辅导案例-PSTAT130

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

PSTAT130: SAS Base Program
Project: Actuarial Science with SAS
Claire Mouminoux
Winter 2020
Instructions: Please note that you are responsible for the work sharing. Each
group will submit two files on GauchoSpace
• Your SAS code in .txt format - Please follow the libname name, vari-
able names and database names instruction to facilitate correction.
• The report produced using the OUTPUT DELIVERY SYSTEM (ODS)
as .pdf file
The code must be appropriately commented to refer to the instruction num-
ber, and the output customized with title so that we understand which
problem you are treating. Do not forget to add your names at the top of
the code.
1 Context
You have just been hired by a Californian Insurance compagny CA SUN INSURANCE
as a statistical analyst and SAS programer in the technical department of motor insur-
ance. The company wants to renew their motor insurance price strategy for the 2020
new business 1. To do so, you will have to analyze their past data to propose an ade-
quate price regarding:
- expected costs of motor claims for the different profile of customers;
- the volume of new business (number of new contracts);
- overall expected profit generated by the 2020 new business.
In addition, you will colaborate with the marketing department which will give you
some information about customers’ behavior and their probability to underwrite a
contract regarding their quotes (i.e. the price that you will determine).
The company has high expectation from your work and you will have to generate an
easely understandable SAS report to explain your price strategy. Also, you will have to
deal with some constraints imposed by your managers and demonstrate to them that
you have followed their recommendations.
1The 2020 new business corresponds to all new insureds who will underwrite a contract for the first
time at CA SUN INSURANCE throughout 2020
1
2 All you need to know
2.1 Actuarial Science: the art of estimating future costs
2.1.1 Pure Premium: Frequency × Severity
To estimate futur claim costs for a cohort i 2, actuaries and statistical analysts use the
past data collected over years to determine two important components:
- The frequency: corresponds to the number of claims of a cohort over the number
of individuals within this cohort: Freqi , it could be interpreted as the probability to
have a claim.
- The severity: corresponds to the average cost of claims within the cohort i: Costi .
Example - Step 1:
Name Age Gender Claims Claim Cost
Pierre 18 H Yes 500
Paul 50 H Yes 800
Luke 18 H No
Claire 50 F No
Anna 50 F Yes 200
Maria 18 F Yes 400
Jude 18 F Yes 500
Peter 50 H No
Sylvia 18 F Np
Henri 18 H Yes 300
Oliver 18 H Yes 900
Suppose we define a cohort according to the following characteristics: Age and Gen-
der. In this example, we have 4 different cohorts corresponding to 4 profiles. We need
to determine:
- the number of individuals within each cohort (number of individuals having the
same profile): NB Individuals;
- the number of claims recorded within each cohort: NB Claims;
- the sum of claim costs within each cohort: Sum of Claim Costs.
2A cohort is a group of individuals having the same characteristics such as the same age and the same
car for instance. It corresponds to the profile of an individual.
2
Cohort i Age Gender NB Individuals NB Claims Sum of Claim Cost
1 18 H 4 3 1700
2 50 H 2 1 800
3 18 F 3 2 900
4 50 F 2 1 200
We are now able to compute the frequency Freqi and the severity Costi of claims for
each cohort i = {1,2,3,4}. These two elements allow us to compute the P ure P remiumi
(P Pi) for each individual given his/her profile (i.e. given the cohort i in which he/she
belongs to). The pure premium is defined by:
P Pi = Freqi ×Costi .
It corresponds to the expected average claim costs of an individual belonging to cohort
i. Indeed, it takes into account both: the probability of an individual with a given
profile to have a claim and the expected costs of the claim if any.
Cohort i Age Gender Freqi Costi P Pi
1 18 H 0.75 1700 425
2 50 H 0.5 800 400
3 18 F 0.67 900 300
4 50 F 0.5 200 100
2.1.2 The Loaded Premium: Pure Premium × (1+Loading Factor)
The costs of claims is not the only cost to take into account for an insurance com-
pany. Besides the price of repairing a car after an accident, for instance, the insurance
company has to remunerate specialists (such as actuaries, accident experts, accounting
officers,...): theses costs are called the handling costs.
In general, we consider these costs as proportional to the expected average costs of
claims of an insured (i.e. the pure premium). In other words, policyholders who have
more claims or/and more costly claims generate higher handling costs (i.e. need of an
expert, take more processing time, ...). The Loaded P remiumi (LPi) for each individual
belonging to a cohort i is equal to:
LPi = P Pi × (1 +λ)
where λ > 0 is the Loading Factor. The loading factor is usually given by the account-
ing department of the insurance company.
Example - Step 2:
Let’s return to our example. We have already determined the P Pi for each profile/cohort
i. Suppose that the accounting department of the insurance company gives you the fol-
lowing information:
3
For a claim cost of $500 we also have additional handling costs of $25.
We thus estimate the loading factor at 5% = 25/500
We are now able to estimate the future costs (including claim costs and handling costs)
for each profile.
Cohort i Age Gender P Pi (1 +λ) LPi
1 18 H 425 1.05 446
2 50 H 400 1.05 420
3 18 F 300 1.05 315
4 50 F 100 1.05 105
2.2 The Insurance Price Strategy
2.2.1 Insurance Price: Loaded Premium × (1+Margin Rate)
The Insurance P ricei (Pi) is the final price proposed to a new prospect 3. To make prof-
its, the insurance company cannot define: Insurance Price = Loaded Premium. They
need to generate a margin on each contract and will define a price strategy by defining
a Margin Rate.
Therefore, the Insurance P ricei (Pi) is :
Pi = LPi × (1 + β).
where β > 0 is the Margin Rate.
Example - Step 3:
Let’s return to our example! Suppose that the managers are hesitating between two
price strategies:
- Strat A: βA = 0.01
- Strat B: βB = 0.05
We first compute the Insurance Price for each profile i and each strategy A or B:
Cohort i Age Gender LPi Pi,A = LPi × (1 + βA) Pi,B = LPi × (1 + βB)
1 18 H 446 451 469
2 50 H 420 424 441
3 18 F 315 318 331
4 50 F 105 106 110
3A prospect is a customer who asks for a quote, but who is not already a client/insured of the com-
pany. He/She will decide to become a client if he/she agrees with the Insurance Price.
4
2.2.2 Determining the Best Strategy
At this stage, we are not able to determine which of A or B price strategy is the best. A
highest price means a largest margin BUT a highest price also means a lowest proba-
bility that a prospect (the customer who receives the quote) accepts it and becomes a
new client!
We need more information about customers’ behaviors and that’s a good news since
an entire department is studying that! Indeed, the quantitative marketing departe-
ment of an insurance company studies the behavior of prospects with respect to an
insurance price quote. In other words, they compute the probability of a prospect to
underwrite the insurance contract given its price.
To do so, they generally use a logistic regression where :
P rob(Accept|Pj) = 11 + exp(f (Pj))
Here P rob(Accept|Pj) is the probability that a prospect j receiving an insurance price
Pj accepts it and becomes a new clients: a New Business contract.
To determine the best strategy you will need two elements from the marketing depart-
ment:
- The function f (Pj) that they have estimated;
- A database of prospects to compute the effect of your two insurance price strategies.
Example - Step 4:
The marketing department gives you the following information.
Please find below our estimated function and the database of prospect clients for 2020
P rob(Accept|Pj) = 11 + exp(f (Pj)) with f (Pj) = −0.45×
LPj
Pj
+ 0.001× Pj
we consider that a prospect becomes a new business client if P rob(Accept|Pj) > 0.5.
5
Prospect_ID Age Gender
1 18 F
3 18 F
5 18 F
6 18 F
7 18 F
12 50 F
14 50 F
2 18 H
4 18 H
8 18 H
9 18 H
10 50 H
11 50 H
13 50 H
15 50 H
Now we have to determine:
- what price the prospect will receive depending on each strategy A or B;
- what is the probability that a prospect accepts depending on each strategy A or B and
the information given by the marketing department ;
- who accepts depending on each strategy A or B and the rule: acceptance if P rob(Accept|Pj) >
0.5.
Prospect_ID Age Gender LP P_A P_B P(Accept |P_A) P(Accept |P_B) Accept | P_A Accept | P_B
1 18 F 315 318 331 0.532 0.524 Yes Yes
3 18 F 315 318 331 0.532 0.524 Yes Yes
5 18 F 315 318 331 0.532 0.524 Yes Yes
6 18 F 315 318 331 0.532 0.524 Yes Yes
7 18 F 315 318 331 0.532 0.524 Yes Yes
12 50 F 105 106 110 0.584 0.579 Yes Yes
14 50 F 105 106 110 0.584 0.579 Yes Yes
2 18 H 446 451 469 0.499 0.490 No No
4 18 H 446 451 469 0.499 0.490 No No
8 18 H 446 451 469 0.499 0.490 No No
9 18 H 446 451 469 0.499 0.490 No No
10 50 H 420 424 441 0.505 0.497 Yes No
11 50 H 420 424 441 0.505 0.497 Yes No
13 50 H 420 424 441 0.505 0.497 Yes No
15 50 H 420 424 441 0.505 0.497 Yes No
Now we are able to determine the best strategy to adopt and the insurance price to
offer depending on the prospect customer profile.
The expected profit is equal to:
Π(A) =
j=15∑
j=1
(Pj,A −LPj)×1j,A, for strategy A
6
or
Π(B) =
j=15∑
j=1
(Pj,B −LPj)×1j,B, for strategy B.
Here 1j,A = 1 if the prospect j accepts the insurance price PA and 0 otherwise. Simi-
larly, 1j,B = 1 if the prospect j accepts the insurance price PB and 0 otherwise.
Prospect_ID LPj Pj,A Pj,B Pj,A −LPj Pj,B −LPj 1j,A 1j,B (Pj,A −LPj )× 1j,A (Pj,B −LPj )× 1j,B
1 315 318 331 3 13 1 1 3 13
3 315 318 331 3 13 1 1 3 13
5 315 318 331 3 13 1 1 3 13
6 315 318 331 3 13 1 1 3 13
7 315 318 331 3 13 1 1 3 13
12 105 106 110 1 4 1 1 1 4
14 105 106 110 1 4 1 1 1 4
2 446 451 469 5 18 0 0 0 0
4 446 451 469 5 18 0 0 0 0
8 446 451 469 5 18 0 0 0 0
9 446 451 469 5 18 0 0 0 0
10 420 424 441 4 17 1 0 4 0
11 420 424 441 4 17 1 0 4 0
13 420 424 441 4 17 1 0 4 0
15 420 424 441 4 17 1 0 4 0
Sum 11 7 33 73
The best insurance price strategy is PB with a total profit of $73 and a new business
volume of 7.
2.2.3 The Final Price table
We determined the best price strategy to adopt to maximise the profit regarding to
the risk of each profile, under the constraints and recommandations of managers and
the help of other departments in the insurance company. Besides a report to sum
up your statistical analysis, your work ouput (a datatable) is the final insurance price
table depending on each profile (i.e. combination of prospects’ characteristics). This
insurance price table will be used to determine which price to propose to each new
prospects.
Example - Step 5:
In our exemple the final output correspond to the table bellow:
Age Gender Final Price
18 H 469
50 H 441
18 F 331
50 F 110
7
2.3 Others comments
Of course an insurance company has more data and this is the reason why SAS is es-
sential to manage big database. The following sections of this document will explain
you, step by step, how to do it using SAS!
Good luck and Enjoy!
3 Step by Step Project Instruction
3.1 Computation of future costs
In this section you will compute the frequency and severity for the different profile
based on previous claims data of the company. As a reminder, for each new database
it is important to use the proc contents procedure to have database information.
Since 2015, the company keep all the information about the insureds in their portefolio
and the claims that occured. You have at your disposal two database from the historical
data of the company:
• ptf.sas7bdat: corresponds to the portfolio of clients of the company from 2015 to
2019 and their characteristics: cars type, birthdate, zip code, their policyholder
id and the ongoing contract starting date (i.e. a policyholder can be present sev-
eral time in the portfolio database if it stay several years in the insurance com-
pany but will have a different ongoing contract date).
• claims.sas7bdat: corresponds to each claim registered by the company from
2015 to 2019 and includes: the policyholder id, the ongoing contract date and
the cost of the claim.
3.1.1 Cohorts’ profile
The company want to create cohorts according to the following information:
- age: the age of the client with the following level : age ≤ 21: "-21", 21 < age ≤ 35:
"21-35", 35 < age ≤ 60: "35-60" and 60 < age: "+60";
- hp: the horsepower of the car with the following level: horsepower ≤ 150: "low",
150 < horsepower ≤ 300: "medium", 300 < horsepower : "high";
- density: the density of the ZIP code where is living with the following level: population ≤
4000: "low", 4000 < population ≤ 30000: "medium", 30000 < population : "high";
Therefore, a profile correspond to the combinaison of these three variables. All in
all there are 36 different profiles: 4(age)× 3(hp)× 3(density) = 36.
Instruction 1: Import the ptf.sas7bdat database using the LIBNAME procedure with
"data" as the name of the path. Using the information of the data.ptf database, compute
the age of each policyholder and create the character variable ’age’ (with the informat
8
$15.) according to levels described before. The output should contain all the previous
variable already exisiting in data.ptf database and the new variable age. The output is
work.ptf: it is a database stored in the temporary library "work" and named ptf.
Hint: Be carefull, for each row we want to have the age of the policyholder with respect
to the date when the contract was ongoing. The if statement if the simplest way to
create the variable with the different level of age.
In addition, insurers often use external data to improve their statistical analysis. CA
SUN INSURANCE, uses two different external database:
- cars.csv: each row includes a cars type ("id" column) and its different features (in-
cluding horsepower).
- CA_ZIP_CODE.TXT: each row includes a californian zip code ("zip_code" column)
and the corresponding population ("population" column).
Instruction 2: Import the cars.csv in sas as work.cars using Proc import procedure
and create the character variable "hp" in the work.cars database (with the informat
$15.) according to levels decribed before. Merge it with the work.ptf database accord-
ing to the "id_cars" variable. The output is again the work.ptf database.
Instruction 3: Import the CA_ZIP_CODE.TXT in sas as work.CA_ZIP_CODE using
Proc import procedure and create the character variable "density" in work.cars (with
the informat $15.) according to levels decribed before. Merge it with the work.ptf
database according to the "Zip_code" variable. The output is again the work.ptf database.
For each row of the ptf database you have now the three characteristics variables in
order to define the policyholder profile.
3.1.2 Computation of the loaded premium
You have to determine the profile of policyholders having a claim.
Instruction 4: Import the claims.sas7bdat database using the LIBNAME procedure with
"data" as the name of the path. Merge the data.claims and the work.ptf by "policy-
holder_id" and "policy_starting_date" to recover the characterics variable of each pol-
icyholder having a claim for each year of contract. The ouput should be a database
named "claims" and stored in the temporary library work.
Hint: Be carefull we want to have in the output database only the rows present on the
claims database.
We now want to know the number of claims and the average cost of claim (severity)
for each profile and for each contract year.
Instruction 5: Create a variable "year" in the work.claims database corresponding to
the year of the ongoing contract and create a database work.claims_summary includ-
ing the number of claims and the average cost of claims for each year and profile. The
average cost of claim variable should be named "cost" and the number of claims vari-
able should be named "nb_claims".
9
Hint: the PROC MEANS procedure is the easiest way to do it (do not forget the output
out statement).
You need to know the number of policyholder on in each cohort and for each year
in order to determine the frequency of claim for each profile and for each year.
Instruction 6: Create a variable "year" in the work.ptf database corresponding to the
year of the ongoing contract and create a database work.ptf_summary including the
number of policyholders for each year and profile. The number of policyholder vari-
able should be named "nb".
Hint: the PROC MEANS procedure is the simplest way to do it (do not forget the output
out statement).
Now you have to compute the frequency of claim for each year and profile.
Instruction 7: Create a work.summary database corresponding to the merge between
work.claims_summary and work.ptf_summary by "age", "density", "population" and
"year" and compute the frequency of claims for each years and profile using variables
"nb_claims" and "nb". The frequency variable is named "freq".
For now, you have analyzed the frequency and the average cost of claim for each profile
and for each year of contract. Indeed, to define frequency to consider (over the years)
for the computation of the pure premium, the manager asks you to be "conservative"
and keep for each profile, the year where the frequency is the highest.
Instruction 8: Create a work.freq database corresponding to the highest frequency
for each profile. Keep on this database only the "age", "density", "hp" and "freq" vari-
ables.
Hint: the PROC SORT procedure is the simplest way to do it (do not forget the nodupkey
and be carefull on the order of by variables).
Similarly, to define averale claim costs to consider (over the years) for the computa-
tion of the pure premium, the manager asks you to be "conservative" and keep for each
profile, the year where the cost is the highest.
Instruction 9: Create a work.cost database corresponding to the highest average cost
of claims for each profile. Keep on this database only the "age", "density", "hp" and
"cost" variables.
Hint: the PROC SORT procedure is the simplest way to do it (do not forget the nodupkey
and be carefull on the order of by variables).
Now you can easely determine the pure premium since you have for each profile the
cost and frequency of claim.
Instruction 10: Create a work.pp database corresponding to the merge of the work.cost
and work.freq database by "age", "density" and "hp" variables. Create a new variable
"pp" corresponding to the pure premium.
Hint: the "pp" database should have 36 observations corresponding to the 36 possible
10
profiles.
Instruction 11: Create a new variable "lp" in the work.pp database corresponding to
the loaded premium according to the following information given by the accounting
department:
For a claim cost of $500 we also have additional handling costs of $25.
We thus estimate the loading factor at 5% = 25/500
3.2 Definition of the Best Insurance Price Strategy
Your managers want to use one of the following strategies:
- Strategy A: loading factor equal to 5%;
- Strategy B: loading factor equal to 10%;
- Strategy C: loading factor equal to 15%;
and they need you to determine which want generate the highest profit.
3.2.1 Define the quote received for each strategy and for each prospect
Instruction 12: Create a new database work.price using the work.pp database. In this
database keep only the "age", "density", "hp", "lp" variables and create three new vari-
ables "pA", "pB", "pC", corresponding respectively to the price of strategy A, B and C .
From the example at the beginning of this document, we know that the price is not
sufficient to determine the profit generated by a stratgegy. You receive the following
information from the marketing department:
Please find below our estimated function and the database of prospect clients for 2020
(Prospect.csv)
P rob(Accept|Pj) = 11 + exp(f (Pj)) with f (Pj) = −0.1×
LPj
Pj
+ 0.002× (Pj −LPj)
we consider that a prospect becomes a new business client if P rob(Accept|Pj) > 0.5.
Instruction 13: Import the Prospect.csv database given by the marketing department
as work.prospect using Proc import procedure Similarly to Instruction 1, 2 and 3, re-
covered the three characteristics to determine the profile of each prospect. The output
should be a database stored in the temporary work library named prospect.
Hint: the work.cars and work.CA_ZIP_CODE database are already existing.
Instruction 14: Merge the work.prospect database with the work.price database by
"age", "density" and "hp" in order to determine the price that they will receive for each
strategy A, B and C. The output should be the work.prospect database.
11
3.2.2 Compute the profit and volume generated by each strategy
Instruction 15: Create for each row three new variable "prob_a", "prob_b" and "prob_c"
in work.prospect database, corresponding to the probability that the prospect receiv-
ing respectively the price "pa", "pb" and "pc" accepts the offer.
Hint: of course you need to take into account the information given by the marketing
department.
Instruction 16: Create for each row three new variables "accept_a", "accept_b" and
"accept_c" in work.prospect database equal to 1 if the prospect accepts respectively
the price "pa", "pb" and "pc" and 0 otherwise.
Hint: the if statement is the simplest way to do it.
Instruction 17: Create a new database work.table_volume corresponding to the vol-
ume of new business for each strategy.
Hint: the Proc MEANS is the simplest way to do it.
Instruction 18: Compute the profit for each row of the work.prospect database and
call the variable respectively "pi_a", "pi_b" and "pi_c" in the work.prospect database
for strategy a, b and c.
Instruction 19: Create a new database work.table_profit corresponding to the profit
for each strategy.
Hint: the Proc MEANS is the simplest way to do it.
You are now able to determine the best strategy to implement to maximise the
profit for the new business in 2020 of this motor insurance company!
3.3 Create a report
You job is almost finished but you need to provide the key elements of your reasoning
and the final output (i.e. the table of the insurance price by profile that you recom-
mand in order to optimize the new business profit).
Your manager ask you to create a pdf report with the following information (in this
order):
• On the top of each page your name and ucsb email adress
• A summary table with the average claim frequency (used in the pp) by age of the
current portfolio
• A summary table with the average claim frequency (used in the pp) by density of
the current portfolio
• A summary table with the average claim frequency (used in the pp) by horse
power of the current portfolio
• A summary table with the average claim costs (used in the pp) by age of the
current portfolio
12
• A summary table with the average claim costs (used in the pp) by density of the
current portfolio
• A summary table with the average claim costs (used in the pp) by horse power of
the current portfolio
• A summary table with the average pure premium of the current portfolio
• The profile with the lowest pure premium
• The profile with the highest pure premium
• A summary table with the volume of new business of each strategy
• A summary table with the total profit of each strategy
• The final pricing strategy table
Instruction 20: Create a pdf report using ODS procedure named "final_report_lastnames.pdf".
To avoid to have one page for each output (corresponding to each procedure in the STATEMENT>) and to add your name at the top of each page use the following ODS op-
tions:
options nodate pdfpageview=FITPAGE;
ods NOPROCTITLE;
ODS PDF style=journal File="...final_report_lastname.pdf"
startpage=no;
title1 "Homework Project PSTAT130";
title2 "Your names;
title3 "Your email adresses";
ods layout gridded;

ods layout end;
ODS PDF CLOSE;
This ODS procedure can generate a warning in the log windows : "WARNING: Layout
is not supported in the RTF(WEB) destination. Layout will be ignored." do
not take it into account. Complete the with the following instruc-
tions.
Instruction 21: Create a work.ptf_pp database by merging the work.ptf database and
the work.pp database previously created. Keep only the current portefolio (i.e. the
policy ongoing date equal to 2019).
Instruction 22: Using Proc mean procedure on the work.ptf_pp database, display
three consecutive tables for the average claim frequency by age, density and horse-
power.
13
Hint: do not forget to add a title to each procedure
Instruction 23: Using Proc mean procedure on the work.ptf_pp database, display
three consecutive tables for the average claim costs by age, density and horsepower.
Hint: do not forget to add a title to each procedure
Instruction 24: Using Proc mean procedure on the work.ptf_pp database, display the
average pure premium of the current portfolio.
Instruction 25: Display only the three variables of the profile with the lowest pure
premium.
Hint: Use Proc sort procedure and proc print procedure with obs option.
Instruction 26: Display only the three variables of the profile with the highest pure
premium.
Hint: Use Proc sort procedure and proc print procedure with obs option.
Instruction 27: Using the database work.table_volume previsouly created , display
the volume of new business generated by each strategy. Add a temporary label to re-
place the name column with Strategy A, Strategy B and Strategy C.
Hint: Use Proc print procedure
Instruction 28: Using the database work.table_profit previsouly created , display the
profit generated by each strategy. Add a temporary label to replace the name column
with Strategy A, Strategy B and Strategy C. Add a footnote to give the best strategy
chosen.
Hint: Use Proc print procedure
Instruction 29: Create a database work.final_price using "work.price database. Keep
only the characteristics variables used to determine profiles and rename the price strat-
egy selected by "best_price". Print this database without the observation index .
14