Predictive Modeling for Property-Casualty Insurance


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Predictive Modeling for Property-Casualty Insurance

  1. 1. Predictive Modeling for Property-Casualty Insurance James Guszcza, FCAS, MAAA Peter Wu, FCAS, MAAA SoCal Actuarial Club LAX September 22, 2004
  2. 2. Predictive Modeling: 3 Levels of Discussion <ul><li>Strategy </li></ul><ul><ul><ul><li>Profitable growth </li></ul></ul></ul><ul><ul><ul><li>Retain most profitable policyholders </li></ul></ul></ul><ul><li>Methodology </li></ul><ul><ul><ul><li>Model design (actuarial) </li></ul></ul></ul><ul><ul><ul><li>Modeling process </li></ul></ul></ul><ul><li>Technique </li></ul><ul><ul><ul><li>GLM vs. decision trees vs. neural nets… </li></ul></ul></ul>
  3. 3. Methodology vs Technique <ul><li>How does data mining need actuarial science? </li></ul><ul><ul><ul><li>Variable creation </li></ul></ul></ul><ul><ul><ul><li>Model design </li></ul></ul></ul><ul><ul><ul><li>Model evaluation </li></ul></ul></ul><ul><li>How does actuarial science need data mining? </li></ul><ul><ul><ul><li>Advances in computing, modeling techniques </li></ul></ul></ul><ul><ul><ul><li>Ideas from other fields can be applied to insurance problems </li></ul></ul></ul>
  4. 4. Semantics: DM vs PM <ul><li>One connotation: Data Mining (DM) is about knowledge discovery in large industrial databases </li></ul><ul><ul><ul><li>Data exploration techniques (some brute force) </li></ul></ul></ul><ul><ul><ul><li>e.g. discover strength of credit variables </li></ul></ul></ul><ul><li>Predictive Modeling (PM) applies statistical techniques (like regression) after knowledge discovery phase is completed. </li></ul><ul><ul><ul><li>Quantify & synthesize relationships found during knowledge discovery </li></ul></ul></ul><ul><ul><ul><li>e.g. build a credit model </li></ul></ul></ul>
  5. 5. Strategy: Why do Data Mining? Think Baseball!
  6. 6. Bay Area Baseball <ul><li>In 1999 Billy Beane (manager for the Oakland Athletics) found a novel use of data mining. </li></ul><ul><li>Not a wealthy team </li></ul><ul><ul><ul><li>Ranked 12 th (out of 14) in payroll </li></ul></ul></ul><ul><ul><ul><li>How to compete with rich teams? </li></ul></ul></ul><ul><li>Beane hired a statistics whiz to analyze statistics advocated by baseball guru Bill James </li></ul><ul><li>Beane was able to hire excellent players undervalued by the market. </li></ul><ul><li>A year after Beane took over, the A’s ranked 2 nd ! </li></ul>
  7. 7. Implication <ul><li>Beane quantified how well a player would do. </li></ul><ul><ul><li>Not perfectly, just better than his peers </li></ul></ul><ul><li>Implication: </li></ul><ul><ul><li>Be on the lookout for fields where an expert is required to reach a decision based on judgmentally synthesizing quantifiable information across many dimensions . </li></ul></ul><ul><ul><li>(sound like insurance underwriting?) </li></ul></ul><ul><ul><li>Maybe a predictive model can beat the pro. </li></ul></ul>
  8. 8. Example <ul><li>Who is worse?... And by how much? </li></ul><ul><ul><ul><li>20 y.o. driver with 1 minor violation who pays his bills on time and was written by your best agent </li></ul></ul></ul><ul><ul><ul><li>Mature driver with a recent accident and has paid his bills late a few times </li></ul></ul></ul><ul><li>Unlike the human, the algorithm knows how much weight to give each dimension… </li></ul><ul><li>Classic PM strategy: build underwriting models to achieve profitable growth . </li></ul>
  9. 9. Keeping Score You! (or people on your team) Billy Bean’s number cruncher Predictive variables – old or new (e.g. credit) Bill James’ stats Potential Insured Potential Team Member Underwriter Beane’s Scouts CEO who wants to run the next Progressive Billy Beane
  10. 10. What is Predictive Modeling?
  11. 11. Three Concepts <ul><li>Scoring engines </li></ul><ul><ul><ul><li>A “predictive model” by any other name… </li></ul></ul></ul><ul><li>Lift curves </li></ul><ul><ul><ul><li>How much worse than average are the policies with the worst scores? </li></ul></ul></ul><ul><li>Out-of-sample tests </li></ul><ul><ul><ul><li>How well will the model work in the real world? </li></ul></ul></ul><ul><ul><ul><li>Unbiased estimate of predictive power </li></ul></ul></ul>
  12. 12. Classic Application: Scoring Engines <ul><li>Scoring engine : formula that classifies or separates policies (or risks, accounts, agents…) into </li></ul><ul><ul><ul><li>profitable vs. unprofitable </li></ul></ul></ul><ul><ul><ul><li>Retaining vs. non-retaining… </li></ul></ul></ul><ul><li>(Non-)Linear equation f ( ) of several predictive variables </li></ul><ul><li>Produces continuous range of scores </li></ul><ul><li>score = f ( X 1 , X 2 , …, X N ) </li></ul>
  13. 13. What “Powers” a Scoring Engine? <ul><li>Scoring Engine: </li></ul><ul><li>score = f ( X 1 , X 2 , …, X N ) </li></ul><ul><li>The X 1 , X 2 ,…, X N are as important as the f ( ) ! </li></ul><ul><ul><ul><li>Why actuarial expertise is necessary </li></ul></ul></ul><ul><li>A large part of the modeling process consists of variable creation and selection </li></ul><ul><ul><ul><li>Usually possible to generate 100’s of variables </li></ul></ul></ul><ul><ul><ul><li>Steepest part of the learning curve </li></ul></ul></ul>
  14. 14. Model Evaluation: Lift Curves <ul><li>Sort data by score </li></ul><ul><li>Break the dataset into 10 equal pieces </li></ul><ul><ul><ul><li>Best “decile”: lowest score  lowest LR </li></ul></ul></ul><ul><ul><ul><li>Worst “decile”: highest score  highest LR </li></ul></ul></ul><ul><ul><ul><li>Difference: “Lift” </li></ul></ul></ul><ul><li>Lift = segmentation power </li></ul><ul><li>Lift  ROI of the modeling project </li></ul>
  15. 15. Out-of-Sample Testing <ul><li>Randomly divide data into 3 pieces </li></ul><ul><ul><ul><li>Training data, Test data, Validation data </li></ul></ul></ul><ul><li>Use Training data to fit models </li></ul><ul><li>Score the Test data to create a lift curve </li></ul><ul><ul><ul><li>Perform the train/test steps iteratively until you have a model you’re happy with </li></ul></ul></ul><ul><ul><ul><li>During this iterative phase, validation data is set aside in a “lock box” </li></ul></ul></ul><ul><li>Once model has been finalized, score the Validation data and produce a lift curve </li></ul><ul><ul><ul><li>Unbiased estimate of future performance </li></ul></ul></ul>
  16. 16. Comparison of Techniques <ul><li>Models built to detect whether an email message is really spam. </li></ul><ul><li>“ Gains charts” from several models  </li></ul><ul><ul><ul><li>Analogous to lift curves </li></ul></ul></ul><ul><ul><ul><li>Good for binary target </li></ul></ul></ul><ul><li>All techniques work ok! </li></ul><ul><ul><ul><li>Good variable creation at least as important as modeling technique. </li></ul></ul></ul>
  17. 17. Credit Scoring is an Example <ul><li>All of these concepts apply to Credit Scoring </li></ul><ul><ul><ul><li>Knowledge discovery in databases (KDD) </li></ul></ul></ul><ul><ul><ul><li>Scoring engine </li></ul></ul></ul><ul><ul><ul><li>Lift Curve evaluation  translates to LR improvement  ROI </li></ul></ul></ul><ul><ul><ul><li>Blind-test validation </li></ul></ul></ul><ul><li>Credit scoring has been the insurance industry’s segue into data mining </li></ul>
  18. 18. Applications Beyond Credit <ul><li>The classic: Profitability Scoring Model </li></ul><ul><ul><ul><li>Underwriting/Pricing applications </li></ul></ul></ul><ul><li>Retention models </li></ul><ul><li>Elasticity models </li></ul><ul><li>Cross-sell models </li></ul><ul><li>Lifetime Value models </li></ul><ul><li>Agent/agency monitoring </li></ul><ul><li>Target marketing </li></ul><ul><li>Fraud detection </li></ul><ul><li>Customer segmentation </li></ul><ul><ul><ul><li>no target variable (“unsupervised learning”) </li></ul></ul></ul>
  19. 19. Data Sources <ul><li>Company’s internal data </li></ul><ul><ul><li>Policy-level records </li></ul></ul><ul><ul><li>Loss & premium transactions </li></ul></ul><ul><ul><li>Agent database </li></ul></ul><ul><ul><li>Billing </li></ul></ul><ul><ul><li>VIN…….. </li></ul></ul><ul><li>Externally purchased data </li></ul><ul><ul><li>Credit </li></ul></ul><ul><ul><li>CLUE </li></ul></ul><ul><ul><li>MVR </li></ul></ul><ul><ul><li>Census </li></ul></ul><ul><ul><li>… . </li></ul></ul>
  20. 20. The Predictive Modeling Process Early: Variable Creation Middle: Data Exploration & Modeling Late: Analysis & Implementation
  21. 21. Variable Creation <ul><li>Research possible data sources </li></ul><ul><li>Extract/purchase data </li></ul><ul><li>Check data for quality (QA) </li></ul><ul><ul><ul><li>Messy! (still deep in the mines) </li></ul></ul></ul><ul><li>Create Predictive and Target Variables </li></ul><ul><ul><ul><li>Opportunity to quantify tribal wisdom </li></ul></ul></ul><ul><ul><ul><li>…and come up with new ideas </li></ul></ul></ul><ul><ul><ul><li>Can be a very big task! </li></ul></ul></ul><ul><li>Steepest part of the learning curve </li></ul>
  22. 22. Types of Predictive Variables <ul><li>Behavioral </li></ul><ul><ul><ul><li>Historical Claim, billing, credit … </li></ul></ul></ul><ul><li>Policyholder </li></ul><ul><ul><ul><li>Age/Gender, # employees … </li></ul></ul></ul><ul><li>Policy specifics </li></ul><ul><ul><ul><li>Vehicle age, Construction Type … </li></ul></ul></ul><ul><li>Territorial </li></ul><ul><ul><ul><li>Census, Weather … </li></ul></ul></ul>
  23. 23. Data Exploration & Variable Transformation <ul><li>1-way analyses of predictive variables </li></ul><ul><li>Exploratory Data Analysis (EDA) </li></ul><ul><li>Data Visualization </li></ul><ul><li>Use EDA to cap / transform predictive variables </li></ul><ul><ul><ul><li>Extreme values </li></ul></ul></ul><ul><ul><ul><li>Missing values </li></ul></ul></ul><ul><ul><ul><li>… etc </li></ul></ul></ul>
  24. 24. Multivariate Modeling <ul><li>Examine correlations among the variables </li></ul><ul><li>Weed out redundant, weak, poorly distributed variables </li></ul><ul><li>Model design </li></ul><ul><li>Build candidate models </li></ul><ul><ul><ul><li>Regression/GLM </li></ul></ul></ul><ul><ul><ul><li>Decision Trees/MARS </li></ul></ul></ul><ul><ul><ul><li>Neural Networks </li></ul></ul></ul><ul><li>Select final model </li></ul>
  25. 25. Building the Model <ul><li>Pair down collection of predictive variables to a manageable set </li></ul><ul><li>Iterative process </li></ul><ul><ul><li>Build candidate models on “training data” </li></ul></ul><ul><ul><li>Evaluate on “test data” </li></ul></ul><ul><ul><li>Many things to tweak </li></ul></ul><ul><ul><ul><li>Different target variables </li></ul></ul></ul><ul><ul><ul><li>Different predictive variables </li></ul></ul></ul><ul><ul><ul><li>Different modeling techniques </li></ul></ul></ul><ul><ul><ul><li># NN nodes, hidden layers; tree splitting rules… </li></ul></ul></ul>
  26. 26. Considerations <ul><li>Do signs/magnitudes of parameters make sense? Statistically significant? </li></ul><ul><li>Is the model biased for/against certain types of policies? States? Policy sizes? ... </li></ul><ul><li>Predictive power holds up for large policies? </li></ul><ul><li>Continuity </li></ul><ul><ul><ul><li>Are there small changes in input values that might produce large swings in scores </li></ul></ul></ul><ul><ul><ul><li>Make sure that an agent can’t game the system </li></ul></ul></ul>
  27. 27. Model Analysis & Implementation <ul><li>Perform model analytics </li></ul><ul><ul><ul><li>Necessary for client to gain comfort with the model </li></ul></ul></ul><ul><li>Calibrate Models </li></ul><ul><ul><ul><li>Create user-friendly “scale” – client dictates </li></ul></ul></ul><ul><li>Implement models </li></ul><ul><ul><ul><li>Programming skills are critical here </li></ul></ul></ul><ul><li>Monitor performance </li></ul><ul><ul><ul><li>Distribution of scores over time, predictiveness, usage of model... </li></ul></ul></ul><ul><ul><ul><li>Plan model maintenance </li></ul></ul></ul>
  28. 28. Modeling Techniques Where Actuarial Science Needs Data Mining
  29. 29. The Greatest Hits <ul><li>Unsupervised: no target variable </li></ul><ul><ul><ul><li>Clustering </li></ul></ul></ul><ul><ul><ul><li>Principal Components (dimension reduction) </li></ul></ul></ul><ul><li>Supervised: predict a target variable </li></ul><ul><ul><ul><li>Regression  GLM </li></ul></ul></ul><ul><ul><ul><li>Neural Networks </li></ul></ul></ul><ul><ul><ul><li>MARS: M ultivariate A daptive R egression S plines </li></ul></ul></ul><ul><ul><ul><li>CART: C lassification A nd R egression T rees </li></ul></ul></ul>
  30. 30. Regression and its Relations <ul><li>GLM: relax regression’s distributional assumptions </li></ul><ul><ul><ul><li>Logistic regression (binary target) </li></ul></ul></ul><ul><ul><ul><li>Poisson regression (count target) </li></ul></ul></ul><ul><li>MARS & NN </li></ul><ul><ul><ul><li>Clever ways of automatically transforming and interacting input variables </li></ul></ul></ul><ul><ul><ul><li>Why: sometimes “true” relationships aren’t linear </li></ul></ul></ul><ul><ul><ul><li>Universal approximators: model any functional form </li></ul></ul></ul><ul><li>CART is simplified MARS </li></ul>
  31. 31. Neural Net Motivation <ul><li>Let X 1 , X 2 , X 3 be three predictive variables </li></ul><ul><ul><ul><li>policy age, historical LR, driver age </li></ul></ul></ul><ul><li>Let Y be the target variable </li></ul><ul><ul><ul><li>Loss ratio </li></ul></ul></ul><ul><li>A NNET model is a complicated, non-linear, function φ such that: </li></ul><ul><li>φ ( X 1 , X 2 , X 3 ) ≈ Y </li></ul>
  32. 32. In visual terms…
  33. 33. NNET lingo <ul><li>Green: “input layer” </li></ul><ul><li>Red: “hidden layer” </li></ul><ul><li>Yellow: “output layer” </li></ul><ul><li>The { a , b } numbers are “weights” to be estimated. </li></ul><ul><li>The network architecture and the weights constitute the model. </li></ul>
  34. 34. In more detail…
  35. 35. In more detail… <ul><li>The NNET model results from substituting the expressions for Z 1 and Z 2 in the expression for Y . </li></ul>
  36. 36. In more detail… <ul><li>Notice that the expression for Y has the form of a logistic regression. </li></ul><ul><li>Similarly with Z 1 , Z 2 . </li></ul>
  37. 37. In more detail… <ul><li>You can therefore think of a NNET as a set of logistic regressions embedded in another logistic regression. </li></ul>
  38. 38. Universal Approximators <ul><li>The essential idea : by layering several logistic regressions in this way… </li></ul><ul><li>… we can model any functional form </li></ul><ul><ul><li>no matter how many non-linearities or interactions between variables X 1 , X 2 ,… </li></ul></ul><ul><ul><li>by varying # of nodes and training cycles only </li></ul></ul><ul><li>NNETs are sometimes called “universal function approximators”. </li></ul>
  39. 39. MARS / CART Motivation <ul><li>NNETs use the logistic function to combine variables and automatically model any functional form </li></ul><ul><li>MARS uses an analogous clever idea to do the same work </li></ul><ul><ul><ul><li>MARS “basis functions” </li></ul></ul></ul><ul><li>CART can be viewed as simplified MARS </li></ul><ul><ul><ul><li>Basis functions are horizontal step functions </li></ul></ul></ul><ul><li> NNETS, MARS, and CART are all cousins of classic regression analysis </li></ul>
  40. 40. Reference <ul><li>For Beginners : </li></ul><ul><li>Data Mining Techniques </li></ul><ul><ul><ul><li>--Michael Berry & Gordon Linhoff </li></ul></ul></ul><ul><li>For Mavens : </li></ul><ul><li>The Elements of Statistical Learning </li></ul><ul><ul><ul><li>--Jerome Friedman, Trevor Hastie, Robert Tibshirani </li></ul></ul></ul>