Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc ...
Objectives <ul><li>Introduce Predictive modeling </li></ul><ul><li>Why use it? </li></ul><ul><li>Describe some methods in ...
Predictive Modeling Family Predictive Modeling Classical Linear Models GLMs Data Mining
Why Predictive Modeling? <ul><li>Better use of insurance data </li></ul><ul><li>Advanced methods for dealing with messy da...
Major Kinds of Modeling <ul><li>Supervised learning </li></ul><ul><ul><li>Most common situation </li></ul></ul><ul><ul><li...
Two Big Specialties in Predicative Modeling GLMS Regression Logistic Regressions Poisson Regression Data Mining Trees Neur...
Modeling Process Internal Data Data Cleaning External Data Other Preprocessing Build Model Validate Model Test Model Deplo...
Data Complexities Affecting Insurance Data <ul><li>Nonlinear functions </li></ul><ul><li>Interactions </li></ul><ul><li>Mi...
Kinds of Applications <ul><li>Classification </li></ul><ul><li>Prediction </li></ul>
The Fraud Study Data <ul><li>1993 Automobile Insurers Bureau closed Personal Injury Protection claims </li></ul><ul><li>De...
Introduction of Two Methods <ul><li>Trees </li></ul><ul><ul><li>Sometimes known as CART (Classification and Regression Tre...
Decision Trees <ul><li>Recursively partitions the data </li></ul><ul><ul><li>Often sequentially bifurcates the data – but ...
Goodness of Fit Statistics <ul><li>Chi Square     CHAID  (Fish, Gallagher, Monroe- Discussion Paper Program, 1990) </li><...
Goodness of Fit Statistics <ul><li>Gini Measure    CART </li></ul><ul><li>i is impurity measure </li></ul>
Goodness of Fit Statistics <ul><li>Entropy     C4.5 </li></ul>
An Illustration from Fraud data: GINI Measure
First Split All Claims p(fraud) = 0.36 Legal Rep = Yes P(fraud) = 0 .612 Legal Rep = No P(fraud) =  0.113
Example cont:
Example of Nonlinear Function Suspicion Score vs. 1 st  Provider Bill
An Approach to Nonlinear Functions:  Fit A Tree
Fitted Curve From Tree
Neural Networks <ul><li>Developed by artificial intelligence experts – but now used by statisticians also </li></ul><ul><l...
Neural Networks <ul><li>Fit by minimizing squared deviation between fitted and actual values </li></ul><ul><li>Can be view...
The Backpropagation Neural Network
Neural Network <ul><li>Fits a nonlinear function at each node of each layer </li></ul>
The Logistic Function
Universal Function Approximator <ul><li>The backpropagation neural network with one hidden layer is a universal function a...
Nonlinear Function Fit by Neural Network
Interactions <ul><li>Functional relationship between a predictor variable and a dependent variable depends on the value of...
Interactions <ul><li>Neural Networks </li></ul><ul><ul><li>The hidden nodes pay a key role in modeling the interactions </...
Simple Tree of Injury and Provider Bill
 
Missing Data <ul><li>Occurs frequently in insurance data </li></ul><ul><ul><li>There are some sophisticated methods for ad...
More Complex Example <ul><li>Dependent variable: Expert’s assessment of liklihood claim is legitimate </li></ul><ul><ul><l...
Red Flag Predictor Variables
Claim File Variables
Neural Network Measure of Variable Importance <ul><li>Look at weights to hidden layer </li></ul><ul><li>Compute sensitivit...
Variable Importance
Testing: Hold Out Part of Sample <ul><li>Fit model on 1/2 to 2/3 of data </li></ul><ul><li>Test fit of model on remaining ...
Testing: Cross-Validation <ul><li>Hold out 1/n (say 1/10) of data </li></ul><ul><li>Fit model to remaining data </li></ul>...
Results of Classification on Test Data
Unsupervised Learning <ul><li>Common Method: Clustering </li></ul><ul><li>No dependent variable – records are grouped into...
Dissimilarity (Distance) Measure <ul><li>Euclidian Distance </li></ul><ul><li>Manhattan Distance </li></ul>
Binary Variables
Binary Variables <ul><li>Sample Matching </li></ul><ul><li>Rogers and Tanimoto </li></ul>
Results for 2 Clusters
Beginners Library <ul><li>Berry, Michael J. A., and Linoff, Gordon,  Data Mining Techniques , John Wiley and Sons, 1997 </...
Data Mining CAMAR Spring Meeting Louise Francis, FCAS, MAAA [email_address] www.data-mines.com
Upcoming SlideShare
Loading in …5
×

2005 Spring Meeting - Predictive Modeling Presentation

476 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
476
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2005 Spring Meeting - Predictive Modeling Presentation

  1. 1. Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc www.data-mines.com
  2. 2. Objectives <ul><li>Introduce Predictive modeling </li></ul><ul><li>Why use it? </li></ul><ul><li>Describe some methods in depth </li></ul><ul><ul><li>Trees </li></ul></ul><ul><ul><li>Neural networks </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><li>Apply to fraud data </li></ul>
  3. 3. Predictive Modeling Family Predictive Modeling Classical Linear Models GLMs Data Mining
  4. 4. Why Predictive Modeling? <ul><li>Better use of insurance data </li></ul><ul><li>Advanced methods for dealing with messy data now available </li></ul>
  5. 5. Major Kinds of Modeling <ul><li>Supervised learning </li></ul><ul><ul><li>Most common situation </li></ul></ul><ul><ul><li>A dependent variable </li></ul></ul><ul><ul><ul><li>Frequency </li></ul></ul></ul><ul><ul><ul><li>Loss ratio </li></ul></ul></ul><ul><ul><ul><li>Fraud/no fraud </li></ul></ul></ul><ul><ul><li>Some methods </li></ul></ul><ul><ul><ul><li>Regression </li></ul></ul></ul><ul><ul><ul><li>CART </li></ul></ul></ul><ul><ul><ul><li>Some neural networks </li></ul></ul></ul><ul><li>Unsupervised learning </li></ul><ul><ul><li>No dependent variable </li></ul></ul><ul><ul><li>Group like records together </li></ul></ul><ul><ul><ul><li>A group of claims with similar characteristics might be more likely to be fraudulent </li></ul></ul></ul><ul><ul><li>Some methods </li></ul></ul><ul><ul><ul><li>Association rules </li></ul></ul></ul><ul><ul><ul><li>K-means clustering </li></ul></ul></ul><ul><ul><ul><li>Kohonen neural networks </li></ul></ul></ul>
  6. 6. Two Big Specialties in Predicative Modeling GLMS Regression Logistic Regressions Poisson Regression Data Mining Trees Neural Networks Clustering
  7. 7. Modeling Process Internal Data Data Cleaning External Data Other Preprocessing Build Model Validate Model Test Model Deploy Model
  8. 8. Data Complexities Affecting Insurance Data <ul><li>Nonlinear functions </li></ul><ul><li>Interactions </li></ul><ul><li>Missing Data </li></ul><ul><li>Correlations </li></ul><ul><li>Non normal data </li></ul>
  9. 9. Kinds of Applications <ul><li>Classification </li></ul><ul><li>Prediction </li></ul>
  10. 10. The Fraud Study Data <ul><li>1993 Automobile Insurers Bureau closed Personal Injury Protection claims </li></ul><ul><li>Dependent Variables </li></ul><ul><ul><li>Suspicion Score </li></ul></ul><ul><ul><ul><li>Number from 0 to 10 </li></ul></ul></ul><ul><ul><li>Expert assessment of liklihood of fraud or abuse </li></ul></ul><ul><ul><ul><li>5 categories </li></ul></ul></ul><ul><ul><ul><li>Used to create a binary indicator </li></ul></ul></ul><ul><li>Predictor Variables </li></ul><ul><ul><li>Red flag indicators </li></ul></ul><ul><ul><li>Claim file variables </li></ul></ul>
  11. 11. Introduction of Two Methods <ul><li>Trees </li></ul><ul><ul><li>Sometimes known as CART (Classification and Regression Trees) </li></ul></ul><ul><li>Neural Networks </li></ul><ul><ul><li>Will introduce backpropagation neural network </li></ul></ul>
  12. 12. Decision Trees <ul><li>Recursively partitions the data </li></ul><ul><ul><li>Often sequentially bifurcates the data – but can split into more groups </li></ul></ul><ul><li>Applies goodness of fit to select best partition at each step </li></ul><ul><li>Selects the partition which results in largest improvement to goodness of fit statistic </li></ul>
  13. 13. Goodness of Fit Statistics <ul><li>Chi Square  CHAID (Fish, Gallagher, Monroe- Discussion Paper Program, 1990) </li></ul><ul><li>Deviance  CART </li></ul>
  14. 14. Goodness of Fit Statistics <ul><li>Gini Measure  CART </li></ul><ul><li>i is impurity measure </li></ul>
  15. 15. Goodness of Fit Statistics <ul><li>Entropy  C4.5 </li></ul>
  16. 16. An Illustration from Fraud data: GINI Measure
  17. 17. First Split All Claims p(fraud) = 0.36 Legal Rep = Yes P(fraud) = 0 .612 Legal Rep = No P(fraud) = 0.113
  18. 18. Example cont:
  19. 19. Example of Nonlinear Function Suspicion Score vs. 1 st Provider Bill
  20. 20. An Approach to Nonlinear Functions: Fit A Tree
  21. 21. Fitted Curve From Tree
  22. 22. Neural Networks <ul><li>Developed by artificial intelligence experts – but now used by statisticians also </li></ul><ul><li>Based on how neurons function in brain </li></ul>
  23. 23. Neural Networks <ul><li>Fit by minimizing squared deviation between fitted and actual values </li></ul><ul><li>Can be viewed as a non-parametric, non-linear regression </li></ul><ul><li>Often thought of as a “black box” </li></ul><ul><ul><li>Due to complexity of fitted model it is difficult to understand relationship between dependent and predictor variables </li></ul></ul>
  24. 24. The Backpropagation Neural Network
  25. 25. Neural Network <ul><li>Fits a nonlinear function at each node of each layer </li></ul>
  26. 26. The Logistic Function
  27. 27. Universal Function Approximator <ul><li>The backpropagation neural network with one hidden layer is a universal function approximator </li></ul><ul><li>Theoretically, with a sufficient number of nodes in the hidden layer, any continuous nonlinear function can be approximated </li></ul>
  28. 28. Nonlinear Function Fit by Neural Network
  29. 29. Interactions <ul><li>Functional relationship between a predictor variable and a dependent variable depends on the value of another variable(s) </li></ul>
  30. 30. Interactions <ul><li>Neural Networks </li></ul><ul><ul><li>The hidden nodes pay a key role in modeling the interactions </li></ul></ul><ul><li>CART partitions the data </li></ul><ul><ul><li>Partitions capture the interactions </li></ul></ul>
  31. 31. Simple Tree of Injury and Provider Bill
  32. 33. Missing Data <ul><li>Occurs frequently in insurance data </li></ul><ul><ul><li>There are some sophisticated methods for addressing this (i.e., EM algorithm) </li></ul></ul><ul><li>CART finds surrogates for variables with missing values </li></ul><ul><li>Neural Networks have no explicit procedure for missing values </li></ul>
  33. 34. More Complex Example <ul><li>Dependent variable: Expert’s assessment of liklihood claim is legitimate </li></ul><ul><ul><li>A classification application </li></ul></ul><ul><li>Predictor variables: Combination of </li></ul><ul><ul><li>claim file variables (age of claimant, legal representation) </li></ul></ul><ul><ul><li>red flag variables (injury is strain/sprain only, claimant has history of previous claim) </li></ul></ul><ul><li>Used an enhancement on CART known as boosting </li></ul>
  34. 35. Red Flag Predictor Variables
  35. 36. Claim File Variables
  36. 37. Neural Network Measure of Variable Importance <ul><li>Look at weights to hidden layer </li></ul><ul><li>Compute sensitivities: </li></ul><ul><ul><li>a measure of how much the predicted value’s error increases when the variables are excluded from the model one at a time </li></ul></ul>
  37. 38. Variable Importance
  38. 39. Testing: Hold Out Part of Sample <ul><li>Fit model on 1/2 to 2/3 of data </li></ul><ul><li>Test fit of model on remaining data </li></ul><ul><li>Need a large sample </li></ul>
  39. 40. Testing: Cross-Validation <ul><li>Hold out 1/n (say 1/10) of data </li></ul><ul><li>Fit model to remaining data </li></ul><ul><li>Test on portion of sample held out </li></ul><ul><li>Do this n (say 10) times and average the results </li></ul><ul><li>Used for moderate sample sizes </li></ul><ul><li>Jacknifing similar to cross-validation </li></ul>
  40. 41. Results of Classification on Test Data
  41. 42. Unsupervised Learning <ul><li>Common Method: Clustering </li></ul><ul><li>No dependent variable – records are grouped into classes with similar values on the variable </li></ul><ul><li>Start with a measure of similarity or dissimilarity </li></ul><ul><li>Maximize dissimilarity between members of different clusters </li></ul>
  42. 43. Dissimilarity (Distance) Measure <ul><li>Euclidian Distance </li></ul><ul><li>Manhattan Distance </li></ul>
  43. 44. Binary Variables
  44. 45. Binary Variables <ul><li>Sample Matching </li></ul><ul><li>Rogers and Tanimoto </li></ul>
  45. 46. Results for 2 Clusters
  46. 47. Beginners Library <ul><li>Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques , John Wiley and Sons, 1997 </li></ul><ul><li>Kaufman, Leonard and Rousseeuw, Peter, Finding Groups in Data, John Wiley and Sons, 1990 </li></ul><ul><li>Smith, Murry, Neural Networks for Statistical Modeling , International Thompson Computer Press, 1996 </li></ul>
  47. 48. Data Mining CAMAR Spring Meeting Louise Francis, FCAS, MAAA [email_address] www.data-mines.com

×