SlideShare a Scribd company logo
1 of 47
Download to read offline
Applications of Unsupervised Learning
in Property and Casualty Insurance
with emphasis on fraud analysis




          Louise Francis, FCAS, MAAA
          Francis Analytics and Actuarial
          Data Mining, Inc.
          www.data-mines.com
          Louise.francus@data-mines.com
Objectives
   Review classic unsupervised learning
    techniques
   Introduce 2 new unsupervised
    learning techniques
     RandomForest
     PRIDIT
   Apply the techniques to insurance
    data
     Automobile Fraud data set
     A publically available automobile
      insurance database
Motivation for Topic
   New book: Predictive Modeling in
    Actuarial Science
       An introduction to predictive modeling for
        actuaries and other insurance
        professionals
   Publisher: Cambridge University Press
   Hope to Publish: Fall 2012
   Chapter on Unsupervised Learning
   Li Yang and Louise Francis
       Li Yang – Variable grouping (PCA)
       Louise Francis- record grouping
        (clustering)
Book Project
   Predictive Modeling 2 Volume Book Project


   A joint project leading to a two volume pair of
    books on Predictive Modeling in Actuarial Science.
   Volume 1 would be on Theory and Methods and
   Volume 2 would be on Property and Casualty
    Applications.
      The first volume will be introductory with basic
    concepts and a wide range of techniques designed
    to acquaint actuaries with this sector of problem
    solving techniques. The second volume would be
    a collection of applications to P&C problems,
    written by authors who are well aware of the
    advantages and disadvantages of the first volume
    techniques but who can explore relevant
    applications in detail with positive results.
The Fraud Study Data
            •   1993 AIB closed PIP claims
            •   Dependent Variables
                •   Suspicion Score
                •   Expert assessment of liklihood of
                    fraud or abuse
            •   Predictor Variables
                •   Red flag indicators
                •   Claim file variables

                     Francis Analytics and Actuarial
6/26/2012                                               5
                           Data Mining, Inc.
The Fraud Problem
            from: www.agentinsure.com




                     Francis Analytics and Actuarial
6/26/2012                                              6
                           Data Mining, Inc.
The Fraud Problem (2)
            from Coalition Against Insurance Fraud




                        Francis Analytics and Actuarial Da
6/26/2012                                                    7
                                  Mining, Inc.
Fraud and Abuse
               Planned fraud
                   Staged accidents
               Abuse
                 Opportunistic
                 Exaggerate claim




                     Francis Analytics and Actuarial
6/26/2012                                              8
                           Data Mining, Inc.
The Fraud Red Flags
   Binary variables that capture
    characteristics of claims
    associated with fraud and abuse
   Accident variables (acc01 - acc19)
   Injury variables (inj01 – inj12)
   Claimant variables (ch01 – ch11)
   Insured variables (ins01 – ins06)
   Treatment variables (trt01 – trt09)
   Lost wages variables (lw01 – lw07)
The Red Flag Variables
                                   Red Flag Variables
                       Indicator
            Subject    Variable    Description
            Accident   ACCO1       No report by police officer at scene
                       A0004       Single vehicle accident
                       A0009       No plausible explanation for accident
                       ACC10       Claimant in old, low valued vehicle
                       ACC11       Rental vehicle involved in accident
                       ACC14       Property Damage was inconsistent with accident
                       ACC15       Very minor impact collision
                       ACC16       Claimant vehicle stopped short
                       ACC19       Insured felt set up, denied fault
            Claimant   CLT02       Had a history of previous claims
                       CLT04       Was an out of state accident
                       CLT07       Was one of three or more claimants in vehicle
            Injury     INJO1       Injury consisted of strain or sprain only
                       INJ02       No objective evidence of injury
                        INJO3      Police report showed no injury or pain
                        INJ05      No emergency treatment was given
                        INJO6      Non-emergency treatment was delayed
                        INJ11      Unusual injury for auto accident
            Insured     INSO1      Had history of previous claims
                        INSO3      Readily accepted fault for accident
                        INSO6      Was difficult to contact/uncooperative
                        INSO7      Accident occurred soon after effective date
            Lost Wages LWO1        Claimant worked for self or a family member
                        LW03       Claimant recently started employment


                                   Francis Analytics and Actuarial
6/26/2012                                                                           10
                                         Data Mining, Inc.
Dependent Variable
            Problem
               Insurance companies frequently do
                not collect information as to
                whether a claim is suspected of
                fraud or abuse
               Even when claims are referred for
                special investigation
               Solution: unsupervised learning


                   Francis Analytics and Actuarial
6/26/2012                                            11
                         Data Mining, Inc.
Supervised Learning




               Francis Analytics and Actuarial
6/26/2012                                        12
                     Data Mining, Inc.
Dimension Reduction
                                                    PolicyCount VehicleCou
                    Frequency Frequency Frequency   NonBusines ntNonBusin
            ZipCode     BI        PD      Comb          sUse      essUse    SeverityBI SeverityPD
             90095         -       54.50     0.03          2.00        3.00               1,973.50
             93741         -         -        -            1.00        1.00
             90015      22.65      43.93     0.04          1.00        2.00 10,181.16     2,442.36
             90067      15.53      44.41     0.04          3.00        6.00 13,146.57     2,565.56
             90004      26.71      48.45     0.04         11.00       17.00   8,538.56    2,354.08




                                 Francis Analytics and Actuarial
6/26/2012                                                                                     13
                                       Data Mining, Inc.
The CAARP Data
               This assigned risk automobile data was made
                available to researchers in 2005 for the purpose of
                studying the effect of change in regultion on territorial
                variables
               contain exposure information (car counts, premium)
                and claim and loss information (Bodily Injury (BI)
                counts, BI ultimate losses, Property Damage (PD)
                claim counts, PD ultimate losses).
               Each record is a zip code
               Good example of using unsupervised learning for
                territory construction
                                 Francis Analytics and Actuarial
6/26/2012                                                               14
                                       Data Mining, Inc.
R Cluster Library
               The “cluster” library from R used
               Many of the functions in the library
                are described in the Kaufman and
                Rousseeuw’s (1990) classic
                bookon clustering.
                   Finding Groups in Data.




                    Francis Analytics and Actuarial
6/26/2012                                             15
                          Data Mining, Inc.
Grouping Records




               Francis Analytics and Actuarial
6/26/2012                                        16
                     Data Mining, Inc.
Dissimilarity
               Euclidian Distance: the record by
                record squared difference between
                the value of each the variables for
                a record and the values for the
                record it is being compared to.




                   Francis Analytics and Actuarial
6/26/2012                                            17
                         Data Mining, Inc.
RF Similarity
           Varies between 0 and 1
           Proximity matrix is an output of RF
             After a tree is fit, all records run through model
             If 2 records in same terminal node, their
              proximity increased by 1
             1-proximity forms distance

           Can be used as an input to clustering and other
            unsupervised learning procedures
           See “Unsupervised Learning with Random
            Forest Predictors” by Shi and Actuarial
                            Francis Analytics
                                              and Horvath
6/26/2012                                                          18
                                 Data Mining, Inc.
Clustering
               Hierarchical clustering
               K-Means clustering
               This analysis uses k-means




                   Francis Analytics and Actuarial
6/26/2012                                            19
                         Data Mining, Inc.
K-means Clustering
               An iterative procedure is used to assign
                each record in the data to one of the k
                clusters.
               The iteration begins with the initial centers
                or mediods for k groups.
               uses a dissimilarity measure to assign
                records to a group and to iterate to a final
                grouping. An iterative procedure is used to
                assign each record to one of the k
6/26/2012
                clusters. byFrancis Analytics and Actuarial
                             the user,                       21
                                Data Mining, Inc.
R Cluster Output




               Francis Analytics and Actuarial
6/26/2012                                        22
                     Data Mining, Inc.
Cluster Plot




               Francis Analytics and Actuarial
6/26/2012                                        23
                     Data Mining, Inc.
Silhouette Plot




               Francis Analytics and Actuarial
6/26/2012                                        24
                     Data Mining, Inc.
Silhouette Plot RF
Proximity
Silhouette Plot – Euclidean
Distance Clustering
Testing using Expert
            Scores: Fit a Tree to Suspicion
            Score for Importance Ranking




                  Francis Analytics and Actuarial
6/26/2012                                           27
                        Data Mining, Inc.
Importance Ranking of
            the Clusters




               Francis Analytics and Actuarial
6/26/2012                                        28
                     Data Mining, Inc.
Fit Tree to Binary Fraud
            Indicator




               Francis Analytics and Actuarial
6/26/2012                                        29
                     Data Mining, Inc.
Importance Ranking (2)




               Francis Analytics and Actuarial
6/26/2012                                        30
                     Data Mining, Inc.
RF Ranking of the
                            “Predictors”: Top 10 of 44
               Variable   MeanDecreaseGini                      Description

            acc10                10.50       Claimant in old low value vehical

            trt01                 9.05       arge # visits to chiro

            inj01                 8.64       strain or sprain

            inj02                 8.64       readily accepted fauld

            inj05                 8.62       non emergency treatment given for injury

            acc01                 8.55       no police report

            clt07                 7.47       one of 3 or more claimants in vehical

            inj06                 7.44       non emergency trt delayed

            acc15                 7.36       very minor collision

            trt03                 6.82       large # visits to PT

                                         Francis Analytics and Actuarial
6/26/2012                                                                               31
                                               Data Mining, Inc.
Problem: Categorical
Variables
   It is not clear how to best perform
    Principal Components/Factor
    Analysis on categorical variables
     The categories may be coded as a
      series of binary dummy variables
     If the categories are ordered
      categories, you may loose
      important information
   This is the problem that PRIDIT
    addresses
RIDIT
         Variables are ordered so that
          lowest value is associated with
          highest probability of fraud
         Use Cumulative distribution of
          claims at each value, i, to create
          RIDIT statistic for claim t, value i

Rti          ˆ
             ptj             ˆ
                             ptj
      j i              j i
Example: RIDIT for Legal
            Representation

                       Legal Representation
                                  Proportion Proportion
Value   Code Number Proportion      Below      Above    RIDIT
Yes     1        706      0.504        0.000      0.496 -0.496
No      2        694      0.496        0.504      0.000 0.504
PRIDIT
   Use RIDIT statistics in Principal
    Components Analysis

                    Component Matri xa
                                   C om pon e n t
                                         1
        S IU                                   .248
        Pol i ce Re port                       .220
        At Faul t                              .709
        Le gal Re p                            .752
        Medi cal Audi t                        .341
        Pri or C l ai m                        .406
        Extracti on Me th od: Pri n ci pal Com pon e n t An al ys i s.
          a. 1 component s ext r act ed.
PRIDITS of Accident
            Flags




               Francis Analytics and Actuarial
6/26/2012                                        36
                     Data Mining, Inc.
Fit Tree with PRIDITS for
            Each Type of Flag




               Francis Analytics and Actuarial
6/26/2012                                        38
                     Data Mining, Inc.
Importance Ranking of
            Pridits




               Francis Analytics and Actuarial
6/26/2012                                        39
                     Data Mining, Inc.
Importance Ranking of
            Factors




               Francis Analytics and Actuarial
6/26/2012                                        40
                     Data Mining, Inc.
Add RF and Euclid
            Clusters to PRIDIT
            Factors




               Francis Analytics and Actuarial
6/26/2012                                        41
                     Data Mining, Inc.
Use Salford RF MDS
               Top variable in importance (acc10)
                used as binary dependent
               Run tree with 1,000 forests
               Output proximities and MDS
               Use MDS scales as to cluster
                (k=3)
               Run Tree to get Importance
                ranking

                   Francis Analytics and Actuarial
6/26/2012                                            42
                         Data Mining, Inc.
MDS Graph




               Francis Analytics and Actuarial
6/26/2012                                        43
                     Data Mining, Inc.
Rank of cluster
            procedures to Tree
            Prediction




                 Francis Analytics and Actuarial
6/26/2012                                          44
                       Data Mining, Inc.
Labeling Clusters




               Francis Analytics and Actuarial
6/26/2012                                        45
                     Data Mining, Inc.
Relation Between
            PRIDIT Factor and
            Suspicion




               Francis Analytics and Actuarial
6/26/2012                                        46
                     Data Mining, Inc.
Next Steps
               Add claim file variables
                 Rerun clusters
                 Rerun PRIDITS

               Do Random Forest proximities on
                the RIDITS
               Apply the procedures to other
                fraud databases


                    Francis Analytics and Actuarial
6/26/2012                                             47
                          Data Mining, Inc.
PRIDIT REFERENCES
Ai, J., Brockett, Patrick L., and Golden, Linda L. (2009) “Assessing Consumer
       Fraud Risk in Insurance Claims with Discrete and Continuous Data,”
       North American Actuarial Journal 13: 438-458.

Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and
     Alpert, Mark, (2002), Fraud Classification Using Principal Component
     Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373.

Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using
     Kohonen’ Self-Organizing Feature Map to Uncover Automobile Bodily
     Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274

Bross, Irwin D.J., (1958), How To Use RIDIT Analysis, Biometrics,
       4:18-38.
Chipman, H.E.I. George and R.E. McCulloch, 2006, Baysian Ensemble Learning,
     Neural Information Processing Systems

Lieberthal, Robert D., (2008), Hospital Quality: A PRIDIT Approach, Health
     Services Research, 43:3, 988–1005.
Questions?




               Francis Analytics and Actuarial
6/26/2012                                        49
                     Data Mining, Inc.

More Related Content

Viewers also liked

Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Nicolas Nicolov
 
Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysisGreg Makowski
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail MarketingJonathan Sedar
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 

Viewers also liked (7)

Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...
 
Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 

Similar to Fraud Analysis and Other Applications of Unsupervised Learning in Property and Casualty Insurance

Analytics in P&C Insurance
Analytics in P&C InsuranceAnalytics in P&C Insurance
Analytics in P&C InsuranceGregg Barrett
 
E-Discovery and its Impact on Insurance Coverage Issues
E-Discovery and its Impact on Insurance Coverage IssuesE-Discovery and its Impact on Insurance Coverage Issues
E-Discovery and its Impact on Insurance Coverage IssuesJoeBeavers
 
Neira jones pci london january 2013 pdf ready
Neira jones pci london january 2013 pdf readyNeira jones pci london january 2013 pdf ready
Neira jones pci london january 2013 pdf readyNeira Jones
 
Using Advanced Analytics to Combat P&C Claims Fraud
Using Advanced Analytics to Combat P&C Claims FraudUsing Advanced Analytics to Combat P&C Claims Fraud
Using Advanced Analytics to Combat P&C Claims FraudCognizant
 
Rp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xgRp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xgAndrey Apuhtin
 
2016 data breach investigations report
2016 data breach investigations report2016 data breach investigations report
2016 data breach investigations reportSneha Kiran
 
2016 Data Breach Investigations Report
2016 Data Breach Investigations Report2016 Data Breach Investigations Report
2016 Data Breach Investigations ReportSneha Kiran
 
Verizon Data Breach Investigation Report
Verizon Data Breach Investigation ReportVerizon Data Breach Investigation Report
Verizon Data Breach Investigation Reportxband
 
Rp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xgRp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xgAndrey Apuhtin
 
Criminal Background Checks for Job Applicants Seriously Flawed, says NCLC
Criminal Background Checks for Job Applicants Seriously Flawed, says NCLCCriminal Background Checks for Job Applicants Seriously Flawed, says NCLC
Criminal Background Checks for Job Applicants Seriously Flawed, says NCLCUmesh Heendeniya
 
Analytics, Big Data and The Cloud II Conference - Kiribatu Labs
Analytics, Big Data and The Cloud II Conference - Kiribatu LabsAnalytics, Big Data and The Cloud II Conference - Kiribatu Labs
Analytics, Big Data and The Cloud II Conference - Kiribatu LabsPawel Brzeminski
 
PDF: First Glance: The EEOC's New Guidance on Using Criminal Records
PDF: First Glance: The EEOC's New Guidance on Using Criminal RecordsPDF: First Glance: The EEOC's New Guidance on Using Criminal Records
PDF: First Glance: The EEOC's New Guidance on Using Criminal RecordsImperative Information Group
 
An overview of historical trends related to suspect counterfeit and nonconfor...
An overview of historical trends related to suspect counterfeit and nonconfor...An overview of historical trends related to suspect counterfeit and nonconfor...
An overview of historical trends related to suspect counterfeit and nonconfor...Kristal Snider
 
leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...
leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...
leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...KristiLBurns
 
Intelligent Transportation Trends chpt.5 - Tolling and Enforcement
Intelligent Transportation Trends chpt.5 - Tolling and EnforcementIntelligent Transportation Trends chpt.5 - Tolling and Enforcement
Intelligent Transportation Trends chpt.5 - Tolling and EnforcementNovavia Solutions
 
ORX Analytics & Scenario Forum 2019 - summary
ORX Analytics & Scenario Forum 2019 - summaryORX Analytics & Scenario Forum 2019 - summary
ORX Analytics & Scenario Forum 2019 - summaryLuke Carrivick
 
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICS
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICSA TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICS
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICSIJNSA Journal
 

Similar to Fraud Analysis and Other Applications of Unsupervised Learning in Property and Casualty Insurance (20)

Analytics in P&C Insurance
Analytics in P&C InsuranceAnalytics in P&C Insurance
Analytics in P&C Insurance
 
E-Discovery and its Impact on Insurance Coverage Issues
E-Discovery and its Impact on Insurance Coverage IssuesE-Discovery and its Impact on Insurance Coverage Issues
E-Discovery and its Impact on Insurance Coverage Issues
 
Neira jones pci london january 2013 pdf ready
Neira jones pci london january 2013 pdf readyNeira jones pci london january 2013 pdf ready
Neira jones pci london january 2013 pdf ready
 
Using Advanced Analytics to Combat P&C Claims Fraud
Using Advanced Analytics to Combat P&C Claims FraudUsing Advanced Analytics to Combat P&C Claims Fraud
Using Advanced Analytics to Combat P&C Claims Fraud
 
Rp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xgRp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xg
 
2016 data breach investigations report
2016 data breach investigations report2016 data breach investigations report
2016 data breach investigations report
 
Verizon DBIR-2016
Verizon DBIR-2016Verizon DBIR-2016
Verizon DBIR-2016
 
2016 Data Breach Investigations Report
2016 Data Breach Investigations Report2016 Data Breach Investigations Report
2016 Data Breach Investigations Report
 
Verizon Data Breach Investigation Report
Verizon Data Breach Investigation ReportVerizon Data Breach Investigation Report
Verizon Data Breach Investigation Report
 
Rp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xgRp dbir 2016_report_en_xg
Rp dbir 2016_report_en_xg
 
Insurance Fraud Whitepaper
Insurance Fraud WhitepaperInsurance Fraud Whitepaper
Insurance Fraud Whitepaper
 
Criminal Background Checks for Job Applicants Seriously Flawed, says NCLC
Criminal Background Checks for Job Applicants Seriously Flawed, says NCLCCriminal Background Checks for Job Applicants Seriously Flawed, says NCLC
Criminal Background Checks for Job Applicants Seriously Flawed, says NCLC
 
Analytics, Big Data and The Cloud II Conference - Kiribatu Labs
Analytics, Big Data and The Cloud II Conference - Kiribatu LabsAnalytics, Big Data and The Cloud II Conference - Kiribatu Labs
Analytics, Big Data and The Cloud II Conference - Kiribatu Labs
 
PDF: First Glance: The EEOC's New Guidance on Using Criminal Records
PDF: First Glance: The EEOC's New Guidance on Using Criminal RecordsPDF: First Glance: The EEOC's New Guidance on Using Criminal Records
PDF: First Glance: The EEOC's New Guidance on Using Criminal Records
 
An overview of historical trends related to suspect counterfeit and nonconfor...
An overview of historical trends related to suspect counterfeit and nonconfor...An overview of historical trends related to suspect counterfeit and nonconfor...
An overview of historical trends related to suspect counterfeit and nonconfor...
 
How To Analyse Structural Failures
How To Analyse Structural FailuresHow To Analyse Structural Failures
How To Analyse Structural Failures
 
leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...
leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...
leewayhertz.com-AI in claims processing Enhanced speed and precision in insur...
 
Intelligent Transportation Trends chpt.5 - Tolling and Enforcement
Intelligent Transportation Trends chpt.5 - Tolling and EnforcementIntelligent Transportation Trends chpt.5 - Tolling and Enforcement
Intelligent Transportation Trends chpt.5 - Tolling and Enforcement
 
ORX Analytics & Scenario Forum 2019 - summary
ORX Analytics & Scenario Forum 2019 - summaryORX Analytics & Scenario Forum 2019 - summary
ORX Analytics & Scenario Forum 2019 - summary
 
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICS
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICSA TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICS
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICS
 

More from Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
 

More from Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 

Recently uploaded

Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 

Recently uploaded (20)

Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 

Fraud Analysis and Other Applications of Unsupervised Learning in Property and Casualty Insurance

  • 1. Applications of Unsupervised Learning in Property and Casualty Insurance with emphasis on fraud analysis Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francus@data-mines.com
  • 2. Objectives  Review classic unsupervised learning techniques  Introduce 2 new unsupervised learning techniques  RandomForest  PRIDIT  Apply the techniques to insurance data  Automobile Fraud data set  A publically available automobile insurance database
  • 3. Motivation for Topic  New book: Predictive Modeling in Actuarial Science  An introduction to predictive modeling for actuaries and other insurance professionals  Publisher: Cambridge University Press  Hope to Publish: Fall 2012  Chapter on Unsupervised Learning  Li Yang and Louise Francis  Li Yang – Variable grouping (PCA)  Louise Francis- record grouping (clustering)
  • 4. Book Project  Predictive Modeling 2 Volume Book Project   A joint project leading to a two volume pair of books on Predictive Modeling in Actuarial Science.  Volume 1 would be on Theory and Methods and  Volume 2 would be on Property and Casualty Applications.  The first volume will be introductory with basic concepts and a wide range of techniques designed to acquaint actuaries with this sector of problem solving techniques. The second volume would be a collection of applications to P&C problems, written by authors who are well aware of the advantages and disadvantages of the first volume techniques but who can explore relevant applications in detail with positive results.
  • 5. The Fraud Study Data • 1993 AIB closed PIP claims • Dependent Variables • Suspicion Score • Expert assessment of liklihood of fraud or abuse • Predictor Variables • Red flag indicators • Claim file variables Francis Analytics and Actuarial 6/26/2012 5 Data Mining, Inc.
  • 6. The Fraud Problem from: www.agentinsure.com Francis Analytics and Actuarial 6/26/2012 6 Data Mining, Inc.
  • 7. The Fraud Problem (2) from Coalition Against Insurance Fraud Francis Analytics and Actuarial Da 6/26/2012 7 Mining, Inc.
  • 8. Fraud and Abuse  Planned fraud  Staged accidents  Abuse  Opportunistic  Exaggerate claim Francis Analytics and Actuarial 6/26/2012 8 Data Mining, Inc.
  • 9. The Fraud Red Flags  Binary variables that capture characteristics of claims associated with fraud and abuse  Accident variables (acc01 - acc19)  Injury variables (inj01 – inj12)  Claimant variables (ch01 – ch11)  Insured variables (ins01 – ins06)  Treatment variables (trt01 – trt09)  Lost wages variables (lw01 – lw07)
  • 10. The Red Flag Variables Red Flag Variables Indicator Subject Variable Description Accident ACCO1 No report by police officer at scene A0004 Single vehicle accident A0009 No plausible explanation for accident ACC10 Claimant in old, low valued vehicle ACC11 Rental vehicle involved in accident ACC14 Property Damage was inconsistent with accident ACC15 Very minor impact collision ACC16 Claimant vehicle stopped short ACC19 Insured felt set up, denied fault Claimant CLT02 Had a history of previous claims CLT04 Was an out of state accident CLT07 Was one of three or more claimants in vehicle Injury INJO1 Injury consisted of strain or sprain only INJ02 No objective evidence of injury INJO3 Police report showed no injury or pain INJ05 No emergency treatment was given INJO6 Non-emergency treatment was delayed INJ11 Unusual injury for auto accident Insured INSO1 Had history of previous claims INSO3 Readily accepted fault for accident INSO6 Was difficult to contact/uncooperative INSO7 Accident occurred soon after effective date Lost Wages LWO1 Claimant worked for self or a family member LW03 Claimant recently started employment Francis Analytics and Actuarial 6/26/2012 10 Data Mining, Inc.
  • 11. Dependent Variable Problem  Insurance companies frequently do not collect information as to whether a claim is suspected of fraud or abuse  Even when claims are referred for special investigation  Solution: unsupervised learning Francis Analytics and Actuarial 6/26/2012 11 Data Mining, Inc.
  • 12. Supervised Learning Francis Analytics and Actuarial 6/26/2012 12 Data Mining, Inc.
  • 13. Dimension Reduction PolicyCount VehicleCou Frequency Frequency Frequency NonBusines ntNonBusin ZipCode BI PD Comb sUse essUse SeverityBI SeverityPD 90095 - 54.50 0.03 2.00 3.00 1,973.50 93741 - - - 1.00 1.00 90015 22.65 43.93 0.04 1.00 2.00 10,181.16 2,442.36 90067 15.53 44.41 0.04 3.00 6.00 13,146.57 2,565.56 90004 26.71 48.45 0.04 11.00 17.00 8,538.56 2,354.08 Francis Analytics and Actuarial 6/26/2012 13 Data Mining, Inc.
  • 14. The CAARP Data  This assigned risk automobile data was made available to researchers in 2005 for the purpose of studying the effect of change in regultion on territorial variables  contain exposure information (car counts, premium) and claim and loss information (Bodily Injury (BI) counts, BI ultimate losses, Property Damage (PD) claim counts, PD ultimate losses).  Each record is a zip code  Good example of using unsupervised learning for territory construction Francis Analytics and Actuarial 6/26/2012 14 Data Mining, Inc.
  • 15. R Cluster Library  The “cluster” library from R used  Many of the functions in the library are described in the Kaufman and Rousseeuw’s (1990) classic bookon clustering.  Finding Groups in Data. Francis Analytics and Actuarial 6/26/2012 15 Data Mining, Inc.
  • 16. Grouping Records Francis Analytics and Actuarial 6/26/2012 16 Data Mining, Inc.
  • 17. Dissimilarity  Euclidian Distance: the record by record squared difference between the value of each the variables for a record and the values for the record it is being compared to. Francis Analytics and Actuarial 6/26/2012 17 Data Mining, Inc.
  • 18. RF Similarity  Varies between 0 and 1  Proximity matrix is an output of RF  After a tree is fit, all records run through model  If 2 records in same terminal node, their proximity increased by 1  1-proximity forms distance  Can be used as an input to clustering and other unsupervised learning procedures  See “Unsupervised Learning with Random Forest Predictors” by Shi and Actuarial Francis Analytics and Horvath 6/26/2012 18 Data Mining, Inc.
  • 19. Clustering  Hierarchical clustering  K-Means clustering  This analysis uses k-means Francis Analytics and Actuarial 6/26/2012 19 Data Mining, Inc.
  • 20. K-means Clustering  An iterative procedure is used to assign each record in the data to one of the k clusters.  The iteration begins with the initial centers or mediods for k groups.  uses a dissimilarity measure to assign records to a group and to iterate to a final grouping. An iterative procedure is used to assign each record to one of the k 6/26/2012 clusters. byFrancis Analytics and Actuarial the user, 21 Data Mining, Inc.
  • 21. R Cluster Output Francis Analytics and Actuarial 6/26/2012 22 Data Mining, Inc.
  • 22. Cluster Plot Francis Analytics and Actuarial 6/26/2012 23 Data Mining, Inc.
  • 23. Silhouette Plot Francis Analytics and Actuarial 6/26/2012 24 Data Mining, Inc.
  • 25. Silhouette Plot – Euclidean Distance Clustering
  • 26. Testing using Expert Scores: Fit a Tree to Suspicion Score for Importance Ranking Francis Analytics and Actuarial 6/26/2012 27 Data Mining, Inc.
  • 27. Importance Ranking of the Clusters Francis Analytics and Actuarial 6/26/2012 28 Data Mining, Inc.
  • 28. Fit Tree to Binary Fraud Indicator Francis Analytics and Actuarial 6/26/2012 29 Data Mining, Inc.
  • 29. Importance Ranking (2) Francis Analytics and Actuarial 6/26/2012 30 Data Mining, Inc.
  • 30. RF Ranking of the “Predictors”: Top 10 of 44 Variable MeanDecreaseGini Description acc10 10.50 Claimant in old low value vehical trt01 9.05 arge # visits to chiro inj01 8.64 strain or sprain inj02 8.64 readily accepted fauld inj05 8.62 non emergency treatment given for injury acc01 8.55 no police report clt07 7.47 one of 3 or more claimants in vehical inj06 7.44 non emergency trt delayed acc15 7.36 very minor collision trt03 6.82 large # visits to PT Francis Analytics and Actuarial 6/26/2012 31 Data Mining, Inc.
  • 31. Problem: Categorical Variables  It is not clear how to best perform Principal Components/Factor Analysis on categorical variables  The categories may be coded as a series of binary dummy variables  If the categories are ordered categories, you may loose important information  This is the problem that PRIDIT addresses
  • 32. RIDIT  Variables are ordered so that lowest value is associated with highest probability of fraud  Use Cumulative distribution of claims at each value, i, to create RIDIT statistic for claim t, value i Rti ˆ ptj ˆ ptj j i j i
  • 33. Example: RIDIT for Legal Representation Legal Representation Proportion Proportion Value Code Number Proportion Below Above RIDIT Yes 1 706 0.504 0.000 0.496 -0.496 No 2 694 0.496 0.504 0.000 0.504
  • 34. PRIDIT  Use RIDIT statistics in Principal Components Analysis Component Matri xa C om pon e n t 1 S IU .248 Pol i ce Re port .220 At Faul t .709 Le gal Re p .752 Medi cal Audi t .341 Pri or C l ai m .406 Extracti on Me th od: Pri n ci pal Com pon e n t An al ys i s. a. 1 component s ext r act ed.
  • 35. PRIDITS of Accident Flags Francis Analytics and Actuarial 6/26/2012 36 Data Mining, Inc.
  • 36. Fit Tree with PRIDITS for Each Type of Flag Francis Analytics and Actuarial 6/26/2012 38 Data Mining, Inc.
  • 37. Importance Ranking of Pridits Francis Analytics and Actuarial 6/26/2012 39 Data Mining, Inc.
  • 38. Importance Ranking of Factors Francis Analytics and Actuarial 6/26/2012 40 Data Mining, Inc.
  • 39. Add RF and Euclid Clusters to PRIDIT Factors Francis Analytics and Actuarial 6/26/2012 41 Data Mining, Inc.
  • 40. Use Salford RF MDS  Top variable in importance (acc10) used as binary dependent  Run tree with 1,000 forests  Output proximities and MDS  Use MDS scales as to cluster (k=3)  Run Tree to get Importance ranking Francis Analytics and Actuarial 6/26/2012 42 Data Mining, Inc.
  • 41. MDS Graph Francis Analytics and Actuarial 6/26/2012 43 Data Mining, Inc.
  • 42. Rank of cluster procedures to Tree Prediction Francis Analytics and Actuarial 6/26/2012 44 Data Mining, Inc.
  • 43. Labeling Clusters Francis Analytics and Actuarial 6/26/2012 45 Data Mining, Inc.
  • 44. Relation Between PRIDIT Factor and Suspicion Francis Analytics and Actuarial 6/26/2012 46 Data Mining, Inc.
  • 45. Next Steps  Add claim file variables  Rerun clusters  Rerun PRIDITS  Do Random Forest proximities on the RIDITS  Apply the procedures to other fraud databases Francis Analytics and Actuarial 6/26/2012 47 Data Mining, Inc.
  • 46. PRIDIT REFERENCES Ai, J., Brockett, Patrick L., and Golden, Linda L. (2009) “Assessing Consumer Fraud Risk in Insurance Claims with Discrete and Continuous Data,” North American Actuarial Journal 13: 438-458. Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Bross, Irwin D.J., (1958), How To Use RIDIT Analysis, Biometrics, 4:18-38. Chipman, H.E.I. George and R.E. McCulloch, 2006, Baysian Ensemble Learning, Neural Information Processing Systems Lieberthal, Robert D., (2008), Hospital Quality: A PRIDIT Approach, Health Services Research, 43:3, 988–1005.
  • 47. Questions? Francis Analytics and Actuarial 6/26/2012 49 Data Mining, Inc.