This document discusses applications of unsupervised learning techniques, such as clustering, in property and casualty insurance fraud analysis. It reviews classic clustering methods and introduces two new techniques, Random Forest and PRIDIT. The document applies these techniques to an automobile insurance fraud dataset to group fraudulent and abusive claims. It aims to develop models that can identify fraud patterns without needing pre-classified examples.
This notice provides information about the driving history and discounts applied to the policyholder's auto insurance policy. It lists three events in the policyholder's driving history: two at-fault accidents in 2009 and 2011 and a non-chargeable license suspension in 2010. The notice also lists available discounts for good drivers with no more than one violation point or at-fault accident in the past three years. Finally, it provides contact information for the consumer reporting agency that provided the driving history data and instructions for obtaining a vehicle report or disputing inaccuracies.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Application of Clustering in Data Science using Real-life Examples Edureka!
This document outlines an Edureka webinar on applications of clustering in real life. The webinar instructor is Kumaran Ponnambalam. The objectives are to understand data science applications and prospects, machine learning categories, clustering and k-means clustering. Examples of clustering applications include wine recommendation, pizza delivery optimization, and news summarization. K-means clustering is demonstrated on pizza delivery location data. The webinar also discusses data science job trends and covers 10 modules on data science topics including machine learning techniques in R.
Cluster analysis for market segmentationVishal Tandel
Cluster analysis is a technique used to segment markets by grouping consumers into clusters based on their characteristics. It aims to maximize similarity within clusters and dissimilarity between clusters. Marketers can use cluster analysis to discover distinct groups of customers and develop targeted marketing programs for each group. Common variables used to segment markets include demographics, psychographics, geographics, product benefits, and behavior.
Look through the slides from the July 17th FierceWireless webinar with guests from Cisco, SDNCentral, and Openwave Mobility as they examine the place of SDN in facilitating next-generation application services.
You will learn:
1. How to provide subscriber-awareness in a SDN network via the SDN Controller
2. Why a hierarchical (L2-4 and L7) SDN approach is necessary
3. The critical business and ROI drivers for service providers considering Gi-LAN services
For more SP Mobility related content, visit our Cisco SP Mobility Community: http://cisco.com/go/mobilitycommunity
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Brocade
Presentation by Brocade Chief Scientist and Fellow, David Meyer, given at Orange Gardens July 2016. What is Machine Learning and what is all the excitement about?
An associated blog is available here: http://community.brocade.com/t5/CTO-Corner/Networking-Meets-Artificial-Intelligence-A-Glimpse-into-the-Very/ba-p/88196
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...OPNFV
The talk will highlight how Machine Learning techniques can be used to address different aspects of the operation and control of NFV and propose future OPNFV activities in this area. First, Diego will introduce how Machine Learning is being applied by the CogNet project to address intent-based networking, and discuss the architecture defined there as a potential framework for future ML integration. Glen will demonstrate a policy-based system for automating VNF scaling using performance data collection and analytics with machine learning (ML), based on OPNFV Brahmaputra and the underlying OpenStack telemetry system (Ceilometer), as well as the open-source Apache Kafka, Apache Zookeeper and Apache Spark streaming and MLlib libraries. Available as open-source, it combines predictive and reactive inputs to make the VNF scaling decision and trigger action in the MANO stack. The presentation will provide an overview of the system, demonstrate the VNF auto-scaling use case and discuss how this system will fit into a future OPNFV release.
PCA is an unsupervised learning technique used to reduce the dimensionality of large data sets by transforming the data to a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is commonly used for applications like dimensionality reduction, data compression, and visualization. The document discusses PCA algorithms and applications of PCA in domains like face recognition, image compression, and noise filtering.
This notice provides information about the driving history and discounts applied to the policyholder's auto insurance policy. It lists three events in the policyholder's driving history: two at-fault accidents in 2009 and 2011 and a non-chargeable license suspension in 2010. The notice also lists available discounts for good drivers with no more than one violation point or at-fault accident in the past three years. Finally, it provides contact information for the consumer reporting agency that provided the driving history data and instructions for obtaining a vehicle report or disputing inaccuracies.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Application of Clustering in Data Science using Real-life Examples Edureka!
This document outlines an Edureka webinar on applications of clustering in real life. The webinar instructor is Kumaran Ponnambalam. The objectives are to understand data science applications and prospects, machine learning categories, clustering and k-means clustering. Examples of clustering applications include wine recommendation, pizza delivery optimization, and news summarization. K-means clustering is demonstrated on pizza delivery location data. The webinar also discusses data science job trends and covers 10 modules on data science topics including machine learning techniques in R.
Cluster analysis for market segmentationVishal Tandel
Cluster analysis is a technique used to segment markets by grouping consumers into clusters based on their characteristics. It aims to maximize similarity within clusters and dissimilarity between clusters. Marketers can use cluster analysis to discover distinct groups of customers and develop targeted marketing programs for each group. Common variables used to segment markets include demographics, psychographics, geographics, product benefits, and behavior.
Look through the slides from the July 17th FierceWireless webinar with guests from Cisco, SDNCentral, and Openwave Mobility as they examine the place of SDN in facilitating next-generation application services.
You will learn:
1. How to provide subscriber-awareness in a SDN network via the SDN Controller
2. Why a hierarchical (L2-4 and L7) SDN approach is necessary
3. The critical business and ROI drivers for service providers considering Gi-LAN services
For more SP Mobility related content, visit our Cisco SP Mobility Community: http://cisco.com/go/mobilitycommunity
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Brocade
Presentation by Brocade Chief Scientist and Fellow, David Meyer, given at Orange Gardens July 2016. What is Machine Learning and what is all the excitement about?
An associated blog is available here: http://community.brocade.com/t5/CTO-Corner/Networking-Meets-Artificial-Intelligence-A-Glimpse-into-the-Very/ba-p/88196
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...OPNFV
The talk will highlight how Machine Learning techniques can be used to address different aspects of the operation and control of NFV and propose future OPNFV activities in this area. First, Diego will introduce how Machine Learning is being applied by the CogNet project to address intent-based networking, and discuss the architecture defined there as a potential framework for future ML integration. Glen will demonstrate a policy-based system for automating VNF scaling using performance data collection and analytics with machine learning (ML), based on OPNFV Brahmaputra and the underlying OpenStack telemetry system (Ceilometer), as well as the open-source Apache Kafka, Apache Zookeeper and Apache Spark streaming and MLlib libraries. Available as open-source, it combines predictive and reactive inputs to make the VNF scaling decision and trigger action in the MANO stack. The presentation will provide an overview of the system, demonstrate the VNF auto-scaling use case and discuss how this system will fit into a future OPNFV release.
PCA is an unsupervised learning technique used to reduce the dimensionality of large data sets by transforming the data to a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is commonly used for applications like dimensionality reduction, data compression, and visualization. The document discusses PCA algorithms and applications of PCA in domains like face recognition, image compression, and noise filtering.
Application of machine learning in industrial applicationsAnish Das
The group will present an introduction to machine learning, the basics of machine learning, and applications of machine learning in industry such as product categorization, improving the accuracy of inertial measurement units using supervised machine learning, data mining techniques, and machine learning for medical diagnosis. They will also discuss the future scope of machine learning.
Machine Learning with Applications in Categorization, Popularity and Sequence...Nicolas Nicolov
This document provides an overview of machine learning techniques including categorization, popularity, and sequence labeling applications. It outlines the goals of introducing important machine learning concepts and illustrating techniques through examples. The tutorial aims to be self-contained and explain notation. The outline includes examples of machine learning applications, encoding objects with features, the machine learning framework, linear models, tree models, boosting, ranking evaluation, and sequence labeling with hidden Markov models.
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
This document discusses building a real-time fraud detection system using big data technologies. It outlines the cyber threat landscape, what anomalies and fraud detection are, and proposes an architecture with a data layer to integrate various sources and an analytics layer using stream processing, rules engines, and machine learning to score transactions in real-time and detect fraud. The system aims to scalably and reliably detect threats for increased security.
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
45min talk given at LondonR March 2014 Meetup.
The presentation describes how one might go about an insights-driven data science project using the R language and packages, using an open source dataset.
The document provides an overview of cluster analysis techniques. It discusses the need for segmentation to group large populations into meaningful subsets. Common clustering algorithms like k-means are introduced, which assign data points to clusters based on similarity. The document also covers calculating distances between observations, defining the distance between clusters, and interpreting the results of clustering analysis. Real-world applications of segmentation and clustering are mentioned such as market research, credit risk analysis, and operations management.
This presentation provides a brief insight into the need to undertake an analytics project, particularly as it pertains to claims management and fraud. To this end the presentation will touch on the general challenges confronting the property and casualty insurance industry, as well as the challenges and lessons learnt from early adopters of business intelligence. In the face of these challenges analytics holds the potential to generate substantial value as evidenced by several short case study examples. The presentation concludes with a look at the issue of fraud as it pertains to the industry and some of the metrics that are influenced by it.
The presentation draws extensively, and focuses on, the work and viewpoints from industry participants including; Accenture, IBM, Ernst & Young, Strategy Meets Action, Ordnance Survey, Gartner, Insurance Institute of America, American Institute for Chartered Property Casualty Underwriters, International Risk Management Institute and John Standish Consulting. References are included on each slide as well as on the “References” slides at the end of the presentation.
Neira jones pci london january 2013 pdf readyNeira Jones
Data breaches increased 36% in 2012 compared to 2011. Personal information breaches now yield larger amounts of stolen data than payment card breaches. Verizon predicts that social engineering, web application exploits, and authentication failures will be the most common causes of data breaches in 2013. Organizations are also warned that mobile devices and payments will increase risk and that third parties and lost/stolen devices often contribute to breaches. Effective incident response is important for mitigating costs, which a CISO and outside consultants can help reduce. The role of the CIO is expanding to include more legal, financial, security, and vendor management responsibilities.
Using Advanced Analytics to Combat P&C Claims FraudCognizant
P&C insurers need to embrace predictive and advanced analytics -- as well as analytics as a service -- to combat the growing complexity and sophistication of claims fraud.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches. Key findings include:
- 89% of breaches were financially or espionage motivated. Phishing and point-of-sale intrusions were the most common initial vectors for breaches.
- External actors were responsible for the majority of breaches, with hacking and malware being the most common threat actions.
- Common asset targets included people falling for phishing scams and user devices like desktops and POS terminals getting infected with malware.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches affecting organizations in 82 countries. Some key points:
- 89% of breaches were financially or espionage motivated.
- The public sector accounted for most incidents, though industries like accommodation and retail saw a larger share of actual breaches due to handling of desirable consumer data.
- The U.S. remained the most affected country, likely due to mandatory reporting requirements, but incidents impacted organizations globally.
- The nine main incident classification patterns from 2014 (including web application attacks, POS intrusions, insider threats, etc.) continued to dominate the threat landscape.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches affecting organizations in 82 countries. Some key points:
- 89% of breaches were financially or espionage motivated.
- The public sector accounted for most incidents, though industries like accommodation and retail saw a larger share of actual breaches due to handling of desirable consumer data.
- The U.S. remained the most affected country, likely due to mandatory reporting requirements, but incidents impacted organizations globally.
- The nine main incident classification patterns from 2014 (including web app attacks, POS intrusions, insider threats, etc.) continued to dominate the threat landscape.
Our ninth Data Breach Investigations Report (DBIR) pulls together incident data from 67 contributors around the world to reveal the biggest IT security risks you’ll face.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches. Key findings include:
- 89% of breaches were financially or espionage motivated. Phishing and point-of-sale intrusions were the most common initial vectors for breaches.
- External actors were responsible for the majority of breaches, with hacking and malware being the most common threat actions.
- Common asset targets included people falling for phishing scams and user devices like desktops and POS terminals getting infected with malware.
How safe is your web application?
How safe is your Network?
How safe is your e-commerce site which has customer card and banking details.
When did you last checked your Internal and external assets for vulnerabilities?
- Insurance companies are increasingly using data and analytics to improve fraud detection. By analyzing large amounts of data from claims, underwriting, and other sources, insurers can identify patterns and flags of potentially fraudulent activity.
- However, adopting new analytic technologies can be costly, and regulations make sharing some data between insurers and departments difficult. Insurers must weigh these challenges against the losses caused by fraud.
- As analytic capabilities advance, fraud detection is moving from a siloed function to one integrated across the insurance lifecycle, from underwriting to claims. This holistic approach allows insurers to gain a more complete view of fraud risks.
This document discusses issues with the accuracy of criminal background checks conducted by commercial background screening companies. It finds that these companies routinely make mistakes that harm job applicants, including misidentifying applicants, reporting sealed or expunged records, omitting key details, providing misleading information, and misclassifying offenses. These errors are attributed to practices like purchasing bulk records without updating them, failing to verify information from subcontractors, using simple matching criteria, and lacking understanding of state criminal justice systems. The document recommends that regulators implement mandatory accuracy measures, define reasonable time for applicants to dispute reports before adverse actions, require agency registration, and investigate major companies and employers for Fair Credit Reporting Act violations. It also calls for states to improve how they provide
Analytics, Big Data and The Cloud II Conference - Kiribatu LabsPawel Brzeminski
Kiribatu is a predictive analytics company that serves the Canadian financial sector, predominantly property and casualty insurance. They help insurers predict risk through analyzing large datasets to predict human behavior. Currently, most insurers still use outdated risk assessment methods from the 1960s-1970s. Kiribatu's predictive models generate an underwriting score to assess risk and profitability, helping insurers optimize their risk sharing pools by predicting which risks require pooling and which can be independently underwritten. The presentation outlines their methodology including data preparation, rating factor analysis, model development, and gain assessment.
On April 25th, the Equal Employment Opportunity Commission issued new “enforcement guidance on the consideration of arrest and conviction records in employment decisions under Title VII of the Civil Rights Act of 1964.”
This is the first guidance on this topic issued by the EEOC in more than 20 years. It reflects the EEOC’s recent litigation trend of trying to limit employers’ use of criminal records in making employment decisions.
An overview of historical trends related to suspect counterfeit and nonconfor...Kristal Snider
The document analyzes trends in counterfeit electronic component reporting from 2005 to 2013 using data from the ERAI and GIDEP databases. It finds that counterfeit incident reporting increased significantly after the 2007 BIS study when reporting was mandated compared to voluntary reporting previously. While counterfeit incidents appear correlated to market fluctuations, data sharing needs more widespread participation to accurately measure trends. Most counterfeits are still identified through established screening processes, but more sophisticated fakes may be slipping through.
Application of machine learning in industrial applicationsAnish Das
The group will present an introduction to machine learning, the basics of machine learning, and applications of machine learning in industry such as product categorization, improving the accuracy of inertial measurement units using supervised machine learning, data mining techniques, and machine learning for medical diagnosis. They will also discuss the future scope of machine learning.
Machine Learning with Applications in Categorization, Popularity and Sequence...Nicolas Nicolov
This document provides an overview of machine learning techniques including categorization, popularity, and sequence labeling applications. It outlines the goals of introducing important machine learning concepts and illustrating techniques through examples. The tutorial aims to be self-contained and explain notation. The outline includes examples of machine learning applications, encoding objects with features, the machine learning framework, linear models, tree models, boosting, ranking evaluation, and sequence labeling with hidden Markov models.
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
This document discusses building a real-time fraud detection system using big data technologies. It outlines the cyber threat landscape, what anomalies and fraud detection are, and proposes an architecture with a data layer to integrate various sources and an analytics layer using stream processing, rules engines, and machine learning to score transactions in real-time and detect fraud. The system aims to scalably and reliably detect threats for increased security.
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
45min talk given at LondonR March 2014 Meetup.
The presentation describes how one might go about an insights-driven data science project using the R language and packages, using an open source dataset.
The document provides an overview of cluster analysis techniques. It discusses the need for segmentation to group large populations into meaningful subsets. Common clustering algorithms like k-means are introduced, which assign data points to clusters based on similarity. The document also covers calculating distances between observations, defining the distance between clusters, and interpreting the results of clustering analysis. Real-world applications of segmentation and clustering are mentioned such as market research, credit risk analysis, and operations management.
This presentation provides a brief insight into the need to undertake an analytics project, particularly as it pertains to claims management and fraud. To this end the presentation will touch on the general challenges confronting the property and casualty insurance industry, as well as the challenges and lessons learnt from early adopters of business intelligence. In the face of these challenges analytics holds the potential to generate substantial value as evidenced by several short case study examples. The presentation concludes with a look at the issue of fraud as it pertains to the industry and some of the metrics that are influenced by it.
The presentation draws extensively, and focuses on, the work and viewpoints from industry participants including; Accenture, IBM, Ernst & Young, Strategy Meets Action, Ordnance Survey, Gartner, Insurance Institute of America, American Institute for Chartered Property Casualty Underwriters, International Risk Management Institute and John Standish Consulting. References are included on each slide as well as on the “References” slides at the end of the presentation.
Neira jones pci london january 2013 pdf readyNeira Jones
Data breaches increased 36% in 2012 compared to 2011. Personal information breaches now yield larger amounts of stolen data than payment card breaches. Verizon predicts that social engineering, web application exploits, and authentication failures will be the most common causes of data breaches in 2013. Organizations are also warned that mobile devices and payments will increase risk and that third parties and lost/stolen devices often contribute to breaches. Effective incident response is important for mitigating costs, which a CISO and outside consultants can help reduce. The role of the CIO is expanding to include more legal, financial, security, and vendor management responsibilities.
Using Advanced Analytics to Combat P&C Claims FraudCognizant
P&C insurers need to embrace predictive and advanced analytics -- as well as analytics as a service -- to combat the growing complexity and sophistication of claims fraud.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches. Key findings include:
- 89% of breaches were financially or espionage motivated. Phishing and point-of-sale intrusions were the most common initial vectors for breaches.
- External actors were responsible for the majority of breaches, with hacking and malware being the most common threat actions.
- Common asset targets included people falling for phishing scams and user devices like desktops and POS terminals getting infected with malware.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches affecting organizations in 82 countries. Some key points:
- 89% of breaches were financially or espionage motivated.
- The public sector accounted for most incidents, though industries like accommodation and retail saw a larger share of actual breaches due to handling of desirable consumer data.
- The U.S. remained the most affected country, likely due to mandatory reporting requirements, but incidents impacted organizations globally.
- The nine main incident classification patterns from 2014 (including web application attacks, POS intrusions, insider threats, etc.) continued to dominate the threat landscape.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches affecting organizations in 82 countries. Some key points:
- 89% of breaches were financially or espionage motivated.
- The public sector accounted for most incidents, though industries like accommodation and retail saw a larger share of actual breaches due to handling of desirable consumer data.
- The U.S. remained the most affected country, likely due to mandatory reporting requirements, but incidents impacted organizations globally.
- The nine main incident classification patterns from 2014 (including web app attacks, POS intrusions, insider threats, etc.) continued to dominate the threat landscape.
Our ninth Data Breach Investigations Report (DBIR) pulls together incident data from 67 contributors around the world to reveal the biggest IT security risks you’ll face.
The 2016 Verizon Data Breach Investigations Report analyzed over 100,000 security incidents, including 3,141 confirmed data breaches. Key findings include:
- 89% of breaches were financially or espionage motivated. Phishing and point-of-sale intrusions were the most common initial vectors for breaches.
- External actors were responsible for the majority of breaches, with hacking and malware being the most common threat actions.
- Common asset targets included people falling for phishing scams and user devices like desktops and POS terminals getting infected with malware.
How safe is your web application?
How safe is your Network?
How safe is your e-commerce site which has customer card and banking details.
When did you last checked your Internal and external assets for vulnerabilities?
- Insurance companies are increasingly using data and analytics to improve fraud detection. By analyzing large amounts of data from claims, underwriting, and other sources, insurers can identify patterns and flags of potentially fraudulent activity.
- However, adopting new analytic technologies can be costly, and regulations make sharing some data between insurers and departments difficult. Insurers must weigh these challenges against the losses caused by fraud.
- As analytic capabilities advance, fraud detection is moving from a siloed function to one integrated across the insurance lifecycle, from underwriting to claims. This holistic approach allows insurers to gain a more complete view of fraud risks.
This document discusses issues with the accuracy of criminal background checks conducted by commercial background screening companies. It finds that these companies routinely make mistakes that harm job applicants, including misidentifying applicants, reporting sealed or expunged records, omitting key details, providing misleading information, and misclassifying offenses. These errors are attributed to practices like purchasing bulk records without updating them, failing to verify information from subcontractors, using simple matching criteria, and lacking understanding of state criminal justice systems. The document recommends that regulators implement mandatory accuracy measures, define reasonable time for applicants to dispute reports before adverse actions, require agency registration, and investigate major companies and employers for Fair Credit Reporting Act violations. It also calls for states to improve how they provide
Analytics, Big Data and The Cloud II Conference - Kiribatu LabsPawel Brzeminski
Kiribatu is a predictive analytics company that serves the Canadian financial sector, predominantly property and casualty insurance. They help insurers predict risk through analyzing large datasets to predict human behavior. Currently, most insurers still use outdated risk assessment methods from the 1960s-1970s. Kiribatu's predictive models generate an underwriting score to assess risk and profitability, helping insurers optimize their risk sharing pools by predicting which risks require pooling and which can be independently underwritten. The presentation outlines their methodology including data preparation, rating factor analysis, model development, and gain assessment.
On April 25th, the Equal Employment Opportunity Commission issued new “enforcement guidance on the consideration of arrest and conviction records in employment decisions under Title VII of the Civil Rights Act of 1964.”
This is the first guidance on this topic issued by the EEOC in more than 20 years. It reflects the EEOC’s recent litigation trend of trying to limit employers’ use of criminal records in making employment decisions.
An overview of historical trends related to suspect counterfeit and nonconfor...Kristal Snider
The document analyzes trends in counterfeit electronic component reporting from 2005 to 2013 using data from the ERAI and GIDEP databases. It finds that counterfeit incident reporting increased significantly after the 2007 BIS study when reporting was mandated compared to voluntary reporting previously. While counterfeit incidents appear correlated to market fluctuations, data sharing needs more widespread participation to accurately measure trends. Most counterfeits are still identified through established screening processes, but more sophisticated fakes may be slipping through.
When a structural failure occurs, the cause of the failure is not immediately apparent. Engineer uses an investigative process to determine the root cause.
Intelligent Transportation Trends chpt.5 - Tolling and EnforcementNovavia Solutions
The term Intelligent Transportation Systems (ITS) was coined over two decades ago to designate applications of information and communication technologies to the operational management of transportation networks. The main promise of ITS has been very consistent over that period: network capacity can be freed up by optimizing traffic controls and empowering users with accurate travel information.
It can be debated how much faith practitioners and policy makers have placed in technology by investing their resources, as well as the extent to which Intelligent Transportation Systems have delivered on their promise. However, there is no question that steady and sometimes spectacular advances in computing technologies and usage trickle down to transportation applications in important ways. As a result, new products and services emerge continuously. They include systems that address the direct needs of networks managers, as well as others that are developed in tangential markets (e.g. automotive) or even through non-market mechanisms (e.g. many mobile web applications).
This talk presentation reviews major trends in information and communication technologies and demonstrate how each of them is driving innovative transportation services. We attempt to envision how those trends might develop in the future, so that we can finally examine some of their implications for travel demand and network management. There lie both challenges and opportunities for transportation engineers and planners, but either way, profound changes appear inevitable.
ORX Analytics & Scenario Forum 2019 - summaryLuke Carrivick
Discover more about the ORX Analytics and Scenarios forum.
On 3-4 July, more than 50 operational risk and scenario experts from banking and insurance met in London for two days of discussion and networking. The ORX Analytics and Scenario Forum takes place each year, and gives participants the chance to talk about the biggest issues facing the industry today.
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICSIJNSA Journal
This document proposes a tiered blockchain framework for vehicular forensics. It identifies key entities involved in the forensics process such as vehicles, manufacturers, technicians, authorities. It describes how these entities interact and how their interactions would be recorded on a permissioned blockchain to generate comprehensive evidence. It also introduces a watchdog entity to prevent collusion and proposes a vehicle state mechanism to verify vehicle sensor data after an accident. Finally, it conducts a security analysis and compares the framework to other proposals.
A TIERED BLOCKCHAIN FRAMEWORK FOR VEHICULAR FORENSICSIJNSA Journal
In this paper, we present a tiered vehicular forensics framework based on permission BlockChain. We integrate all entities involved in the forensics process and record their interactions in the BlockChain to generate comprehensive evidence for settling disputes and appropriating blame. We incorporate a watchdog entity in our tiered framework to prevent collusive tendencies of potentiality liable entities and to prevent exploitation of evidence. Also, we incorporate a state mechanism to prove the state of a smart vehicle when an accident occurs. Furthermore, we conduct a security analysis to demonstrate the resilience of our framework against identified attacks and describe security mechanisms used to achieve key requirements for vehicular forensics. Finally, we comparatively evaluate our framework against existing proposals.
Similar to Fraud Analysis and Other Applications of Unsupervised Learning in Property and Casualty Insurance (20)
Improve Your Regression with CART and RandomForestsSalford Systems
Why You Should Watch: Learn the fundamentals of tree-based machine learning algorithms and how to easily fine tune and improve your Random Forest regression models.
Abstract: In this webinar we'll introduce you to two tree-based machine learning algorithms, CART® decision trees and RandomForests®. We will discuss the advantages of tree based techniques including their ability to automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll explore the CART algorithm, bootstrap sampling, and the Random Forest algorithm (all with animations) and compare their predictive performance using a real world dataset.
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
The document discusses using in silico methods like virtual screening and predictive modeling to improve drug discovery. It presents results from applying techniques like receptor docking, machine learning algorithms, and Bayesian modeling to develop improved scoring functions that better distinguish active from inactive compounds. These scoring functions helped identify key molecular properties that correlated with active hits. The methods showed improved ability to find active hits compared to previous scoring functions.
Churn Modeling-For-Mobile-Telecommunications Salford Systems
This document summarizes a study on predicting customer churn for a major mobile provider. TreeNet models were used to predict the probability of customers churning (switching providers) within a 30-60 day period. TreeNet models significantly outperformed other methods, increasing accuracy and the proportion of high-risk customers identified. Applying the most accurate TreeNet models could translate to millions in additional annual revenue by helping the provider preemptively retain more customers.
This document provides dos and don'ts for data mining based on experiences from various practitioners. It lists important steps like clearly defining objectives, simplifying solutions, preparing data, using multiple techniques, and checking models. It warns against underestimating preparation, overfitting models, and collecting excessive unhelpful data. Practitioners emphasize the importance of domain knowledge, transparency, and creating models that are understandable to stakeholders.
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
The document outlines 9 challenges faced by data scientists: 1) poor quality data issues like dirty, missing, or inadequate data, 2) lack of understanding of data mining techniques, 3) lack of good literature on important topics and techniques, 4) difficulty for academic institutions accessing commercial-grade software at reasonable costs, 5) accommodating data from different sources and formats, 6) updating models constantly with new incoming data for online machine learning, 7) dealing with huge datasets requiring distributed approaches, 8) determining the right questions to ask of the data, and 9) remaining objective and letting the data lead rather than preconceptions.
This document contains a collection of quotes related to statistics and data. Some key quotes emphasize that while data and information are important, they must be used carefully and combined with human intelligence, judgement, and insight. Other quotes note that statistics can be flexible and misleading if not interpreted carefully, and that collecting quality data over long periods of time is important for analysis. The overall message is that statistics are a useful tool but have limitations, and human discernment is still needed.
Using CART For Beginners with A Teclo Example DatasetSalford Systems
Familiarize yourself with CART Decision Tree technology in this beginner's tutorial using a telecommunications example dataset from the 1990s. By the end of this tutorial you should feel comfortable using CART on your own with sample or real-world data.
The document provides an overview of a 4-part webinar covering the evolution of regression techniques from classical least squares to more advanced machine learning methods like random forests and gradient boosting. It outlines the topics to be covered in each part, including classical regression, regularized regression techniques like ridge regression, LASSO, and MARS, and ensemble methods like random forests and TreeNet gradient boosted trees. Examples using the Boston housing data set are provided to illustrate some of these techniques.
This document discusses how educational institutions can use data mining software to better understand and support their students. It outlines several areas where data analysis can provide insights, such as predicting student performance based on more than just grades, understanding factors that lead to success or failure and graduation, determining the effectiveness of support programs, identifying which recruitment strategies and financial packages attract students, and predicting those most at risk of dropping out or defaulting on loans. The overall goal is to enhance student outcomes and institutional management through analytics.
Comparison of statistical methods commonly used in predictive modelingSalford Systems
This document compares four statistical methods commonly used in predictive modelling: Logistic Multiple Regression (LMR), Principal Component Regression (PCR), Classification and Regression Tree analysis (CART), and Multivariate Adaptive Regression Splines (MARS). It applies these methods to two ecological data sets to test their accuracy, reliability, ease of use, and implementation in a geographic information system (GIS). The results show that independent data is needed to validate models, and that MARS and CART achieved the best prediction success, although CART models became too complex for cartographic purposes with a large number of data points.
This document discusses Dr. Wayne Danter's research using artificial intelligence tools to predict biological activity of molecular structures. His method involves using CART to analyze public HIV data and build predictive models. CART generates decision trees to identify important variables that predict if a molecule is biologically active against HIV. Dr. Danter then uses MARS and NeuroShell Classifier to further improve prediction accuracy. His proprietary CHEMSASTM algorithm teaches neural networks to relate molecular structure to function for screening potential HIV drugs. Using these methods, Dr. Danter has achieved over 96% accuracy in classifying 311 drugs' activity against HIV.
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
Understand CART decision tree pros/cons, how TreeNet stochastic gradient boosting ca n help overcome single-tree challenges, and what the advantages are when using CART and TreeNet in combination for predictive modeling success.
Salford Systems offers several products for data mining and predictive modeling. The table compares features of their Basic, Pro, ProEx, and Ultra components. The Basic component includes basic modeling, reporting, and automation features. Pro adds additional modeling engines and missing data handling capabilities. ProEx further expands the supported modeling techniques and automations. Ultra provides the most extensive set of features, including additional modeling pipelines, ensemble methods, and tree-based algorithms.
This document provides an introduction to MARS (Multivariate Adaptive Regression Splines), an automated regression modeling tool. MARS can build accurate predictive models for continuous and binary dependent variables by automatically selecting variables, determining transformations and interactions between variables, and handling missing data. It efficiently searches through all possible models to identify an optimal solution. The document explains how MARS works, provides settings to configure MARS, and uses the Boston housing dataset to demonstrate the basic steps of building a MARS model.
The document discusses combining CART (Classification and Regression Tree) and logistic regression models to take advantage of their respective strengths in classification and data mining tasks. It describes how running a logistic regression on the entire dataset using CART terminal node assignments as dummy variables allows the logistic model to find effects across nodes that CART cannot detect. This improves CART's predictions by imposing slopes on cases within nodes and providing a more granular, continuous response than CART alone. The approach also allows compensating for some of CART's weaknesses like coarse-grained responses.
When building a predictive model in SPM, you'll want to know exactly what you did to get your results. This short slide deck will show you how to review your work in the session logs.
The document discusses techniques for compressing and extracting rules from TreeNet models. It describes how TreeNet has achieved high predictive performance but its models can be refined further. Regularized regression can be applied to the trees or nodes in a TreeNet model to combine similar trees, reweight trees, and select a compressed subset of trees without much loss in accuracy. This "model compression" technique aims to simplify TreeNet models for improved deployment while maintaining good predictive performance.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Fraud Analysis and Other Applications of Unsupervised Learning in Property and Casualty Insurance
1. Applications of Unsupervised Learning
in Property and Casualty Insurance
with emphasis on fraud analysis
Louise Francis, FCAS, MAAA
Francis Analytics and Actuarial
Data Mining, Inc.
www.data-mines.com
Louise.francus@data-mines.com
2. Objectives
Review classic unsupervised learning
techniques
Introduce 2 new unsupervised
learning techniques
RandomForest
PRIDIT
Apply the techniques to insurance
data
Automobile Fraud data set
A publically available automobile
insurance database
3. Motivation for Topic
New book: Predictive Modeling in
Actuarial Science
An introduction to predictive modeling for
actuaries and other insurance
professionals
Publisher: Cambridge University Press
Hope to Publish: Fall 2012
Chapter on Unsupervised Learning
Li Yang and Louise Francis
Li Yang – Variable grouping (PCA)
Louise Francis- record grouping
(clustering)
4. Book Project
Predictive Modeling 2 Volume Book Project
A joint project leading to a two volume pair of
books on Predictive Modeling in Actuarial Science.
Volume 1 would be on Theory and Methods and
Volume 2 would be on Property and Casualty
Applications.
The first volume will be introductory with basic
concepts and a wide range of techniques designed
to acquaint actuaries with this sector of problem
solving techniques. The second volume would be
a collection of applications to P&C problems,
written by authors who are well aware of the
advantages and disadvantages of the first volume
techniques but who can explore relevant
applications in detail with positive results.
5. The Fraud Study Data
• 1993 AIB closed PIP claims
• Dependent Variables
• Suspicion Score
• Expert assessment of liklihood of
fraud or abuse
• Predictor Variables
• Red flag indicators
• Claim file variables
Francis Analytics and Actuarial
6/26/2012 5
Data Mining, Inc.
6. The Fraud Problem
from: www.agentinsure.com
Francis Analytics and Actuarial
6/26/2012 6
Data Mining, Inc.
7. The Fraud Problem (2)
from Coalition Against Insurance Fraud
Francis Analytics and Actuarial Da
6/26/2012 7
Mining, Inc.
8. Fraud and Abuse
Planned fraud
Staged accidents
Abuse
Opportunistic
Exaggerate claim
Francis Analytics and Actuarial
6/26/2012 8
Data Mining, Inc.
9. The Fraud Red Flags
Binary variables that capture
characteristics of claims
associated with fraud and abuse
Accident variables (acc01 - acc19)
Injury variables (inj01 – inj12)
Claimant variables (ch01 – ch11)
Insured variables (ins01 – ins06)
Treatment variables (trt01 – trt09)
Lost wages variables (lw01 – lw07)
10. The Red Flag Variables
Red Flag Variables
Indicator
Subject Variable Description
Accident ACCO1 No report by police officer at scene
A0004 Single vehicle accident
A0009 No plausible explanation for accident
ACC10 Claimant in old, low valued vehicle
ACC11 Rental vehicle involved in accident
ACC14 Property Damage was inconsistent with accident
ACC15 Very minor impact collision
ACC16 Claimant vehicle stopped short
ACC19 Insured felt set up, denied fault
Claimant CLT02 Had a history of previous claims
CLT04 Was an out of state accident
CLT07 Was one of three or more claimants in vehicle
Injury INJO1 Injury consisted of strain or sprain only
INJ02 No objective evidence of injury
INJO3 Police report showed no injury or pain
INJ05 No emergency treatment was given
INJO6 Non-emergency treatment was delayed
INJ11 Unusual injury for auto accident
Insured INSO1 Had history of previous claims
INSO3 Readily accepted fault for accident
INSO6 Was difficult to contact/uncooperative
INSO7 Accident occurred soon after effective date
Lost Wages LWO1 Claimant worked for self or a family member
LW03 Claimant recently started employment
Francis Analytics and Actuarial
6/26/2012 10
Data Mining, Inc.
11. Dependent Variable
Problem
Insurance companies frequently do
not collect information as to
whether a claim is suspected of
fraud or abuse
Even when claims are referred for
special investigation
Solution: unsupervised learning
Francis Analytics and Actuarial
6/26/2012 11
Data Mining, Inc.
12. Supervised Learning
Francis Analytics and Actuarial
6/26/2012 12
Data Mining, Inc.
13. Dimension Reduction
PolicyCount VehicleCou
Frequency Frequency Frequency NonBusines ntNonBusin
ZipCode BI PD Comb sUse essUse SeverityBI SeverityPD
90095 - 54.50 0.03 2.00 3.00 1,973.50
93741 - - - 1.00 1.00
90015 22.65 43.93 0.04 1.00 2.00 10,181.16 2,442.36
90067 15.53 44.41 0.04 3.00 6.00 13,146.57 2,565.56
90004 26.71 48.45 0.04 11.00 17.00 8,538.56 2,354.08
Francis Analytics and Actuarial
6/26/2012 13
Data Mining, Inc.
14. The CAARP Data
This assigned risk automobile data was made
available to researchers in 2005 for the purpose of
studying the effect of change in regultion on territorial
variables
contain exposure information (car counts, premium)
and claim and loss information (Bodily Injury (BI)
counts, BI ultimate losses, Property Damage (PD)
claim counts, PD ultimate losses).
Each record is a zip code
Good example of using unsupervised learning for
territory construction
Francis Analytics and Actuarial
6/26/2012 14
Data Mining, Inc.
15. R Cluster Library
The “cluster” library from R used
Many of the functions in the library
are described in the Kaufman and
Rousseeuw’s (1990) classic
bookon clustering.
Finding Groups in Data.
Francis Analytics and Actuarial
6/26/2012 15
Data Mining, Inc.
16. Grouping Records
Francis Analytics and Actuarial
6/26/2012 16
Data Mining, Inc.
17. Dissimilarity
Euclidian Distance: the record by
record squared difference between
the value of each the variables for
a record and the values for the
record it is being compared to.
Francis Analytics and Actuarial
6/26/2012 17
Data Mining, Inc.
18. RF Similarity
Varies between 0 and 1
Proximity matrix is an output of RF
After a tree is fit, all records run through model
If 2 records in same terminal node, their
proximity increased by 1
1-proximity forms distance
Can be used as an input to clustering and other
unsupervised learning procedures
See “Unsupervised Learning with Random
Forest Predictors” by Shi and Actuarial
Francis Analytics
and Horvath
6/26/2012 18
Data Mining, Inc.
19. Clustering
Hierarchical clustering
K-Means clustering
This analysis uses k-means
Francis Analytics and Actuarial
6/26/2012 19
Data Mining, Inc.
20. K-means Clustering
An iterative procedure is used to assign
each record in the data to one of the k
clusters.
The iteration begins with the initial centers
or mediods for k groups.
uses a dissimilarity measure to assign
records to a group and to iterate to a final
grouping. An iterative procedure is used to
assign each record to one of the k
6/26/2012
clusters. byFrancis Analytics and Actuarial
the user, 21
Data Mining, Inc.
21. R Cluster Output
Francis Analytics and Actuarial
6/26/2012 22
Data Mining, Inc.
22. Cluster Plot
Francis Analytics and Actuarial
6/26/2012 23
Data Mining, Inc.
23. Silhouette Plot
Francis Analytics and Actuarial
6/26/2012 24
Data Mining, Inc.
30. RF Ranking of the
“Predictors”: Top 10 of 44
Variable MeanDecreaseGini Description
acc10 10.50 Claimant in old low value vehical
trt01 9.05 arge # visits to chiro
inj01 8.64 strain or sprain
inj02 8.64 readily accepted fauld
inj05 8.62 non emergency treatment given for injury
acc01 8.55 no police report
clt07 7.47 one of 3 or more claimants in vehical
inj06 7.44 non emergency trt delayed
acc15 7.36 very minor collision
trt03 6.82 large # visits to PT
Francis Analytics and Actuarial
6/26/2012 31
Data Mining, Inc.
31. Problem: Categorical
Variables
It is not clear how to best perform
Principal Components/Factor
Analysis on categorical variables
The categories may be coded as a
series of binary dummy variables
If the categories are ordered
categories, you may loose
important information
This is the problem that PRIDIT
addresses
32. RIDIT
Variables are ordered so that
lowest value is associated with
highest probability of fraud
Use Cumulative distribution of
claims at each value, i, to create
RIDIT statistic for claim t, value i
Rti ˆ
ptj ˆ
ptj
j i j i
33. Example: RIDIT for Legal
Representation
Legal Representation
Proportion Proportion
Value Code Number Proportion Below Above RIDIT
Yes 1 706 0.504 0.000 0.496 -0.496
No 2 694 0.496 0.504 0.000 0.504
34. PRIDIT
Use RIDIT statistics in Principal
Components Analysis
Component Matri xa
C om pon e n t
1
S IU .248
Pol i ce Re port .220
At Faul t .709
Le gal Re p .752
Medi cal Audi t .341
Pri or C l ai m .406
Extracti on Me th od: Pri n ci pal Com pon e n t An al ys i s.
a. 1 component s ext r act ed.
35. PRIDITS of Accident
Flags
Francis Analytics and Actuarial
6/26/2012 36
Data Mining, Inc.
36. Fit Tree with PRIDITS for
Each Type of Flag
Francis Analytics and Actuarial
6/26/2012 38
Data Mining, Inc.
37. Importance Ranking of
Pridits
Francis Analytics and Actuarial
6/26/2012 39
Data Mining, Inc.
38. Importance Ranking of
Factors
Francis Analytics and Actuarial
6/26/2012 40
Data Mining, Inc.
39. Add RF and Euclid
Clusters to PRIDIT
Factors
Francis Analytics and Actuarial
6/26/2012 41
Data Mining, Inc.
40. Use Salford RF MDS
Top variable in importance (acc10)
used as binary dependent
Run tree with 1,000 forests
Output proximities and MDS
Use MDS scales as to cluster
(k=3)
Run Tree to get Importance
ranking
Francis Analytics and Actuarial
6/26/2012 42
Data Mining, Inc.
41. MDS Graph
Francis Analytics and Actuarial
6/26/2012 43
Data Mining, Inc.
42. Rank of cluster
procedures to Tree
Prediction
Francis Analytics and Actuarial
6/26/2012 44
Data Mining, Inc.
43. Labeling Clusters
Francis Analytics and Actuarial
6/26/2012 45
Data Mining, Inc.
44. Relation Between
PRIDIT Factor and
Suspicion
Francis Analytics and Actuarial
6/26/2012 46
Data Mining, Inc.
45. Next Steps
Add claim file variables
Rerun clusters
Rerun PRIDITS
Do Random Forest proximities on
the RIDITS
Apply the procedures to other
fraud databases
Francis Analytics and Actuarial
6/26/2012 47
Data Mining, Inc.
46. PRIDIT REFERENCES
Ai, J., Brockett, Patrick L., and Golden, Linda L. (2009) “Assessing Consumer
Fraud Risk in Insurance Claims with Discrete and Continuous Data,”
North American Actuarial Journal 13: 438-458.
Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and
Alpert, Mark, (2002), Fraud Classification Using Principal Component
Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373.
Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using
Kohonen’ Self-Organizing Feature Map to Uncover Automobile Bodily
Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274
Bross, Irwin D.J., (1958), How To Use RIDIT Analysis, Biometrics,
4:18-38.
Chipman, H.E.I. George and R.E. McCulloch, 2006, Baysian Ensemble Learning,
Neural Information Processing Systems
Lieberthal, Robert D., (2008), Hospital Quality: A PRIDIT Approach, Health
Services Research, 43:3, 988–1005.
47. Questions?
Francis Analytics and Actuarial
6/26/2012 49
Data Mining, Inc.