SlideShare a Scribd company logo
1 of 12
Morse-Smale Regression
Extensions for Risk
Modeling
Colleen M. Farrelly
Introduction
 Subgroups are ubiquitous in scientific research and actuarial science.
 Risk is not uniform.
 Risk types can vary in degree and kind, and high risk on some factors are not as
high risk overall as lower risk on other factors.
 Piecewise regression is one method that can accurately capture this
phenomenon.
 Morse-Smale regression is a topologically-based piecewise regression method
that has shown promise on various Tweedie-distributed outcomes, including
common distributions used in modeling risk.
 This method currently employs elastic net and generalized linear modeling to fit
the regression pieces to Morse-Smale-complex-based partitions.
 Many machine learning extensions of regression exist and can capture multivariate
trends in the data, which is a limitation of both elastic net and generalized linear
modeling.
 Extending Morse-Smale regression to machine-learning-based models can
potentially improve accuracy and understanding of risk.
Tweedie Regression Overview
• Tweedie model framework (many
biological/social count variables):
• y=φµξ
• Where φ is the dispersion parameter (extra
zeros in model, here=1.5)
• µ is the mean
• ξ is the Tweedie parameter (increased mass
near zero and fatness of non-zero
distributions)
• Many exponential distributions converge to
Tweedie distributions (normal, Poisson,
gamma, compound Poisson-gamma…)
• Examples
• Number of students enrolled by advisor in a
month
• Insurance claim payout
• Heroin use per month
Morse-Smale Regression Overview: I
 To build intuition:
 Imagine a soccer player kicking a ball on the ground of a hilly field to explore the
field.
 The high and low points (maxima and minima) determine where the ball will come to rest.
 These paths of the ball define which parts of the field share common hills and valleys.
 These paths are actually gradient paths defined by height on the field’s topological
space.
 The spaces they define are the Morse-Smale complex of the field, partitioning it
into different regions (clusters).
Morse-Smale Regression Overview: II
 Morse-Smale clusters
partition data space into
sections with common
minimums and maximums
based on the function flow.
 Groups can be visualized in
low-dimensional space to
see commonalities and
differences (top right).
 Groups can also be
examined based on
differences in predictor
values (bottom right).
 This provides users with a
good visualization tool to
understand the data.
Example: 2 groups,
3 predictors
Extending Morse-Smale Regression
 Multivariate algorithms to fit partitioned regression models
 Random forest
 Bagged ensemble of tree models
 Akin to combining novel summaries of a class randomly assigned a few chapters
 Boosted regression
 Iteratively added model of main effects and interaction terms
 Akin to guessing a puzzle’s picture by adding key pieces until the picture is mostly there
 Homotopy LASSO
 Extends penalized regression model (LASSO) through homotopy estimation methods
 Akin to a blind-folded person navigating around obstacles between two set points by
following a rope
 Conditional inference tree
 Tree method that partitions space by assessing covariate independence
 Extreme learning machine
 Single-layer feed-forward neural networks based on random mapping between layers
 Has universal approximation properties
Simulation and Swedish Motor Insurance
 Simulation
 Simulation design parameters
 4 true predictors, 11 noise variables
 Sample size set to 10,000
 Outcome Tweedie-distributed, with Tweedie parameter varying (1, 1.5, 2) and dispersion
(1, 2, 4)
 Nature of predictor relationships varied (4 main effects, 2 interaction effects, or a
combination of 2 main effects and 1 interaction effect)
 Each trial was run 10 times with a 70/30 training/test split.
 Mean square error (MSE) was used to assess model accuracy.
 Swedish 3rd Party Motor Insurance 1977
 2182 observations with 6 predictors (kilometers traveled per year, geographic zone,
bonus, car model make, number of years insured, total claims)
 MSE assessed for all models based on 70/30 training/test split
Simulation Results
 Most multivariate Morse-Smale regression algorithms perform well against the
original Morse-Smale regression algorithm, particularly for trials involving linear or
mixed predictor relationships and trials with lower dispersion.
 Some of these models outperformed their non-piecewise counterpart models, as
well.
 Even when algorithms perform similarly to non-piecewise counterparts, they
provide a comparison of predictor importance among different risk subgroups and
methods to visualize these differences (random forest model shown below).
Swedish Motor Insurance Results: I
 Most machine learning models perform well, and multivariate Morse-Smale
regression methods perform exceptionally well.
Swedish Motor Insurance Results: II
 Three distinct subgroups were found, and risk type varied significantly
between them.
 Group 1: relatively high dependence on make and number of claims
 Group 2: relatively high dependence on bonus and number of years insured
 Group 3: almost solely dependent on number of claims and geographic zone
Conclusions
 Multivariate Morse-Smale regression models typically:
 Outperform the original Morse-Smale regression algorithm
 Perform comparably to the non-partitioned models built with the same machine
learning algorithm.
 Multivariate Morse-Smale regression models provide subgroup-based analytics
capabilities and differentiated risk structure abilities that can help actuaries:
 Better understand risk
 Create models based on insurance policy risk groups (as well as risk level)
 Visualize this process to help others within the industry understand the models
(less black-box)
 However, some black-box algorithms perform better on Tweedie regression
problems (particularly Farrelly, 2017, KNN regression ensembles); these
methods don’t allow for visualization or comparison of risk factors.
 Large sample sizes are needed for good performance, but most insurance
datasets are large enough to circumvent potential convergence issues.
References
 Talk is a summary of:
 Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to
Actuarial Science. arXiv preprint arXiv:1708.05712.—Accepted Dec 2017 for
publication by Casualty Actuarial Society
 Selected references from 2017 Farrelly paper:
 De Jong, P., & Heller, G. Z. (2008). Generalized linear models for insurance data
(Vol. 10). Cambridge: Cambridge University Press.
 Farrelly, C. M. (2017). KNN Ensembles for Tweedie Regression: The Power of
Multiscale Neighborhoods. arXiv preprint arXiv:1708.02122.
 Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–
smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-
214.
 McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression. Journal of the
American Statistical Association, 65(331), 1109-1124.
 Tomoda, K., Morino, K., Murata, H., Asaoka, R., & Yamanishi, K. (2016). Predicting
Glaucomatous Progression with Piecewise Regression Model from Heterogeneous
Medical Data. HEALTHINF, 2016.

More Related Content

What's hot

Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
silvaft
 

What's hot (20)

Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
Topology for data science
Topology for data scienceTopology for data science
Topology for data science
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case Studies
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression Trees
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
 
Machine learning
Machine learningMachine learning
Machine learning
 

Similar to Morse-Smale Regression for Risk Modeling

Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
javed khan
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
BRNSS Publication Hub
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
Saleesh Satheeshchandran
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
butest
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
butest
 
Maxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingMaxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modeling
Erika G. G.
 

Similar to Morse-Smale Regression for Risk Modeling (20)

Statsci
StatsciStatsci
Statsci
 
StatsModelling
StatsModellingStatsModelling
StatsModelling
 
SENIOR COMP FINAL
SENIOR COMP FINALSENIOR COMP FINAL
SENIOR COMP FINAL
 
Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512
 
Morse et al 2012
Morse et al 2012Morse et al 2012
Morse et al 2012
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Mixed models
Mixed modelsMixed models
Mixed models
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear models
 
Maxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingMaxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modeling
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 

More from Colleen Farrelly

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 

Recently uploaded (20)

Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 

Morse-Smale Regression for Risk Modeling

  • 1. Morse-Smale Regression Extensions for Risk Modeling Colleen M. Farrelly
  • 2. Introduction  Subgroups are ubiquitous in scientific research and actuarial science.  Risk is not uniform.  Risk types can vary in degree and kind, and high risk on some factors are not as high risk overall as lower risk on other factors.  Piecewise regression is one method that can accurately capture this phenomenon.  Morse-Smale regression is a topologically-based piecewise regression method that has shown promise on various Tweedie-distributed outcomes, including common distributions used in modeling risk.  This method currently employs elastic net and generalized linear modeling to fit the regression pieces to Morse-Smale-complex-based partitions.  Many machine learning extensions of regression exist and can capture multivariate trends in the data, which is a limitation of both elastic net and generalized linear modeling.  Extending Morse-Smale regression to machine-learning-based models can potentially improve accuracy and understanding of risk.
  • 3. Tweedie Regression Overview • Tweedie model framework (many biological/social count variables): • y=φµξ • Where φ is the dispersion parameter (extra zeros in model, here=1.5) • µ is the mean • ξ is the Tweedie parameter (increased mass near zero and fatness of non-zero distributions) • Many exponential distributions converge to Tweedie distributions (normal, Poisson, gamma, compound Poisson-gamma…) • Examples • Number of students enrolled by advisor in a month • Insurance claim payout • Heroin use per month
  • 4. Morse-Smale Regression Overview: I  To build intuition:  Imagine a soccer player kicking a ball on the ground of a hilly field to explore the field.  The high and low points (maxima and minima) determine where the ball will come to rest.  These paths of the ball define which parts of the field share common hills and valleys.  These paths are actually gradient paths defined by height on the field’s topological space.  The spaces they define are the Morse-Smale complex of the field, partitioning it into different regions (clusters).
  • 5. Morse-Smale Regression Overview: II  Morse-Smale clusters partition data space into sections with common minimums and maximums based on the function flow.  Groups can be visualized in low-dimensional space to see commonalities and differences (top right).  Groups can also be examined based on differences in predictor values (bottom right).  This provides users with a good visualization tool to understand the data. Example: 2 groups, 3 predictors
  • 6. Extending Morse-Smale Regression  Multivariate algorithms to fit partitioned regression models  Random forest  Bagged ensemble of tree models  Akin to combining novel summaries of a class randomly assigned a few chapters  Boosted regression  Iteratively added model of main effects and interaction terms  Akin to guessing a puzzle’s picture by adding key pieces until the picture is mostly there  Homotopy LASSO  Extends penalized regression model (LASSO) through homotopy estimation methods  Akin to a blind-folded person navigating around obstacles between two set points by following a rope  Conditional inference tree  Tree method that partitions space by assessing covariate independence  Extreme learning machine  Single-layer feed-forward neural networks based on random mapping between layers  Has universal approximation properties
  • 7. Simulation and Swedish Motor Insurance  Simulation  Simulation design parameters  4 true predictors, 11 noise variables  Sample size set to 10,000  Outcome Tweedie-distributed, with Tweedie parameter varying (1, 1.5, 2) and dispersion (1, 2, 4)  Nature of predictor relationships varied (4 main effects, 2 interaction effects, or a combination of 2 main effects and 1 interaction effect)  Each trial was run 10 times with a 70/30 training/test split.  Mean square error (MSE) was used to assess model accuracy.  Swedish 3rd Party Motor Insurance 1977  2182 observations with 6 predictors (kilometers traveled per year, geographic zone, bonus, car model make, number of years insured, total claims)  MSE assessed for all models based on 70/30 training/test split
  • 8. Simulation Results  Most multivariate Morse-Smale regression algorithms perform well against the original Morse-Smale regression algorithm, particularly for trials involving linear or mixed predictor relationships and trials with lower dispersion.  Some of these models outperformed their non-piecewise counterpart models, as well.  Even when algorithms perform similarly to non-piecewise counterparts, they provide a comparison of predictor importance among different risk subgroups and methods to visualize these differences (random forest model shown below).
  • 9. Swedish Motor Insurance Results: I  Most machine learning models perform well, and multivariate Morse-Smale regression methods perform exceptionally well.
  • 10. Swedish Motor Insurance Results: II  Three distinct subgroups were found, and risk type varied significantly between them.  Group 1: relatively high dependence on make and number of claims  Group 2: relatively high dependence on bonus and number of years insured  Group 3: almost solely dependent on number of claims and geographic zone
  • 11. Conclusions  Multivariate Morse-Smale regression models typically:  Outperform the original Morse-Smale regression algorithm  Perform comparably to the non-partitioned models built with the same machine learning algorithm.  Multivariate Morse-Smale regression models provide subgroup-based analytics capabilities and differentiated risk structure abilities that can help actuaries:  Better understand risk  Create models based on insurance policy risk groups (as well as risk level)  Visualize this process to help others within the industry understand the models (less black-box)  However, some black-box algorithms perform better on Tweedie regression problems (particularly Farrelly, 2017, KNN regression ensembles); these methods don’t allow for visualization or comparison of risk factors.  Large sample sizes are needed for good performance, but most insurance datasets are large enough to circumvent potential convergence issues.
  • 12. References  Talk is a summary of:  Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to Actuarial Science. arXiv preprint arXiv:1708.05712.—Accepted Dec 2017 for publication by Casualty Actuarial Society  Selected references from 2017 Farrelly paper:  De Jong, P., & Heller, G. Z. (2008). Generalized linear models for insurance data (Vol. 10). Cambridge: Cambridge University Press.  Farrelly, C. M. (2017). KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods. arXiv preprint arXiv:1708.02122.  Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse– smale regression. Journal of Computational and Graphical Statistics, 22(1), 193- 214.  McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression. Journal of the American Statistical Association, 65(331), 1109-1124.  Tomoda, K., Morino, K., Murata, H., Asaoka, R., & Yamanishi, K. (2016). Predicting Glaucomatous Progression with Piecewise Regression Model from Heterogeneous Medical Data. HEALTHINF, 2016.