SlideShare a Scribd company logo
1 of 12
Morse-Smale Regression
Extensions for Risk
Modeling
Colleen M. Farrelly
Introduction
 Subgroups are ubiquitous in scientific research and actuarial science.
 Risk is not uniform.
 Risk types can vary in degree and kind, and high risk on some factors are not as
high risk overall as lower risk on other factors.
 Piecewise regression is one method that can accurately capture this
phenomenon.
 Morse-Smale regression is a topologically-based piecewise regression method
that has shown promise on various Tweedie-distributed outcomes, including
common distributions used in modeling risk.
 This method currently employs elastic net and generalized linear modeling to fit
the regression pieces to Morse-Smale-complex-based partitions.
 Many machine learning extensions of regression exist and can capture multivariate
trends in the data, which is a limitation of both elastic net and generalized linear
modeling.
 Extending Morse-Smale regression to machine-learning-based models can
potentially improve accuracy and understanding of risk.
Tweedie Regression Overview
• Tweedie model framework (many
biological/social count variables):
• y=φµξ
• Where φ is the dispersion parameter (extra
zeros in model, here=1.5)
• µ is the mean
• ξ is the Tweedie parameter (increased mass
near zero and fatness of non-zero
distributions)
• Many exponential distributions converge to
Tweedie distributions (normal, Poisson,
gamma, compound Poisson-gamma…)
• Examples
• Number of students enrolled by advisor in a
month
• Insurance claim payout
• Heroin use per month
Morse-Smale Regression Overview: I
 To build intuition:
 Imagine a soccer player kicking a ball on the ground of a hilly field to explore the
field.
 The high and low points (maxima and minima) determine where the ball will come to rest.
 These paths of the ball define which parts of the field share common hills and valleys.
 These paths are actually gradient paths defined by height on the field’s topological
space.
 The spaces they define are the Morse-Smale complex of the field, partitioning it
into different regions (clusters).
Morse-Smale Regression Overview: II
 Morse-Smale clusters
partition data space into
sections with common
minimums and maximums
based on the function flow.
 Groups can be visualized in
low-dimensional space to
see commonalities and
differences (top right).
 Groups can also be
examined based on
differences in predictor
values (bottom right).
 This provides users with a
good visualization tool to
understand the data.
Example: 2 groups,
3 predictors
Extending Morse-Smale Regression
 Multivariate algorithms to fit partitioned regression models
 Random forest
 Bagged ensemble of tree models
 Akin to combining novel summaries of a class randomly assigned a few chapters
 Boosted regression
 Iteratively added model of main effects and interaction terms
 Akin to guessing a puzzle’s picture by adding key pieces until the picture is mostly there
 Homotopy LASSO
 Extends penalized regression model (LASSO) through homotopy estimation methods
 Akin to a blind-folded person navigating around obstacles between two set points by
following a rope
 Conditional inference tree
 Tree method that partitions space by assessing covariate independence
 Extreme learning machine
 Single-layer feed-forward neural networks based on random mapping between layers
 Has universal approximation properties
Simulation and Swedish Motor Insurance
 Simulation
 Simulation design parameters
 4 true predictors, 11 noise variables
 Sample size set to 10,000
 Outcome Tweedie-distributed, with Tweedie parameter varying (1, 1.5, 2) and dispersion
(1, 2, 4)
 Nature of predictor relationships varied (4 main effects, 2 interaction effects, or a
combination of 2 main effects and 1 interaction effect)
 Each trial was run 10 times with a 70/30 training/test split.
 Mean square error (MSE) was used to assess model accuracy.
 Swedish 3rd Party Motor Insurance 1977
 2182 observations with 6 predictors (kilometers traveled per year, geographic zone,
bonus, car model make, number of years insured, total claims)
 MSE assessed for all models based on 70/30 training/test split
Simulation Results
 Most multivariate Morse-Smale regression algorithms perform well against the
original Morse-Smale regression algorithm, particularly for trials involving linear or
mixed predictor relationships and trials with lower dispersion.
 Some of these models outperformed their non-piecewise counterpart models, as
well.
 Even when algorithms perform similarly to non-piecewise counterparts, they
provide a comparison of predictor importance among different risk subgroups and
methods to visualize these differences (random forest model shown below).
Swedish Motor Insurance Results: I
 Most machine learning models perform well, and multivariate Morse-Smale
regression methods perform exceptionally well.
Swedish Motor Insurance Results: II
 Three distinct subgroups were found, and risk type varied significantly
between them.
 Group 1: relatively high dependence on make and number of claims
 Group 2: relatively high dependence on bonus and number of years insured
 Group 3: almost solely dependent on number of claims and geographic zone
Conclusions
 Multivariate Morse-Smale regression models typically:
 Outperform the original Morse-Smale regression algorithm
 Perform comparably to the non-partitioned models built with the same machine
learning algorithm.
 Multivariate Morse-Smale regression models provide subgroup-based analytics
capabilities and differentiated risk structure abilities that can help actuaries:
 Better understand risk
 Create models based on insurance policy risk groups (as well as risk level)
 Visualize this process to help others within the industry understand the models
(less black-box)
 However, some black-box algorithms perform better on Tweedie regression
problems (particularly Farrelly, 2017, KNN regression ensembles); these
methods don’t allow for visualization or comparison of risk factors.
 Large sample sizes are needed for good performance, but most insurance
datasets are large enough to circumvent potential convergence issues.
References
 Talk is a summary of:
 Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to
Actuarial Science. arXiv preprint arXiv:1708.05712.—Accepted Dec 2017 for
publication by Casualty Actuarial Society
 Selected references from 2017 Farrelly paper:
 De Jong, P., & Heller, G. Z. (2008). Generalized linear models for insurance data
(Vol. 10). Cambridge: Cambridge University Press.
 Farrelly, C. M. (2017). KNN Ensembles for Tweedie Regression: The Power of
Multiscale Neighborhoods. arXiv preprint arXiv:1708.02122.
 Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–
smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-
214.
 McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression. Journal of the
American Statistical Association, 65(331), 1109-1124.
 Tomoda, K., Morino, K., Murata, H., Asaoka, R., & Yamanishi, K. (2016). Predicting
Glaucomatous Progression with Piecewise Regression Model from Heterogeneous
Medical Data. HEALTHINF, 2016.

More Related Content

What's hot

Basics of Structural Equation Modeling
Basics of Structural Equation ModelingBasics of Structural Equation Modeling
Basics of Structural Equation Modelingsmackinnon
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splinesEklavya Gupta
 
Confusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreConfusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreSaurabh Singh
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Dependence Techniques
Dependence Techniques Dependence Techniques
Dependence Techniques Hasnain Khan
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learningShajun Nisha
 
Application of survival data analysis introduction and discussion
Application of survival data analysis  introduction and discussionApplication of survival data analysis  introduction and discussion
Application of survival data analysis introduction and discussionASQ Reliability Division
 
4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptx4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptxSaitama84
 
Arima model
Arima modelArima model
Arima modelJassika
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfittingSivapriyaS12
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis pptMukesh Bisht
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesRajat Sharma
 

What's hot (20)

Basics of Structural Equation Modeling
Basics of Structural Equation ModelingBasics of Structural Equation Modeling
Basics of Structural Equation Modeling
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splines
 
Confusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreConfusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f score
 
ders 7.1 VAR.pptx
ders 7.1 VAR.pptxders 7.1 VAR.pptx
ders 7.1 VAR.pptx
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Dependence Techniques
Dependence Techniques Dependence Techniques
Dependence Techniques
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Application of survival data analysis introduction and discussion
Application of survival data analysis  introduction and discussionApplication of survival data analysis  introduction and discussion
Application of survival data analysis introduction and discussion
 
4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptx4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptx
 
Arima model
Arima modelArima model
Arima model
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
Factor analysis (1)
Factor analysis (1)Factor analysis (1)
Factor analysis (1)
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
EFA
EFAEFA
EFA
 

Similar to Morse-Smale Regression for Risk Modeling

Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Stats Statswork
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512IJRAT
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret languagejaved khan
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...BRNSS Publication Hub
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdfLeonardo Auslender
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningIOSR Journals
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and rPhilip Ramsey
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...butest
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...butest
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingGregg Barrett
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear modelsColleen Farrelly
 
Maxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingMaxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingErika G. G.
 

Similar to Morse-Smale Regression for Risk Modeling (20)

Statsci
StatsciStatsci
Statsci
 
StatsModelling
StatsModellingStatsModelling
StatsModelling
 
SENIOR COMP FINAL
SENIOR COMP FINALSENIOR COMP FINAL
SENIOR COMP FINAL
 
Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512
 
Morse et al 2012
Morse et al 2012Morse et al 2012
Morse et al 2012
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Mixed models
Mixed modelsMixed models
Mixed models
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear models
 
Maxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingMaxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modeling
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Morse-Smale Regression for Risk Modeling

  • 1. Morse-Smale Regression Extensions for Risk Modeling Colleen M. Farrelly
  • 2. Introduction  Subgroups are ubiquitous in scientific research and actuarial science.  Risk is not uniform.  Risk types can vary in degree and kind, and high risk on some factors are not as high risk overall as lower risk on other factors.  Piecewise regression is one method that can accurately capture this phenomenon.  Morse-Smale regression is a topologically-based piecewise regression method that has shown promise on various Tweedie-distributed outcomes, including common distributions used in modeling risk.  This method currently employs elastic net and generalized linear modeling to fit the regression pieces to Morse-Smale-complex-based partitions.  Many machine learning extensions of regression exist and can capture multivariate trends in the data, which is a limitation of both elastic net and generalized linear modeling.  Extending Morse-Smale regression to machine-learning-based models can potentially improve accuracy and understanding of risk.
  • 3. Tweedie Regression Overview • Tweedie model framework (many biological/social count variables): • y=φµξ • Where φ is the dispersion parameter (extra zeros in model, here=1.5) • µ is the mean • ξ is the Tweedie parameter (increased mass near zero and fatness of non-zero distributions) • Many exponential distributions converge to Tweedie distributions (normal, Poisson, gamma, compound Poisson-gamma…) • Examples • Number of students enrolled by advisor in a month • Insurance claim payout • Heroin use per month
  • 4. Morse-Smale Regression Overview: I  To build intuition:  Imagine a soccer player kicking a ball on the ground of a hilly field to explore the field.  The high and low points (maxima and minima) determine where the ball will come to rest.  These paths of the ball define which parts of the field share common hills and valleys.  These paths are actually gradient paths defined by height on the field’s topological space.  The spaces they define are the Morse-Smale complex of the field, partitioning it into different regions (clusters).
  • 5. Morse-Smale Regression Overview: II  Morse-Smale clusters partition data space into sections with common minimums and maximums based on the function flow.  Groups can be visualized in low-dimensional space to see commonalities and differences (top right).  Groups can also be examined based on differences in predictor values (bottom right).  This provides users with a good visualization tool to understand the data. Example: 2 groups, 3 predictors
  • 6. Extending Morse-Smale Regression  Multivariate algorithms to fit partitioned regression models  Random forest  Bagged ensemble of tree models  Akin to combining novel summaries of a class randomly assigned a few chapters  Boosted regression  Iteratively added model of main effects and interaction terms  Akin to guessing a puzzle’s picture by adding key pieces until the picture is mostly there  Homotopy LASSO  Extends penalized regression model (LASSO) through homotopy estimation methods  Akin to a blind-folded person navigating around obstacles between two set points by following a rope  Conditional inference tree  Tree method that partitions space by assessing covariate independence  Extreme learning machine  Single-layer feed-forward neural networks based on random mapping between layers  Has universal approximation properties
  • 7. Simulation and Swedish Motor Insurance  Simulation  Simulation design parameters  4 true predictors, 11 noise variables  Sample size set to 10,000  Outcome Tweedie-distributed, with Tweedie parameter varying (1, 1.5, 2) and dispersion (1, 2, 4)  Nature of predictor relationships varied (4 main effects, 2 interaction effects, or a combination of 2 main effects and 1 interaction effect)  Each trial was run 10 times with a 70/30 training/test split.  Mean square error (MSE) was used to assess model accuracy.  Swedish 3rd Party Motor Insurance 1977  2182 observations with 6 predictors (kilometers traveled per year, geographic zone, bonus, car model make, number of years insured, total claims)  MSE assessed for all models based on 70/30 training/test split
  • 8. Simulation Results  Most multivariate Morse-Smale regression algorithms perform well against the original Morse-Smale regression algorithm, particularly for trials involving linear or mixed predictor relationships and trials with lower dispersion.  Some of these models outperformed their non-piecewise counterpart models, as well.  Even when algorithms perform similarly to non-piecewise counterparts, they provide a comparison of predictor importance among different risk subgroups and methods to visualize these differences (random forest model shown below).
  • 9. Swedish Motor Insurance Results: I  Most machine learning models perform well, and multivariate Morse-Smale regression methods perform exceptionally well.
  • 10. Swedish Motor Insurance Results: II  Three distinct subgroups were found, and risk type varied significantly between them.  Group 1: relatively high dependence on make and number of claims  Group 2: relatively high dependence on bonus and number of years insured  Group 3: almost solely dependent on number of claims and geographic zone
  • 11. Conclusions  Multivariate Morse-Smale regression models typically:  Outperform the original Morse-Smale regression algorithm  Perform comparably to the non-partitioned models built with the same machine learning algorithm.  Multivariate Morse-Smale regression models provide subgroup-based analytics capabilities and differentiated risk structure abilities that can help actuaries:  Better understand risk  Create models based on insurance policy risk groups (as well as risk level)  Visualize this process to help others within the industry understand the models (less black-box)  However, some black-box algorithms perform better on Tweedie regression problems (particularly Farrelly, 2017, KNN regression ensembles); these methods don’t allow for visualization or comparison of risk factors.  Large sample sizes are needed for good performance, but most insurance datasets are large enough to circumvent potential convergence issues.
  • 12. References  Talk is a summary of:  Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to Actuarial Science. arXiv preprint arXiv:1708.05712.—Accepted Dec 2017 for publication by Casualty Actuarial Society  Selected references from 2017 Farrelly paper:  De Jong, P., & Heller, G. Z. (2008). Generalized linear models for insurance data (Vol. 10). Cambridge: Cambridge University Press.  Farrelly, C. M. (2017). KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods. arXiv preprint arXiv:1708.02122.  Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse– smale regression. Journal of Computational and Graphical Statistics, 22(1), 193- 214.  McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression. Journal of the American Statistical Association, 65(331), 1109-1124.  Tomoda, K., Morino, K., Murata, H., Asaoka, R., & Yamanishi, K. (2016). Predicting Glaucomatous Progression with Piecewise Regression Model from Heterogeneous Medical Data. HEALTHINF, 2016.