SlideShare a Scribd company logo
HEALTH INSURANCE RATE ANALYSIS
AND PREDICTION
USING HEALTHCARE.GOV
MARKETPLACE DATA
By Sunitha Flowerhill
Big Data, BI, Hadoop Data lake Engineer and Architect
1
The Health Insurance Marketplace Public Use Files
(PUF) which contain data on health and dental plans
offered to individuals and small businesses through
the US Health Insurance Marketplace.
2
PROJECT DIRECTIONS, PROCEDURES, GOALS:
• DOWNLOAD NATIONWIDE DATASETS FROM HEALTHCARE.GOV
• LOOK AT THE METADATA AND SEE IF IT MATCHES WITH YOUR PROJECT GOALS.
• IDENTIFY THE BEST SUITED DATASET FROM THE DOWNLOADED BUNCH OF INSURANCE
DATASETS
• CLEANUP THE DATA USING JMP TOOLS : ROWS, COLS MENU, DATA FILTER, ROW
SELECTION ETC.
• NARROW IT DOWN TO STATE OF DELAWARE DATA
• PRELIMINARY ANALYSIS OF THE DATA – MARK THE NECESSARY COLUMNS, DELETE EMPTY
COLUMNS
• CHECK FOR CONSISTENCY OF DATA USING GRAPH BUILDER
• CONVERT THE CATEGORICAL VARIABLES: AGE TO NUMERIC, RATE TO CURRENCY, REMOVE
$ SYMBOL
• FURTHER CORE ANALYSIS: DECISION TREE, PARTIAL LEAST SQUARES, NEURAL NETWORKS
3
I have selected the huge individual rates file out of the 18 downloaded
datasets. Selected DE data, Cleaned up age column, made it numeric,
cleaned up rate column by removing dollar sign, removed insignificant
columns like tobacco for DE, eliminated empty columns. Tools used are
data filter, row selection, formula editor etc.
4
THE DATA
Now the rate_puf.csv became rate_DE.jmp with all clean data
5
There is steady increase
of rate per month, year
There is steady increase
of rate with age
Finding out which Issuer
holds most Business in
State of DE
Which issuer have
marked up and down
versions of Plans
Have done various analysis, to make sure I am choosing the correct X
factors.. There is an interesting 3D plot with Rate as Y, Age and version
number as X and Z.
6
The first analysis is the Partition decision tree – I chose this because of
the significant number of categorical variables. The major report
elements are towards the right.
7
Here is a beautiful story unfolding – from the insurance rates of state of
Delaware, from Healthcae.gov – out of 15,928 individuals, 1350 people
of prime age have 0 premium. The Major contributors of the premium
are listed in the green rectangle. Age is the most decision factor – 14
splits. The second is the version number, which I believe is the marked
up or down version of the same plan, by healthcare.gov – 8 splits, then
the issuer – various companies that offer healthcare plans. The rest of
the components are insignificant. Altogether 25 splits on the above
mentioned prime components. Decision tree is the best choice when
many of the variables are categorical. And there is only one Y, which is
the rate per individual.
8
3D TREE
The Rsquare looks good, Actual by Predicted Plot is symmetrical. 3 split
trials gave similar results
9
This is a Fit Model, partial least squares
10
The number of minimum factors is 8,
there is 16 factors for VIP
11
DENOTES
INCOMPLETE
MODEL
12
13
14
COMPARING PREDICTION PROFILERS :
PLS, DECISION TREE, NEURAL NETWORK
Out of curiosity, I compared the decision tree with another method –
partial least squares, which mostly support continuous variables. The
above mentioned prediction profiler sounds very interesting. Look at
the ways. Major factors in the rate prediction, in the state of Delaware
are 1. Age, (rate increases with age) Version numer (the higher the
number, lower the rate. Low version numbers have marked up
premium), then categorical variables such as issuerid1 and issuerid2
take up next places. We have 2014,15 and 16 data, there is constant
insignificant increase with month and year.
15
THE
BEGINNING...
LESSONS LEARNED, CONCLUSIONS, APPENDIX:
✓ START EARLY, MAKE EVERY EFFORT TO CLEAN DATA, ANALYZE AND RE-ANALYZE USING GRAPHS
✓ ELIMINATE UNWANTED DATA, GET OPTIMUM DATA FOR EVALUATION
➢ WHEN THERE ARE SIGNIFICANT CATEGORICAL VARIABLES, PARTITION DECISION TREE IS A GOOD
CHOICE.
➢ FIT MODEL->PLS ALSO ACCEPT A MIXTURE OF CATEGORICAL AND NUMERIC VARIABLES AND GIVES
OPTIMUM RESULTS.
➢ NEURAL NETWORKS WORKS WONDERS WITH LARGER CLEANER DATASETS.
➢ FROM ALL THE ANALYSIS, AGE, ISSUER, MARKED UP-DOWN VERSION NUMBER ARE THE MOST SIGNIFICANT
FACTORS IN DECIDING THE INDIVIDUAL RATE.
➢ FOR RATE PREDICTION, MAJOR COMPONENTS ARE:
➢ 1. AGE 2. VERSION NUMBER
➢ 3. ISSUERID, ISSUERID2 4. MONTH AND YEAR
APPENDIX:
HTTPS://DATA.HEALTHCARE.GOV/
HTTP://DHSS.DELAWARE.GOV/DHCC/
16

More Related Content

What's hot

New Ways for Predictive Analytics and Machine Learning to Advance Population ...
New Ways for Predictive Analytics and Machine Learning to Advance Population ...New Ways for Predictive Analytics and Machine Learning to Advance Population ...
New Ways for Predictive Analytics and Machine Learning to Advance Population ...
Edifecs Inc
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
Health Catalyst
 
Building a Data Warehouse at Clover (PDF)
Building a Data Warehouse at Clover (PDF)Building a Data Warehouse at Clover (PDF)
Building a Data Warehouse at Clover (PDF)
Otis Anderson
 
Getting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics ProgramGetting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics Program
J. Bryan Bennett, MBA, CPA, LSSGB
 
Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining
Splunk
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
Health Catalyst
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Health Catalyst
 
Data mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support SystemData mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support System
鴻鈞 王
 
Late Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data WarehousingLate Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data Warehousing
Health Catalyst
 
Introduction To Medical Data
Introduction To Medical DataIntroduction To Medical Data
Introduction To Medical Data
Dr Neelesh Bhandari
 
Healthcare Data Analytics Implementation
Healthcare Data Analytics ImplementationHealthcare Data Analytics Implementation
Healthcare Data Analytics Implementation
ALTEN Calsoft Labs
 
Hands-on Machine Learning Using Healthcare
Hands-on Machine Learning Using HealthcareHands-on Machine Learning Using Healthcare
Hands-on Machine Learning Using Healthcare
Health Catalyst
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
DeZyre
 
Building a Data Warehouse at Clover
Building a Data Warehouse at CloverBuilding a Data Warehouse at Clover
Building a Data Warehouse at Clover
Otis Anderson
 
Seattle code camp 2016 - Role of Data Science in Healthcare
Seattle code camp 2016  - Role of Data Science in HealthcareSeattle code camp 2016  - Role of Data Science in Healthcare
Seattle code camp 2016 - Role of Data Science in Healthcare
Gaurav Garg
 
Driving Healthcare Operations with Data Science
Driving Healthcare Operations with Data ScienceDriving Healthcare Operations with Data Science
Driving Healthcare Operations with Data Science
Sandy Ryza
 
Application of data science in healthcare
Application of data science in healthcareApplication of data science in healthcare
Application of data science in healthcare
ShreyaPai7
 
Machine Learning Misconceptions
Machine Learning MisconceptionsMachine Learning Misconceptions
Machine Learning Misconceptions
Health Catalyst
 
12 steps to better healthcare
12 steps to better healthcare12 steps to better healthcare
12 steps to better healthcare
Mark H. Davis
 
Secondary Use of Healthcare Data for Translational Research
Secondary Use of Healthcare Data for Translational ResearchSecondary Use of Healthcare Data for Translational Research
Secondary Use of Healthcare Data for Translational Research
Children's Hospital Informatics Program
 

What's hot (20)

New Ways for Predictive Analytics and Machine Learning to Advance Population ...
New Ways for Predictive Analytics and Machine Learning to Advance Population ...New Ways for Predictive Analytics and Machine Learning to Advance Population ...
New Ways for Predictive Analytics and Machine Learning to Advance Population ...
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 
Building a Data Warehouse at Clover (PDF)
Building a Data Warehouse at Clover (PDF)Building a Data Warehouse at Clover (PDF)
Building a Data Warehouse at Clover (PDF)
 
Getting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics ProgramGetting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics Program
 
Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
 
Data mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support SystemData mining paper survey for Health Care Support System
Data mining paper survey for Health Care Support System
 
Late Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data WarehousingLate Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data Warehousing
 
Introduction To Medical Data
Introduction To Medical DataIntroduction To Medical Data
Introduction To Medical Data
 
Healthcare Data Analytics Implementation
Healthcare Data Analytics ImplementationHealthcare Data Analytics Implementation
Healthcare Data Analytics Implementation
 
Hands-on Machine Learning Using Healthcare
Hands-on Machine Learning Using HealthcareHands-on Machine Learning Using Healthcare
Hands-on Machine Learning Using Healthcare
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Building a Data Warehouse at Clover
Building a Data Warehouse at CloverBuilding a Data Warehouse at Clover
Building a Data Warehouse at Clover
 
Seattle code camp 2016 - Role of Data Science in Healthcare
Seattle code camp 2016  - Role of Data Science in HealthcareSeattle code camp 2016  - Role of Data Science in Healthcare
Seattle code camp 2016 - Role of Data Science in Healthcare
 
Driving Healthcare Operations with Data Science
Driving Healthcare Operations with Data ScienceDriving Healthcare Operations with Data Science
Driving Healthcare Operations with Data Science
 
Application of data science in healthcare
Application of data science in healthcareApplication of data science in healthcare
Application of data science in healthcare
 
Machine Learning Misconceptions
Machine Learning MisconceptionsMachine Learning Misconceptions
Machine Learning Misconceptions
 
12 steps to better healthcare
12 steps to better healthcare12 steps to better healthcare
12 steps to better healthcare
 
Secondary Use of Healthcare Data for Translational Research
Secondary Use of Healthcare Data for Translational ResearchSecondary Use of Healthcare Data for Translational Research
Secondary Use of Healthcare Data for Translational Research
 

Similar to PREDICTION and RATE analysis: Health Insurance

SXSW: Open Data as an Open Challenge
SXSW: Open Data as an Open ChallengeSXSW: Open Data as an Open Challenge
SXSW: Open Data as an Open Challenge
RowdMap has joined Cotiviti
 
What Is Data Analytics?
What Is Data Analytics?What Is Data Analytics?
What Is Data Analytics?
McGohan Brabender
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey data
Alex Papageorgiou
 
What are Entry Level Data Analyst Jobs?: A Guide Skills
What are Entry Level Data Analyst Jobs?: A Guide Skills What are Entry Level Data Analyst Jobs?: A Guide Skills
What are Entry Level Data Analyst Jobs?: A Guide Skills
optnation1
 
U5 a1 stages in the decision making process
U5 a1 stages in the decision making processU5 a1 stages in the decision making process
U5 a1 stages in the decision making process
Peter R Breach
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
IDEAS - Int'l Data Engineering and Science Association
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
Dale Sanders
 
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
STAT 2103 Project 4  Performing a Multiple Linear Regress.docxSTAT 2103 Project 4  Performing a Multiple Linear Regress.docx
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
dessiechisomjj4
 
Data Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance CompanyData Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance Company
DILIP KUMAR
 
Maximising Capital Investments - is guesswork eroding your bottomline?
Maximising Capital Investments - is guesswork eroding your bottomline?Maximising Capital Investments - is guesswork eroding your bottomline?
Maximising Capital Investments - is guesswork eroding your bottomline?
Michael McKeon
 
eBook - Data Analytics in Healthcare
eBook - Data Analytics in HealthcareeBook - Data Analytics in Healthcare
eBook - Data Analytics in Healthcare
NextGen Healthcare
 
Data analytics
Data analyticsData analytics
Data analytics
Bhanu Pratap
 
Unit 3 Qualitative Data
Unit 3 Qualitative DataUnit 3 Qualitative Data
Unit 3 Qualitative Data
Sherry Bailey
 
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group Project
Erik Bebernes
 
2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science
Adam Rabinovitch
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
Trieu Nguyen
 
GSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-EditionGSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-EditionGang Li
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
Redox Engine
 
8 M&E: Data Sources
8 M&E: Data Sources8 M&E: Data Sources
8 M&E: Data Sources
Tony
 

Similar to PREDICTION and RATE analysis: Health Insurance (20)

SXSW: Open Data as an Open Challenge
SXSW: Open Data as an Open ChallengeSXSW: Open Data as an Open Challenge
SXSW: Open Data as an Open Challenge
 
What Is Data Analytics?
What Is Data Analytics?What Is Data Analytics?
What Is Data Analytics?
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey data
 
What are Entry Level Data Analyst Jobs?: A Guide Skills
What are Entry Level Data Analyst Jobs?: A Guide Skills What are Entry Level Data Analyst Jobs?: A Guide Skills
What are Entry Level Data Analyst Jobs?: A Guide Skills
 
U5 a1 stages in the decision making process
U5 a1 stages in the decision making processU5 a1 stages in the decision making process
U5 a1 stages in the decision making process
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
STAT 2103 Project 4  Performing a Multiple Linear Regress.docxSTAT 2103 Project 4  Performing a Multiple Linear Regress.docx
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
 
Data Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance CompanyData Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance Company
 
Maximising Capital Investments - is guesswork eroding your bottomline?
Maximising Capital Investments - is guesswork eroding your bottomline?Maximising Capital Investments - is guesswork eroding your bottomline?
Maximising Capital Investments - is guesswork eroding your bottomline?
 
eBook - Data Analytics in Healthcare
eBook - Data Analytics in HealthcareeBook - Data Analytics in Healthcare
eBook - Data Analytics in Healthcare
 
Data analytics
Data analyticsData analytics
Data analytics
 
Unit 3 Qualitative Data
Unit 3 Qualitative DataUnit 3 Qualitative Data
Unit 3 Qualitative Data
 
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group Project
 
2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
GSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-EditionGSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-Edition
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
 
8 M&E: Data Sources
8 M&E: Data Sources8 M&E: Data Sources
8 M&E: Data Sources
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 

PREDICTION and RATE analysis: Health Insurance

  • 1. HEALTH INSURANCE RATE ANALYSIS AND PREDICTION USING HEALTHCARE.GOV MARKETPLACE DATA By Sunitha Flowerhill Big Data, BI, Hadoop Data lake Engineer and Architect 1
  • 2. The Health Insurance Marketplace Public Use Files (PUF) which contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace. 2
  • 3. PROJECT DIRECTIONS, PROCEDURES, GOALS: • DOWNLOAD NATIONWIDE DATASETS FROM HEALTHCARE.GOV • LOOK AT THE METADATA AND SEE IF IT MATCHES WITH YOUR PROJECT GOALS. • IDENTIFY THE BEST SUITED DATASET FROM THE DOWNLOADED BUNCH OF INSURANCE DATASETS • CLEANUP THE DATA USING JMP TOOLS : ROWS, COLS MENU, DATA FILTER, ROW SELECTION ETC. • NARROW IT DOWN TO STATE OF DELAWARE DATA • PRELIMINARY ANALYSIS OF THE DATA – MARK THE NECESSARY COLUMNS, DELETE EMPTY COLUMNS • CHECK FOR CONSISTENCY OF DATA USING GRAPH BUILDER • CONVERT THE CATEGORICAL VARIABLES: AGE TO NUMERIC, RATE TO CURRENCY, REMOVE $ SYMBOL • FURTHER CORE ANALYSIS: DECISION TREE, PARTIAL LEAST SQUARES, NEURAL NETWORKS 3
  • 4. I have selected the huge individual rates file out of the 18 downloaded datasets. Selected DE data, Cleaned up age column, made it numeric, cleaned up rate column by removing dollar sign, removed insignificant columns like tobacco for DE, eliminated empty columns. Tools used are data filter, row selection, formula editor etc. 4
  • 5. THE DATA Now the rate_puf.csv became rate_DE.jmp with all clean data 5
  • 6. There is steady increase of rate per month, year There is steady increase of rate with age Finding out which Issuer holds most Business in State of DE Which issuer have marked up and down versions of Plans Have done various analysis, to make sure I am choosing the correct X factors.. There is an interesting 3D plot with Rate as Y, Age and version number as X and Z. 6
  • 7. The first analysis is the Partition decision tree – I chose this because of the significant number of categorical variables. The major report elements are towards the right. 7
  • 8. Here is a beautiful story unfolding – from the insurance rates of state of Delaware, from Healthcae.gov – out of 15,928 individuals, 1350 people of prime age have 0 premium. The Major contributors of the premium are listed in the green rectangle. Age is the most decision factor – 14 splits. The second is the version number, which I believe is the marked up or down version of the same plan, by healthcare.gov – 8 splits, then the issuer – various companies that offer healthcare plans. The rest of the components are insignificant. Altogether 25 splits on the above mentioned prime components. Decision tree is the best choice when many of the variables are categorical. And there is only one Y, which is the rate per individual. 8
  • 9. 3D TREE The Rsquare looks good, Actual by Predicted Plot is symmetrical. 3 split trials gave similar results 9
  • 10. This is a Fit Model, partial least squares 10
  • 11. The number of minimum factors is 8, there is 16 factors for VIP 11
  • 13. 13
  • 14. 14
  • 15. COMPARING PREDICTION PROFILERS : PLS, DECISION TREE, NEURAL NETWORK Out of curiosity, I compared the decision tree with another method – partial least squares, which mostly support continuous variables. The above mentioned prediction profiler sounds very interesting. Look at the ways. Major factors in the rate prediction, in the state of Delaware are 1. Age, (rate increases with age) Version numer (the higher the number, lower the rate. Low version numbers have marked up premium), then categorical variables such as issuerid1 and issuerid2 take up next places. We have 2014,15 and 16 data, there is constant insignificant increase with month and year. 15
  • 16. THE BEGINNING... LESSONS LEARNED, CONCLUSIONS, APPENDIX: ✓ START EARLY, MAKE EVERY EFFORT TO CLEAN DATA, ANALYZE AND RE-ANALYZE USING GRAPHS ✓ ELIMINATE UNWANTED DATA, GET OPTIMUM DATA FOR EVALUATION ➢ WHEN THERE ARE SIGNIFICANT CATEGORICAL VARIABLES, PARTITION DECISION TREE IS A GOOD CHOICE. ➢ FIT MODEL->PLS ALSO ACCEPT A MIXTURE OF CATEGORICAL AND NUMERIC VARIABLES AND GIVES OPTIMUM RESULTS. ➢ NEURAL NETWORKS WORKS WONDERS WITH LARGER CLEANER DATASETS. ➢ FROM ALL THE ANALYSIS, AGE, ISSUER, MARKED UP-DOWN VERSION NUMBER ARE THE MOST SIGNIFICANT FACTORS IN DECIDING THE INDIVIDUAL RATE. ➢ FOR RATE PREDICTION, MAJOR COMPONENTS ARE: ➢ 1. AGE 2. VERSION NUMBER ➢ 3. ISSUERID, ISSUERID2 4. MONTH AND YEAR APPENDIX: HTTPS://DATA.HEALTHCARE.GOV/ HTTP://DHSS.DELAWARE.GOV/DHCC/ 16