SlideShare a Scribd company logo
1 of 6
Midterm Project
Due 03/25/2020 11:55 PM
The goal of the project is to model and understand the socio-
economic factors affecting cancer mortality.
The data were aggregated from a number of sources including
the American Community Survey
(census.gov (http://census.gov)), clinicaltrials.gov
(http://clinicaltrials.gov), and cancer.gov
(http://cancer.gov). The data dictionary is provided in the
Appendix. We will attempt to predict cancer
mortality in different counties in the nation
(TARGET_deathRate) and try to understand how different
socio-economic factors might influence health and mortality.
The data has been portioned into two (1) CancerData.CSV, and
(2) CancerHoldoutData.csv. Use
CancerData.csv for model training, parameter tuning (if any),
etc. CancerHoldoutData.csv should only be
used for evaluation of model performance. It should not be
used in anyway in the model development
process.
Analyze the following. Note that the items need not be
presented in a sequential order. You can address
them in any order. For example, missing data analysis can be
integrated with regression analysis.
1. Exploratory Data analysis 20 Points
mortality from exploratory data
analysis? Why?
How does addressing outliers affect
model performance?
techniques to handle missing values. Note
that the approach to handle missing data might be different for
different variables. Document
model performance improvement obtained by missing data
handling.
detected? Document how addressing
collinearity affects model performance?
2. Linear Regression 25 Points
removing insignificant variables affect
model performance?
ent and interpret model diagnosis. What insights did you
obtain to improve the model from
diagnosis?
-linear and interaction terms and evaluate
how they affect model performance
and diagnosis.
3. KNN
data into 70% training and 30% testing.
Evaluate test MSE for at least 5 different
values of K and find the K that minimizes test MSE. 20 Points
-linear technique, but does not work well with
high dimensional data. Try to
identify important variables from Linear Regression model and
use only a subset of important
features in the KNN model. Document impact on test
performance 20 Points
http://cancer.gov/
4. Feature Selection 10 Points
Write an “Executive Summary” section documenting your
interpretation of the important features
impacting cancer mortality and how they influence cancer
mortality.
5. Performance reporting on Holdout data 5 Points
Summarize and compare the model performance (MSE) of LR
and KNN on holdout dataset as a table.
Appendix: Data Dictionary
1. TARGET_deathRate: Dependent variable. Mean per capita
(100,000) cancer mortalities
2. incidenceRate: Mean per capita (100,000) cancer diagnoses
3. medianIncome: Median income per county
4. povertyPercent: Percent of populace in poverty
5. MedianAge: Median age of county residents
6. MedianAgeMale: Median age of male county residents
7. MedianAgeFemale: Median age of female county residents
8. Geography: County name
9. AvgHouseholdSize: Mean household size of county
10. PercentMarried: Percent of county residents who are
married
11. PctNoHS18_24: Percent of county residents ages 18-24
highest education attained: less than high
school
12. PctHS18_24: Percent of county residents ages 18-24 highest
education attained: high school
diploma
13. PctSomeCol18_24: Percent of county residents ages 18-24
highest education attained: some
college
14. PctBachDeg18_24: Percent of county residents ages 18-24
highest education attained: bachelor's
degree
15. PctPrivateCoverage: Percent of county residents with
private health coverage
16. PctPublicCoverage: Percent of county residents with
government-provided health coverage
17. PctPubliceCoverageAlone: Percent of county residents with
government-provided health
coverage alone
18. PctWhite: Percent of county residents who identify as White
19. PctBlack: Percent of county residents who identify as Black
20. PctAsian: Percent of county residents who identify as Asian
21. PctOtherRace: Percent of county residents who identify in a
category which is not White, Black,
or Asian
22. PctMarriedHouseholds: Percent of married households

More Related Content

Similar to Cancer Mortality Prediction Using Socioeconomic Factors

CA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdf
CA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdfCA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdf
CA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdfpayecat828
 
IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...
IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...
IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...IRJET Journal
 
Market analysisMarket analysisMarket analysisName.docx
Market analysisMarket analysisMarket analysisName.docxMarket analysisMarket analysisMarket analysisName.docx
Market analysisMarket analysisMarket analysisName.docxalfredacavx97
 
Miners and quarry workers have the highest suicide rates of any job in the U.S.
Miners and quarry workers have the highest suicide rates of any job in the U.S.Miners and quarry workers have the highest suicide rates of any job in the U.S.
Miners and quarry workers have the highest suicide rates of any job in the U.S.Δρ. Γιώργος K. Κασάπης
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationIRJET Journal
 
Department of Health InformaticsHealth Information Managemen
Department of Health InformaticsHealth Information ManagemenDepartment of Health InformaticsHealth Information Managemen
Department of Health InformaticsHealth Information ManagemenLinaCovington707
 
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01  Points 30ACriminal Justice Statistics Lab 4CRJS-3020-01  Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30ACruzIbarra161
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
Harnessing Data to Improve Health Equity - Dr. Ali Mokdad
Harnessing Data to Improve Health Equity - Dr. Ali MokdadHarnessing Data to Improve Health Equity - Dr. Ali Mokdad
Harnessing Data to Improve Health Equity - Dr. Ali MokdadLauren Johnson
 
Kaouthar lbiati-health-composite-indicator
Kaouthar lbiati-health-composite-indicatorKaouthar lbiati-health-composite-indicator
Kaouthar lbiati-health-composite-indicatorKaouthar Lbiati (MD)
 
Data Preparation and Visualization for Monitoring NCDs Mortality
Data Preparation and Visualization for Monitoring NCDs MortalityData Preparation and Visualization for Monitoring NCDs Mortality
Data Preparation and Visualization for Monitoring NCDs MortalityRamon Martinez
 
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...IRJET Journal
 
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...Leonard Davis Institute of Health Economics
 
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
 
Big data approaches to healthcare systems
Big data approaches to healthcare systemsBig data approaches to healthcare systems
Big data approaches to healthcare systemsShubham Jain
 
Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...
Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...
Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...CityAge
 

Similar to Cancer Mortality Prediction Using Socioeconomic Factors (20)

CA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdf
CA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdfCA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdf
CA A Cancer J Clinicians - 2021 - Siegel - Cancer Statistics 2021.pdf
 
IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...
IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...
IRJET- Analysis of Automated Detection of WBC Cancer Diseases in Biomedical P...
 
Market analysisMarket analysisMarket analysisName.docx
Market analysisMarket analysisMarket analysisName.docxMarket analysisMarket analysisMarket analysisName.docx
Market analysisMarket analysisMarket analysisName.docx
 
Miners and quarry workers have the highest suicide rates of any job in the U.S.
Miners and quarry workers have the highest suicide rates of any job in the U.S.Miners and quarry workers have the highest suicide rates of any job in the U.S.
Miners and quarry workers have the highest suicide rates of any job in the U.S.
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease Classification
 
Department of Health InformaticsHealth Information Managemen
Department of Health InformaticsHealth Information ManagemenDepartment of Health InformaticsHealth Information Managemen
Department of Health InformaticsHealth Information Managemen
 
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01  Points 30ACriminal Justice Statistics Lab 4CRJS-3020-01  Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
Harnessing Data to Improve Health Equity - Dr. Ali Mokdad
Harnessing Data to Improve Health Equity - Dr. Ali MokdadHarnessing Data to Improve Health Equity - Dr. Ali Mokdad
Harnessing Data to Improve Health Equity - Dr. Ali Mokdad
 
Kaouthar lbiati-health-composite-indicator
Kaouthar lbiati-health-composite-indicatorKaouthar lbiati-health-composite-indicator
Kaouthar lbiati-health-composite-indicator
 
Data Preparation and Visualization for Monitoring NCDs Mortality
Data Preparation and Visualization for Monitoring NCDs MortalityData Preparation and Visualization for Monitoring NCDs Mortality
Data Preparation and Visualization for Monitoring NCDs Mortality
 
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
 
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
 
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
Big data approaches to healthcare systems
Big data approaches to healthcare systemsBig data approaches to healthcare systems
Big data approaches to healthcare systems
 
Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...
Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...
Data Science Meets Healthcare: The Advent of Personalized Medicine - Jacomo C...
 

More from jessiehampson

Milestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docxMilestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docxjessiehampson
 
Migration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docxMigration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docxjessiehampson
 
Min-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docxMin-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docxjessiehampson
 
Mingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docxMingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docxjessiehampson
 
Miller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docxMiller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docxjessiehampson
 
Migrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docxMigrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docxjessiehampson
 
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docxMike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docxjessiehampson
 
Michelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docxMichelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docxjessiehampson
 
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report     7Midterm Lab ReportIntroductionCell.docxMidterm Lad Report     7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docxjessiehampson
 
MicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docxMicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docxjessiehampson
 
MILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docxMILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docxjessiehampson
 
midtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docxmidtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docxjessiehampson
 
Midterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docxMidterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docxjessiehampson
 
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docxMGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docxjessiehampson
 
Microsoft Word Editing Version 1.0Software Requirement Speci.docx
Microsoft Word Editing  Version 1.0Software Requirement Speci.docxMicrosoft Word Editing  Version 1.0Software Requirement Speci.docx
Microsoft Word Editing Version 1.0Software Requirement Speci.docxjessiehampson
 
Microsoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docxMicrosoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docxjessiehampson
 
MGT520 Critical Thinking Writing Rubric - Module 10 .docx
MGT520  Critical Thinking Writing Rubric - Module 10   .docxMGT520  Critical Thinking Writing Rubric - Module 10   .docx
MGT520 Critical Thinking Writing Rubric - Module 10 .docxjessiehampson
 
Midterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docxMidterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docxjessiehampson
 
Miami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docxMiami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docxjessiehampson
 
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docxMGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docxjessiehampson
 

More from jessiehampson (20)

Milestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docxMilestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docx
 
Migration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docxMigration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docx
 
Min-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docxMin-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docx
 
Mingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docxMingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docx
 
Miller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docxMiller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docx
 
Migrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docxMigrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docx
 
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docxMike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
 
Michelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docxMichelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docx
 
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report     7Midterm Lab ReportIntroductionCell.docxMidterm Lad Report     7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docx
 
MicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docxMicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docx
 
MILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docxMILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docx
 
midtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docxmidtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docx
 
Midterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docxMidterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docx
 
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docxMGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
 
Microsoft Word Editing Version 1.0Software Requirement Speci.docx
Microsoft Word Editing  Version 1.0Software Requirement Speci.docxMicrosoft Word Editing  Version 1.0Software Requirement Speci.docx
Microsoft Word Editing Version 1.0Software Requirement Speci.docx
 
Microsoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docxMicrosoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docx
 
MGT520 Critical Thinking Writing Rubric - Module 10 .docx
MGT520  Critical Thinking Writing Rubric - Module 10   .docxMGT520  Critical Thinking Writing Rubric - Module 10   .docx
MGT520 Critical Thinking Writing Rubric - Module 10 .docx
 
Midterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docxMidterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docx
 
Miami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docxMiami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docx
 
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docxMGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
 

Recently uploaded

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 

Recently uploaded (20)

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 

Cancer Mortality Prediction Using Socioeconomic Factors

  • 1. Midterm Project Due 03/25/2020 11:55 PM The goal of the project is to model and understand the socio- economic factors affecting cancer mortality. The data were aggregated from a number of sources including the American Community Survey (census.gov (http://census.gov)), clinicaltrials.gov (http://clinicaltrials.gov), and cancer.gov (http://cancer.gov). The data dictionary is provided in the Appendix. We will attempt to predict cancer mortality in different counties in the nation (TARGET_deathRate) and try to understand how different socio-economic factors might influence health and mortality. The data has been portioned into two (1) CancerData.CSV, and (2) CancerHoldoutData.csv. Use CancerData.csv for model training, parameter tuning (if any), etc. CancerHoldoutData.csv should only be used for evaluation of model performance. It should not be used in anyway in the model development process.
  • 2. Analyze the following. Note that the items need not be presented in a sequential order. You can address them in any order. For example, missing data analysis can be integrated with regression analysis. 1. Exploratory Data analysis 20 Points mortality from exploratory data analysis? Why? How does addressing outliers affect model performance? techniques to handle missing values. Note that the approach to handle missing data might be different for different variables. Document model performance improvement obtained by missing data handling. detected? Document how addressing collinearity affects model performance? 2. Linear Regression 25 Points
  • 3. removing insignificant variables affect model performance? ent and interpret model diagnosis. What insights did you obtain to improve the model from diagnosis? -linear and interaction terms and evaluate how they affect model performance and diagnosis. 3. KNN data into 70% training and 30% testing. Evaluate test MSE for at least 5 different values of K and find the K that minimizes test MSE. 20 Points -linear technique, but does not work well with high dimensional data. Try to identify important variables from Linear Regression model and use only a subset of important features in the KNN model. Document impact on test performance 20 Points http://cancer.gov/
  • 4. 4. Feature Selection 10 Points Write an “Executive Summary” section documenting your interpretation of the important features impacting cancer mortality and how they influence cancer mortality. 5. Performance reporting on Holdout data 5 Points Summarize and compare the model performance (MSE) of LR and KNN on holdout dataset as a table. Appendix: Data Dictionary 1. TARGET_deathRate: Dependent variable. Mean per capita (100,000) cancer mortalities 2. incidenceRate: Mean per capita (100,000) cancer diagnoses 3. medianIncome: Median income per county 4. povertyPercent: Percent of populace in poverty 5. MedianAge: Median age of county residents 6. MedianAgeMale: Median age of male county residents
  • 5. 7. MedianAgeFemale: Median age of female county residents 8. Geography: County name 9. AvgHouseholdSize: Mean household size of county 10. PercentMarried: Percent of county residents who are married 11. PctNoHS18_24: Percent of county residents ages 18-24 highest education attained: less than high school 12. PctHS18_24: Percent of county residents ages 18-24 highest education attained: high school diploma 13. PctSomeCol18_24: Percent of county residents ages 18-24 highest education attained: some college 14. PctBachDeg18_24: Percent of county residents ages 18-24 highest education attained: bachelor's degree 15. PctPrivateCoverage: Percent of county residents with private health coverage 16. PctPublicCoverage: Percent of county residents with government-provided health coverage 17. PctPubliceCoverageAlone: Percent of county residents with
  • 6. government-provided health coverage alone 18. PctWhite: Percent of county residents who identify as White 19. PctBlack: Percent of county residents who identify as Black 20. PctAsian: Percent of county residents who identify as Asian 21. PctOtherRace: Percent of county residents who identify in a category which is not White, Black, or Asian 22. PctMarriedHouseholds: Percent of married households