SlideShare a Scribd company logo
1 of 49
Download to read offline
The Burden of Disease
Group 30
Aman Desai, Jim Huang, Gloria Marín, Carmen Chen, Dimitris Charitos, Lorenzo Gherardi
Index
• Introduction to the case
• Methodology
• Data Visualization
• Initial Regressions
• Moving Forward
The Burden of Disease
[Slogan here]
Index
• Introduction to the case
• Main variables
• Exploratory Analysis
• Initial Regressions
• Moving Forward
DALYs
Disability
Adjusted
Life
Years
• Metric used to measure the
Burden of Disease
• DALY includes the sum of
mortality and morbidity due
to a specific disease
• One DALY = loss of 1 year
in good health because of
• Premature death
• Disease
• Disability
- Mortality is used as a method to assess a
population's health
- Through ‘child mortality’
- Through ‘life expectancy’
- The Problem with this method is that it does
not account for a population that lives through
suffering due to a disease which otherwise
prevents a normal life.
- For people to get healthy, attention needs to
be given to the impact on the lives of people
suffering with disease. Years of contribution to
ones one’s community, industry, and nation,
are lost.
ASSIGNMENT PURPOSE:
1. Understand causes
2. Identify factors that magnify its impact
Introduction to the Case
The Burden of Disease
[Slogan here]
Introduction to the Case: background and preparation
Communicable diseases
Non-communicable
diseases (NCDs)
Injuries
Diarrhoea, lower
respiratory & other
common infectious
diseases
Cardiovascular diseases
(inc. stroke, heart disease
and heart failure)
Road injuries
Neonatal disorders Cancers
Other transport
injuries
Maternal disorders Respiratory disease Falls
Malaria & neglected
tropical diseases
Diabetes, blood and
endocrine diseases
Drowning
Nutritional deficiencies
Mental and substance use
disorders
Fire, heat and hot
substances
HIV/AIDS Liver diseases Poisonings
Tuberculosis Digestive diseases Self-harm
Other communicable
diseases
Musculoskeletal disorders Interpersonal violence
Neurological disorders
(including dementia)
Conflict & terrorism
Other NCDs Natural disasters
Methodology: Linear Model and Data Visualization
Step 1. Identify relevant variables
• Explanatory Variable (x): Select factors covering the following dimensions from hundreds of
other factors: Diet habit (E.g., fruit consumption), Healthcare level (E.g., healthcare expense),
Living habit (smoking %), and Other demographics (E.g., education, overweight %)
• Response Variable (y): Choose ‘Overall DALYs’, ‘Communicable Diseases DALYs’, ‘Non-
Communicable Diseases DALYs’, and ‘Injuries DALYs’ as our response variable from 24
possible variable by comparing the models
Step 2. Check for non-linear relations
Step 3. Generate the linear regression and prediction model
• Dropped all the insignificant level
• Checked the VIF to eliminate the risk of multilinearity
Data Visualization
• Bar Chart and Stacked Bar Chart: Compare causes of DALYs by continent
• Area Chart: Look into the DALY rate over 2000~2017 by continent
• Scatter Plot: Measure DALY due to Proportion of GDP spent on Healthcare by continent
Linear Model
Accumulated DALYs per Capita 1980 - 2017
Due to Communicable Diseases
Due to Non-Communicable
Diseases
Due to Injuries
• Overall, Africa has the highest accumulated
average DALY rate (9), followed by Asia (5),
and Oceania (4).
• The high contrast of Africa is mainly due to
communicable diseases, with a rate triple of
that of the next highest continent, Asia
• DALY rate for non-communicable diseases
and injuries are relatively uniform around the
world.
• Africa's communicable DALY rate has
declined since 2008. Despite this, the
burden of disease on the continent remains
high and this leaves room to consider the
causes and potential solutions for this.
Summary
Communicable DALYs over 1980 - 2017
Results & Conclusion: Data Visualization (1)
• The variables affecting DALY have been
further broken down. The largest contribution
factor found were:
Ø Cardiovascular Diseases for Non-
Communicable Diseases, and
Ø Unintentional Injuries for Injuries.
• There was no significant cause of disease
found across the different continents.
• GDP has a negative correlation to DALY
for Communicable Diseases, however, the
proprtion of GDP used has a positive
corrlation to the same. This could be
because poorer countries have a higher
liklihood of having to combat communicable
diseases and as a result spend more of their
GDP on healthcare.
Summary
Results & Conclusion: Data Visualization (2)
Causes of DALYs by Continent
due to Non-Communicable Disease
Causes of DALYs by Continent
due to Injuries
Healthcare Expense vs. DALY from
Non-Communicable Disease
Healthcare Expense vs. DALY from
Communicable Disease
Analysis of the Correlation among Variables
1. All explanatory variables’ Pr(>|t|) < 0.01
Results & Conclusion: Final Model
DALY= 66523 - 202.37 overweight% - 33.86 veg_consump - 1030.84 animal_protein_consump -534.61 education - 8.67
pocket_per_cap - 40.86 fruit_consump -7140.44 Asia + 13792.58 Africa -9335.46 NorthA -5196.99 Europe -9146.72 SouthA
Model of best fit
2. VIF (Variance Inflation Factor) is <10
• By ruling out all insignificant variables,
we had 7 variables in our best model.
• The risk of multicollinearity was checked
by ensuring that VIF <10.
• The high R-squared obtained (75.35%)
suggests that the model explains the
variance of DALY accurately.
Summary
Results & Conclusion: Prediction
Step 1. Using our linear model, we have estimated the DALY rates worldwide for
2013 using our data for all the years until 2012.
Step 2. The data was filtered to all periods before 2013 and a linear model was created.
Step 3. Using the Linear Model, data for 2013 was predicted.
Step 4. Compared to the actual data available for 2013, the accuracy was determined
Prediction Accuracy was 85.7%
Prediction
Moving Forward: Adding New Variables
What other ‘external’ elements may be magnifying results?
COMMON TO ALL
• Percentage of population insured with health insurance.
• Number of medical doctors per 1,000 people.
• Number of nurses per 1,000 people.
• Out-of-pocket expenditure for healthcare.
SPECIFIC TO
a) Communicable, maternal, neonatal, and nutritional diseases
• Nutritional deficiencies.
• Hygiene practices.
• Housing space per person.
b) Non-communicable diseases
• Physical inactivity.
• Wellbeing.
• Genetics.
c) Injuries
• Surveillance.
• Regulations for safety.
The Burden of Disease
Group 30 - Aman Desai, Jim Huang, Gloria Marín, Carmen Chen, Dimitris Charitos, Lorenzo Gherardi
Introduction
A glance of DALY
Linear Regressions
DALY= 66523 - 202.37 overweight% - 33.86 veg_consum - 1030.84 animal_consum -534.61 education - 8.67 pocket/cap
- 40.86 fruit_consum -7140.44 Asia + 13792.58 Africa -9335.46 NorthA -5196.99 Europe -9146.72 SouthA
All explanatory variables’ Pr(>|t|) < 0.01 VIF (Variance Inflation Factor) is <10
Conclusions
Moving Forward
Methodology (Model)
Model of best fit
Disability
Adjusted
Life
Years
DALYs
• Metric used to measure the Burden of Disease
• It includes the sum of mortality and morbidity
• DALY = loss of 1 year in good health because of
Premature death, Disease, Disability
Aim of study
• Understand causes
• Identify factors that magnify the impact
Background & Preparation
Burden of Disease, 2017
Disease Burden due to Communicable disease vs GDP per capita
Category of Disease
• Communicable disease
• Non-Communicable
disease (e.g., Cancers)
• Injuries (e.g., Falls, Fire)
DALY Around the World Due to Communicable Disease
Due to Non-Communicable Disease
Step 1. Identify relevant variables
• Explanatory Variable (x): Select factors covering following
dimensions from hundreds of other factors: Diet habit (E.g.,
fruit consumption), Healthcare level (E.g., healthcare expense),
Living habit (smoking %), and Other demographics (E.g.,
education, overweight %
• Response Variable (y): Choose ‘Overall DALYs’, ‘Communicable
Diseases DALYs’, ‘Non-Communicable Diseases DALYs’, and
‘Injuries DALYs’ as our response variable from 24 possible
variable by comparing the models
Step 2. Check for non-linear relations
Step 3. Generate the regression model
• Dropped all the insignificant level
• Checked the VIF to eliminate the risk of multilinearity
Statistical Technique
Healthcare Expense vs.
DALY from Non- and
Communicable Disease
Stacked Bar Chart - Causes of
DALYs by continent for Non-
Communicable Disease and
Injuries
What other ‘external’ elements may be magnifying results?
• Common to all::
• Percentage of population insured with health insurance
• Number of medical doctors/Nurse per 1,000 people
• Specific to
a) Communicable diseases : Family size
b) Non-communicable diseases: Literacy rate
c) Injuries: Alcohol consumption
Linear Regression
• The best model had 7 variables (overweight%, veg_consum, animal_consum,
education, pocket/cap, fruit _consum, continent) including in best model with
all the variavles Pr(>|t|) < 0.01 and VIF<10
• High R-squared (75.35%) suggests the model explain the variance of DALY well
Data Visualization
• Africa has the highest Avg. DALY rate (c 9), followed by Asia (c 5), and Oceania.
• The high contrast of Africa is mainly due to communicable diseases with a rate
more than triples the second highest continent.
• Africa‘s communicable DALY rate declines since 2008 but remain high over other
continents, leaving room to further consider the causes and potential solutions
Communicable DALYs over 1980 - 2017
Analysis of Correlation among
Variables
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 1/39
CM30_GroupProject_SG30
Team 30
2021-02-14
1 Burden of Disease
Mortality rates are a common method used to assess a population’s health. Often used rates for such assessment
include child mortality or life expectancy. However, a focus on mortality neglects the suffering caused to people who
still live with the disease. A disease impacts, in a direct or indirect manner, the ability of living a normal life. Potential
contributions to one’s community, work, or nation, are often lost.
Our study, therefore, seeks to understand the magnitude of the burden of diseases by the different disease types, as
well as identify factors that amplify such effects.
The metric that will be used to measure disease burden is called DALY, which stands for Disability Adjusted Life Years.
This metric includes the sum of mortality and morbidity. One DALY stands for 1 year loss in good health due to either
premature death, disease, or disability.
1.1 Data import and inspection
1.1.0.1 Importing data for overall disease burden (DALY)
Rows: 48,698
Columns: 7
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ total_population_gapminder_hyde_un <dbl> …
$ continent <chr> …
$ health_expenditure_per_capita_current_us <dbl> …
$ dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate <dbl> …
Code
Hide
#source: https://ourworldindata.org/burden-of-disease
# Reading first file
daly_total <- read_csv(here::here('Data',"disease-burden-vs-health-expenditure-per-capita.csv")) %>%
clean_names()
# Checking for variable types
glimpse(daly_total)
Hide
# Changing variable names and variable types
daly_total<- daly_total %>%
mutate(
location=as.factor(entity),
period=year,
health_expenditure_per_capita=health_expenditure_per_capita_current_us,
daly_adjusted=dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate,
total_population = total_population_gapminder_hyde_un) %>%
select(location,period,daly_adjusted,health_expenditure_per_capita,total_population)
1 Burden of Disease
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 2/39
Although important as a whole, DALY rates can futher be divided into 3 sub-categories of disease cause; these being:
communicable diseases, non-communicable diseases, and injuries. We, therefore, included the datasets for each
individual subcategory below.
1.1.0.2 Adding data for burden of non-communicable diseases
Rows: 6,468
Columns: 4
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rate <dbl> …
Rows: 6,468
Columns: 3
$ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
$ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,…
$ daly_ncds <dbl> 41145.51, 40587.17, 39644.60, 39821.31, 40641.76, 40790.73,…
1.1.0.3 Adding data for burden from communicable, neonatal, maternal and nutritional diseases
Rows: 6,468
Columns: 4
$ entity
<chr> …
$ code
<chr> …
$ year
<dbl> …
$ dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex_both_age_age_stan
dardized_rate <dbl> …
Hide
#source:https://ourworldindata.org/burden-of-disease
#Reading the file
ncds <- read_csv(here::here('Data',"burden-of-disease-rates-from-ncds.csv")) %>%
clean_names()
# Checking for variable types
glimpse(ncds)
# Changing variable names and variable types
ncds<- ncds %>%
mutate(location=as.factor(entity),
period=year,
daly_ncds=dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rat
e) %>%
select(location,period,daly_ncds)
glimpse(ncds)
#Merging data frames
total <- merge(daly_total,ncds,by=c("location","period"))
Hide
#source:https://ourworldindata.org/burden-of-disease
#Reading the file
cnmnd <- read_csv(here::here('Data',"burden-of-disease-rates-from-communicable-neonatal-maternal-nutritional-disease
s.csv")) %>%
clean_names()
# Checking for variable types
glimpse(cnmnd)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 3/39
Rows: 6,468
Columns: 3
$ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghan…
$ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999…
$ daly_cnmnd <dbl> 51181.84, 47263.29, 38908.25, 36882.69, 38809.79, 38262.20…
1.1.0.4 Adding data for burden from injuries, violence, self-harm and accidents
Rows: 6,468
Columns: 4
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate <dbl> …
Rows: 6,468
Columns: 3
$ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
$ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,…
$ daly_ivsa <dbl> 11775.715, 13390.289, 12365.622, 11530.363, 13546.148, 1238…
Within each of the 3 sub-categories of disease causes, there are speci c diseases that classify as such. We included all
categories in our dataset.
1.1.0.5 Adding data for disease burden by cause (DALY by cause)
# Changing variable names and variable types
cnmnd<- cnmnd %>%
mutate(location=as.factor(entity),
period=year,
daly_cnmnd=dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex
_both_age_age_standardized_rate) %>%
select(location,period,daly_cnmnd)
glimpse(cnmnd)
Hide
#Merging data frames
total <- merge(total,cnmnd,by=c("location","period"))
Hide
#source:https://ourworldindata.org/burden-of-disease
#Reading the file
ivsa <- read_csv(here::here('Data',"burden-of-disease-rates-from-injuries.csv")) %>%
clean_names()
# Checking for variable types
glimpse(ivsa)
Hide
# Changing variable names and variable types
ivsa<- ivsa %>%
mutate(location=as.factor(entity),
period=year,
daly_ivsa=dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate) %>%
select(location,period,daly_ivsa)
glimpse(ivsa)
Hide
#Merging data frames
total <- merge(total,ivsa,by=c("location","period"))
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 4/39
Aside from the main variables, additional variables that may be contributing to the nal effect of DALY rates were
included in the dataset.
1.1.0.6 Adding data for GDP per capita
Hide
#source: https://ourworldindata.org/burden-of-disease
# Reading second file
daly_by_cause <- read_csv(here::here('Data',"burden-of-disease-by-cause.csv")) %>%
clean_names()
# Checking for variable types
#glimpse(daly_by_cause)
# Changing variable names and variable types
daly_by_cause <- daly_by_cause %>%
mutate(
location=as.factor(entity),
period=year,
daly_conflict_terrorism=dal_ys_disability_adjusted_life_years_conflict_and_terrorism_sex_both_age_all_ages_numbe
r,
daly_hiv_tuberculosis=dal_ys_disability_adjusted_life_years_hiv_aids_and_tuberculosis_sex_both_age_all_ages_numbe
r,
daly_diahrrea_respiratory=dal_ys_disability_adjusted_life_years_diarrhea_lower_respiratory_and_other_common_infec
tious_diseases_sex_both_age_all_ages_number,
daly_cvs=dal_ys_disability_adjusted_life_years_cardiovascular_diseases_sex_both_age_all_ages_number,
daly_self_harm=dal_ys_disability_adjusted_life_years_self_harm_sex_both_age_all_ages_number,
daly_violence=dal_ys_disability_adjusted_life_years_interpersonal_violence_sex_both_age_all_ages_number,
daly_nutritional_deficiencies=dal_ys_disability_adjusted_life_years_nutritional_deficiencies_sex_both_age_all_age
s_number,
daly_transport_injuries=dal_ys_disability_adjusted_life_years_transport_injuries_sex_both_age_all_ages_number,
daly_unintentional_injuries=dal_ys_disability_adjusted_life_years_unintentional_injuries_sex_both_age_all_ages_nu
mber,
daly_maternal_disorders=dal_ys_disability_adjusted_life_years_maternal_disorders_sex_both_age_all_ages_number,
daly_neonatal_disorders=dal_ys_disability_adjusted_life_years_neonatal_disorders_sex_both_age_all_ages_number,
daly_other_communicable=dal_ys_disability_adjusted_life_years_other_communicable_maternal_neonatal_and_nutritiona
l_diseases_sex_both_age_all_ages_number,
daly_nature_forces=dal_ys_disability_adjusted_life_years_exposure_to_forces_of_nature_sex_both_age_all_ages_numbe
r,
daly_chronic_respiratory=dal_ys_disability_adjusted_life_years_chronic_respiratory_diseases_sex_both_age_all_ages
_number,
daly_chronic_liver=dal_ys_disability_adjusted_life_years_cirrhosis_and_other_chronic_liver_diseases_sex_both_age_
all_ages_number,
daly_digestive=dal_ys_disability_adjusted_life_years_digestive_diseases_sex_both_age_all_ages_number,
daly_tropical_and_malaria=dal_ys_disability_adjusted_life_years_neglected_tropical_diseases_and_malaria_sex_both_
age_all_ages_number,
daly_musculoskeletal=dal_ys_disability_adjusted_life_years_musculoskeletal_disorders_sex_both_age_all_ages_numbe
r,
daly_other_non_communicable=dal_ys_disability_adjusted_life_years_other_non_communicable_diseases_sex_both_age_al
l_ages_number,
daly_neurological=dal_ys_disability_adjusted_life_years_neurological_disorders_sex_both_age_all_ages_number,
daly_mental_and_substance=dal_ys_disability_adjusted_life_years_mental_and_substance_use_disorders_sex_both_age_a
ll_ages_number,
daly_diabetes_urogenital_blood_endocrine=dal_ys_disability_adjusted_life_years_diabetes_urogenital_blood_and_endo
crine_diseases_sex_both_age_all_ages_number,
daly_neoplasms=dal_ys_disability_adjusted_life_years_neoplasms_sex_both_age_all_ages_number)%>%
select(location, period,daly_conflict_terrorism,daly_hiv_tuberculosis,daly_diahrrea_respiratory,daly_cvs,daly_self_
harm,daly_violence,daly_nutritional_deficiencies,daly_transport_injuries,daly_unintentional_injuries,daly_mat
ernal_disorders,daly_neonatal_disorders,daly_other_communicable,daly_nature_forces,daly_chronic_respiratory,d
aly_chronic_liver,daly_digestive,daly_tropical_and_malaria,daly_musculoskeletal,daly_other_non_communicable,d
aly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocrine,daly_neoplasms)
#glimpse(daly_by_cause)
# Merging dataframes
total <- merge(total,daly_by_cause,by=c("location","period"))
#We will consider taking out health expenditure per capita since it has a complete rate of 57.4% and may distort the
final data.
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 5/39
#source: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
# Reading third file
gdp <- read_csv(here::here('Data',"API_NY.GDP.PCAP.CD_DS2_en_csv_v2_1926744.csv"),skip=3) %>%
clean_names()
# Checking for variable types
glimpse(gdp)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 6/39
Rows: 264
Columns: 66
$ country_name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra"…
$ country_code <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG"…
$ indicator_name <chr> "GDP per capita (current US$)", "GDP per capita (curre…
$ indicator_code <chr> "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", …
$ x1960 <dbl> NA, 59.77319, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1807…
$ x1961 <dbl> NA, 59.86087, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1874…
$ x1962 <dbl> NA, 58.45801, NA, NA, NA, NA, NA, 1155.89017, NA, NA, …
$ x1963 <dbl> NA, 78.70639, NA, NA, NA, NA, NA, 850.30474, NA, NA, N…
$ x1964 <dbl> NA, 82.09523, NA, NA, NA, NA, NA, 1173.23821, NA, NA, …
$ x1965 <dbl> NA, 101.10830, NA, NA, NA, NA, NA, 1279.11343, NA, NA,…
$ x1966 <dbl> NA, 137.59435, NA, NA, NA, NA, NA, 1272.80298, NA, NA,…
$ x1967 <dbl> NA, 160.89859, NA, NA, NA, NA, NA, 1062.54355, NA, NA,…
$ x1968 <dbl> NA, 129.10832, NA, NA, NA, 224.87811, NA, 1141.08048, …
$ x1969 <dbl> NA, 129.32971, NA, NA, NA, 240.03563, NA, 1329.05866, …
$ x1970 <dbl> NA, 156.5189, NA, NA, 3238.5568, 262.8663, NA, 1322.59…
$ x1971 <dbl> NA, 159.56758, NA, NA, 3498.17365, 295.97104, NA, 1372…
$ x1972 <dbl> NA, 135.31731, NA, NA, 4217.17358, 343.56582, NA, 1408…
$ x1973 <dbl> NA, 143.14465, NA, NA, 5342.16856, 423.13508, NA, 2097…
$ x1974 <dbl> NA, 173.65376, NA, NA, 6319.73903, 777.56068, NA, 2844…
$ x1975 <dbl> NA, 186.5109, NA, NA, 7169.1010, 836.2083, 26847.7944,…
$ x1976 <dbl> NA, 197.4455, NA, NA, 7152.3751, 1007.1404, 30118.1378…
$ x1977 <dbl> NA, 224.2248, NA, NA, 7751.3702, 1123.1433, 33823.3196…
$ x1978 <dbl> NA, 247.3541, NA, NA, 9129.7062, 1193.7456, 28456.7374…
$ x1979 <dbl> NA, 275.7382, NA, NA, 11820.8494, 1563.7035, 33512.741…
$ x1980 <dbl> NA, 272.6553, 710.9816, NA, 12377.4116, 2052.9558, 427…
$ x1981 <dbl> NA, 264.1113, 642.3839, NA, 10372.2328, 2050.7698, 449…
$ x1982 <dbl> NA, NA, 619.9614, NA, 9610.2663, 1864.8707, 40026.1663…
$ x1983 <dbl> NA, NA, 623.4406, NA, 8022.6548, 1699.2152, 34843.1029…
$ x1984 <dbl> NA, NA, 637.7152, 639.4847, 7728.9067, 1672.2788, 3230…
$ x1985 <dbl> NA, NA, 758.2376, 639.8659, 7774.3938, 1606.7558, 2972…
$ x1986 <dbl> 6472.5020, NA, 685.2701, 693.8735, 10361.8160, 1489.84…
$ x1987 <dbl> 7885.7965, NA, 756.2619, 674.7934, 12616.1676, 1543.51…
$ x1988 <dbl> 9764.7900, NA, 792.3031, 652.7743, 14304.3570, 1476.04…
$ x1989 <dbl> 11392.4558, NA, 890.5541, 697.9956, 15166.4379, 1505.5…
$ x1990 <dbl> 12307.3117, NA, 947.7042, 617.2304, 18878.5060, 2009.4…
$ x1991 <dbl> 13496.0031, NA, 865.6927, 336.5870, 19532.5402, 1929.6…
$ x1992 <dbl> 14046.5038, NA, 656.3618, 200.8522, 20547.7118, 2027.8…
$ x1993 <dbl> 14936.8272, NA, 441.2007, 367.2792, 16516.4710, 1996.9…
$ x1994 <dbl> 16241.0465, NA, 328.6733, 586.4163, 16234.8090, 1989.4…
$ x1995 <dbl> 16439.3564, NA, 397.1795, 750.6044, 18461.0649, 2072.7…
$ x1996 <dbl> 16586.0684, NA, 522.6438, 1009.9777, 19017.1746, 2235.…
$ x1997 <dbl> 17927.7496, NA, 514.2952, 717.3806, 18353.0597, 2319.0…
$ x1998 <dbl> 19078.3432, NA, 423.5937, 813.7903, 18894.5215, 2188.9…
$ x1999 <dbl> 19356.2034, NA, 387.7843, 1033.2417, 19261.7105, 2331.…
$ x2000 <dbl> 20620.7006, NA, 556.8363, 1126.6833, 21854.2468, 2605.…
$ x2001 <dbl> 20669.0320, NA, 527.3335, 1281.6594, 22971.5355, 2506.…
$ x2002 <dbl> 20436.8871, 179.4266, 872.4945, 1425.1248, 25066.8822,…
$ x2003 <dbl> 20833.7616, 190.6838, 982.9609, 1846.1188, 32271.9639,…
$ x2004 <dbl> 22569.9750, 211.3821, 1255.5640, 2373.5798, 37969.1750…
$ x2005 <dbl> 23300.0396, 242.0313, 1902.4223, 2673.7873, 40066.2569…
$ x2006 <dbl> 24045.2725, 263.7337, 2599.5665, 2972.7433, 42675.8128…
$ x2007 <dbl> 25835.1327, 359.6932, 3121.9956, 3595.0372, 47803.6936…
$ x2008 <dbl> 27084.7037, 364.6607, 4080.9414, 4370.5401, 48718.4969…
$ x2009 <dbl> 24630.4537, 438.0760, 3122.7808, 4114.1401, 43503.1855…
$ x2010 <dbl> 23512.6026, 543.3030, 3587.8838, 4094.3503, 40852.6668…
$ x2011 <dbl> 24985.9933, 591.1628, 4615.4680, 4437.1429, 43335.3289…
$ x2012 <dbl> 24713.6980, 641.8715, 5100.0958, 4247.6300, 38686.4613…
$ x2013 <dbl> 26189.4355, 637.1655, 5254.8823, 4413.0609, 39538.7667…
$ x2014 <dbl> 26647.9381, 613.8567, 5408.4105, 4578.6320, 41303.9294…
$ x2015 <dbl> 27980.8807, 578.4664, 4166.9797, 3952.8012, 35762.5231…
$ x2016 <dbl> 28281.3505, 509.2187, 3506.0729, 4124.0557, 37474.6654…
$ x2017 <dbl> 29007.6930, 519.8848, 4095.8129, 4531.0208, 38962.8804…
$ x2018 <dbl> NA, 493.7504, 3289.6467, 5284.3802, 41793.0553, 6601.8…
$ x2019 <dbl> NA, 507.1034, 2790.7266, 5353.2449, 40886.3912, 6584.7…
$ x2020 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ x66 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 7/39
1.1.0.7 Adding data for smoking percentages
1.1.0.8 Adding data for healthcare expenditure per capita
Rows: 4,675
Columns: 4
$ entity <chr> "Afghan…
$ code <chr> "AFG", …
$ year <dbl> 2002, 2…
$ health_expenditure_per_capita_ppp_constant_2011_international <dbl> 75.9835…
# Changing variable names and variable types
gdp <- gdp %>%
gather(year, gdp,-c(country_name, country_code,indicator_name,indicator_code)) %>%
mutate(location=as.factor(country_name),
period=readr::parse_number(year)) %>%
select(location,period,gdp)
# Merging dataframes
total <- merge(total,gdp,by=c("location","period"))
#skim(total)
Hide
#source: http://ghdx.healthdata.org/record/ihme-data/gbd-2015-smoking-prevalence-1980-2015
#Reading fourth file
smoking_percentage <- read_csv(here::here('Data',"IHME_GBD_2015_SMOKING_PREVALENCE_1980_2015_Y2017M04D05.CSV")) %>%
clean_names()
# Checking for variable types
#skim(smoking_percentage)
# Changing variable names and variable types
smoking_percentage <- smoking_percentage %>%
filter(age_group_name=="Age-standardized",
metric=="Percent",
sex=="Both") %>%
mutate(location=as.factor(location_name),
period=year_id,
smoking_percentage=mean) %>%
select(location,period,smoking_percentage)
#skim(smoking_percentage)
#Merging data frames
total <- merge(total,smoking_percentage,by=c("location","period"))
Hide
#source:https://ourworldindata.org/grapher/annual-healthcare-expenditure-per-capita?tab=chart&time=1995..2014&region=
World
#Reading fifth file
healthcare_expenditure <- read_csv(here::here('Data',"annual-healthcare-expenditure-per-capita.CSV")) %>%
clean_names()
# Checking for variable types
glimpse(healthcare_expenditure)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 8/39
1.1.0.9 Adding data for percentage of population being overweight
Rows: 8,316
Columns: 4
$ entity <chr> "Afghanistan", "A…
$ code <chr> "AFG", "AFG", "AF…
$ year <dbl> 1975, 1976, 1977,…
$ prevalence_of_overweight_adults_both_sexes_who_2019 <dbl> 5.3, 5.5, 5.7, 5.…
1.1.0.10 Adding data for fruit consumption per capita
Rows: 11,028
Columns: 4
$ entity <chr> "Afg…
$ code <chr> "AFG…
$ year <dbl> 1961…
$ fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 41.1…
# Changing variable names and variable types
healthcare_expenditure <- healthcare_expenditure %>%
mutate(location=as.factor(entity),
period=year,
healthcare_expenditure=health_expenditure_per_capita_ppp_constant_2011_international) %>%
select(location,period,healthcare_expenditure)
#glimpse(healthcare_expenditure)
#Merging data frames
total <- merge(total,healthcare_expenditure,by=c("location","period"))
Hide
#source: https://ourworldindata.org/obesity
#Reading sixth file
percentage_overweight <- read_csv(here::here('Data',"share-of-adults-who-are-overweight.csv")) %>%
clean_names()
# Checking for variable types
glimpse(percentage_overweight)
Hide
# Changing variable names and variable types
percentage_overweight <- percentage_overweight %>%
mutate(location=as.factor(entity),
period=year,
percentage_overweight=prevalence_of_overweight_adults_both_sexes_who_2019) %>%
select(location,period,percentage_overweight)
#glimpse(percentage_overweight)
#Merging data frames
total <- merge(total,percentage_overweight,by=c("location","period"))
Hide
#source: https://ourworldindata.org/diet-compositions
#Reading seventh file
fruit_consumption <- read_csv(here::here('Data',"fruit-consumption-per-capita.csv")) %>%
clean_names()
# Checking for variable types
glimpse(fruit_consumption)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 9/39
1.1.0.11 Adding data for vegetable consumption per capita
Rows: 11,028
Columns: 4
$ entity <chr> "Afghanistan", …
$ code <chr> "AFG", "AFG", "…
$ year <dbl> 1961, 1962, 196…
$ vegetables_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 36.75, 37.47, 3…
1.1.0.12 Adding data for animal based foods consumption per capita
# Changing variable names and variable types
fruit_consumption <- fruit_consumption %>%
mutate(location=as.factor(entity),
period=year,
fruit_consumption=fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020) %>%
select(location,period,fruit_consumption)
#glimpse(fruit_consumption)
#Merging data frames
total <- merge(total,fruit_consumption,by=c("location","period"))
Hide
#source: https://ourworldindata.org/diet-compositions
#Reading eigth file
vegetable_consumption <- read_csv(here::here('Data',"vegetable-consumption-per-capita.csv")) %>%
clean_names()
#Checking for variable types
glimpse(vegetable_consumption)
Hide
## Changing variable names and variable types
vegetable_consumption <- vegetable_consumption %>%
mutate(location=as.factor(entity),
period=year,
vegetable_consumption=vegetables_food_supply_quantity_kg_capita_yr_fao_2020) %>%
select(location,period,vegetable_consumption)
#glimpse(vegetable_consumption)
#Merging dataframes
total <- merge(total,vegetable_consumption,by=c("location","period"))
#skim(total)
Hide
#source: https://ourworldindata.org/diet-compositions
#Reading ninth file
animal_protein_consumption <-read_csv(here::here('Data',"share-of-calories-from-animal-protein-vs-gdp-per-capita.csv"
)) %>%
clean_names()
#Checking for variable types
glimpse(animal_protein_consumption)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 10/39
Rows: 24,472
Columns: 7
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ total_population_gapminder <dbl> …
$ continent <chr> …
$ share_of_calories_from_animal_protein_fao_2017 <dbl> …
$ real_gdp_per_capita_in_2011us_2011_benchmark_maddison_project_database_2018 <dbl> …
1.1.0.13 Adding data for mean years of schooling
1.1.0.14 Adding data for physicians per 1000 people
Hide
#Changing variable names and type
animal_protein_consumption <- animal_protein_consumption %>%
mutate(location=as.factor(entity),
period=year,
animal_protein_consumption=share_of_calories_from_animal_protein_fao_2017) %>%
select(location,period,animal_protein_consumption)
#glimpse(animal_protein_consumption)
#Mergining dataframes
total <- merge(total,animal_protein_consumption,by=c("location","period"))
#glimpse(total)
Hide
#source: https://ourworldindata.org/global-education
#Reading file
education_years <- read_csv(here::here('Data',"mean-years-of-schooling-1.csv")) %>%
clean_names()
#Checking for variable types
#glimpse(education_years)
#Changing variable names and type
education_years <- education_years %>%
mutate(location=as.factor(entity),
period=year,
education_years=average_total_years_of_schooling_for_adult_population_lee_lee_2016_barro_lee_2018_and_undp_2
018) %>%
select(location,period,education_years)
#glimpse(education_years)
#Merging dataframes
total <- merge(total,education_years,by=c("location","period"))
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 11/39
1.1.0.15 Adding data for nurses per 1000 people
Rows: 1,542
Columns: 4
$ entity <chr> "Afghanistan", "Afghanistan", "A…
$ code <chr> "AFG", "AFG", "AFG", "AFG", "AFG…
$ year <dbl> 2005, 2006, 2007, 2008, 2009, 20…
$ nurses_and_midwives_per_1_000_people <dbl> 0.612000, 0.462000, 0.519000, 0.…
Nurses had too little incidences. Thus, it was not included in our nal dataset.
1.1.0.16 Adding data for out-of-pocket expenditure
Rows: 3,002
Columns: 4
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure <dbl> …
#source:https://ourworldindata.org/grapher/physicians-per-1000-people
#Reading file
physicians <- read_csv(here::here('Data',"physicians-per-1000-people.csv")) %>%
clean_names()
#Checking for variable types
#glimpse(physicians)
#Changing variable names and type
physicians <- physicians %>%
mutate(location=as.factor(entity),
period=year,
physicians_1000=physicians_per_1_000_people) %>%
select(location,period,physicians_1000)
#glimpse(physicians)
#Merging dataframes
total <- merge(total,physicians,by=c("location","period"))
Hide
#source:https://ourworldindata.org/grapher/nurses-and-midwives-per-1000-people?
#Reading file
nurses <- read_csv(here::here('Data',"nurses-and-midwives-per-1000-people.csv")) %>%
clean_names()
#Checking for variable types
glimpse(nurses)
Hide
#source:https://ourworldindata.org/grapher/out-of-pocket-expenditure-per-capita-on-healthcare
#Reading file
pocket_exp <- read_csv(here::here('Data',"out-of-pocket-expenditure-per-capita-on-healthcare.csv")) %>%
clean_names()
#Checking for variable types
glimpse(pocket_exp)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 12/39
1.1.0.17 Adding data for health protection coverage
Rows: 162
Columns: 4
$ entity <chr> "Albania", "…
$ code <chr> "ALB", "DZA"…
$ year <dbl> 2008, 2005, …
$ share_of_population_covered_by_health_insurance_ilo_2014 <dbl> 23.6, 85.2, …
Health coverage had too little incidences. Thus, it was not included in our nal dataset.
1.1.0.18 Adding data for literacy rate
Rows: 215
Columns: 4
$ entity <chr> "Afghanistan", "Albania", "Algeria", …
$ code <chr> "AFG", "ALB", "DZA", "ASM", "AND", "A…
$ year <dbl> 2000, 2011, 2006, 1980, 2011, 2011, 1…
$ literacy_rate_cia_factbook_2016 <dbl> 28.1, 96.8, 72.6, 97.0, 100.0, 70.4, …
Literacy had too little incidences. Thus, it was not included in our nal dataset.
1.1.0.19 Adding data for grouping locations into continents
Rows: 194
Columns: 2
$ continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "Africa",…
$ country <chr> "Algeria", "Angola", "Benin", "Botswana", "Burkina", "Burun…
#Changing variable names and type
pocket_exp <- pocket_exp %>%
mutate(location=as.factor(entity),
period=year,
pocket_per_cap=out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure) %>%
select(location,period,pocket_per_cap)
#Merging dataframes
total <- merge(total,pocket_exp,by=c("location","period"))
Hide
#Reading file
health_protect <- read_csv(here::here('Data',"health-protection-coverage.csv")) %>%
clean_names()
#Checking for variable types
glimpse(health_protect)
Hide
#Reading file
literacy <- read_csv(here::here('Data',"literacy-rate-by-country.csv")) %>%
clean_names()
#Checking for variable types
glimpse(literacy)
Hide
#source: https://github.com/dbouquin/IS_608/blob/master/NanosatDB_munging/Countries-Continents.csv
#Reading file
continents <- read_csv(here::here('Data',"Continents.csv")) %>%
clean_names()
#Checking for variable types
glimpse(continents)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 13/39
Rows: 194
Columns: 2
$ location <fct> Algeria, Angola, Benin, Botswana, Burkina, Burundi, Cameroo…
$ continent <fct> Africa, Africa, Africa, Africa, Africa, Africa, Africa, Afr…
1.1.0.20 Dealing with NAs
After including all potentially-relevant and signi cant variables into our dataset, an inital exploration of the data was
made.
1.2 Exploratory Data Analsys
1.2.0.1 DALY Rates per Continent
Hide
#Changing variable names and type
continents <- continents %>%
mutate(location=as.factor(country),
continent=as.factor(continent))%>%
select(location, continent)
glimpse(continents)
Hide
#Merging dataframes
total <- merge(total,continents,by=c("location"))
Hide
#Adding variables of per capita healthcare expenditure - per capita gdp
total <- total%>%
mutate(healthcare_gdp_rate = healthcare_expenditure/gdp)
#skim(total)
total <- total %>%
na.omit()
#skim(total)
Hide
#Selecting data only from 1980 - onward (to gain better insights on the recent situation)
total_short <-total %>%
filter(period>=1980)
#Re-coding DALY variables as averages per continent, per year
total_cont<-total_short%>%
group_by(period,continent)%>%
summarise(daly_adjusted=mean(daly_adjusted/100000), daly_cnmnd = mean(daly_cnmnd/100000), daly_ncds = mean(daly_ncd
s/100000), daly_ivsa = mean(daly_ivsa/10000))
#Plotting for average DALY rates per capita accumulated from 1980 to 2017
ggplot(total_cont, aes(x = continent, y = daly_adjusted, fill = continent)) +
geom_bar(stat = "identity") +
labs(x= "Continent", y = "Overall DALYs", title = "Accumulated Average DALYs per Capita, per Continent 1980 - 2
017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 14/39
Hide
ggplot(total_cont, aes(x = continent, y = daly_cnmnd, fill = continent)) +
geom_bar(stat = "identity")+
labs(x= "Continent", y = "Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from Comm
unicable Diseases, per Continent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 15/39
Hide
ggplot(total_cont, aes(x = continent, y = daly_ncds, fill = continent)) +
geom_bar(stat = "identity")+
labs(x= "Continent", y = "Non-Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from
Non-Communicable Diseases, per Continent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 16/39
Hide
ggplot(total_cont, aes(x = continent, y = daly_ivsa, fill = continent)) +
geom_bar(stat = "identity")+
labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Con
tinent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 17/39
Overall, we nd that Africa has the highest accumulated average DALY rate per capita of all countries (c 90), followed
by Asia (c 50), and Oceania (c 40). The high contrast of Africa agaist the rest of the continents is mainly due to its high
accumulated average for communicable diseases. In this category, Africa more than tripples the second highest
continent (c 55 for Africa compared to c 17 for Asia).
When it comes to non-communicable diseases and injuries, rates are fairly even. For non-communicable diseases, DALY
rates range c 27 - 33 (North America being the lowest and Africa, the highest). Although with much lower DALY rates,
injuriy rates range c 4 - 6 (Europe being the lowest and Africa, the highest).
Consequently, communicable diseases are found to have the highest burden in the population, with Africa taking (or
having taken) the highest burden. A closer look into these rates were taken to better understand its evolution throught
time.
1.2.1 Communicable Diseases
Hide
graph1 <- total_cont %>%
ggplot(aes(x=period, y=daly_cnmnd, fill=continent, text=continent)) +
geom_area(alpha = 1) +
theme(legend.position="none") +
ggtitle(".") +
theme(legend.position="none") +
labs(x= "Year", y = "DALY for communicable disease", title = "Time Series Average DALYs per Capita from Communica
ble Diseases per Continent")
ggplotly(graph1)
Time Series Average DALYs per Capita from Communicable Diseases per Continent
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 18/39
2000 2005
0.0
0.2
0.4
0.6
0.8
Ye
DALY
for
communicable
disease
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 19/39
As seen from the graph, Africa’s communicable DALY rate seems to be in the decline since 2008.However, this
continent has been consistently ranking high over other continents which leaves room to further consider the causes
and potential solutions.
From the Our World in Data report, it is found that neonatal disorders are the top communicable diseases in terms of
total share of burden (7.45% of all causes). It is also known that there is a strong negative correlation between GDP and
DALY from communicable diseases. Similarly, a negative correlation is found between health expenditure per capita
and DALY from communicable diseases.
What about healthcare expenditure as percentage of GDP?
Hide
ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_cnmnd, color = continent))+
geom_point()+
labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Communicable Diseases", title = "Rates
due to Proportion of GDP spent on Healthcare")
Hide
# No clear correlation yet, but interesting
Hide
total_short%>%
select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>%
ggpairs()
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 20/39
> A higher GDP per country seems to have a signi cant negative correlation to DALY of communicable diseases. However, the proportion of GDP used for
healthcare seems to have a signi cant positive correlation to DALY of communicable diseases. GDP seems to have a signi cant negative correlation to the
proportion of GDP spent on healthcare. This could indicate that poorer countries have a higher likelihood of having to combat communicable diseases.
Consequently, they spend a greater proportion of their GDP on healthcare than richer countries. Out of pocket expenditure is also highly negatively
correlated to DALY of communicable diseases, although highly positively correlated to gdp. This leads to the interpretation that poor countries in which the
population is individually responsible for investing in their medical care and are most likely to have higher DALY communicable disease rates.
1.2.2 Injuries
With DALY rates for injuries and additional causes having similar rates across all continents, we decided to rst take a
closer look at which types of causes were most prominent overall.
Hide
#This plot shows injury related DALY in a stacked bar chart.
start <- total%>%
group_by(continent)%>%
summarise(daly_conflict_terrorism = mean(daly_conflict_terrorism/total_population), daly_self_harm = mean(daly_self
_harm/total_population), daly_violence = mean(daly_violence/total_population), daly_transport_injuries = mean
(daly_transport_injuries/total_population), daly_nature_forces = mean(daly_nature_forces/total_population), d
aly_unintentional_injuries = mean(daly_unintentional_injuries/total_population))
pivot <- pivot_longer(start, cols=c(daly_conflict_terrorism, daly_self_harm, daly_violence,daly_transport_injuries, d
aly_unintentional_injuries, daly_nature_forces), names_to = "diseases",values_to = "value")
#select columns from dataset
plots <- pivot %>%
select(continent,diseases,value)
ggplot(plots, aes(fill=diseases, y=value, x=continent)) +
geom_bar(position="stack", stat="identity") +
labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Conti
nent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 21/39
Hide
#Plot on Terrorism and Violence
terrorism_violence <- start %>%
select(daly_conflict_terrorism, daly_violence, continent)
terrorism_violence <- pivot_longer(terrorism_violence,c(daly_conflict_terrorism, daly_violence,
),names_to = "diseases",values_to = "value")
#select columns from dataset
terrorism_violence <- terrorism_violence%>%
select(diseases,value,continent)
#stacked bar chart
ggplot(terrorism_violence, aes(fill=diseases, y=value, x=continent)) +
geom_bar(position="stack", stat="identity") +
labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Terrorism and Viole
nce 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 22/39
Hide
total_short%>%
select(daly_ivsa, gdp, daly_mental_and_substance, physicians_1000, education_years)%>%
ggpairs()
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 23/39
1.2.3 Non-Communicable Diseases
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 24/39
start1 <- total%>%
group_by(continent)%>%
summarise(daly_cvs = mean(daly_cvs/total_population), daly_nutritional_deficiencies = mean(daly_nutritional_deficie
ncies/total_population), daly_maternal_disorders = mean(daly_maternal_disorders/total_population), daly_muscu
loskeletal = mean(daly_musculoskeletal/total_population), daly_other_non_communicable = mean(daly_other_non_c
ommunicable/total_population), daly_neurological = mean(daly_neurological/total_population), daly_mental_and_
substance = mean(daly_mental_and_substance/total_population), daly_diabetes_urogenital_blood_endocrine = mean
(daly_diabetes_urogenital_blood_endocrine/ total_population), daly_neoplasms = mean(daly_neoplasms/total_popu
lation), daly_chronic_liver = mean(daly_chronic_liver/total_population))
pivot1 <- pivot_longer(start1, c(daly_cvs,daly_nutritional_deficiencies,daly_maternal_disorders,daly_musculoskeletal,
daly_other_non_communicable,daly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocr
ine,daly_neoplasms,daly_chronic_liver), names_to = "diseases",values_to = "value")
#select columns from data set
total_short_ncds <- pivot1%>%
select(continent,diseases,value)
#stacked bar chart
# This staked bar chart shows the DALY once again for non communicable diseases but has been adjusted to show data fo
r per 100000 population. Additionally the data has been colored to show the different categories of non-commu
nicable diseases.
#Asia has the highest DALY for non communicable diseases closely followed by Europe. There are reasons to suggest why
DALY remains high in both regions. For Asia, the lack of affordability, lack of doctors, and having helathcar
e not to the highest standards may all contribute towards this. Due to Europe's aging population, non-communi
cable diseases are more likely to be present among its population. As seen in the graphs earlier, a path of n
ations to become modern and developed, their population transitions from suffering from communicable disease
towards non-communicable disease, which come with age.
ggplot(total_short_ncds, aes(fill=diseases, y=value, x=continent)) +
geom_bar(position="stack", stat="identity") +
labs(x= "Continent", y = "Non-Comm DALYs", title = "Accumulated Average DALYs per Capita from Non-Comm, per Conti
nent 1980 - 2017")
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 25/39
# Looking into CVS in more detail.
ggplot(total_short, aes(x= continent, y = daly_cvs))+
geom_col()+
labs(x= "Continent", y = "Daly due to CVS related conditions", title = "DALY per capita due to CVS condition
s per continent")
Hide
# Looking into neoplasms in more detail.
ggplot(total_short, aes(x= continent, y = daly_neoplasms))+
geom_col()+
labs(x= "Continent", y = "Daly due to neoplasm", title = "DALY per capita due to neoplasms per continent")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 26/39
Hide
# Looking into diabetes, urogenital, blood, endocrine in more detail.
ggplot(total_short, aes(x= continent, y = daly_diabetes_urogenital_blood_endocrine))+
geom_col()+
labs(x= "Continent", y = "Daly due to diabetes, urogenital, blood and endocrine related conditions.", title
= "DALY per capita due to diabetes, urogenital, blood and endocrine related conditions per continent")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 27/39
Hide
ggplot(total_short, aes(x= continent, y = daly_mental_and_substance))+
geom_col()+
labs(x= "Continent", y = "Daly due to mental and substance related conditions.", title = "DALY per capita du
e to mental and substance related conditions per continent")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 28/39
Hide
ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_ncds/100000, color = continent))+
geom_point()+
labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Non- Communicable Diseases", title =
"Rates due to Proportion of GDP spent on Healthcare")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 29/39
Hide
total_short%>%
select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>%
ggpairs()
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 30/39
1.3 Regression analysis
Although highly complex, and with many different societal and economical variables affecting the nal DALY rates, we
decided to look into certain variables that had enough data to be used for our analysis. These variables affecting both,
DALY rates by cause and general DALY rates, can be divided in several categories.
Diet habit variables (fruit consumption per capita per year, percentage of animal protein consumption out of total daily
calories, vegetable consumption percentage of population being overweight), healthcare variables (annual healtcare
expenditure, out of pocket expenditure on healthcare, healthcare per gdp, and number of physicians per 1,000 people),
living habits (smoking percentages), other demographics (education years).
In addition to these elements, we considered the effect of each continent separately by tranforming them into dummy
variables.
1.3.0.1 Models 0 and 1
Hide
#Transforming continent factors into dummy variables
total=total%>%
mutate(Asia=case_when(total$continent=="Asia"~1,TRUE~0))%>%
mutate(Europe=case_when(total$continent=="Europe"~1,TRUE~0))%>%
mutate(NorthA=case_when(total$continent=="North America"~1,TRUE~0))%>%
mutate(Africa=case_when(total$continent=="Africa"~1,TRUE~0))%>%
mutate(SouthA=case_when(total$continent=="South America"~1,TRUE~0))
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 31/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate +
daly_ivsa + daly_ncds + daly_cnmnd, data = total, subset = gdp)
Residuals:
Min 1Q Median 3Q Max
-6.602e-11 -4.258e-12 -5.730e-13 2.894e-12 1.220e-10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.867e-12 1.064e-11 -3.640e-01 0.716590
smoking_percentage -1.016e-10 1.900e-11 -5.346e+00 2.49e-07 ***
percentage_overweight -6.735e-13 1.006e-13 -6.694e+00 2.26e-10 ***
fruit_consumption -7.039e-14 2.012e-14 -3.499e+00 0.000579 ***
vegetable_consumption -1.385e-14 2.245e-14 -6.170e-01 0.537948
animal_protein_consumption 5.153e-12 7.264e-13 7.094e+00 2.35e-11 ***
education_years 1.769e-12 6.270e-13 2.821e+00 0.005277 **
physicians_1000 3.200e-12 1.696e-12 1.887e+00 0.060675 .
pocket_per_cap -2.376e-14 8.464e-15 -2.807e+00 0.005506 **
healthcare_gdp_rate 3.037e-11 1.838e-11 1.652e+00 0.100074
daly_ivsa 1.000e+00 1.045e-15 9.570e+14 < 2e-16 ***
daly_ncds 1.000e+00 3.813e-16 2.623e+15 < 2e-16 ***
daly_cnmnd 1.000e+00 1.157e-16 8.645e+15 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.446e-11 on 195 degrees of freedom
(817 observations deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.025e+31 on 12 and 195 DF, p-value: < 2.2e-16
# Lm0 was created to show that daly_ivsa, daly_ncds and daly_cnmnd make up daly_adjusted. As a result, these three va
riables are not included in the linear models.
lm0= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p
rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + daly_ivsa + dal
y_ncds + daly_cnmnd, gdp, data = total)
summary(lm0)
Hide
lm1= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p
rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + gdp, data = tota
l)
summary(lm1)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 32/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate +
gdp, data = total)
Residuals:
Min 1Q Median 3Q Max
-25122 -6233 -812 4866 59542
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.921e+04 1.894e+03 41.814 < 2e-16 ***
smoking_percentage -2.391e+04 6.139e+03 -3.894 0.000105 ***
percentage_overweight -2.590e+02 3.531e+01 -7.335 4.53e-13 ***
fruit_consumption -5.051e+01 8.655e+00 -5.836 7.19e-09 ***
vegetable_consumption -3.828e+01 6.980e+00 -5.484 5.26e-08 ***
animal_protein_consumption -1.799e+03 2.719e+02 -6.615 6.00e-11 ***
education_years -1.270e+03 2.027e+02 -6.268 5.41e-10 ***
physicians_1000 7.964e+02 4.986e+02 1.597 0.110538
pocket_per_cap -1.011e+01 2.508e+00 -4.031 5.96e-05 ***
healthcare_gdp_rate 6.335e+03 6.803e+03 0.931 0.351968
gdp 6.325e-02 3.290e-02 1.923 0.054808 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11340 on 1014 degrees of freedom
Multiple R-squared: 0.6371, Adjusted R-squared: 0.6335
F-statistic: 178 on 10 and 1014 DF, p-value: < 2.2e-16
Already from model one we reach an adjusted R-squared of 0.6335, meaning these factors can explain approximately
63 percent of general DALY’s uctuation. The variable with the highest p value was dropped sequentially for the below
models.
Hide
lm2 = lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+
education_years+ physicians_1000+ pocket_per_cap+ fruit_consumption + gdp, data = total)
summary(lm2)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 33/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
vegetable_consumption + animal_protein_consumption + education_years +
physicians_1000 + pocket_per_cap + fruit_consumption + gdp,
data = total)
Residuals:
Min 1Q Median 3Q Max
-25123 -6213 -846 4897 59243
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 ***
smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 ***
percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 ***
vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 ***
animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 ***
education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 ***
physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 .
pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 ***
fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 ***
gdp 5.507e-02 3.170e-02 1.737 0.082661 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11340 on 1015 degrees of freedom
Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335
F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16
Dropping healthcare-gdp percentage makes out of pocket expenditure become signi cant.
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
vegetable_consumption + animal_protein_consumption + education_years +
pocket_per_cap + fruit_consumption + physicians_1000, data = total)
Residuals:
Min 1Q Median 3Q Max
-25203 -6215 -447 4669 59274
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79726.798 1410.663 56.517 < 2e-16 ***
smoking_percentage -24292.378 6127.035 -3.965 7.86e-05 ***
percentage_overweight -266.174 35.115 -7.580 7.76e-14 ***
vegetable_consumption -40.473 6.894 -5.871 5.87e-09 ***
animal_protein_consumption -1741.370 258.204 -6.744 2.58e-11 ***
education_years -1222.730 201.228 -6.076 1.74e-09 ***
pocket_per_cap -7.568 2.052 -3.688 0.000238 ***
fruit_consumption -47.366 8.442 -5.611 2.59e-08 ***
physicians_1000 859.752 484.202 1.776 0.076097 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11350 on 1016 degrees of freedom
Multiple R-squared: 0.6357, Adjusted R-squared: 0.6328
F-statistic: 221.6 on 8 and 1016 DF, p-value: < 2.2e-16
1.3.0.2 Drop physicians_1000
Hide
lm3=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e
ducation_years+ pocket_per_cap+ fruit_consumption + physicians_1000, data = total)
summary(lm3)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 34/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
vegetable_consumption + animal_protein_consumption + education_years +
pocket_per_cap + fruit_consumption, data = total)
Residuals:
Min 1Q Median 3Q Max
-25602 -6210 -408 4778 59363
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 78595.709 1259.974 62.379 < 2e-16 ***
smoking_percentage -21927.213 5986.815 -3.663 0.000263 ***
percentage_overweight -256.075 34.687 -7.382 3.23e-13 ***
vegetable_consumption -37.819 6.737 -5.613 2.56e-08 ***
animal_protein_consumption -1623.099 249.729 -6.499 1.26e-10 ***
education_years -1107.609 190.699 -5.808 8.44e-09 ***
pocket_per_cap -6.709 1.996 -3.361 0.000806 ***
fruit_consumption -49.239 8.384 -5.873 5.80e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11360 on 1017 degrees of freedom
Multiple R-squared: 0.6346, Adjusted R-squared: 0.632
F-statistic: 252.3 on 7 and 1017 DF, p-value: < 2.2e-16
All variables are now sigi cant, leading to a model with 0.632 as its adjusted R-squared.
1.3.1 Stepwise regression& VIF exam
We can also used stepwise regression to nd the optimal model.Stepwise method is more precise than dropping
variables mannually since it provides the possibility of adding the dropped variables back in the future steps if it
improves the model(lowers model’s AIC),and also examines the signi cance after adding or dropping variables.
lm4=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e
ducation_years+ pocket_per_cap+ fruit_consumption, data = total)
summary(lm4)
Hide
fit1_step=step(lm1,direction="both")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 35/39
Start: AIC=19149.68
daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate +
gdp
Df Sum of Sq RSS AIC
- healthcare_gdp_rate 1 111483768 1.3048e+11 19149
<none> 1.3036e+11 19150
- physicians_1000 1 327963566 1.3069e+11 19150
- gdp 1 475232954 1.3084e+11 19151
- smoking_percentage 1 1949782662 1.3231e+11 19163
- pocket_per_cap 1 2089322026 1.3245e+11 19164
- vegetable_consumption 1 3866128944 1.3423e+11 19178
- fruit_consumption 1 4378585222 1.3474e+11 19182
- education_years 1 5050374107 1.3541e+11 19187
- animal_protein_consumption 1 5625702511 1.3599e+11 19191
- percentage_overweight 1 6917422353 1.3728e+11 19201
Step: AIC=19148.55
daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + gdp
Df Sum of Sq RSS AIC
<none> 1.3048e+11 19149
- gdp 1 387923531 1.3086e+11 19150
+ healthcare_gdp_rate 1 111483768 1.3036e+11 19150
- physicians_1000 1 449760026 1.3093e+11 19150
- smoking_percentage 1 1912380306 1.3239e+11 19162
- pocket_per_cap 1 2075803261 1.3255e+11 19163
- vegetable_consumption 1 4040506535 1.3452e+11 19178
- fruit_consumption 1 4418260882 1.3489e+11 19181
- education_years 1 5027016883 1.3550e+11 19185
- animal_protein_consumption 1 6245781148 1.3672e+11 19194
- percentage_overweight 1 7139008696 1.3761e+11 19201
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + gdp,
data = total)
Residuals:
Min 1Q Median 3Q Max
-25123 -6213 -846 4897 59243
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 ***
smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 ***
percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 ***
fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 ***
vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 ***
animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 ***
education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 ***
physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 .
pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 ***
gdp 5.507e-02 3.170e-02 1.737 0.082661 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11340 on 1015 degrees of freedom
Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335
F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16
Hide
summary(fit1_step)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 36/39
smoking_percentage percentage_overweight
1.668205 2.913117
fruit_consumption vegetable_consumption
1.425008 1.502601
animal_protein_consumption education_years
3.110474 3.088943
physicians_1000 pocket_per_cap
3.754590 3.210832
gdp
2.999374
From the nal result we can see that six variables are signi cant with a p-value lower than 0.1. Expense and
pocket_per_cap are both signi cant in this case. However, dropping one of them may lead to insigni cance of the other.
This could be because these two have a joint effect on the burden of disease. We can choose from these two models
according to our con dence interval.
Continents were also considered as part of the model to see their effect.
Call:
lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption +
animal_protein_consumption + education_years + pocket_per_cap +
fruit_consumption + Asia + Africa + NorthA + Europe + SouthA +
Asia + Africa + NorthA + Europe + SouthA, data = total)
Residuals:
Min 1Q Median 3Q Max
-29935 -4148 -431 3996 50866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66522.998 2345.279 28.365 < 2e-16 ***
percentage_overweight -202.369 33.403 -6.058 1.94e-09 ***
vegetable_consumption -33.863 6.185 -5.475 5.51e-08 ***
animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 ***
education_years -534.613 165.120 -3.238 0.00124 **
pocket_per_cap -8.669 1.678 -5.168 2.86e-07 ***
fruit_consumption -40.857 6.824 -5.987 2.96e-09 ***
Asia -7140.437 1807.499 -3.950 8.34e-05 ***
Africa 13792.577 1918.746 7.188 1.27e-12 ***
NorthA -9335.463 1794.310 -5.203 2.38e-07 ***
Europe -5196.987 1650.294 -3.149 0.00169 **
SouthA -9146.724 1917.915 -4.769 2.12e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9298 on 1013 degrees of freedom
Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535
F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16
Hide
vif(fit1_step)
Hide
fit = lm(daly_adjusted~ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ education_years+ po
cket_per_cap+ fruit_consumption+Asia+Africa+NorthA+Europe+SouthA+ Asia+ Africa+ NorthA+ Europe+ SouthA, data
= total)
print(summary(fit))
Hide
print(vif(fit))
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 37/39
percentage_overweight vegetable_consumption
3.908936 1.772149
animal_protein_consumption education_years
2.885232 3.048594
pocket_per_cap fruit_consumption
2.136245 1.318169
Asia Africa
7.257346 6.590683
NorthA Europe
3.209642 7.556342
SouthA
2.440735
1.3.2 Interpretation on the nal model
Our nal model had 11 variables
Call:
lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption +
animal_protein_consumption + education_years + pocket_per_cap +
fruit_consumption + Asia + Africa + NorthA + Europe + SouthA +
Asia + Africa + NorthA + Europe + SouthA, data = total)
Residuals:
Min 1Q Median 3Q Max
-29935 -4148 -431 3996 50866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66522.998 2345.279 28.365 < 2e-16 ***
percentage_overweight -202.369 33.403 -6.058 1.94e-09 ***
vegetable_consumption -33.863 6.185 -5.475 5.51e-08 ***
animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 ***
education_years -534.613 165.120 -3.238 0.00124 **
pocket_per_cap -8.669 1.678 -5.168 2.86e-07 ***
fruit_consumption -40.857 6.824 -5.987 2.96e-09 ***
Asia -7140.437 1807.499 -3.950 8.34e-05 ***
Africa 13792.577 1918.746 7.188 1.27e-12 ***
NorthA -9335.463 1794.310 -5.203 2.38e-07 ***
Europe -5196.987 1650.294 -3.149 0.00169 **
SouthA -9146.724 1917.915 -4.769 2.12e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9298 on 1013 degrees of freedom
Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535
F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16
percentage_overweight vegetable_consumption
3.908936 1.772149
animal_protein_consumption education_years
2.885232 3.048594
pocket_per_cap fruit_consumption
2.136245 1.318169
Asia Africa
7.257346 6.590683
NorthA Europe
3.209642 7.556342
SouthA
2.440735
Hide
continent_fit=fit
summary(continent_fit)
Hide
vif(continent_fit)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 38/39
(1) (2) (3) (4) (5)
(Intercept) 79209.24 *** 80341.02 *** 79726.80 *** 78595.71 *** 66523.00 ***
(1894.33)    (1452.94)    (1410.66)    (1259.97)    (2345.28)   
smoking_percentage -23905.42 *** -23651.69 *** -24292.38 *** -21927.21 ***        
(6138.51)    (6132.06)    (6127.03)    (5986.82)           
percentage_overweight -259.02 *** -262.03 *** -266.17 *** -256.08 *** -202.37 ***
(35.31)    (35.16)    (35.11)    (34.69)    (33.40)   
fruit_consumption -50.51 *** -50.72 *** -47.37 *** -49.24 *** -40.86 ***
(8.65)    (8.65)    (8.44)    (8.38)    (6.82)   
vegetable_consumption -38.28 *** -38.93 *** -40.47 *** -37.82 *** -33.86 ***
(6.98)    (6.94)    (6.89)    (6.74)    (6.18)   
animal_protein_consumption -1798.59 *** -1852.18 *** -1741.37 *** -1623.10 *** -1030.84 ***
(271.90)    (265.72)    (258.20)    (249.73)    (209.88)   
education_years -1270.47 *** -1267.36 *** -1222.73 *** -1107.61 *** -534.61 ** 
(202.70)    (202.66)    (201.23)    (190.70)    (165.12)   
physicians_1000 796.40     906.18     859.75                    
(498.63)    (484.46)    (484.20)                   
pocket_per_cap -10.11 *** -10.08 *** -7.57 *** -6.71 *** -8.67 ***
(2.51)    (2.51)    (2.05)    (2.00)    (1.68)   
healthcare_gdp_rate 6334.55                                    
(6802.52)                                   
gdp 0.06     0.06                            
(0.03)    (0.03)                           
Asia                                 -7140.44 ***
                                (1807.50)   
Africa                                 13792.58 ***
                                (1918.75)   
NorthA                                 -9335.46 ***
                                (1794.31)   
Europe                                 -5196.99 ** 
                                (1650.29)   
SouthA                                 -9146.72 ***
                                (1917.91)   
N 1025        1025        1025        1025        1025       
R2 0.64     0.64     0.64     0.63     0.76    
logLik -11018.25     -11018.69     -11020.21     -11021.80     -10814.42    
Hide
huxtable::huxreg(lm1,lm2,lm3,lm4, continent_fit,
number_format = "%.2f")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 39/39
AIC 22060.50     22059.38     22060.42     22061.60     21654.84    
*** p < 0.001; ** p < 0.01; * p < 0.05.
actual predicted
actual 1.0000000 0.8572182
predicted 0.8572182 1.0000000
From the 5 models, continent_ t was chosen as the nal model due to having all sigini cant variables, and the highest R
squared (0.76). As it can be seen from our predictions, our model is able to predict the correct DALY rates for 2013 with
85.72 percent accuracy.
Hide
best_model <- continent_fit
#Part 2: We wanted to test the prediction efficacy of our model by ensuring that it was able to predict with a certai
n level of cofidence the DALYS for the last full year of data (2013)
train <- total %>%
filter(period<2013)
predict <- total %>%
filter(period == 2013)
continent_fit2 <- lm(continent_fit, data = train)
final_prediction <- predict(continent_fit2, newdata = predict)
ac_pred <- data.frame(cbind(actual = predict$daly_adjusted, predicted = final_prediction))
correlation_accuracy <- cor(ac_pred)
correlation_accuracy

More Related Content

What's hot

Globalization and public health
Globalization and public healthGlobalization and public health
Globalization and public healthImroseRashid
 
Ppt session 9 4-2 food security indicators
Ppt session 9 4-2 food security indicatorsPpt session 9 4-2 food security indicators
Ppt session 9 4-2 food security indicatorsDebbie-Ann Hall
 
Understanding the Importance of Public Health
Understanding the Importance of Public HealthUnderstanding the Importance of Public Health
Understanding the Importance of Public HealthGreen Hope University
 
Disease for control elimination & eradication
Disease for control elimination & eradication Disease for control elimination & eradication
Disease for control elimination & eradication RINSAVAHEED1
 
Changing pattern of diseases
Changing pattern of diseasesChanging pattern of diseases
Changing pattern of diseasesAlteib Yousif
 
The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...
The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...
The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...Sumaiya Akter Snigdha
 
Nested case control study
Nested case control studyNested case control study
Nested case control studyPrayas Gautam
 
outbreak investigation - types of epidemics and investigating them
outbreak investigation - types of epidemics and investigating themoutbreak investigation - types of epidemics and investigating them
outbreak investigation - types of epidemics and investigating themTimiresh Das
 
Triple burden of disease
Triple burden of diseaseTriple burden of disease
Triple burden of diseaseSushantLuitel1
 
Investigation of epidemic presentation
Investigation of epidemic presentationInvestigation of epidemic presentation
Investigation of epidemic presentationMoustapha Ramadan
 
International health regulation
International health regulationInternational health regulation
International health regulationVenu Bolisetti
 
Burden of disease and determinants of health
Burden of disease and determinants of healthBurden of disease and determinants of health
Burden of disease and determinants of healthDrZahid Khan
 
Burden of nc ds, policies and programme for
Burden of nc ds, policies and programme forBurden of nc ds, policies and programme for
Burden of nc ds, policies and programme forDr. Dharmendra Gahwai
 

What's hot (20)

Globalization and public health
Globalization and public healthGlobalization and public health
Globalization and public health
 
Surveillance
SurveillanceSurveillance
Surveillance
 
Sources of Public Health Data
 Sources of Public Health Data Sources of Public Health Data
Sources of Public Health Data
 
Ppt session 9 4-2 food security indicators
Ppt session 9 4-2 food security indicatorsPpt session 9 4-2 food security indicators
Ppt session 9 4-2 food security indicators
 
Epidemiologic Transition
Epidemiologic Transition Epidemiologic Transition
Epidemiologic Transition
 
International health
International healthInternational health
International health
 
Global health
Global healthGlobal health
Global health
 
Understanding the Importance of Public Health
Understanding the Importance of Public HealthUnderstanding the Importance of Public Health
Understanding the Importance of Public Health
 
Disease for control elimination & eradication
Disease for control elimination & eradication Disease for control elimination & eradication
Disease for control elimination & eradication
 
Changing pattern of diseases
Changing pattern of diseasesChanging pattern of diseases
Changing pattern of diseases
 
Outbreak Investigation
Outbreak InvestigationOutbreak Investigation
Outbreak Investigation
 
The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...
The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...
The World Health Organization STEPwise Approach to Noncommunicable Disease Ri...
 
Nested case control study
Nested case control studyNested case control study
Nested case control study
 
outbreak investigation - types of epidemics and investigating them
outbreak investigation - types of epidemics and investigating themoutbreak investigation - types of epidemics and investigating them
outbreak investigation - types of epidemics and investigating them
 
Triple burden of disease
Triple burden of diseaseTriple burden of disease
Triple burden of disease
 
Investigation of epidemic presentation
Investigation of epidemic presentationInvestigation of epidemic presentation
Investigation of epidemic presentation
 
Measures of disease frequency
Measures of disease frequency Measures of disease frequency
Measures of disease frequency
 
International health regulation
International health regulationInternational health regulation
International health regulation
 
Burden of disease and determinants of health
Burden of disease and determinants of healthBurden of disease and determinants of health
Burden of disease and determinants of health
 
Burden of nc ds, policies and programme for
Burden of nc ds, policies and programme forBurden of nc ds, policies and programme for
Burden of nc ds, policies and programme for
 

Similar to The Burden of Disease: Data analysis, interpretation and linear regression

The Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regressionThe Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regressionAmanDesai8
 
Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014
Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014
Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014GlobalResearchUCSF
 
Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"
Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"
Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"HopkinsCFAR
 
One Health and food safety research in developing countries
One Health and food safety research in developing countriesOne Health and food safety research in developing countries
One Health and food safety research in developing countriesILRI
 
1362574283 economic burden dm sl
1362574283 economic burden dm sl1362574283 economic burden dm sl
1362574283 economic burden dm sldfsimedia
 
Global health care challenges and trends_ besty
Global health care challenges and trends_ bestyGlobal health care challenges and trends_ besty
Global health care challenges and trends_ bestyBesty Varghese
 
Global health care challenges and trends_ besty
Global health care challenges and trends_ bestyGlobal health care challenges and trends_ besty
Global health care challenges and trends_ bestyBesty Varghese
 
Innovations in Health Service Evaluation Techniques: Rafael Lozano
Innovations in Health Service Evaluation Techniques: Rafael LozanoInnovations in Health Service Evaluation Techniques: Rafael Lozano
Innovations in Health Service Evaluation Techniques: Rafael LozanoUWGlobalHealth
 
Precision prevention and tailored screening public
Precision prevention and tailored screening publicPrecision prevention and tailored screening public
Precision prevention and tailored screening publicGraham Colditz
 
Diabetes mellitus; characteristics, epidemiology & risk factors
Diabetes mellitus; characteristics, epidemiology & risk factorsDiabetes mellitus; characteristics, epidemiology & risk factors
Diabetes mellitus; characteristics, epidemiology & risk factorsYousef Biuk
 
HEALTH ISSUE AS A PUBLIC HEALTH PROBLEM
HEALTH ISSUE AS A PUBLIC HEALTH PROBLEMHEALTH ISSUE AS A PUBLIC HEALTH PROBLEM
HEALTH ISSUE AS A PUBLIC HEALTH PROBLEMAneesa K Ayoob
 
Framework for assessing the economic costs and burdens of zoonotic disease
Framework for assessing the economic costs and burdens of zoonotic diseaseFramework for assessing the economic costs and burdens of zoonotic disease
Framework for assessing the economic costs and burdens of zoonotic diseaseILRI
 
Kaplan University School of Health Sciences HW315 Unit 6.docx
Kaplan University School of Health Sciences HW315 Unit 6.docxKaplan University School of Health Sciences HW315 Unit 6.docx
Kaplan University School of Health Sciences HW315 Unit 6.docxtawnyataylor528
 
Tackling wasteful-spending-on-health-highlights-revised OECD
Tackling wasteful-spending-on-health-highlights-revised OECDTackling wasteful-spending-on-health-highlights-revised OECD
Tackling wasteful-spending-on-health-highlights-revised OECDCarlo Favaretti
 
Using financial incentives to increase testing uptake versie 2
Using financial incentives to increase testing uptake versie 2Using financial incentives to increase testing uptake versie 2
Using financial incentives to increase testing uptake versie 2Jennie van de Weerd
 

Similar to The Burden of Disease: Data analysis, interpretation and linear regression (20)

The Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regressionThe Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regression
 
Wesat2003
Wesat2003Wesat2003
Wesat2003
 
Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014
Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014
Comparative Effectiveness: UCSF East Africa Global Health -Kisumu 2014
 
Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"
Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"
Anton Pozniak: "The Test and Treat Approach: Achieving 90-90-90"
 
One Health and food safety research in developing countries
One Health and food safety research in developing countriesOne Health and food safety research in developing countries
One Health and food safety research in developing countries
 
1362574283 economic burden dm sl
1362574283 economic burden dm sl1362574283 economic burden dm sl
1362574283 economic burden dm sl
 
Global health care challenges and trends_ besty
Global health care challenges and trends_ bestyGlobal health care challenges and trends_ besty
Global health care challenges and trends_ besty
 
Global health care challenges and trends_ besty
Global health care challenges and trends_ bestyGlobal health care challenges and trends_ besty
Global health care challenges and trends_ besty
 
Innovations in Health Service Evaluation Techniques: Rafael Lozano
Innovations in Health Service Evaluation Techniques: Rafael LozanoInnovations in Health Service Evaluation Techniques: Rafael Lozano
Innovations in Health Service Evaluation Techniques: Rafael Lozano
 
Precision prevention and tailored screening public
Precision prevention and tailored screening publicPrecision prevention and tailored screening public
Precision prevention and tailored screening public
 
Health service determinants
Health service determinantsHealth service determinants
Health service determinants
 
Rethinking Prevention
Rethinking PreventionRethinking Prevention
Rethinking Prevention
 
Diabetes mellitus; characteristics, epidemiology & risk factors
Diabetes mellitus; characteristics, epidemiology & risk factorsDiabetes mellitus; characteristics, epidemiology & risk factors
Diabetes mellitus; characteristics, epidemiology & risk factors
 
HEALTH ISSUE AS A PUBLIC HEALTH PROBLEM
HEALTH ISSUE AS A PUBLIC HEALTH PROBLEMHEALTH ISSUE AS A PUBLIC HEALTH PROBLEM
HEALTH ISSUE AS A PUBLIC HEALTH PROBLEM
 
Framework for assessing the economic costs and burdens of zoonotic disease
Framework for assessing the economic costs and burdens of zoonotic diseaseFramework for assessing the economic costs and burdens of zoonotic disease
Framework for assessing the economic costs and burdens of zoonotic disease
 
Kaplan University School of Health Sciences HW315 Unit 6.docx
Kaplan University School of Health Sciences HW315 Unit 6.docxKaplan University School of Health Sciences HW315 Unit 6.docx
Kaplan University School of Health Sciences HW315 Unit 6.docx
 
Bringing Agriculture to the Table
Bringing Agriculture to the TableBringing Agriculture to the Table
Bringing Agriculture to the Table
 
Divina.ppt
Divina.pptDivina.ppt
Divina.ppt
 
Tackling wasteful-spending-on-health-highlights-revised OECD
Tackling wasteful-spending-on-health-highlights-revised OECDTackling wasteful-spending-on-health-highlights-revised OECD
Tackling wasteful-spending-on-health-highlights-revised OECD
 
Using financial incentives to increase testing uptake versie 2
Using financial incentives to increase testing uptake versie 2Using financial incentives to increase testing uptake versie 2
Using financial incentives to increase testing uptake versie 2
 

Recently uploaded

Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts ServiceVip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Serviceankitnayak356677
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...lizamodels9
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfmuskan1121w
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 

Recently uploaded (20)

Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Best Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting PartnershipBest Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting Partnership
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts ServiceVip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdf
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 

The Burden of Disease: Data analysis, interpretation and linear regression

  • 1. The Burden of Disease Group 30 Aman Desai, Jim Huang, Gloria Marín, Carmen Chen, Dimitris Charitos, Lorenzo Gherardi Index • Introduction to the case • Methodology • Data Visualization • Initial Regressions • Moving Forward
  • 2. The Burden of Disease [Slogan here] Index • Introduction to the case • Main variables • Exploratory Analysis • Initial Regressions • Moving Forward DALYs Disability Adjusted Life Years • Metric used to measure the Burden of Disease • DALY includes the sum of mortality and morbidity due to a specific disease • One DALY = loss of 1 year in good health because of • Premature death • Disease • Disability - Mortality is used as a method to assess a population's health - Through ‘child mortality’ - Through ‘life expectancy’ - The Problem with this method is that it does not account for a population that lives through suffering due to a disease which otherwise prevents a normal life. - For people to get healthy, attention needs to be given to the impact on the lives of people suffering with disease. Years of contribution to ones one’s community, industry, and nation, are lost. ASSIGNMENT PURPOSE: 1. Understand causes 2. Identify factors that magnify its impact Introduction to the Case
  • 3. The Burden of Disease [Slogan here] Introduction to the Case: background and preparation Communicable diseases Non-communicable diseases (NCDs) Injuries Diarrhoea, lower respiratory & other common infectious diseases Cardiovascular diseases (inc. stroke, heart disease and heart failure) Road injuries Neonatal disorders Cancers Other transport injuries Maternal disorders Respiratory disease Falls Malaria & neglected tropical diseases Diabetes, blood and endocrine diseases Drowning Nutritional deficiencies Mental and substance use disorders Fire, heat and hot substances HIV/AIDS Liver diseases Poisonings Tuberculosis Digestive diseases Self-harm Other communicable diseases Musculoskeletal disorders Interpersonal violence Neurological disorders (including dementia) Conflict & terrorism Other NCDs Natural disasters
  • 4. Methodology: Linear Model and Data Visualization Step 1. Identify relevant variables • Explanatory Variable (x): Select factors covering the following dimensions from hundreds of other factors: Diet habit (E.g., fruit consumption), Healthcare level (E.g., healthcare expense), Living habit (smoking %), and Other demographics (E.g., education, overweight %) • Response Variable (y): Choose ‘Overall DALYs’, ‘Communicable Diseases DALYs’, ‘Non- Communicable Diseases DALYs’, and ‘Injuries DALYs’ as our response variable from 24 possible variable by comparing the models Step 2. Check for non-linear relations Step 3. Generate the linear regression and prediction model • Dropped all the insignificant level • Checked the VIF to eliminate the risk of multilinearity Data Visualization • Bar Chart and Stacked Bar Chart: Compare causes of DALYs by continent • Area Chart: Look into the DALY rate over 2000~2017 by continent • Scatter Plot: Measure DALY due to Proportion of GDP spent on Healthcare by continent Linear Model
  • 5. Accumulated DALYs per Capita 1980 - 2017 Due to Communicable Diseases Due to Non-Communicable Diseases Due to Injuries • Overall, Africa has the highest accumulated average DALY rate (9), followed by Asia (5), and Oceania (4). • The high contrast of Africa is mainly due to communicable diseases, with a rate triple of that of the next highest continent, Asia • DALY rate for non-communicable diseases and injuries are relatively uniform around the world. • Africa's communicable DALY rate has declined since 2008. Despite this, the burden of disease on the continent remains high and this leaves room to consider the causes and potential solutions for this. Summary Communicable DALYs over 1980 - 2017 Results & Conclusion: Data Visualization (1)
  • 6. • The variables affecting DALY have been further broken down. The largest contribution factor found were: Ø Cardiovascular Diseases for Non- Communicable Diseases, and Ø Unintentional Injuries for Injuries. • There was no significant cause of disease found across the different continents. • GDP has a negative correlation to DALY for Communicable Diseases, however, the proprtion of GDP used has a positive corrlation to the same. This could be because poorer countries have a higher liklihood of having to combat communicable diseases and as a result spend more of their GDP on healthcare. Summary Results & Conclusion: Data Visualization (2) Causes of DALYs by Continent due to Non-Communicable Disease Causes of DALYs by Continent due to Injuries Healthcare Expense vs. DALY from Non-Communicable Disease Healthcare Expense vs. DALY from Communicable Disease Analysis of the Correlation among Variables
  • 7. 1. All explanatory variables’ Pr(>|t|) < 0.01 Results & Conclusion: Final Model DALY= 66523 - 202.37 overweight% - 33.86 veg_consump - 1030.84 animal_protein_consump -534.61 education - 8.67 pocket_per_cap - 40.86 fruit_consump -7140.44 Asia + 13792.58 Africa -9335.46 NorthA -5196.99 Europe -9146.72 SouthA Model of best fit 2. VIF (Variance Inflation Factor) is <10 • By ruling out all insignificant variables, we had 7 variables in our best model. • The risk of multicollinearity was checked by ensuring that VIF <10. • The high R-squared obtained (75.35%) suggests that the model explains the variance of DALY accurately. Summary
  • 8. Results & Conclusion: Prediction Step 1. Using our linear model, we have estimated the DALY rates worldwide for 2013 using our data for all the years until 2012. Step 2. The data was filtered to all periods before 2013 and a linear model was created. Step 3. Using the Linear Model, data for 2013 was predicted. Step 4. Compared to the actual data available for 2013, the accuracy was determined Prediction Accuracy was 85.7% Prediction
  • 9. Moving Forward: Adding New Variables What other ‘external’ elements may be magnifying results? COMMON TO ALL • Percentage of population insured with health insurance. • Number of medical doctors per 1,000 people. • Number of nurses per 1,000 people. • Out-of-pocket expenditure for healthcare. SPECIFIC TO a) Communicable, maternal, neonatal, and nutritional diseases • Nutritional deficiencies. • Hygiene practices. • Housing space per person. b) Non-communicable diseases • Physical inactivity. • Wellbeing. • Genetics. c) Injuries • Surveillance. • Regulations for safety.
  • 10. The Burden of Disease Group 30 - Aman Desai, Jim Huang, Gloria Marín, Carmen Chen, Dimitris Charitos, Lorenzo Gherardi Introduction A glance of DALY Linear Regressions DALY= 66523 - 202.37 overweight% - 33.86 veg_consum - 1030.84 animal_consum -534.61 education - 8.67 pocket/cap - 40.86 fruit_consum -7140.44 Asia + 13792.58 Africa -9335.46 NorthA -5196.99 Europe -9146.72 SouthA All explanatory variables’ Pr(>|t|) < 0.01 VIF (Variance Inflation Factor) is <10 Conclusions Moving Forward Methodology (Model) Model of best fit Disability Adjusted Life Years DALYs • Metric used to measure the Burden of Disease • It includes the sum of mortality and morbidity • DALY = loss of 1 year in good health because of Premature death, Disease, Disability Aim of study • Understand causes • Identify factors that magnify the impact Background & Preparation Burden of Disease, 2017 Disease Burden due to Communicable disease vs GDP per capita Category of Disease • Communicable disease • Non-Communicable disease (e.g., Cancers) • Injuries (e.g., Falls, Fire) DALY Around the World Due to Communicable Disease Due to Non-Communicable Disease Step 1. Identify relevant variables • Explanatory Variable (x): Select factors covering following dimensions from hundreds of other factors: Diet habit (E.g., fruit consumption), Healthcare level (E.g., healthcare expense), Living habit (smoking %), and Other demographics (E.g., education, overweight % • Response Variable (y): Choose ‘Overall DALYs’, ‘Communicable Diseases DALYs’, ‘Non-Communicable Diseases DALYs’, and ‘Injuries DALYs’ as our response variable from 24 possible variable by comparing the models Step 2. Check for non-linear relations Step 3. Generate the regression model • Dropped all the insignificant level • Checked the VIF to eliminate the risk of multilinearity Statistical Technique Healthcare Expense vs. DALY from Non- and Communicable Disease Stacked Bar Chart - Causes of DALYs by continent for Non- Communicable Disease and Injuries What other ‘external’ elements may be magnifying results? • Common to all:: • Percentage of population insured with health insurance • Number of medical doctors/Nurse per 1,000 people • Specific to a) Communicable diseases : Family size b) Non-communicable diseases: Literacy rate c) Injuries: Alcohol consumption Linear Regression • The best model had 7 variables (overweight%, veg_consum, animal_consum, education, pocket/cap, fruit _consum, continent) including in best model with all the variavles Pr(>|t|) < 0.01 and VIF<10 • High R-squared (75.35%) suggests the model explain the variance of DALY well Data Visualization • Africa has the highest Avg. DALY rate (c 9), followed by Asia (c 5), and Oceania. • The high contrast of Africa is mainly due to communicable diseases with a rate more than triples the second highest continent. • Africa‘s communicable DALY rate declines since 2008 but remain high over other continents, leaving room to further consider the causes and potential solutions Communicable DALYs over 1980 - 2017 Analysis of Correlation among Variables
  • 11. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 1/39 CM30_GroupProject_SG30 Team 30 2021-02-14 1 Burden of Disease Mortality rates are a common method used to assess a population’s health. Often used rates for such assessment include child mortality or life expectancy. However, a focus on mortality neglects the suffering caused to people who still live with the disease. A disease impacts, in a direct or indirect manner, the ability of living a normal life. Potential contributions to one’s community, work, or nation, are often lost. Our study, therefore, seeks to understand the magnitude of the burden of diseases by the different disease types, as well as identify factors that amplify such effects. The metric that will be used to measure disease burden is called DALY, which stands for Disability Adjusted Life Years. This metric includes the sum of mortality and morbidity. One DALY stands for 1 year loss in good health due to either premature death, disease, or disability. 1.1 Data import and inspection 1.1.0.1 Importing data for overall disease burden (DALY) Rows: 48,698 Columns: 7 $ entity <chr> … $ code <chr> … $ year <dbl> … $ total_population_gapminder_hyde_un <dbl> … $ continent <chr> … $ health_expenditure_per_capita_current_us <dbl> … $ dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate <dbl> … Code Hide #source: https://ourworldindata.org/burden-of-disease # Reading first file daly_total <- read_csv(here::here('Data',"disease-burden-vs-health-expenditure-per-capita.csv")) %>% clean_names() # Checking for variable types glimpse(daly_total) Hide # Changing variable names and variable types daly_total<- daly_total %>% mutate( location=as.factor(entity), period=year, health_expenditure_per_capita=health_expenditure_per_capita_current_us, daly_adjusted=dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate, total_population = total_population_gapminder_hyde_un) %>% select(location,period,daly_adjusted,health_expenditure_per_capita,total_population) 1 Burden of Disease
  • 12. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 2/39 Although important as a whole, DALY rates can futher be divided into 3 sub-categories of disease cause; these being: communicable diseases, non-communicable diseases, and injuries. We, therefore, included the datasets for each individual subcategory below. 1.1.0.2 Adding data for burden of non-communicable diseases Rows: 6,468 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rate <dbl> … Rows: 6,468 Columns: 3 $ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani… $ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,… $ daly_ncds <dbl> 41145.51, 40587.17, 39644.60, 39821.31, 40641.76, 40790.73,… 1.1.0.3 Adding data for burden from communicable, neonatal, maternal and nutritional diseases Rows: 6,468 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex_both_age_age_stan dardized_rate <dbl> … Hide #source:https://ourworldindata.org/burden-of-disease #Reading the file ncds <- read_csv(here::here('Data',"burden-of-disease-rates-from-ncds.csv")) %>% clean_names() # Checking for variable types glimpse(ncds) # Changing variable names and variable types ncds<- ncds %>% mutate(location=as.factor(entity), period=year, daly_ncds=dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rat e) %>% select(location,period,daly_ncds) glimpse(ncds) #Merging data frames total <- merge(daly_total,ncds,by=c("location","period")) Hide #source:https://ourworldindata.org/burden-of-disease #Reading the file cnmnd <- read_csv(here::here('Data',"burden-of-disease-rates-from-communicable-neonatal-maternal-nutritional-disease s.csv")) %>% clean_names() # Checking for variable types glimpse(cnmnd) Hide
  • 13. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 3/39 Rows: 6,468 Columns: 3 $ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghan… $ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999… $ daly_cnmnd <dbl> 51181.84, 47263.29, 38908.25, 36882.69, 38809.79, 38262.20… 1.1.0.4 Adding data for burden from injuries, violence, self-harm and accidents Rows: 6,468 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate <dbl> … Rows: 6,468 Columns: 3 $ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani… $ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,… $ daly_ivsa <dbl> 11775.715, 13390.289, 12365.622, 11530.363, 13546.148, 1238… Within each of the 3 sub-categories of disease causes, there are speci c diseases that classify as such. We included all categories in our dataset. 1.1.0.5 Adding data for disease burden by cause (DALY by cause) # Changing variable names and variable types cnmnd<- cnmnd %>% mutate(location=as.factor(entity), period=year, daly_cnmnd=dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex _both_age_age_standardized_rate) %>% select(location,period,daly_cnmnd) glimpse(cnmnd) Hide #Merging data frames total <- merge(total,cnmnd,by=c("location","period")) Hide #source:https://ourworldindata.org/burden-of-disease #Reading the file ivsa <- read_csv(here::here('Data',"burden-of-disease-rates-from-injuries.csv")) %>% clean_names() # Checking for variable types glimpse(ivsa) Hide # Changing variable names and variable types ivsa<- ivsa %>% mutate(location=as.factor(entity), period=year, daly_ivsa=dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate) %>% select(location,period,daly_ivsa) glimpse(ivsa) Hide #Merging data frames total <- merge(total,ivsa,by=c("location","period"))
  • 14. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 4/39 Aside from the main variables, additional variables that may be contributing to the nal effect of DALY rates were included in the dataset. 1.1.0.6 Adding data for GDP per capita Hide #source: https://ourworldindata.org/burden-of-disease # Reading second file daly_by_cause <- read_csv(here::here('Data',"burden-of-disease-by-cause.csv")) %>% clean_names() # Checking for variable types #glimpse(daly_by_cause) # Changing variable names and variable types daly_by_cause <- daly_by_cause %>% mutate( location=as.factor(entity), period=year, daly_conflict_terrorism=dal_ys_disability_adjusted_life_years_conflict_and_terrorism_sex_both_age_all_ages_numbe r, daly_hiv_tuberculosis=dal_ys_disability_adjusted_life_years_hiv_aids_and_tuberculosis_sex_both_age_all_ages_numbe r, daly_diahrrea_respiratory=dal_ys_disability_adjusted_life_years_diarrhea_lower_respiratory_and_other_common_infec tious_diseases_sex_both_age_all_ages_number, daly_cvs=dal_ys_disability_adjusted_life_years_cardiovascular_diseases_sex_both_age_all_ages_number, daly_self_harm=dal_ys_disability_adjusted_life_years_self_harm_sex_both_age_all_ages_number, daly_violence=dal_ys_disability_adjusted_life_years_interpersonal_violence_sex_both_age_all_ages_number, daly_nutritional_deficiencies=dal_ys_disability_adjusted_life_years_nutritional_deficiencies_sex_both_age_all_age s_number, daly_transport_injuries=dal_ys_disability_adjusted_life_years_transport_injuries_sex_both_age_all_ages_number, daly_unintentional_injuries=dal_ys_disability_adjusted_life_years_unintentional_injuries_sex_both_age_all_ages_nu mber, daly_maternal_disorders=dal_ys_disability_adjusted_life_years_maternal_disorders_sex_both_age_all_ages_number, daly_neonatal_disorders=dal_ys_disability_adjusted_life_years_neonatal_disorders_sex_both_age_all_ages_number, daly_other_communicable=dal_ys_disability_adjusted_life_years_other_communicable_maternal_neonatal_and_nutritiona l_diseases_sex_both_age_all_ages_number, daly_nature_forces=dal_ys_disability_adjusted_life_years_exposure_to_forces_of_nature_sex_both_age_all_ages_numbe r, daly_chronic_respiratory=dal_ys_disability_adjusted_life_years_chronic_respiratory_diseases_sex_both_age_all_ages _number, daly_chronic_liver=dal_ys_disability_adjusted_life_years_cirrhosis_and_other_chronic_liver_diseases_sex_both_age_ all_ages_number, daly_digestive=dal_ys_disability_adjusted_life_years_digestive_diseases_sex_both_age_all_ages_number, daly_tropical_and_malaria=dal_ys_disability_adjusted_life_years_neglected_tropical_diseases_and_malaria_sex_both_ age_all_ages_number, daly_musculoskeletal=dal_ys_disability_adjusted_life_years_musculoskeletal_disorders_sex_both_age_all_ages_numbe r, daly_other_non_communicable=dal_ys_disability_adjusted_life_years_other_non_communicable_diseases_sex_both_age_al l_ages_number, daly_neurological=dal_ys_disability_adjusted_life_years_neurological_disorders_sex_both_age_all_ages_number, daly_mental_and_substance=dal_ys_disability_adjusted_life_years_mental_and_substance_use_disorders_sex_both_age_a ll_ages_number, daly_diabetes_urogenital_blood_endocrine=dal_ys_disability_adjusted_life_years_diabetes_urogenital_blood_and_endo crine_diseases_sex_both_age_all_ages_number, daly_neoplasms=dal_ys_disability_adjusted_life_years_neoplasms_sex_both_age_all_ages_number)%>% select(location, period,daly_conflict_terrorism,daly_hiv_tuberculosis,daly_diahrrea_respiratory,daly_cvs,daly_self_ harm,daly_violence,daly_nutritional_deficiencies,daly_transport_injuries,daly_unintentional_injuries,daly_mat ernal_disorders,daly_neonatal_disorders,daly_other_communicable,daly_nature_forces,daly_chronic_respiratory,d aly_chronic_liver,daly_digestive,daly_tropical_and_malaria,daly_musculoskeletal,daly_other_non_communicable,d aly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocrine,daly_neoplasms) #glimpse(daly_by_cause) # Merging dataframes total <- merge(total,daly_by_cause,by=c("location","period")) #We will consider taking out health expenditure per capita since it has a complete rate of 57.4% and may distort the final data. Hide
  • 15. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 5/39 #source: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD # Reading third file gdp <- read_csv(here::here('Data',"API_NY.GDP.PCAP.CD_DS2_en_csv_v2_1926744.csv"),skip=3) %>% clean_names() # Checking for variable types glimpse(gdp)
  • 16. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 6/39 Rows: 264 Columns: 66 $ country_name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra"… $ country_code <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG"… $ indicator_name <chr> "GDP per capita (current US$)", "GDP per capita (curre… $ indicator_code <chr> "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", … $ x1960 <dbl> NA, 59.77319, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1807… $ x1961 <dbl> NA, 59.86087, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1874… $ x1962 <dbl> NA, 58.45801, NA, NA, NA, NA, NA, 1155.89017, NA, NA, … $ x1963 <dbl> NA, 78.70639, NA, NA, NA, NA, NA, 850.30474, NA, NA, N… $ x1964 <dbl> NA, 82.09523, NA, NA, NA, NA, NA, 1173.23821, NA, NA, … $ x1965 <dbl> NA, 101.10830, NA, NA, NA, NA, NA, 1279.11343, NA, NA,… $ x1966 <dbl> NA, 137.59435, NA, NA, NA, NA, NA, 1272.80298, NA, NA,… $ x1967 <dbl> NA, 160.89859, NA, NA, NA, NA, NA, 1062.54355, NA, NA,… $ x1968 <dbl> NA, 129.10832, NA, NA, NA, 224.87811, NA, 1141.08048, … $ x1969 <dbl> NA, 129.32971, NA, NA, NA, 240.03563, NA, 1329.05866, … $ x1970 <dbl> NA, 156.5189, NA, NA, 3238.5568, 262.8663, NA, 1322.59… $ x1971 <dbl> NA, 159.56758, NA, NA, 3498.17365, 295.97104, NA, 1372… $ x1972 <dbl> NA, 135.31731, NA, NA, 4217.17358, 343.56582, NA, 1408… $ x1973 <dbl> NA, 143.14465, NA, NA, 5342.16856, 423.13508, NA, 2097… $ x1974 <dbl> NA, 173.65376, NA, NA, 6319.73903, 777.56068, NA, 2844… $ x1975 <dbl> NA, 186.5109, NA, NA, 7169.1010, 836.2083, 26847.7944,… $ x1976 <dbl> NA, 197.4455, NA, NA, 7152.3751, 1007.1404, 30118.1378… $ x1977 <dbl> NA, 224.2248, NA, NA, 7751.3702, 1123.1433, 33823.3196… $ x1978 <dbl> NA, 247.3541, NA, NA, 9129.7062, 1193.7456, 28456.7374… $ x1979 <dbl> NA, 275.7382, NA, NA, 11820.8494, 1563.7035, 33512.741… $ x1980 <dbl> NA, 272.6553, 710.9816, NA, 12377.4116, 2052.9558, 427… $ x1981 <dbl> NA, 264.1113, 642.3839, NA, 10372.2328, 2050.7698, 449… $ x1982 <dbl> NA, NA, 619.9614, NA, 9610.2663, 1864.8707, 40026.1663… $ x1983 <dbl> NA, NA, 623.4406, NA, 8022.6548, 1699.2152, 34843.1029… $ x1984 <dbl> NA, NA, 637.7152, 639.4847, 7728.9067, 1672.2788, 3230… $ x1985 <dbl> NA, NA, 758.2376, 639.8659, 7774.3938, 1606.7558, 2972… $ x1986 <dbl> 6472.5020, NA, 685.2701, 693.8735, 10361.8160, 1489.84… $ x1987 <dbl> 7885.7965, NA, 756.2619, 674.7934, 12616.1676, 1543.51… $ x1988 <dbl> 9764.7900, NA, 792.3031, 652.7743, 14304.3570, 1476.04… $ x1989 <dbl> 11392.4558, NA, 890.5541, 697.9956, 15166.4379, 1505.5… $ x1990 <dbl> 12307.3117, NA, 947.7042, 617.2304, 18878.5060, 2009.4… $ x1991 <dbl> 13496.0031, NA, 865.6927, 336.5870, 19532.5402, 1929.6… $ x1992 <dbl> 14046.5038, NA, 656.3618, 200.8522, 20547.7118, 2027.8… $ x1993 <dbl> 14936.8272, NA, 441.2007, 367.2792, 16516.4710, 1996.9… $ x1994 <dbl> 16241.0465, NA, 328.6733, 586.4163, 16234.8090, 1989.4… $ x1995 <dbl> 16439.3564, NA, 397.1795, 750.6044, 18461.0649, 2072.7… $ x1996 <dbl> 16586.0684, NA, 522.6438, 1009.9777, 19017.1746, 2235.… $ x1997 <dbl> 17927.7496, NA, 514.2952, 717.3806, 18353.0597, 2319.0… $ x1998 <dbl> 19078.3432, NA, 423.5937, 813.7903, 18894.5215, 2188.9… $ x1999 <dbl> 19356.2034, NA, 387.7843, 1033.2417, 19261.7105, 2331.… $ x2000 <dbl> 20620.7006, NA, 556.8363, 1126.6833, 21854.2468, 2605.… $ x2001 <dbl> 20669.0320, NA, 527.3335, 1281.6594, 22971.5355, 2506.… $ x2002 <dbl> 20436.8871, 179.4266, 872.4945, 1425.1248, 25066.8822,… $ x2003 <dbl> 20833.7616, 190.6838, 982.9609, 1846.1188, 32271.9639,… $ x2004 <dbl> 22569.9750, 211.3821, 1255.5640, 2373.5798, 37969.1750… $ x2005 <dbl> 23300.0396, 242.0313, 1902.4223, 2673.7873, 40066.2569… $ x2006 <dbl> 24045.2725, 263.7337, 2599.5665, 2972.7433, 42675.8128… $ x2007 <dbl> 25835.1327, 359.6932, 3121.9956, 3595.0372, 47803.6936… $ x2008 <dbl> 27084.7037, 364.6607, 4080.9414, 4370.5401, 48718.4969… $ x2009 <dbl> 24630.4537, 438.0760, 3122.7808, 4114.1401, 43503.1855… $ x2010 <dbl> 23512.6026, 543.3030, 3587.8838, 4094.3503, 40852.6668… $ x2011 <dbl> 24985.9933, 591.1628, 4615.4680, 4437.1429, 43335.3289… $ x2012 <dbl> 24713.6980, 641.8715, 5100.0958, 4247.6300, 38686.4613… $ x2013 <dbl> 26189.4355, 637.1655, 5254.8823, 4413.0609, 39538.7667… $ x2014 <dbl> 26647.9381, 613.8567, 5408.4105, 4578.6320, 41303.9294… $ x2015 <dbl> 27980.8807, 578.4664, 4166.9797, 3952.8012, 35762.5231… $ x2016 <dbl> 28281.3505, 509.2187, 3506.0729, 4124.0557, 37474.6654… $ x2017 <dbl> 29007.6930, 519.8848, 4095.8129, 4531.0208, 38962.8804… $ x2018 <dbl> NA, 493.7504, 3289.6467, 5284.3802, 41793.0553, 6601.8… $ x2019 <dbl> NA, 507.1034, 2790.7266, 5353.2449, 40886.3912, 6584.7… $ x2020 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… $ x66 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… Hide
  • 17. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 7/39 1.1.0.7 Adding data for smoking percentages 1.1.0.8 Adding data for healthcare expenditure per capita Rows: 4,675 Columns: 4 $ entity <chr> "Afghan… $ code <chr> "AFG", … $ year <dbl> 2002, 2… $ health_expenditure_per_capita_ppp_constant_2011_international <dbl> 75.9835… # Changing variable names and variable types gdp <- gdp %>% gather(year, gdp,-c(country_name, country_code,indicator_name,indicator_code)) %>% mutate(location=as.factor(country_name), period=readr::parse_number(year)) %>% select(location,period,gdp) # Merging dataframes total <- merge(total,gdp,by=c("location","period")) #skim(total) Hide #source: http://ghdx.healthdata.org/record/ihme-data/gbd-2015-smoking-prevalence-1980-2015 #Reading fourth file smoking_percentage <- read_csv(here::here('Data',"IHME_GBD_2015_SMOKING_PREVALENCE_1980_2015_Y2017M04D05.CSV")) %>% clean_names() # Checking for variable types #skim(smoking_percentage) # Changing variable names and variable types smoking_percentage <- smoking_percentage %>% filter(age_group_name=="Age-standardized", metric=="Percent", sex=="Both") %>% mutate(location=as.factor(location_name), period=year_id, smoking_percentage=mean) %>% select(location,period,smoking_percentage) #skim(smoking_percentage) #Merging data frames total <- merge(total,smoking_percentage,by=c("location","period")) Hide #source:https://ourworldindata.org/grapher/annual-healthcare-expenditure-per-capita?tab=chart&time=1995..2014&region= World #Reading fifth file healthcare_expenditure <- read_csv(here::here('Data',"annual-healthcare-expenditure-per-capita.CSV")) %>% clean_names() # Checking for variable types glimpse(healthcare_expenditure) Hide
  • 18. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 8/39 1.1.0.9 Adding data for percentage of population being overweight Rows: 8,316 Columns: 4 $ entity <chr> "Afghanistan", "A… $ code <chr> "AFG", "AFG", "AF… $ year <dbl> 1975, 1976, 1977,… $ prevalence_of_overweight_adults_both_sexes_who_2019 <dbl> 5.3, 5.5, 5.7, 5.… 1.1.0.10 Adding data for fruit consumption per capita Rows: 11,028 Columns: 4 $ entity <chr> "Afg… $ code <chr> "AFG… $ year <dbl> 1961… $ fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 41.1… # Changing variable names and variable types healthcare_expenditure <- healthcare_expenditure %>% mutate(location=as.factor(entity), period=year, healthcare_expenditure=health_expenditure_per_capita_ppp_constant_2011_international) %>% select(location,period,healthcare_expenditure) #glimpse(healthcare_expenditure) #Merging data frames total <- merge(total,healthcare_expenditure,by=c("location","period")) Hide #source: https://ourworldindata.org/obesity #Reading sixth file percentage_overweight <- read_csv(here::here('Data',"share-of-adults-who-are-overweight.csv")) %>% clean_names() # Checking for variable types glimpse(percentage_overweight) Hide # Changing variable names and variable types percentage_overweight <- percentage_overweight %>% mutate(location=as.factor(entity), period=year, percentage_overweight=prevalence_of_overweight_adults_both_sexes_who_2019) %>% select(location,period,percentage_overweight) #glimpse(percentage_overweight) #Merging data frames total <- merge(total,percentage_overweight,by=c("location","period")) Hide #source: https://ourworldindata.org/diet-compositions #Reading seventh file fruit_consumption <- read_csv(here::here('Data',"fruit-consumption-per-capita.csv")) %>% clean_names() # Checking for variable types glimpse(fruit_consumption) Hide
  • 19. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 9/39 1.1.0.11 Adding data for vegetable consumption per capita Rows: 11,028 Columns: 4 $ entity <chr> "Afghanistan", … $ code <chr> "AFG", "AFG", "… $ year <dbl> 1961, 1962, 196… $ vegetables_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 36.75, 37.47, 3… 1.1.0.12 Adding data for animal based foods consumption per capita # Changing variable names and variable types fruit_consumption <- fruit_consumption %>% mutate(location=as.factor(entity), period=year, fruit_consumption=fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020) %>% select(location,period,fruit_consumption) #glimpse(fruit_consumption) #Merging data frames total <- merge(total,fruit_consumption,by=c("location","period")) Hide #source: https://ourworldindata.org/diet-compositions #Reading eigth file vegetable_consumption <- read_csv(here::here('Data',"vegetable-consumption-per-capita.csv")) %>% clean_names() #Checking for variable types glimpse(vegetable_consumption) Hide ## Changing variable names and variable types vegetable_consumption <- vegetable_consumption %>% mutate(location=as.factor(entity), period=year, vegetable_consumption=vegetables_food_supply_quantity_kg_capita_yr_fao_2020) %>% select(location,period,vegetable_consumption) #glimpse(vegetable_consumption) #Merging dataframes total <- merge(total,vegetable_consumption,by=c("location","period")) #skim(total) Hide #source: https://ourworldindata.org/diet-compositions #Reading ninth file animal_protein_consumption <-read_csv(here::here('Data',"share-of-calories-from-animal-protein-vs-gdp-per-capita.csv" )) %>% clean_names() #Checking for variable types glimpse(animal_protein_consumption)
  • 20. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 10/39 Rows: 24,472 Columns: 7 $ entity <chr> … $ code <chr> … $ year <dbl> … $ total_population_gapminder <dbl> … $ continent <chr> … $ share_of_calories_from_animal_protein_fao_2017 <dbl> … $ real_gdp_per_capita_in_2011us_2011_benchmark_maddison_project_database_2018 <dbl> … 1.1.0.13 Adding data for mean years of schooling 1.1.0.14 Adding data for physicians per 1000 people Hide #Changing variable names and type animal_protein_consumption <- animal_protein_consumption %>% mutate(location=as.factor(entity), period=year, animal_protein_consumption=share_of_calories_from_animal_protein_fao_2017) %>% select(location,period,animal_protein_consumption) #glimpse(animal_protein_consumption) #Mergining dataframes total <- merge(total,animal_protein_consumption,by=c("location","period")) #glimpse(total) Hide #source: https://ourworldindata.org/global-education #Reading file education_years <- read_csv(here::here('Data',"mean-years-of-schooling-1.csv")) %>% clean_names() #Checking for variable types #glimpse(education_years) #Changing variable names and type education_years <- education_years %>% mutate(location=as.factor(entity), period=year, education_years=average_total_years_of_schooling_for_adult_population_lee_lee_2016_barro_lee_2018_and_undp_2 018) %>% select(location,period,education_years) #glimpse(education_years) #Merging dataframes total <- merge(total,education_years,by=c("location","period")) Hide
  • 21. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 11/39 1.1.0.15 Adding data for nurses per 1000 people Rows: 1,542 Columns: 4 $ entity <chr> "Afghanistan", "Afghanistan", "A… $ code <chr> "AFG", "AFG", "AFG", "AFG", "AFG… $ year <dbl> 2005, 2006, 2007, 2008, 2009, 20… $ nurses_and_midwives_per_1_000_people <dbl> 0.612000, 0.462000, 0.519000, 0.… Nurses had too little incidences. Thus, it was not included in our nal dataset. 1.1.0.16 Adding data for out-of-pocket expenditure Rows: 3,002 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure <dbl> … #source:https://ourworldindata.org/grapher/physicians-per-1000-people #Reading file physicians <- read_csv(here::here('Data',"physicians-per-1000-people.csv")) %>% clean_names() #Checking for variable types #glimpse(physicians) #Changing variable names and type physicians <- physicians %>% mutate(location=as.factor(entity), period=year, physicians_1000=physicians_per_1_000_people) %>% select(location,period,physicians_1000) #glimpse(physicians) #Merging dataframes total <- merge(total,physicians,by=c("location","period")) Hide #source:https://ourworldindata.org/grapher/nurses-and-midwives-per-1000-people? #Reading file nurses <- read_csv(here::here('Data',"nurses-and-midwives-per-1000-people.csv")) %>% clean_names() #Checking for variable types glimpse(nurses) Hide #source:https://ourworldindata.org/grapher/out-of-pocket-expenditure-per-capita-on-healthcare #Reading file pocket_exp <- read_csv(here::here('Data',"out-of-pocket-expenditure-per-capita-on-healthcare.csv")) %>% clean_names() #Checking for variable types glimpse(pocket_exp) Hide
  • 22. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 12/39 1.1.0.17 Adding data for health protection coverage Rows: 162 Columns: 4 $ entity <chr> "Albania", "… $ code <chr> "ALB", "DZA"… $ year <dbl> 2008, 2005, … $ share_of_population_covered_by_health_insurance_ilo_2014 <dbl> 23.6, 85.2, … Health coverage had too little incidences. Thus, it was not included in our nal dataset. 1.1.0.18 Adding data for literacy rate Rows: 215 Columns: 4 $ entity <chr> "Afghanistan", "Albania", "Algeria", … $ code <chr> "AFG", "ALB", "DZA", "ASM", "AND", "A… $ year <dbl> 2000, 2011, 2006, 1980, 2011, 2011, 1… $ literacy_rate_cia_factbook_2016 <dbl> 28.1, 96.8, 72.6, 97.0, 100.0, 70.4, … Literacy had too little incidences. Thus, it was not included in our nal dataset. 1.1.0.19 Adding data for grouping locations into continents Rows: 194 Columns: 2 $ continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "Africa",… $ country <chr> "Algeria", "Angola", "Benin", "Botswana", "Burkina", "Burun… #Changing variable names and type pocket_exp <- pocket_exp %>% mutate(location=as.factor(entity), period=year, pocket_per_cap=out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure) %>% select(location,period,pocket_per_cap) #Merging dataframes total <- merge(total,pocket_exp,by=c("location","period")) Hide #Reading file health_protect <- read_csv(here::here('Data',"health-protection-coverage.csv")) %>% clean_names() #Checking for variable types glimpse(health_protect) Hide #Reading file literacy <- read_csv(here::here('Data',"literacy-rate-by-country.csv")) %>% clean_names() #Checking for variable types glimpse(literacy) Hide #source: https://github.com/dbouquin/IS_608/blob/master/NanosatDB_munging/Countries-Continents.csv #Reading file continents <- read_csv(here::here('Data',"Continents.csv")) %>% clean_names() #Checking for variable types glimpse(continents)
  • 23. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 13/39 Rows: 194 Columns: 2 $ location <fct> Algeria, Angola, Benin, Botswana, Burkina, Burundi, Cameroo… $ continent <fct> Africa, Africa, Africa, Africa, Africa, Africa, Africa, Afr… 1.1.0.20 Dealing with NAs After including all potentially-relevant and signi cant variables into our dataset, an inital exploration of the data was made. 1.2 Exploratory Data Analsys 1.2.0.1 DALY Rates per Continent Hide #Changing variable names and type continents <- continents %>% mutate(location=as.factor(country), continent=as.factor(continent))%>% select(location, continent) glimpse(continents) Hide #Merging dataframes total <- merge(total,continents,by=c("location")) Hide #Adding variables of per capita healthcare expenditure - per capita gdp total <- total%>% mutate(healthcare_gdp_rate = healthcare_expenditure/gdp) #skim(total) total <- total %>% na.omit() #skim(total) Hide #Selecting data only from 1980 - onward (to gain better insights on the recent situation) total_short <-total %>% filter(period>=1980) #Re-coding DALY variables as averages per continent, per year total_cont<-total_short%>% group_by(period,continent)%>% summarise(daly_adjusted=mean(daly_adjusted/100000), daly_cnmnd = mean(daly_cnmnd/100000), daly_ncds = mean(daly_ncd s/100000), daly_ivsa = mean(daly_ivsa/10000)) #Plotting for average DALY rates per capita accumulated from 1980 to 2017 ggplot(total_cont, aes(x = continent, y = daly_adjusted, fill = continent)) + geom_bar(stat = "identity") + labs(x= "Continent", y = "Overall DALYs", title = "Accumulated Average DALYs per Capita, per Continent 1980 - 2 017")
  • 24. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 14/39 Hide ggplot(total_cont, aes(x = continent, y = daly_cnmnd, fill = continent)) + geom_bar(stat = "identity")+ labs(x= "Continent", y = "Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from Comm unicable Diseases, per Continent 1980 - 2017")
  • 25. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 15/39 Hide ggplot(total_cont, aes(x = continent, y = daly_ncds, fill = continent)) + geom_bar(stat = "identity")+ labs(x= "Continent", y = "Non-Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from Non-Communicable Diseases, per Continent 1980 - 2017")
  • 26. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 16/39 Hide ggplot(total_cont, aes(x = continent, y = daly_ivsa, fill = continent)) + geom_bar(stat = "identity")+ labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Con tinent 1980 - 2017")
  • 27. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 17/39 Overall, we nd that Africa has the highest accumulated average DALY rate per capita of all countries (c 90), followed by Asia (c 50), and Oceania (c 40). The high contrast of Africa agaist the rest of the continents is mainly due to its high accumulated average for communicable diseases. In this category, Africa more than tripples the second highest continent (c 55 for Africa compared to c 17 for Asia). When it comes to non-communicable diseases and injuries, rates are fairly even. For non-communicable diseases, DALY rates range c 27 - 33 (North America being the lowest and Africa, the highest). Although with much lower DALY rates, injuriy rates range c 4 - 6 (Europe being the lowest and Africa, the highest). Consequently, communicable diseases are found to have the highest burden in the population, with Africa taking (or having taken) the highest burden. A closer look into these rates were taken to better understand its evolution throught time. 1.2.1 Communicable Diseases Hide graph1 <- total_cont %>% ggplot(aes(x=period, y=daly_cnmnd, fill=continent, text=continent)) + geom_area(alpha = 1) + theme(legend.position="none") + ggtitle(".") + theme(legend.position="none") + labs(x= "Year", y = "DALY for communicable disease", title = "Time Series Average DALYs per Capita from Communica ble Diseases per Continent") ggplotly(graph1) Time Series Average DALYs per Capita from Communicable Diseases per Continent
  • 28. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 18/39 2000 2005 0.0 0.2 0.4 0.6 0.8 Ye DALY for communicable disease
  • 29. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 19/39 As seen from the graph, Africa’s communicable DALY rate seems to be in the decline since 2008.However, this continent has been consistently ranking high over other continents which leaves room to further consider the causes and potential solutions. From the Our World in Data report, it is found that neonatal disorders are the top communicable diseases in terms of total share of burden (7.45% of all causes). It is also known that there is a strong negative correlation between GDP and DALY from communicable diseases. Similarly, a negative correlation is found between health expenditure per capita and DALY from communicable diseases. What about healthcare expenditure as percentage of GDP? Hide ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_cnmnd, color = continent))+ geom_point()+ labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Communicable Diseases", title = "Rates due to Proportion of GDP spent on Healthcare") Hide # No clear correlation yet, but interesting Hide total_short%>% select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>% ggpairs()
  • 30. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 20/39 > A higher GDP per country seems to have a signi cant negative correlation to DALY of communicable diseases. However, the proportion of GDP used for healthcare seems to have a signi cant positive correlation to DALY of communicable diseases. GDP seems to have a signi cant negative correlation to the proportion of GDP spent on healthcare. This could indicate that poorer countries have a higher likelihood of having to combat communicable diseases. Consequently, they spend a greater proportion of their GDP on healthcare than richer countries. Out of pocket expenditure is also highly negatively correlated to DALY of communicable diseases, although highly positively correlated to gdp. This leads to the interpretation that poor countries in which the population is individually responsible for investing in their medical care and are most likely to have higher DALY communicable disease rates. 1.2.2 Injuries With DALY rates for injuries and additional causes having similar rates across all continents, we decided to rst take a closer look at which types of causes were most prominent overall. Hide #This plot shows injury related DALY in a stacked bar chart. start <- total%>% group_by(continent)%>% summarise(daly_conflict_terrorism = mean(daly_conflict_terrorism/total_population), daly_self_harm = mean(daly_self _harm/total_population), daly_violence = mean(daly_violence/total_population), daly_transport_injuries = mean (daly_transport_injuries/total_population), daly_nature_forces = mean(daly_nature_forces/total_population), d aly_unintentional_injuries = mean(daly_unintentional_injuries/total_population)) pivot <- pivot_longer(start, cols=c(daly_conflict_terrorism, daly_self_harm, daly_violence,daly_transport_injuries, d aly_unintentional_injuries, daly_nature_forces), names_to = "diseases",values_to = "value") #select columns from dataset plots <- pivot %>% select(continent,diseases,value) ggplot(plots, aes(fill=diseases, y=value, x=continent)) + geom_bar(position="stack", stat="identity") + labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Conti nent 1980 - 2017")
  • 31. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 21/39 Hide #Plot on Terrorism and Violence terrorism_violence <- start %>% select(daly_conflict_terrorism, daly_violence, continent) terrorism_violence <- pivot_longer(terrorism_violence,c(daly_conflict_terrorism, daly_violence, ),names_to = "diseases",values_to = "value") #select columns from dataset terrorism_violence <- terrorism_violence%>% select(diseases,value,continent) #stacked bar chart ggplot(terrorism_violence, aes(fill=diseases, y=value, x=continent)) + geom_bar(position="stack", stat="identity") + labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Terrorism and Viole nce 1980 - 2017")
  • 32. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 22/39 Hide total_short%>% select(daly_ivsa, gdp, daly_mental_and_substance, physicians_1000, education_years)%>% ggpairs()
  • 33. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 23/39 1.2.3 Non-Communicable Diseases Hide
  • 34. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 24/39 start1 <- total%>% group_by(continent)%>% summarise(daly_cvs = mean(daly_cvs/total_population), daly_nutritional_deficiencies = mean(daly_nutritional_deficie ncies/total_population), daly_maternal_disorders = mean(daly_maternal_disorders/total_population), daly_muscu loskeletal = mean(daly_musculoskeletal/total_population), daly_other_non_communicable = mean(daly_other_non_c ommunicable/total_population), daly_neurological = mean(daly_neurological/total_population), daly_mental_and_ substance = mean(daly_mental_and_substance/total_population), daly_diabetes_urogenital_blood_endocrine = mean (daly_diabetes_urogenital_blood_endocrine/ total_population), daly_neoplasms = mean(daly_neoplasms/total_popu lation), daly_chronic_liver = mean(daly_chronic_liver/total_population)) pivot1 <- pivot_longer(start1, c(daly_cvs,daly_nutritional_deficiencies,daly_maternal_disorders,daly_musculoskeletal, daly_other_non_communicable,daly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocr ine,daly_neoplasms,daly_chronic_liver), names_to = "diseases",values_to = "value") #select columns from data set total_short_ncds <- pivot1%>% select(continent,diseases,value) #stacked bar chart # This staked bar chart shows the DALY once again for non communicable diseases but has been adjusted to show data fo r per 100000 population. Additionally the data has been colored to show the different categories of non-commu nicable diseases. #Asia has the highest DALY for non communicable diseases closely followed by Europe. There are reasons to suggest why DALY remains high in both regions. For Asia, the lack of affordability, lack of doctors, and having helathcar e not to the highest standards may all contribute towards this. Due to Europe's aging population, non-communi cable diseases are more likely to be present among its population. As seen in the graphs earlier, a path of n ations to become modern and developed, their population transitions from suffering from communicable disease towards non-communicable disease, which come with age. ggplot(total_short_ncds, aes(fill=diseases, y=value, x=continent)) + geom_bar(position="stack", stat="identity") + labs(x= "Continent", y = "Non-Comm DALYs", title = "Accumulated Average DALYs per Capita from Non-Comm, per Conti nent 1980 - 2017") Hide
  • 35. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 25/39 # Looking into CVS in more detail. ggplot(total_short, aes(x= continent, y = daly_cvs))+ geom_col()+ labs(x= "Continent", y = "Daly due to CVS related conditions", title = "DALY per capita due to CVS condition s per continent") Hide # Looking into neoplasms in more detail. ggplot(total_short, aes(x= continent, y = daly_neoplasms))+ geom_col()+ labs(x= "Continent", y = "Daly due to neoplasm", title = "DALY per capita due to neoplasms per continent")
  • 36. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 26/39 Hide # Looking into diabetes, urogenital, blood, endocrine in more detail. ggplot(total_short, aes(x= continent, y = daly_diabetes_urogenital_blood_endocrine))+ geom_col()+ labs(x= "Continent", y = "Daly due to diabetes, urogenital, blood and endocrine related conditions.", title = "DALY per capita due to diabetes, urogenital, blood and endocrine related conditions per continent")
  • 37. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 27/39 Hide ggplot(total_short, aes(x= continent, y = daly_mental_and_substance))+ geom_col()+ labs(x= "Continent", y = "Daly due to mental and substance related conditions.", title = "DALY per capita du e to mental and substance related conditions per continent")
  • 38. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 28/39 Hide ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_ncds/100000, color = continent))+ geom_point()+ labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Non- Communicable Diseases", title = "Rates due to Proportion of GDP spent on Healthcare")
  • 39. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 29/39 Hide total_short%>% select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>% ggpairs()
  • 40. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 30/39 1.3 Regression analysis Although highly complex, and with many different societal and economical variables affecting the nal DALY rates, we decided to look into certain variables that had enough data to be used for our analysis. These variables affecting both, DALY rates by cause and general DALY rates, can be divided in several categories. Diet habit variables (fruit consumption per capita per year, percentage of animal protein consumption out of total daily calories, vegetable consumption percentage of population being overweight), healthcare variables (annual healtcare expenditure, out of pocket expenditure on healthcare, healthcare per gdp, and number of physicians per 1,000 people), living habits (smoking percentages), other demographics (education years). In addition to these elements, we considered the effect of each continent separately by tranforming them into dummy variables. 1.3.0.1 Models 0 and 1 Hide #Transforming continent factors into dummy variables total=total%>% mutate(Asia=case_when(total$continent=="Asia"~1,TRUE~0))%>% mutate(Europe=case_when(total$continent=="Europe"~1,TRUE~0))%>% mutate(NorthA=case_when(total$continent=="North America"~1,TRUE~0))%>% mutate(Africa=case_when(total$continent=="Africa"~1,TRUE~0))%>% mutate(SouthA=case_when(total$continent=="South America"~1,TRUE~0)) Hide
  • 41. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 31/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate + daly_ivsa + daly_ncds + daly_cnmnd, data = total, subset = gdp) Residuals: Min 1Q Median 3Q Max -6.602e-11 -4.258e-12 -5.730e-13 2.894e-12 1.220e-10 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.867e-12 1.064e-11 -3.640e-01 0.716590 smoking_percentage -1.016e-10 1.900e-11 -5.346e+00 2.49e-07 *** percentage_overweight -6.735e-13 1.006e-13 -6.694e+00 2.26e-10 *** fruit_consumption -7.039e-14 2.012e-14 -3.499e+00 0.000579 *** vegetable_consumption -1.385e-14 2.245e-14 -6.170e-01 0.537948 animal_protein_consumption 5.153e-12 7.264e-13 7.094e+00 2.35e-11 *** education_years 1.769e-12 6.270e-13 2.821e+00 0.005277 ** physicians_1000 3.200e-12 1.696e-12 1.887e+00 0.060675 . pocket_per_cap -2.376e-14 8.464e-15 -2.807e+00 0.005506 ** healthcare_gdp_rate 3.037e-11 1.838e-11 1.652e+00 0.100074 daly_ivsa 1.000e+00 1.045e-15 9.570e+14 < 2e-16 *** daly_ncds 1.000e+00 3.813e-16 2.623e+15 < 2e-16 *** daly_cnmnd 1.000e+00 1.157e-16 8.645e+15 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.446e-11 on 195 degrees of freedom (817 observations deleted due to missingness) Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 3.025e+31 on 12 and 195 DF, p-value: < 2.2e-16 # Lm0 was created to show that daly_ivsa, daly_ncds and daly_cnmnd make up daly_adjusted. As a result, these three va riables are not included in the linear models. lm0= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + daly_ivsa + dal y_ncds + daly_cnmnd, gdp, data = total) summary(lm0) Hide lm1= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + gdp, data = tota l) summary(lm1)
  • 42. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 32/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate + gdp, data = total) Residuals: Min 1Q Median 3Q Max -25122 -6233 -812 4866 59542 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.921e+04 1.894e+03 41.814 < 2e-16 *** smoking_percentage -2.391e+04 6.139e+03 -3.894 0.000105 *** percentage_overweight -2.590e+02 3.531e+01 -7.335 4.53e-13 *** fruit_consumption -5.051e+01 8.655e+00 -5.836 7.19e-09 *** vegetable_consumption -3.828e+01 6.980e+00 -5.484 5.26e-08 *** animal_protein_consumption -1.799e+03 2.719e+02 -6.615 6.00e-11 *** education_years -1.270e+03 2.027e+02 -6.268 5.41e-10 *** physicians_1000 7.964e+02 4.986e+02 1.597 0.110538 pocket_per_cap -1.011e+01 2.508e+00 -4.031 5.96e-05 *** healthcare_gdp_rate 6.335e+03 6.803e+03 0.931 0.351968 gdp 6.325e-02 3.290e-02 1.923 0.054808 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11340 on 1014 degrees of freedom Multiple R-squared: 0.6371, Adjusted R-squared: 0.6335 F-statistic: 178 on 10 and 1014 DF, p-value: < 2.2e-16 Already from model one we reach an adjusted R-squared of 0.6335, meaning these factors can explain approximately 63 percent of general DALY’s uctuation. The variable with the highest p value was dropped sequentially for the below models. Hide lm2 = lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ fruit_consumption + gdp, data = total) summary(lm2)
  • 43. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 33/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + fruit_consumption + gdp, data = total) Residuals: Min 1Q Median 3Q Max -25123 -6213 -846 4897 59243 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 *** smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 *** percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 *** vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 *** animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 *** education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 *** physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 . pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 *** fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 *** gdp 5.507e-02 3.170e-02 1.737 0.082661 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11340 on 1015 degrees of freedom Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335 F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16 Dropping healthcare-gdp percentage makes out of pocket expenditure become signi cant. Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption + physicians_1000, data = total) Residuals: Min 1Q Median 3Q Max -25203 -6215 -447 4669 59274 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 79726.798 1410.663 56.517 < 2e-16 *** smoking_percentage -24292.378 6127.035 -3.965 7.86e-05 *** percentage_overweight -266.174 35.115 -7.580 7.76e-14 *** vegetable_consumption -40.473 6.894 -5.871 5.87e-09 *** animal_protein_consumption -1741.370 258.204 -6.744 2.58e-11 *** education_years -1222.730 201.228 -6.076 1.74e-09 *** pocket_per_cap -7.568 2.052 -3.688 0.000238 *** fruit_consumption -47.366 8.442 -5.611 2.59e-08 *** physicians_1000 859.752 484.202 1.776 0.076097 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11350 on 1016 degrees of freedom Multiple R-squared: 0.6357, Adjusted R-squared: 0.6328 F-statistic: 221.6 on 8 and 1016 DF, p-value: < 2.2e-16 1.3.0.2 Drop physicians_1000 Hide lm3=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e ducation_years+ pocket_per_cap+ fruit_consumption + physicians_1000, data = total) summary(lm3) Hide
  • 44. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 34/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption, data = total) Residuals: Min 1Q Median 3Q Max -25602 -6210 -408 4778 59363 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 78595.709 1259.974 62.379 < 2e-16 *** smoking_percentage -21927.213 5986.815 -3.663 0.000263 *** percentage_overweight -256.075 34.687 -7.382 3.23e-13 *** vegetable_consumption -37.819 6.737 -5.613 2.56e-08 *** animal_protein_consumption -1623.099 249.729 -6.499 1.26e-10 *** education_years -1107.609 190.699 -5.808 8.44e-09 *** pocket_per_cap -6.709 1.996 -3.361 0.000806 *** fruit_consumption -49.239 8.384 -5.873 5.80e-09 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11360 on 1017 degrees of freedom Multiple R-squared: 0.6346, Adjusted R-squared: 0.632 F-statistic: 252.3 on 7 and 1017 DF, p-value: < 2.2e-16 All variables are now sigi cant, leading to a model with 0.632 as its adjusted R-squared. 1.3.1 Stepwise regression& VIF exam We can also used stepwise regression to nd the optimal model.Stepwise method is more precise than dropping variables mannually since it provides the possibility of adding the dropped variables back in the future steps if it improves the model(lowers model’s AIC),and also examines the signi cance after adding or dropping variables. lm4=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e ducation_years+ pocket_per_cap+ fruit_consumption, data = total) summary(lm4) Hide fit1_step=step(lm1,direction="both")
  • 45. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 35/39 Start: AIC=19149.68 daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate + gdp Df Sum of Sq RSS AIC - healthcare_gdp_rate 1 111483768 1.3048e+11 19149 <none> 1.3036e+11 19150 - physicians_1000 1 327963566 1.3069e+11 19150 - gdp 1 475232954 1.3084e+11 19151 - smoking_percentage 1 1949782662 1.3231e+11 19163 - pocket_per_cap 1 2089322026 1.3245e+11 19164 - vegetable_consumption 1 3866128944 1.3423e+11 19178 - fruit_consumption 1 4378585222 1.3474e+11 19182 - education_years 1 5050374107 1.3541e+11 19187 - animal_protein_consumption 1 5625702511 1.3599e+11 19191 - percentage_overweight 1 6917422353 1.3728e+11 19201 Step: AIC=19148.55 daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + gdp Df Sum of Sq RSS AIC <none> 1.3048e+11 19149 - gdp 1 387923531 1.3086e+11 19150 + healthcare_gdp_rate 1 111483768 1.3036e+11 19150 - physicians_1000 1 449760026 1.3093e+11 19150 - smoking_percentage 1 1912380306 1.3239e+11 19162 - pocket_per_cap 1 2075803261 1.3255e+11 19163 - vegetable_consumption 1 4040506535 1.3452e+11 19178 - fruit_consumption 1 4418260882 1.3489e+11 19181 - education_years 1 5027016883 1.3550e+11 19185 - animal_protein_consumption 1 6245781148 1.3672e+11 19194 - percentage_overweight 1 7139008696 1.3761e+11 19201 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + gdp, data = total) Residuals: Min 1Q Median 3Q Max -25123 -6213 -846 4897 59243 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 *** smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 *** percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 *** fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 *** vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 *** animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 *** education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 *** physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 . pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 *** gdp 5.507e-02 3.170e-02 1.737 0.082661 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11340 on 1015 degrees of freedom Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335 F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16 Hide summary(fit1_step)
  • 46. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 36/39 smoking_percentage percentage_overweight 1.668205 2.913117 fruit_consumption vegetable_consumption 1.425008 1.502601 animal_protein_consumption education_years 3.110474 3.088943 physicians_1000 pocket_per_cap 3.754590 3.210832 gdp 2.999374 From the nal result we can see that six variables are signi cant with a p-value lower than 0.1. Expense and pocket_per_cap are both signi cant in this case. However, dropping one of them may lead to insigni cance of the other. This could be because these two have a joint effect on the burden of disease. We can choose from these two models according to our con dence interval. Continents were also considered as part of the model to see their effect. Call: lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption + Asia + Africa + NorthA + Europe + SouthA + Asia + Africa + NorthA + Europe + SouthA, data = total) Residuals: Min 1Q Median 3Q Max -29935 -4148 -431 3996 50866 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 66522.998 2345.279 28.365 < 2e-16 *** percentage_overweight -202.369 33.403 -6.058 1.94e-09 *** vegetable_consumption -33.863 6.185 -5.475 5.51e-08 *** animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 *** education_years -534.613 165.120 -3.238 0.00124 ** pocket_per_cap -8.669 1.678 -5.168 2.86e-07 *** fruit_consumption -40.857 6.824 -5.987 2.96e-09 *** Asia -7140.437 1807.499 -3.950 8.34e-05 *** Africa 13792.577 1918.746 7.188 1.27e-12 *** NorthA -9335.463 1794.310 -5.203 2.38e-07 *** Europe -5196.987 1650.294 -3.149 0.00169 ** SouthA -9146.724 1917.915 -4.769 2.12e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 9298 on 1013 degrees of freedom Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535 F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16 Hide vif(fit1_step) Hide fit = lm(daly_adjusted~ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ education_years+ po cket_per_cap+ fruit_consumption+Asia+Africa+NorthA+Europe+SouthA+ Asia+ Africa+ NorthA+ Europe+ SouthA, data = total) print(summary(fit)) Hide print(vif(fit))
  • 47. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 37/39 percentage_overweight vegetable_consumption 3.908936 1.772149 animal_protein_consumption education_years 2.885232 3.048594 pocket_per_cap fruit_consumption 2.136245 1.318169 Asia Africa 7.257346 6.590683 NorthA Europe 3.209642 7.556342 SouthA 2.440735 1.3.2 Interpretation on the nal model Our nal model had 11 variables Call: lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption + Asia + Africa + NorthA + Europe + SouthA + Asia + Africa + NorthA + Europe + SouthA, data = total) Residuals: Min 1Q Median 3Q Max -29935 -4148 -431 3996 50866 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 66522.998 2345.279 28.365 < 2e-16 *** percentage_overweight -202.369 33.403 -6.058 1.94e-09 *** vegetable_consumption -33.863 6.185 -5.475 5.51e-08 *** animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 *** education_years -534.613 165.120 -3.238 0.00124 ** pocket_per_cap -8.669 1.678 -5.168 2.86e-07 *** fruit_consumption -40.857 6.824 -5.987 2.96e-09 *** Asia -7140.437 1807.499 -3.950 8.34e-05 *** Africa 13792.577 1918.746 7.188 1.27e-12 *** NorthA -9335.463 1794.310 -5.203 2.38e-07 *** Europe -5196.987 1650.294 -3.149 0.00169 ** SouthA -9146.724 1917.915 -4.769 2.12e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 9298 on 1013 degrees of freedom Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535 F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16 percentage_overweight vegetable_consumption 3.908936 1.772149 animal_protein_consumption education_years 2.885232 3.048594 pocket_per_cap fruit_consumption 2.136245 1.318169 Asia Africa 7.257346 6.590683 NorthA Europe 3.209642 7.556342 SouthA 2.440735 Hide continent_fit=fit summary(continent_fit) Hide vif(continent_fit)
  • 48. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 38/39 (1) (2) (3) (4) (5) (Intercept) 79209.24 *** 80341.02 *** 79726.80 *** 78595.71 *** 66523.00 *** (1894.33)    (1452.94)    (1410.66)    (1259.97)    (2345.28)    smoking_percentage -23905.42 *** -23651.69 *** -24292.38 *** -21927.21 ***         (6138.51)    (6132.06)    (6127.03)    (5986.82)            percentage_overweight -259.02 *** -262.03 *** -266.17 *** -256.08 *** -202.37 *** (35.31)    (35.16)    (35.11)    (34.69)    (33.40)    fruit_consumption -50.51 *** -50.72 *** -47.37 *** -49.24 *** -40.86 *** (8.65)    (8.65)    (8.44)    (8.38)    (6.82)    vegetable_consumption -38.28 *** -38.93 *** -40.47 *** -37.82 *** -33.86 *** (6.98)    (6.94)    (6.89)    (6.74)    (6.18)    animal_protein_consumption -1798.59 *** -1852.18 *** -1741.37 *** -1623.10 *** -1030.84 *** (271.90)    (265.72)    (258.20)    (249.73)    (209.88)    education_years -1270.47 *** -1267.36 *** -1222.73 *** -1107.61 *** -534.61 **  (202.70)    (202.66)    (201.23)    (190.70)    (165.12)    physicians_1000 796.40     906.18     859.75                     (498.63)    (484.46)    (484.20)                    pocket_per_cap -10.11 *** -10.08 *** -7.57 *** -6.71 *** -8.67 *** (2.51)    (2.51)    (2.05)    (2.00)    (1.68)    healthcare_gdp_rate 6334.55                                     (6802.52)                                    gdp 0.06     0.06                             (0.03)    (0.03)                            Asia                                 -7140.44 ***                                 (1807.50)    Africa                                 13792.58 ***                                 (1918.75)    NorthA                                 -9335.46 ***                                 (1794.31)    Europe                                 -5196.99 **                                  (1650.29)    SouthA                                 -9146.72 ***                                 (1917.91)    N 1025        1025        1025        1025        1025        R2 0.64     0.64     0.64     0.63     0.76     logLik -11018.25     -11018.69     -11020.21     -11021.80     -10814.42     Hide huxtable::huxreg(lm1,lm2,lm3,lm4, continent_fit, number_format = "%.2f")
  • 49. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 39/39 AIC 22060.50     22059.38     22060.42     22061.60     21654.84     *** p < 0.001; ** p < 0.01; * p < 0.05. actual predicted actual 1.0000000 0.8572182 predicted 0.8572182 1.0000000 From the 5 models, continent_ t was chosen as the nal model due to having all sigini cant variables, and the highest R squared (0.76). As it can be seen from our predictions, our model is able to predict the correct DALY rates for 2013 with 85.72 percent accuracy. Hide best_model <- continent_fit #Part 2: We wanted to test the prediction efficacy of our model by ensuring that it was able to predict with a certai n level of cofidence the DALYS for the last full year of data (2013) train <- total %>% filter(period<2013) predict <- total %>% filter(period == 2013) continent_fit2 <- lm(continent_fit, data = train) final_prediction <- predict(continent_fit2, newdata = predict) ac_pred <- data.frame(cbind(actual = predict$daly_adjusted, predicted = final_prediction)) correlation_accuracy <- cor(ac_pred) correlation_accuracy