SlideShare a Scribd company logo
1 of 39
Download to read offline
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 1/39
CM30_GroupProject_SG30
Team 30
2021-02-14
1 Burden of Disease
Mortality rates are a common method used to assess a population’s health. Often used rates for such assessment
include child mortality or life expectancy. However, a focus on mortality neglects the suffering caused to people who
still live with the disease. A disease impacts, in a direct or indirect manner, the ability of living a normal life. Potential
contributions to one’s community, work, or nation, are often lost.
Our study, therefore, seeks to understand the magnitude of the burden of diseases by the different disease types, as
well as identify factors that amplify such effects.
The metric that will be used to measure disease burden is called DALY, which stands for Disability Adjusted Life Years.
This metric includes the sum of mortality and morbidity. One DALY stands for 1 year loss in good health due to either
premature death, disease, or disability.
1.1 Data import and inspection
1.1.0.1 Importing data for overall disease burden (DALY)
Rows: 48,698
Columns: 7
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ total_population_gapminder_hyde_un <dbl> …
$ continent <chr> …
$ health_expenditure_per_capita_current_us <dbl> …
$ dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate <dbl> …
Code
Hide
#source: https://ourworldindata.org/burden-of-disease
# Reading first file
daly_total <- read_csv(here::here('Data',"disease-burden-vs-health-expenditure-per-capita.csv")) %>%
clean_names()
# Checking for variable types
glimpse(daly_total)
Hide
# Changing variable names and variable types
daly_total<- daly_total %>%
mutate(
location=as.factor(entity),
period=year,
health_expenditure_per_capita=health_expenditure_per_capita_current_us,
daly_adjusted=dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate,
total_population = total_population_gapminder_hyde_un) %>%
select(location,period,daly_adjusted,health_expenditure_per_capita,total_population)
1 Burden of Disease
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 2/39
Although important as a whole, DALY rates can futher be divided into 3 sub-categories of disease cause; these being:
communicable diseases, non-communicable diseases, and injuries. We, therefore, included the datasets for each
individual subcategory below.
1.1.0.2 Adding data for burden of non-communicable diseases
Rows: 6,468
Columns: 4
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rate <dbl> …
Rows: 6,468
Columns: 3
$ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
$ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,…
$ daly_ncds <dbl> 41145.51, 40587.17, 39644.60, 39821.31, 40641.76, 40790.73,…
1.1.0.3 Adding data for burden from communicable, neonatal, maternal and nutritional diseases
Rows: 6,468
Columns: 4
$ entity
<chr> …
$ code
<chr> …
$ year
<dbl> …
$ dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex_both_age_age_stan
dardized_rate <dbl> …
Hide
#source:https://ourworldindata.org/burden-of-disease
#Reading the file
ncds <- read_csv(here::here('Data',"burden-of-disease-rates-from-ncds.csv")) %>%
clean_names()
# Checking for variable types
glimpse(ncds)
# Changing variable names and variable types
ncds<- ncds %>%
mutate(location=as.factor(entity),
period=year,
daly_ncds=dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rat
e) %>%
select(location,period,daly_ncds)
glimpse(ncds)
#Merging data frames
total <- merge(daly_total,ncds,by=c("location","period"))
Hide
#source:https://ourworldindata.org/burden-of-disease
#Reading the file
cnmnd <- read_csv(here::here('Data',"burden-of-disease-rates-from-communicable-neonatal-maternal-nutritional-disease
s.csv")) %>%
clean_names()
# Checking for variable types
glimpse(cnmnd)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 3/39
Rows: 6,468
Columns: 3
$ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghan…
$ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999…
$ daly_cnmnd <dbl> 51181.84, 47263.29, 38908.25, 36882.69, 38809.79, 38262.20…
1.1.0.4 Adding data for burden from injuries, violence, self-harm and accidents
Rows: 6,468
Columns: 4
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate <dbl> …
Rows: 6,468
Columns: 3
$ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
$ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,…
$ daly_ivsa <dbl> 11775.715, 13390.289, 12365.622, 11530.363, 13546.148, 1238…
Within each of the 3 sub-categories of disease causes, there are speci c diseases that classify as such. We included all
categories in our dataset.
1.1.0.5 Adding data for disease burden by cause (DALY by cause)
# Changing variable names and variable types
cnmnd<- cnmnd %>%
mutate(location=as.factor(entity),
period=year,
daly_cnmnd=dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex
_both_age_age_standardized_rate) %>%
select(location,period,daly_cnmnd)
glimpse(cnmnd)
Hide
#Merging data frames
total <- merge(total,cnmnd,by=c("location","period"))
Hide
#source:https://ourworldindata.org/burden-of-disease
#Reading the file
ivsa <- read_csv(here::here('Data',"burden-of-disease-rates-from-injuries.csv")) %>%
clean_names()
# Checking for variable types
glimpse(ivsa)
Hide
# Changing variable names and variable types
ivsa<- ivsa %>%
mutate(location=as.factor(entity),
period=year,
daly_ivsa=dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate) %>%
select(location,period,daly_ivsa)
glimpse(ivsa)
Hide
#Merging data frames
total <- merge(total,ivsa,by=c("location","period"))
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 4/39
Aside from the main variables, additional variables that may be contributing to the nal effect of DALY rates were
included in the dataset.
1.1.0.6 Adding data for GDP per capita
Hide
#source: https://ourworldindata.org/burden-of-disease
# Reading second file
daly_by_cause <- read_csv(here::here('Data',"burden-of-disease-by-cause.csv")) %>%
clean_names()
# Checking for variable types
#glimpse(daly_by_cause)
# Changing variable names and variable types
daly_by_cause <- daly_by_cause %>%
mutate(
location=as.factor(entity),
period=year,
daly_conflict_terrorism=dal_ys_disability_adjusted_life_years_conflict_and_terrorism_sex_both_age_all_ages_numbe
r,
daly_hiv_tuberculosis=dal_ys_disability_adjusted_life_years_hiv_aids_and_tuberculosis_sex_both_age_all_ages_numbe
r,
daly_diahrrea_respiratory=dal_ys_disability_adjusted_life_years_diarrhea_lower_respiratory_and_other_common_infec
tious_diseases_sex_both_age_all_ages_number,
daly_cvs=dal_ys_disability_adjusted_life_years_cardiovascular_diseases_sex_both_age_all_ages_number,
daly_self_harm=dal_ys_disability_adjusted_life_years_self_harm_sex_both_age_all_ages_number,
daly_violence=dal_ys_disability_adjusted_life_years_interpersonal_violence_sex_both_age_all_ages_number,
daly_nutritional_deficiencies=dal_ys_disability_adjusted_life_years_nutritional_deficiencies_sex_both_age_all_age
s_number,
daly_transport_injuries=dal_ys_disability_adjusted_life_years_transport_injuries_sex_both_age_all_ages_number,
daly_unintentional_injuries=dal_ys_disability_adjusted_life_years_unintentional_injuries_sex_both_age_all_ages_nu
mber,
daly_maternal_disorders=dal_ys_disability_adjusted_life_years_maternal_disorders_sex_both_age_all_ages_number,
daly_neonatal_disorders=dal_ys_disability_adjusted_life_years_neonatal_disorders_sex_both_age_all_ages_number,
daly_other_communicable=dal_ys_disability_adjusted_life_years_other_communicable_maternal_neonatal_and_nutritiona
l_diseases_sex_both_age_all_ages_number,
daly_nature_forces=dal_ys_disability_adjusted_life_years_exposure_to_forces_of_nature_sex_both_age_all_ages_numbe
r,
daly_chronic_respiratory=dal_ys_disability_adjusted_life_years_chronic_respiratory_diseases_sex_both_age_all_ages
_number,
daly_chronic_liver=dal_ys_disability_adjusted_life_years_cirrhosis_and_other_chronic_liver_diseases_sex_both_age_
all_ages_number,
daly_digestive=dal_ys_disability_adjusted_life_years_digestive_diseases_sex_both_age_all_ages_number,
daly_tropical_and_malaria=dal_ys_disability_adjusted_life_years_neglected_tropical_diseases_and_malaria_sex_both_
age_all_ages_number,
daly_musculoskeletal=dal_ys_disability_adjusted_life_years_musculoskeletal_disorders_sex_both_age_all_ages_numbe
r,
daly_other_non_communicable=dal_ys_disability_adjusted_life_years_other_non_communicable_diseases_sex_both_age_al
l_ages_number,
daly_neurological=dal_ys_disability_adjusted_life_years_neurological_disorders_sex_both_age_all_ages_number,
daly_mental_and_substance=dal_ys_disability_adjusted_life_years_mental_and_substance_use_disorders_sex_both_age_a
ll_ages_number,
daly_diabetes_urogenital_blood_endocrine=dal_ys_disability_adjusted_life_years_diabetes_urogenital_blood_and_endo
crine_diseases_sex_both_age_all_ages_number,
daly_neoplasms=dal_ys_disability_adjusted_life_years_neoplasms_sex_both_age_all_ages_number)%>%
select(location, period,daly_conflict_terrorism,daly_hiv_tuberculosis,daly_diahrrea_respiratory,daly_cvs,daly_self_
harm,daly_violence,daly_nutritional_deficiencies,daly_transport_injuries,daly_unintentional_injuries,daly_mat
ernal_disorders,daly_neonatal_disorders,daly_other_communicable,daly_nature_forces,daly_chronic_respiratory,d
aly_chronic_liver,daly_digestive,daly_tropical_and_malaria,daly_musculoskeletal,daly_other_non_communicable,d
aly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocrine,daly_neoplasms)
#glimpse(daly_by_cause)
# Merging dataframes
total <- merge(total,daly_by_cause,by=c("location","period"))
#We will consider taking out health expenditure per capita since it has a complete rate of 57.4% and may distort the
final data.
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 5/39
#source: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
# Reading third file
gdp <- read_csv(here::here('Data',"API_NY.GDP.PCAP.CD_DS2_en_csv_v2_1926744.csv"),skip=3) %>%
clean_names()
# Checking for variable types
glimpse(gdp)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 6/39
Rows: 264
Columns: 66
$ country_name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra"…
$ country_code <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG"…
$ indicator_name <chr> "GDP per capita (current US$)", "GDP per capita (curre…
$ indicator_code <chr> "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", …
$ x1960 <dbl> NA, 59.77319, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1807…
$ x1961 <dbl> NA, 59.86087, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1874…
$ x1962 <dbl> NA, 58.45801, NA, NA, NA, NA, NA, 1155.89017, NA, NA, …
$ x1963 <dbl> NA, 78.70639, NA, NA, NA, NA, NA, 850.30474, NA, NA, N…
$ x1964 <dbl> NA, 82.09523, NA, NA, NA, NA, NA, 1173.23821, NA, NA, …
$ x1965 <dbl> NA, 101.10830, NA, NA, NA, NA, NA, 1279.11343, NA, NA,…
$ x1966 <dbl> NA, 137.59435, NA, NA, NA, NA, NA, 1272.80298, NA, NA,…
$ x1967 <dbl> NA, 160.89859, NA, NA, NA, NA, NA, 1062.54355, NA, NA,…
$ x1968 <dbl> NA, 129.10832, NA, NA, NA, 224.87811, NA, 1141.08048, …
$ x1969 <dbl> NA, 129.32971, NA, NA, NA, 240.03563, NA, 1329.05866, …
$ x1970 <dbl> NA, 156.5189, NA, NA, 3238.5568, 262.8663, NA, 1322.59…
$ x1971 <dbl> NA, 159.56758, NA, NA, 3498.17365, 295.97104, NA, 1372…
$ x1972 <dbl> NA, 135.31731, NA, NA, 4217.17358, 343.56582, NA, 1408…
$ x1973 <dbl> NA, 143.14465, NA, NA, 5342.16856, 423.13508, NA, 2097…
$ x1974 <dbl> NA, 173.65376, NA, NA, 6319.73903, 777.56068, NA, 2844…
$ x1975 <dbl> NA, 186.5109, NA, NA, 7169.1010, 836.2083, 26847.7944,…
$ x1976 <dbl> NA, 197.4455, NA, NA, 7152.3751, 1007.1404, 30118.1378…
$ x1977 <dbl> NA, 224.2248, NA, NA, 7751.3702, 1123.1433, 33823.3196…
$ x1978 <dbl> NA, 247.3541, NA, NA, 9129.7062, 1193.7456, 28456.7374…
$ x1979 <dbl> NA, 275.7382, NA, NA, 11820.8494, 1563.7035, 33512.741…
$ x1980 <dbl> NA, 272.6553, 710.9816, NA, 12377.4116, 2052.9558, 427…
$ x1981 <dbl> NA, 264.1113, 642.3839, NA, 10372.2328, 2050.7698, 449…
$ x1982 <dbl> NA, NA, 619.9614, NA, 9610.2663, 1864.8707, 40026.1663…
$ x1983 <dbl> NA, NA, 623.4406, NA, 8022.6548, 1699.2152, 34843.1029…
$ x1984 <dbl> NA, NA, 637.7152, 639.4847, 7728.9067, 1672.2788, 3230…
$ x1985 <dbl> NA, NA, 758.2376, 639.8659, 7774.3938, 1606.7558, 2972…
$ x1986 <dbl> 6472.5020, NA, 685.2701, 693.8735, 10361.8160, 1489.84…
$ x1987 <dbl> 7885.7965, NA, 756.2619, 674.7934, 12616.1676, 1543.51…
$ x1988 <dbl> 9764.7900, NA, 792.3031, 652.7743, 14304.3570, 1476.04…
$ x1989 <dbl> 11392.4558, NA, 890.5541, 697.9956, 15166.4379, 1505.5…
$ x1990 <dbl> 12307.3117, NA, 947.7042, 617.2304, 18878.5060, 2009.4…
$ x1991 <dbl> 13496.0031, NA, 865.6927, 336.5870, 19532.5402, 1929.6…
$ x1992 <dbl> 14046.5038, NA, 656.3618, 200.8522, 20547.7118, 2027.8…
$ x1993 <dbl> 14936.8272, NA, 441.2007, 367.2792, 16516.4710, 1996.9…
$ x1994 <dbl> 16241.0465, NA, 328.6733, 586.4163, 16234.8090, 1989.4…
$ x1995 <dbl> 16439.3564, NA, 397.1795, 750.6044, 18461.0649, 2072.7…
$ x1996 <dbl> 16586.0684, NA, 522.6438, 1009.9777, 19017.1746, 2235.…
$ x1997 <dbl> 17927.7496, NA, 514.2952, 717.3806, 18353.0597, 2319.0…
$ x1998 <dbl> 19078.3432, NA, 423.5937, 813.7903, 18894.5215, 2188.9…
$ x1999 <dbl> 19356.2034, NA, 387.7843, 1033.2417, 19261.7105, 2331.…
$ x2000 <dbl> 20620.7006, NA, 556.8363, 1126.6833, 21854.2468, 2605.…
$ x2001 <dbl> 20669.0320, NA, 527.3335, 1281.6594, 22971.5355, 2506.…
$ x2002 <dbl> 20436.8871, 179.4266, 872.4945, 1425.1248, 25066.8822,…
$ x2003 <dbl> 20833.7616, 190.6838, 982.9609, 1846.1188, 32271.9639,…
$ x2004 <dbl> 22569.9750, 211.3821, 1255.5640, 2373.5798, 37969.1750…
$ x2005 <dbl> 23300.0396, 242.0313, 1902.4223, 2673.7873, 40066.2569…
$ x2006 <dbl> 24045.2725, 263.7337, 2599.5665, 2972.7433, 42675.8128…
$ x2007 <dbl> 25835.1327, 359.6932, 3121.9956, 3595.0372, 47803.6936…
$ x2008 <dbl> 27084.7037, 364.6607, 4080.9414, 4370.5401, 48718.4969…
$ x2009 <dbl> 24630.4537, 438.0760, 3122.7808, 4114.1401, 43503.1855…
$ x2010 <dbl> 23512.6026, 543.3030, 3587.8838, 4094.3503, 40852.6668…
$ x2011 <dbl> 24985.9933, 591.1628, 4615.4680, 4437.1429, 43335.3289…
$ x2012 <dbl> 24713.6980, 641.8715, 5100.0958, 4247.6300, 38686.4613…
$ x2013 <dbl> 26189.4355, 637.1655, 5254.8823, 4413.0609, 39538.7667…
$ x2014 <dbl> 26647.9381, 613.8567, 5408.4105, 4578.6320, 41303.9294…
$ x2015 <dbl> 27980.8807, 578.4664, 4166.9797, 3952.8012, 35762.5231…
$ x2016 <dbl> 28281.3505, 509.2187, 3506.0729, 4124.0557, 37474.6654…
$ x2017 <dbl> 29007.6930, 519.8848, 4095.8129, 4531.0208, 38962.8804…
$ x2018 <dbl> NA, 493.7504, 3289.6467, 5284.3802, 41793.0553, 6601.8…
$ x2019 <dbl> NA, 507.1034, 2790.7266, 5353.2449, 40886.3912, 6584.7…
$ x2020 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ x66 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 7/39
1.1.0.7 Adding data for smoking percentages
1.1.0.8 Adding data for healthcare expenditure per capita
Rows: 4,675
Columns: 4
$ entity <chr> "Afghan…
$ code <chr> "AFG", …
$ year <dbl> 2002, 2…
$ health_expenditure_per_capita_ppp_constant_2011_international <dbl> 75.9835…
# Changing variable names and variable types
gdp <- gdp %>%
gather(year, gdp,-c(country_name, country_code,indicator_name,indicator_code)) %>%
mutate(location=as.factor(country_name),
period=readr::parse_number(year)) %>%
select(location,period,gdp)
# Merging dataframes
total <- merge(total,gdp,by=c("location","period"))
#skim(total)
Hide
#source: http://ghdx.healthdata.org/record/ihme-data/gbd-2015-smoking-prevalence-1980-2015
#Reading fourth file
smoking_percentage <- read_csv(here::here('Data',"IHME_GBD_2015_SMOKING_PREVALENCE_1980_2015_Y2017M04D05.CSV")) %>%
clean_names()
# Checking for variable types
#skim(smoking_percentage)
# Changing variable names and variable types
smoking_percentage <- smoking_percentage %>%
filter(age_group_name=="Age-standardized",
metric=="Percent",
sex=="Both") %>%
mutate(location=as.factor(location_name),
period=year_id,
smoking_percentage=mean) %>%
select(location,period,smoking_percentage)
#skim(smoking_percentage)
#Merging data frames
total <- merge(total,smoking_percentage,by=c("location","period"))
Hide
#source:https://ourworldindata.org/grapher/annual-healthcare-expenditure-per-capita?tab=chart&time=1995..2014&region=
World
#Reading fifth file
healthcare_expenditure <- read_csv(here::here('Data',"annual-healthcare-expenditure-per-capita.CSV")) %>%
clean_names()
# Checking for variable types
glimpse(healthcare_expenditure)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 8/39
1.1.0.9 Adding data for percentage of population being overweight
Rows: 8,316
Columns: 4
$ entity <chr> "Afghanistan", "A…
$ code <chr> "AFG", "AFG", "AF…
$ year <dbl> 1975, 1976, 1977,…
$ prevalence_of_overweight_adults_both_sexes_who_2019 <dbl> 5.3, 5.5, 5.7, 5.…
1.1.0.10 Adding data for fruit consumption per capita
Rows: 11,028
Columns: 4
$ entity <chr> "Afg…
$ code <chr> "AFG…
$ year <dbl> 1961…
$ fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 41.1…
# Changing variable names and variable types
healthcare_expenditure <- healthcare_expenditure %>%
mutate(location=as.factor(entity),
period=year,
healthcare_expenditure=health_expenditure_per_capita_ppp_constant_2011_international) %>%
select(location,period,healthcare_expenditure)
#glimpse(healthcare_expenditure)
#Merging data frames
total <- merge(total,healthcare_expenditure,by=c("location","period"))
Hide
#source: https://ourworldindata.org/obesity
#Reading sixth file
percentage_overweight <- read_csv(here::here('Data',"share-of-adults-who-are-overweight.csv")) %>%
clean_names()
# Checking for variable types
glimpse(percentage_overweight)
Hide
# Changing variable names and variable types
percentage_overweight <- percentage_overweight %>%
mutate(location=as.factor(entity),
period=year,
percentage_overweight=prevalence_of_overweight_adults_both_sexes_who_2019) %>%
select(location,period,percentage_overweight)
#glimpse(percentage_overweight)
#Merging data frames
total <- merge(total,percentage_overweight,by=c("location","period"))
Hide
#source: https://ourworldindata.org/diet-compositions
#Reading seventh file
fruit_consumption <- read_csv(here::here('Data',"fruit-consumption-per-capita.csv")) %>%
clean_names()
# Checking for variable types
glimpse(fruit_consumption)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 9/39
1.1.0.11 Adding data for vegetable consumption per capita
Rows: 11,028
Columns: 4
$ entity <chr> "Afghanistan", …
$ code <chr> "AFG", "AFG", "…
$ year <dbl> 1961, 1962, 196…
$ vegetables_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 36.75, 37.47, 3…
1.1.0.12 Adding data for animal based foods consumption per capita
# Changing variable names and variable types
fruit_consumption <- fruit_consumption %>%
mutate(location=as.factor(entity),
period=year,
fruit_consumption=fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020) %>%
select(location,period,fruit_consumption)
#glimpse(fruit_consumption)
#Merging data frames
total <- merge(total,fruit_consumption,by=c("location","period"))
Hide
#source: https://ourworldindata.org/diet-compositions
#Reading eigth file
vegetable_consumption <- read_csv(here::here('Data',"vegetable-consumption-per-capita.csv")) %>%
clean_names()
#Checking for variable types
glimpse(vegetable_consumption)
Hide
## Changing variable names and variable types
vegetable_consumption <- vegetable_consumption %>%
mutate(location=as.factor(entity),
period=year,
vegetable_consumption=vegetables_food_supply_quantity_kg_capita_yr_fao_2020) %>%
select(location,period,vegetable_consumption)
#glimpse(vegetable_consumption)
#Merging dataframes
total <- merge(total,vegetable_consumption,by=c("location","period"))
#skim(total)
Hide
#source: https://ourworldindata.org/diet-compositions
#Reading ninth file
animal_protein_consumption <-read_csv(here::here('Data',"share-of-calories-from-animal-protein-vs-gdp-per-capita.csv"
)) %>%
clean_names()
#Checking for variable types
glimpse(animal_protein_consumption)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 10/39
Rows: 24,472
Columns: 7
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ total_population_gapminder <dbl> …
$ continent <chr> …
$ share_of_calories_from_animal_protein_fao_2017 <dbl> …
$ real_gdp_per_capita_in_2011us_2011_benchmark_maddison_project_database_2018 <dbl> …
1.1.0.13 Adding data for mean years of schooling
1.1.0.14 Adding data for physicians per 1000 people
Hide
#Changing variable names and type
animal_protein_consumption <- animal_protein_consumption %>%
mutate(location=as.factor(entity),
period=year,
animal_protein_consumption=share_of_calories_from_animal_protein_fao_2017) %>%
select(location,period,animal_protein_consumption)
#glimpse(animal_protein_consumption)
#Mergining dataframes
total <- merge(total,animal_protein_consumption,by=c("location","period"))
#glimpse(total)
Hide
#source: https://ourworldindata.org/global-education
#Reading file
education_years <- read_csv(here::here('Data',"mean-years-of-schooling-1.csv")) %>%
clean_names()
#Checking for variable types
#glimpse(education_years)
#Changing variable names and type
education_years <- education_years %>%
mutate(location=as.factor(entity),
period=year,
education_years=average_total_years_of_schooling_for_adult_population_lee_lee_2016_barro_lee_2018_and_undp_2
018) %>%
select(location,period,education_years)
#glimpse(education_years)
#Merging dataframes
total <- merge(total,education_years,by=c("location","period"))
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 11/39
1.1.0.15 Adding data for nurses per 1000 people
Rows: 1,542
Columns: 4
$ entity <chr> "Afghanistan", "Afghanistan", "A…
$ code <chr> "AFG", "AFG", "AFG", "AFG", "AFG…
$ year <dbl> 2005, 2006, 2007, 2008, 2009, 20…
$ nurses_and_midwives_per_1_000_people <dbl> 0.612000, 0.462000, 0.519000, 0.…
Nurses had too little incidences. Thus, it was not included in our nal dataset.
1.1.0.16 Adding data for out-of-pocket expenditure
Rows: 3,002
Columns: 4
$ entity <chr> …
$ code <chr> …
$ year <dbl> …
$ out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure <dbl> …
#source:https://ourworldindata.org/grapher/physicians-per-1000-people
#Reading file
physicians <- read_csv(here::here('Data',"physicians-per-1000-people.csv")) %>%
clean_names()
#Checking for variable types
#glimpse(physicians)
#Changing variable names and type
physicians <- physicians %>%
mutate(location=as.factor(entity),
period=year,
physicians_1000=physicians_per_1_000_people) %>%
select(location,period,physicians_1000)
#glimpse(physicians)
#Merging dataframes
total <- merge(total,physicians,by=c("location","period"))
Hide
#source:https://ourworldindata.org/grapher/nurses-and-midwives-per-1000-people?
#Reading file
nurses <- read_csv(here::here('Data',"nurses-and-midwives-per-1000-people.csv")) %>%
clean_names()
#Checking for variable types
glimpse(nurses)
Hide
#source:https://ourworldindata.org/grapher/out-of-pocket-expenditure-per-capita-on-healthcare
#Reading file
pocket_exp <- read_csv(here::here('Data',"out-of-pocket-expenditure-per-capita-on-healthcare.csv")) %>%
clean_names()
#Checking for variable types
glimpse(pocket_exp)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 12/39
1.1.0.17 Adding data for health protection coverage
Rows: 162
Columns: 4
$ entity <chr> "Albania", "…
$ code <chr> "ALB", "DZA"…
$ year <dbl> 2008, 2005, …
$ share_of_population_covered_by_health_insurance_ilo_2014 <dbl> 23.6, 85.2, …
Health coverage had too little incidences. Thus, it was not included in our nal dataset.
1.1.0.18 Adding data for literacy rate
Rows: 215
Columns: 4
$ entity <chr> "Afghanistan", "Albania", "Algeria", …
$ code <chr> "AFG", "ALB", "DZA", "ASM", "AND", "A…
$ year <dbl> 2000, 2011, 2006, 1980, 2011, 2011, 1…
$ literacy_rate_cia_factbook_2016 <dbl> 28.1, 96.8, 72.6, 97.0, 100.0, 70.4, …
Literacy had too little incidences. Thus, it was not included in our nal dataset.
1.1.0.19 Adding data for grouping locations into continents
Rows: 194
Columns: 2
$ continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "Africa",…
$ country <chr> "Algeria", "Angola", "Benin", "Botswana", "Burkina", "Burun…
#Changing variable names and type
pocket_exp <- pocket_exp %>%
mutate(location=as.factor(entity),
period=year,
pocket_per_cap=out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure) %>%
select(location,period,pocket_per_cap)
#Merging dataframes
total <- merge(total,pocket_exp,by=c("location","period"))
Hide
#Reading file
health_protect <- read_csv(here::here('Data',"health-protection-coverage.csv")) %>%
clean_names()
#Checking for variable types
glimpse(health_protect)
Hide
#Reading file
literacy <- read_csv(here::here('Data',"literacy-rate-by-country.csv")) %>%
clean_names()
#Checking for variable types
glimpse(literacy)
Hide
#source: https://github.com/dbouquin/IS_608/blob/master/NanosatDB_munging/Countries-Continents.csv
#Reading file
continents <- read_csv(here::here('Data',"Continents.csv")) %>%
clean_names()
#Checking for variable types
glimpse(continents)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 13/39
Rows: 194
Columns: 2
$ location <fct> Algeria, Angola, Benin, Botswana, Burkina, Burundi, Cameroo…
$ continent <fct> Africa, Africa, Africa, Africa, Africa, Africa, Africa, Afr…
1.1.0.20 Dealing with NAs
After including all potentially-relevant and signi cant variables into our dataset, an inital exploration of the data was
made.
1.2 Exploratory Data Analsys
1.2.0.1 DALY Rates per Continent
Hide
#Changing variable names and type
continents <- continents %>%
mutate(location=as.factor(country),
continent=as.factor(continent))%>%
select(location, continent)
glimpse(continents)
Hide
#Merging dataframes
total <- merge(total,continents,by=c("location"))
Hide
#Adding variables of per capita healthcare expenditure - per capita gdp
total <- total%>%
mutate(healthcare_gdp_rate = healthcare_expenditure/gdp)
#skim(total)
total <- total %>%
na.omit()
#skim(total)
Hide
#Selecting data only from 1980 - onward (to gain better insights on the recent situation)
total_short <-total %>%
filter(period>=1980)
#Re-coding DALY variables as averages per continent, per year
total_cont<-total_short%>%
group_by(period,continent)%>%
summarise(daly_adjusted=mean(daly_adjusted/100000), daly_cnmnd = mean(daly_cnmnd/100000), daly_ncds = mean(daly_ncd
s/100000), daly_ivsa = mean(daly_ivsa/10000))
#Plotting for average DALY rates per capita accumulated from 1980 to 2017
ggplot(total_cont, aes(x = continent, y = daly_adjusted, fill = continent)) +
geom_bar(stat = "identity") +
labs(x= "Continent", y = "Overall DALYs", title = "Accumulated Average DALYs per Capita, per Continent 1980 - 2
017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 14/39
Hide
ggplot(total_cont, aes(x = continent, y = daly_cnmnd, fill = continent)) +
geom_bar(stat = "identity")+
labs(x= "Continent", y = "Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from Comm
unicable Diseases, per Continent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 15/39
Hide
ggplot(total_cont, aes(x = continent, y = daly_ncds, fill = continent)) +
geom_bar(stat = "identity")+
labs(x= "Continent", y = "Non-Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from
Non-Communicable Diseases, per Continent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 16/39
Hide
ggplot(total_cont, aes(x = continent, y = daly_ivsa, fill = continent)) +
geom_bar(stat = "identity")+
labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Con
tinent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 17/39
Overall, we nd that Africa has the highest accumulated average DALY rate per capita of all countries (c 90), followed
by Asia (c 50), and Oceania (c 40). The high contrast of Africa agaist the rest of the continents is mainly due to its high
accumulated average for communicable diseases. In this category, Africa more than tripples the second highest
continent (c 55 for Africa compared to c 17 for Asia).
When it comes to non-communicable diseases and injuries, rates are fairly even. For non-communicable diseases, DALY
rates range c 27 - 33 (North America being the lowest and Africa, the highest). Although with much lower DALY rates,
injuriy rates range c 4 - 6 (Europe being the lowest and Africa, the highest).
Consequently, communicable diseases are found to have the highest burden in the population, with Africa taking (or
having taken) the highest burden. A closer look into these rates were taken to better understand its evolution throught
time.
1.2.1 Communicable Diseases
Hide
graph1 <- total_cont %>%
ggplot(aes(x=period, y=daly_cnmnd, fill=continent, text=continent)) +
geom_area(alpha = 1) +
theme(legend.position="none") +
ggtitle(".") +
theme(legend.position="none") +
labs(x= "Year", y = "DALY for communicable disease", title = "Time Series Average DALYs per Capita from Communica
ble Diseases per Continent")
ggplotly(graph1)
Time Series Average DALYs per Capita from Communicable Diseases per Continent
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 18/39
2000 2005
0.0
0.2
0.4
0.6
0.8
Ye
DALY
for
communicable
disease
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 19/39
As seen from the graph, Africa’s communicable DALY rate seems to be in the decline since 2008.However, this
continent has been consistently ranking high over other continents which leaves room to further consider the causes
and potential solutions.
From the Our World in Data report, it is found that neonatal disorders are the top communicable diseases in terms of
total share of burden (7.45% of all causes). It is also known that there is a strong negative correlation between GDP and
DALY from communicable diseases. Similarly, a negative correlation is found between health expenditure per capita
and DALY from communicable diseases.
What about healthcare expenditure as percentage of GDP?
Hide
ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_cnmnd, color = continent))+
geom_point()+
labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Communicable Diseases", title = "Rates
due to Proportion of GDP spent on Healthcare")
Hide
# No clear correlation yet, but interesting
Hide
total_short%>%
select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>%
ggpairs()
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 20/39
> A higher GDP per country seems to have a signi cant negative correlation to DALY of communicable diseases. However, the proportion of GDP used for
healthcare seems to have a signi cant positive correlation to DALY of communicable diseases. GDP seems to have a signi cant negative correlation to the
proportion of GDP spent on healthcare. This could indicate that poorer countries have a higher likelihood of having to combat communicable diseases.
Consequently, they spend a greater proportion of their GDP on healthcare than richer countries. Out of pocket expenditure is also highly negatively
correlated to DALY of communicable diseases, although highly positively correlated to gdp. This leads to the interpretation that poor countries in which the
population is individually responsible for investing in their medical care and are most likely to have higher DALY communicable disease rates.
1.2.2 Injuries
With DALY rates for injuries and additional causes having similar rates across all continents, we decided to rst take a
closer look at which types of causes were most prominent overall.
Hide
#This plot shows injury related DALY in a stacked bar chart.
start <- total%>%
group_by(continent)%>%
summarise(daly_conflict_terrorism = mean(daly_conflict_terrorism/total_population), daly_self_harm = mean(daly_self
_harm/total_population), daly_violence = mean(daly_violence/total_population), daly_transport_injuries = mean
(daly_transport_injuries/total_population), daly_nature_forces = mean(daly_nature_forces/total_population), d
aly_unintentional_injuries = mean(daly_unintentional_injuries/total_population))
pivot <- pivot_longer(start, cols=c(daly_conflict_terrorism, daly_self_harm, daly_violence,daly_transport_injuries, d
aly_unintentional_injuries, daly_nature_forces), names_to = "diseases",values_to = "value")
#select columns from dataset
plots <- pivot %>%
select(continent,diseases,value)
ggplot(plots, aes(fill=diseases, y=value, x=continent)) +
geom_bar(position="stack", stat="identity") +
labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Conti
nent 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 21/39
Hide
#Plot on Terrorism and Violence
terrorism_violence <- start %>%
select(daly_conflict_terrorism, daly_violence, continent)
terrorism_violence <- pivot_longer(terrorism_violence,c(daly_conflict_terrorism, daly_violence,
),names_to = "diseases",values_to = "value")
#select columns from dataset
terrorism_violence <- terrorism_violence%>%
select(diseases,value,continent)
#stacked bar chart
ggplot(terrorism_violence, aes(fill=diseases, y=value, x=continent)) +
geom_bar(position="stack", stat="identity") +
labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Terrorism and Viole
nce 1980 - 2017")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 22/39
Hide
total_short%>%
select(daly_ivsa, gdp, daly_mental_and_substance, physicians_1000, education_years)%>%
ggpairs()
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 23/39
1.2.3 Non-Communicable Diseases
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 24/39
start1 <- total%>%
group_by(continent)%>%
summarise(daly_cvs = mean(daly_cvs/total_population), daly_nutritional_deficiencies = mean(daly_nutritional_deficie
ncies/total_population), daly_maternal_disorders = mean(daly_maternal_disorders/total_population), daly_muscu
loskeletal = mean(daly_musculoskeletal/total_population), daly_other_non_communicable = mean(daly_other_non_c
ommunicable/total_population), daly_neurological = mean(daly_neurological/total_population), daly_mental_and_
substance = mean(daly_mental_and_substance/total_population), daly_diabetes_urogenital_blood_endocrine = mean
(daly_diabetes_urogenital_blood_endocrine/ total_population), daly_neoplasms = mean(daly_neoplasms/total_popu
lation), daly_chronic_liver = mean(daly_chronic_liver/total_population))
pivot1 <- pivot_longer(start1, c(daly_cvs,daly_nutritional_deficiencies,daly_maternal_disorders,daly_musculoskeletal,
daly_other_non_communicable,daly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocr
ine,daly_neoplasms,daly_chronic_liver), names_to = "diseases",values_to = "value")
#select columns from data set
total_short_ncds <- pivot1%>%
select(continent,diseases,value)
#stacked bar chart
# This staked bar chart shows the DALY once again for non communicable diseases but has been adjusted to show data fo
r per 100000 population. Additionally the data has been colored to show the different categories of non-commu
nicable diseases.
#Asia has the highest DALY for non communicable diseases closely followed by Europe. There are reasons to suggest why
DALY remains high in both regions. For Asia, the lack of affordability, lack of doctors, and having helathcar
e not to the highest standards may all contribute towards this. Due to Europe's aging population, non-communi
cable diseases are more likely to be present among its population. As seen in the graphs earlier, a path of n
ations to become modern and developed, their population transitions from suffering from communicable disease
towards non-communicable disease, which come with age.
ggplot(total_short_ncds, aes(fill=diseases, y=value, x=continent)) +
geom_bar(position="stack", stat="identity") +
labs(x= "Continent", y = "Non-Comm DALYs", title = "Accumulated Average DALYs per Capita from Non-Comm, per Conti
nent 1980 - 2017")
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 25/39
# Looking into CVS in more detail.
ggplot(total_short, aes(x= continent, y = daly_cvs))+
geom_col()+
labs(x= "Continent", y = "Daly due to CVS related conditions", title = "DALY per capita due to CVS condition
s per continent")
Hide
# Looking into neoplasms in more detail.
ggplot(total_short, aes(x= continent, y = daly_neoplasms))+
geom_col()+
labs(x= "Continent", y = "Daly due to neoplasm", title = "DALY per capita due to neoplasms per continent")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 26/39
Hide
# Looking into diabetes, urogenital, blood, endocrine in more detail.
ggplot(total_short, aes(x= continent, y = daly_diabetes_urogenital_blood_endocrine))+
geom_col()+
labs(x= "Continent", y = "Daly due to diabetes, urogenital, blood and endocrine related conditions.", title
= "DALY per capita due to diabetes, urogenital, blood and endocrine related conditions per continent")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 27/39
Hide
ggplot(total_short, aes(x= continent, y = daly_mental_and_substance))+
geom_col()+
labs(x= "Continent", y = "Daly due to mental and substance related conditions.", title = "DALY per capita du
e to mental and substance related conditions per continent")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 28/39
Hide
ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_ncds/100000, color = continent))+
geom_point()+
labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Non- Communicable Diseases", title =
"Rates due to Proportion of GDP spent on Healthcare")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 29/39
Hide
total_short%>%
select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>%
ggpairs()
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 30/39
1.3 Regression analysis
Although highly complex, and with many different societal and economical variables affecting the nal DALY rates, we
decided to look into certain variables that had enough data to be used for our analysis. These variables affecting both,
DALY rates by cause and general DALY rates, can be divided in several categories.
Diet habit variables (fruit consumption per capita per year, percentage of animal protein consumption out of total daily
calories, vegetable consumption percentage of population being overweight), healthcare variables (annual healtcare
expenditure, out of pocket expenditure on healthcare, healthcare per gdp, and number of physicians per 1,000 people),
living habits (smoking percentages), other demographics (education years).
In addition to these elements, we considered the effect of each continent separately by tranforming them into dummy
variables.
1.3.0.1 Models 0 and 1
Hide
#Transforming continent factors into dummy variables
total=total%>%
mutate(Asia=case_when(total$continent=="Asia"~1,TRUE~0))%>%
mutate(Europe=case_when(total$continent=="Europe"~1,TRUE~0))%>%
mutate(NorthA=case_when(total$continent=="North America"~1,TRUE~0))%>%
mutate(Africa=case_when(total$continent=="Africa"~1,TRUE~0))%>%
mutate(SouthA=case_when(total$continent=="South America"~1,TRUE~0))
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 31/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate +
daly_ivsa + daly_ncds + daly_cnmnd, data = total, subset = gdp)
Residuals:
Min 1Q Median 3Q Max
-6.602e-11 -4.258e-12 -5.730e-13 2.894e-12 1.220e-10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.867e-12 1.064e-11 -3.640e-01 0.716590
smoking_percentage -1.016e-10 1.900e-11 -5.346e+00 2.49e-07 ***
percentage_overweight -6.735e-13 1.006e-13 -6.694e+00 2.26e-10 ***
fruit_consumption -7.039e-14 2.012e-14 -3.499e+00 0.000579 ***
vegetable_consumption -1.385e-14 2.245e-14 -6.170e-01 0.537948
animal_protein_consumption 5.153e-12 7.264e-13 7.094e+00 2.35e-11 ***
education_years 1.769e-12 6.270e-13 2.821e+00 0.005277 **
physicians_1000 3.200e-12 1.696e-12 1.887e+00 0.060675 .
pocket_per_cap -2.376e-14 8.464e-15 -2.807e+00 0.005506 **
healthcare_gdp_rate 3.037e-11 1.838e-11 1.652e+00 0.100074
daly_ivsa 1.000e+00 1.045e-15 9.570e+14 < 2e-16 ***
daly_ncds 1.000e+00 3.813e-16 2.623e+15 < 2e-16 ***
daly_cnmnd 1.000e+00 1.157e-16 8.645e+15 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.446e-11 on 195 degrees of freedom
(817 observations deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.025e+31 on 12 and 195 DF, p-value: < 2.2e-16
# Lm0 was created to show that daly_ivsa, daly_ncds and daly_cnmnd make up daly_adjusted. As a result, these three va
riables are not included in the linear models.
lm0= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p
rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + daly_ivsa + dal
y_ncds + daly_cnmnd, gdp, data = total)
summary(lm0)
Hide
lm1= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p
rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + gdp, data = tota
l)
summary(lm1)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 32/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate +
gdp, data = total)
Residuals:
Min 1Q Median 3Q Max
-25122 -6233 -812 4866 59542
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.921e+04 1.894e+03 41.814 < 2e-16 ***
smoking_percentage -2.391e+04 6.139e+03 -3.894 0.000105 ***
percentage_overweight -2.590e+02 3.531e+01 -7.335 4.53e-13 ***
fruit_consumption -5.051e+01 8.655e+00 -5.836 7.19e-09 ***
vegetable_consumption -3.828e+01 6.980e+00 -5.484 5.26e-08 ***
animal_protein_consumption -1.799e+03 2.719e+02 -6.615 6.00e-11 ***
education_years -1.270e+03 2.027e+02 -6.268 5.41e-10 ***
physicians_1000 7.964e+02 4.986e+02 1.597 0.110538
pocket_per_cap -1.011e+01 2.508e+00 -4.031 5.96e-05 ***
healthcare_gdp_rate 6.335e+03 6.803e+03 0.931 0.351968
gdp 6.325e-02 3.290e-02 1.923 0.054808 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11340 on 1014 degrees of freedom
Multiple R-squared: 0.6371, Adjusted R-squared: 0.6335
F-statistic: 178 on 10 and 1014 DF, p-value: < 2.2e-16
Already from model one we reach an adjusted R-squared of 0.6335, meaning these factors can explain approximately
63 percent of general DALY’s uctuation. The variable with the highest p value was dropped sequentially for the below
models.
Hide
lm2 = lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+
education_years+ physicians_1000+ pocket_per_cap+ fruit_consumption + gdp, data = total)
summary(lm2)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 33/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
vegetable_consumption + animal_protein_consumption + education_years +
physicians_1000 + pocket_per_cap + fruit_consumption + gdp,
data = total)
Residuals:
Min 1Q Median 3Q Max
-25123 -6213 -846 4897 59243
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 ***
smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 ***
percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 ***
vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 ***
animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 ***
education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 ***
physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 .
pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 ***
fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 ***
gdp 5.507e-02 3.170e-02 1.737 0.082661 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11340 on 1015 degrees of freedom
Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335
F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16
Dropping healthcare-gdp percentage makes out of pocket expenditure become signi cant.
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
vegetable_consumption + animal_protein_consumption + education_years +
pocket_per_cap + fruit_consumption + physicians_1000, data = total)
Residuals:
Min 1Q Median 3Q Max
-25203 -6215 -447 4669 59274
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79726.798 1410.663 56.517 < 2e-16 ***
smoking_percentage -24292.378 6127.035 -3.965 7.86e-05 ***
percentage_overweight -266.174 35.115 -7.580 7.76e-14 ***
vegetable_consumption -40.473 6.894 -5.871 5.87e-09 ***
animal_protein_consumption -1741.370 258.204 -6.744 2.58e-11 ***
education_years -1222.730 201.228 -6.076 1.74e-09 ***
pocket_per_cap -7.568 2.052 -3.688 0.000238 ***
fruit_consumption -47.366 8.442 -5.611 2.59e-08 ***
physicians_1000 859.752 484.202 1.776 0.076097 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11350 on 1016 degrees of freedom
Multiple R-squared: 0.6357, Adjusted R-squared: 0.6328
F-statistic: 221.6 on 8 and 1016 DF, p-value: < 2.2e-16
1.3.0.2 Drop physicians_1000
Hide
lm3=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e
ducation_years+ pocket_per_cap+ fruit_consumption + physicians_1000, data = total)
summary(lm3)
Hide
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 34/39
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
vegetable_consumption + animal_protein_consumption + education_years +
pocket_per_cap + fruit_consumption, data = total)
Residuals:
Min 1Q Median 3Q Max
-25602 -6210 -408 4778 59363
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 78595.709 1259.974 62.379 < 2e-16 ***
smoking_percentage -21927.213 5986.815 -3.663 0.000263 ***
percentage_overweight -256.075 34.687 -7.382 3.23e-13 ***
vegetable_consumption -37.819 6.737 -5.613 2.56e-08 ***
animal_protein_consumption -1623.099 249.729 -6.499 1.26e-10 ***
education_years -1107.609 190.699 -5.808 8.44e-09 ***
pocket_per_cap -6.709 1.996 -3.361 0.000806 ***
fruit_consumption -49.239 8.384 -5.873 5.80e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11360 on 1017 degrees of freedom
Multiple R-squared: 0.6346, Adjusted R-squared: 0.632
F-statistic: 252.3 on 7 and 1017 DF, p-value: < 2.2e-16
All variables are now sigi cant, leading to a model with 0.632 as its adjusted R-squared.
1.3.1 Stepwise regression& VIF exam
We can also used stepwise regression to nd the optimal model.Stepwise method is more precise than dropping
variables mannually since it provides the possibility of adding the dropped variables back in the future steps if it
improves the model(lowers model’s AIC),and also examines the signi cance after adding or dropping variables.
lm4=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e
ducation_years+ pocket_per_cap+ fruit_consumption, data = total)
summary(lm4)
Hide
fit1_step=step(lm1,direction="both")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 35/39
Start: AIC=19149.68
daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate +
gdp
Df Sum of Sq RSS AIC
- healthcare_gdp_rate 1 111483768 1.3048e+11 19149
<none> 1.3036e+11 19150
- physicians_1000 1 327963566 1.3069e+11 19150
- gdp 1 475232954 1.3084e+11 19151
- smoking_percentage 1 1949782662 1.3231e+11 19163
- pocket_per_cap 1 2089322026 1.3245e+11 19164
- vegetable_consumption 1 3866128944 1.3423e+11 19178
- fruit_consumption 1 4378585222 1.3474e+11 19182
- education_years 1 5050374107 1.3541e+11 19187
- animal_protein_consumption 1 5625702511 1.3599e+11 19191
- percentage_overweight 1 6917422353 1.3728e+11 19201
Step: AIC=19148.55
daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + gdp
Df Sum of Sq RSS AIC
<none> 1.3048e+11 19149
- gdp 1 387923531 1.3086e+11 19150
+ healthcare_gdp_rate 1 111483768 1.3036e+11 19150
- physicians_1000 1 449760026 1.3093e+11 19150
- smoking_percentage 1 1912380306 1.3239e+11 19162
- pocket_per_cap 1 2075803261 1.3255e+11 19163
- vegetable_consumption 1 4040506535 1.3452e+11 19178
- fruit_consumption 1 4418260882 1.3489e+11 19181
- education_years 1 5027016883 1.3550e+11 19185
- animal_protein_consumption 1 6245781148 1.3672e+11 19194
- percentage_overweight 1 7139008696 1.3761e+11 19201
Call:
lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight +
fruit_consumption + vegetable_consumption + animal_protein_consumption +
education_years + physicians_1000 + pocket_per_cap + gdp,
data = total)
Residuals:
Min 1Q Median 3Q Max
-25123 -6213 -846 4897 59243
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 ***
smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 ***
percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 ***
fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 ***
vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 ***
animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 ***
education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 ***
physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 .
pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 ***
gdp 5.507e-02 3.170e-02 1.737 0.082661 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11340 on 1015 degrees of freedom
Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335
F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16
Hide
summary(fit1_step)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 36/39
smoking_percentage percentage_overweight
1.668205 2.913117
fruit_consumption vegetable_consumption
1.425008 1.502601
animal_protein_consumption education_years
3.110474 3.088943
physicians_1000 pocket_per_cap
3.754590 3.210832
gdp
2.999374
From the nal result we can see that six variables are signi cant with a p-value lower than 0.1. Expense and
pocket_per_cap are both signi cant in this case. However, dropping one of them may lead to insigni cance of the other.
This could be because these two have a joint effect on the burden of disease. We can choose from these two models
according to our con dence interval.
Continents were also considered as part of the model to see their effect.
Call:
lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption +
animal_protein_consumption + education_years + pocket_per_cap +
fruit_consumption + Asia + Africa + NorthA + Europe + SouthA +
Asia + Africa + NorthA + Europe + SouthA, data = total)
Residuals:
Min 1Q Median 3Q Max
-29935 -4148 -431 3996 50866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66522.998 2345.279 28.365 < 2e-16 ***
percentage_overweight -202.369 33.403 -6.058 1.94e-09 ***
vegetable_consumption -33.863 6.185 -5.475 5.51e-08 ***
animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 ***
education_years -534.613 165.120 -3.238 0.00124 **
pocket_per_cap -8.669 1.678 -5.168 2.86e-07 ***
fruit_consumption -40.857 6.824 -5.987 2.96e-09 ***
Asia -7140.437 1807.499 -3.950 8.34e-05 ***
Africa 13792.577 1918.746 7.188 1.27e-12 ***
NorthA -9335.463 1794.310 -5.203 2.38e-07 ***
Europe -5196.987 1650.294 -3.149 0.00169 **
SouthA -9146.724 1917.915 -4.769 2.12e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9298 on 1013 degrees of freedom
Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535
F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16
Hide
vif(fit1_step)
Hide
fit = lm(daly_adjusted~ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ education_years+ po
cket_per_cap+ fruit_consumption+Asia+Africa+NorthA+Europe+SouthA+ Asia+ Africa+ NorthA+ Europe+ SouthA, data
= total)
print(summary(fit))
Hide
print(vif(fit))
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 37/39
percentage_overweight vegetable_consumption
3.908936 1.772149
animal_protein_consumption education_years
2.885232 3.048594
pocket_per_cap fruit_consumption
2.136245 1.318169
Asia Africa
7.257346 6.590683
NorthA Europe
3.209642 7.556342
SouthA
2.440735
1.3.2 Interpretation on the nal model
Our nal model had 11 variables
Call:
lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption +
animal_protein_consumption + education_years + pocket_per_cap +
fruit_consumption + Asia + Africa + NorthA + Europe + SouthA +
Asia + Africa + NorthA + Europe + SouthA, data = total)
Residuals:
Min 1Q Median 3Q Max
-29935 -4148 -431 3996 50866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66522.998 2345.279 28.365 < 2e-16 ***
percentage_overweight -202.369 33.403 -6.058 1.94e-09 ***
vegetable_consumption -33.863 6.185 -5.475 5.51e-08 ***
animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 ***
education_years -534.613 165.120 -3.238 0.00124 **
pocket_per_cap -8.669 1.678 -5.168 2.86e-07 ***
fruit_consumption -40.857 6.824 -5.987 2.96e-09 ***
Asia -7140.437 1807.499 -3.950 8.34e-05 ***
Africa 13792.577 1918.746 7.188 1.27e-12 ***
NorthA -9335.463 1794.310 -5.203 2.38e-07 ***
Europe -5196.987 1650.294 -3.149 0.00169 **
SouthA -9146.724 1917.915 -4.769 2.12e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9298 on 1013 degrees of freedom
Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535
F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16
percentage_overweight vegetable_consumption
3.908936 1.772149
animal_protein_consumption education_years
2.885232 3.048594
pocket_per_cap fruit_consumption
2.136245 1.318169
Asia Africa
7.257346 6.590683
NorthA Europe
3.209642 7.556342
SouthA
2.440735
Hide
continent_fit=fit
summary(continent_fit)
Hide
vif(continent_fit)
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 38/39
(1) (2) (3) (4) (5)
(Intercept) 79209.24 *** 80341.02 *** 79726.80 *** 78595.71 *** 66523.00 ***
(1894.33)    (1452.94)    (1410.66)    (1259.97)    (2345.28)   
smoking_percentage -23905.42 *** -23651.69 *** -24292.38 *** -21927.21 ***        
(6138.51)    (6132.06)    (6127.03)    (5986.82)           
percentage_overweight -259.02 *** -262.03 *** -266.17 *** -256.08 *** -202.37 ***
(35.31)    (35.16)    (35.11)    (34.69)    (33.40)   
fruit_consumption -50.51 *** -50.72 *** -47.37 *** -49.24 *** -40.86 ***
(8.65)    (8.65)    (8.44)    (8.38)    (6.82)   
vegetable_consumption -38.28 *** -38.93 *** -40.47 *** -37.82 *** -33.86 ***
(6.98)    (6.94)    (6.89)    (6.74)    (6.18)   
animal_protein_consumption -1798.59 *** -1852.18 *** -1741.37 *** -1623.10 *** -1030.84 ***
(271.90)    (265.72)    (258.20)    (249.73)    (209.88)   
education_years -1270.47 *** -1267.36 *** -1222.73 *** -1107.61 *** -534.61 ** 
(202.70)    (202.66)    (201.23)    (190.70)    (165.12)   
physicians_1000 796.40     906.18     859.75                    
(498.63)    (484.46)    (484.20)                   
pocket_per_cap -10.11 *** -10.08 *** -7.57 *** -6.71 *** -8.67 ***
(2.51)    (2.51)    (2.05)    (2.00)    (1.68)   
healthcare_gdp_rate 6334.55                                    
(6802.52)                                   
gdp 0.06     0.06                            
(0.03)    (0.03)                           
Asia                                 -7140.44 ***
                                (1807.50)   
Africa                                 13792.58 ***
                                (1918.75)   
NorthA                                 -9335.46 ***
                                (1794.31)   
Europe                                 -5196.99 ** 
                                (1650.29)   
SouthA                                 -9146.72 ***
                                (1917.91)   
N 1025        1025        1025        1025        1025       
R2 0.64     0.64     0.64     0.63     0.76    
logLik -11018.25     -11018.69     -11020.21     -11021.80     -10814.42    
Hide
huxtable::huxreg(lm1,lm2,lm3,lm4, continent_fit,
number_format = "%.2f")
22/02/2021 CM30_GroupProject_SG30
file:///Users/Aman/Downloads/The Burden of Disease Code.html 39/39
AIC 22060.50     22059.38     22060.42     22061.60     21654.84    
*** p < 0.001; ** p < 0.01; * p < 0.05.
actual predicted
actual 1.0000000 0.8572182
predicted 0.8572182 1.0000000
From the 5 models, continent_ t was chosen as the nal model due to having all sigini cant variables, and the highest R
squared (0.76). As it can be seen from our predictions, our model is able to predict the correct DALY rates for 2013 with
85.72 percent accuracy.
Hide
best_model <- continent_fit
#Part 2: We wanted to test the prediction efficacy of our model by ensuring that it was able to predict with a certai
n level of cofidence the DALYS for the last full year of data (2013)
train <- total %>%
filter(period<2013)
predict <- total %>%
filter(period == 2013)
continent_fit2 <- lm(continent_fit, data = train)
final_prediction <- predict(continent_fit2, newdata = predict)
ac_pred <- data.frame(cbind(actual = predict$daly_adjusted, predicted = final_prediction))
correlation_accuracy <- cor(ac_pred)
correlation_accuracy

More Related Content

What's hot

Socio economic differentials in health care seeking behaviour and out-of-pock...
Socio economic differentials in health care seeking behaviour and out-of-pock...Socio economic differentials in health care seeking behaviour and out-of-pock...
Socio economic differentials in health care seeking behaviour and out-of-pock...Alexander Decker
 
Catastrophic health expenditure and poverty and Malawi by Martina Rhino Mchenga
Catastrophic health expenditure and poverty and Malawi by Martina Rhino MchengaCatastrophic health expenditure and poverty and Malawi by Martina Rhino Mchenga
Catastrophic health expenditure and poverty and Malawi by Martina Rhino MchengaIFPRIMaSSP
 
Addressing health equity &amp; the risk in providing care
Addressing health equity &amp; the risk in providing careAddressing health equity &amp; the risk in providing care
Addressing health equity &amp; the risk in providing careEvan Osborne
 
Zarone,_Jordan_MPH_Essay(edited)
Zarone,_Jordan_MPH_Essay(edited)Zarone,_Jordan_MPH_Essay(edited)
Zarone,_Jordan_MPH_Essay(edited)Jordan Zarone
 
Utah’s All Payer Claims Dataset: A vital resource for health reform
Utah’s All Payer Claims Dataset: A  vital resource for health reformUtah’s All Payer Claims Dataset: A  vital resource for health reform
Utah’s All Payer Claims Dataset: A vital resource for health reformState of Utah, Salt Lake City
 
AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...
AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...
AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...AIDSTAROne
 
Cancer Mortality in WI Presentation
Cancer Mortality in WI PresentationCancer Mortality in WI Presentation
Cancer Mortality in WI PresentationCallie Fohrman
 
Catastrohpic out-of-pocket payment for health care and its impact on househol...
Catastrohpic out-of-pocket payment for health care and its impact on househol...Catastrohpic out-of-pocket payment for health care and its impact on househol...
Catastrohpic out-of-pocket payment for health care and its impact on househol...Jeff Knezovich
 
3 dimensions of care for diabetes June 2015
3 dimensions of care for diabetes June 20153 dimensions of care for diabetes June 2015
3 dimensions of care for diabetes June 2015NHS Improving Quality
 
Using system effects modelling to evaluate food safety impact and barriers in...
Using system effects modelling to evaluate food safety impact and barriers in...Using system effects modelling to evaluate food safety impact and barriers in...
Using system effects modelling to evaluate food safety impact and barriers in...ILRI
 
Policy Brief 2, David Dingus
Policy Brief 2, David Dingus Policy Brief 2, David Dingus
Policy Brief 2, David Dingus David J Dingus
 
Where is the C in HSS Framework? PAHO/WHO Perspective_ Del Riego
Where is the C in HSS Framework? PAHO/WHO Perspective_ Del RiegoWhere is the C in HSS Framework? PAHO/WHO Perspective_ Del Riego
Where is the C in HSS Framework? PAHO/WHO Perspective_ Del RiegoCORE Group
 
Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide
   Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide   Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide
Noncommunicable diseases (NCDs) account for 71% of the deaths worldwideΔρ. Γιώργος K. Κασάπης
 
2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea
2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea
2002 Tmih V07 P1001 Editorial On Aids Crisis & Ceawvdamme
 
WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...
WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...
WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...Economic Research Forum
 
Relationship between Health Care System Setup and Adherence To Tuberculosis T...
Relationship between Health Care System Setup and Adherence To Tuberculosis T...Relationship between Health Care System Setup and Adherence To Tuberculosis T...
Relationship between Health Care System Setup and Adherence To Tuberculosis T...QUESTJOURNAL
 

What's hot (19)

Socio economic differentials in health care seeking behaviour and out-of-pock...
Socio economic differentials in health care seeking behaviour and out-of-pock...Socio economic differentials in health care seeking behaviour and out-of-pock...
Socio economic differentials in health care seeking behaviour and out-of-pock...
 
Catastrophic health expenditure and poverty and Malawi by Martina Rhino Mchenga
Catastrophic health expenditure and poverty and Malawi by Martina Rhino MchengaCatastrophic health expenditure and poverty and Malawi by Martina Rhino Mchenga
Catastrophic health expenditure and poverty and Malawi by Martina Rhino Mchenga
 
Addressing health equity &amp; the risk in providing care
Addressing health equity &amp; the risk in providing careAddressing health equity &amp; the risk in providing care
Addressing health equity &amp; the risk in providing care
 
Zarone,_Jordan_MPH_Essay(edited)
Zarone,_Jordan_MPH_Essay(edited)Zarone,_Jordan_MPH_Essay(edited)
Zarone,_Jordan_MPH_Essay(edited)
 
Utah’s All Payer Claims Dataset: A vital resource for health reform
Utah’s All Payer Claims Dataset: A  vital resource for health reformUtah’s All Payer Claims Dataset: A  vital resource for health reform
Utah’s All Payer Claims Dataset: A vital resource for health reform
 
AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...
AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...
AIDSTAR-One Issue Paper: The Debilitating Cycle of HIV, Food Insecurity, and ...
 
Cancer Mortality in WI Presentation
Cancer Mortality in WI PresentationCancer Mortality in WI Presentation
Cancer Mortality in WI Presentation
 
Catastrohpic out-of-pocket payment for health care and its impact on househol...
Catastrohpic out-of-pocket payment for health care and its impact on househol...Catastrohpic out-of-pocket payment for health care and its impact on househol...
Catastrohpic out-of-pocket payment for health care and its impact on househol...
 
3 dimensions of care for diabetes June 2015
3 dimensions of care for diabetes June 20153 dimensions of care for diabetes June 2015
3 dimensions of care for diabetes June 2015
 
Jadhav et al 2015
Jadhav et al 2015Jadhav et al 2015
Jadhav et al 2015
 
Using system effects modelling to evaluate food safety impact and barriers in...
Using system effects modelling to evaluate food safety impact and barriers in...Using system effects modelling to evaluate food safety impact and barriers in...
Using system effects modelling to evaluate food safety impact and barriers in...
 
Policy Brief 2, David Dingus
Policy Brief 2, David Dingus Policy Brief 2, David Dingus
Policy Brief 2, David Dingus
 
Where is the C in HSS Framework? PAHO/WHO Perspective_ Del Riego
Where is the C in HSS Framework? PAHO/WHO Perspective_ Del RiegoWhere is the C in HSS Framework? PAHO/WHO Perspective_ Del Riego
Where is the C in HSS Framework? PAHO/WHO Perspective_ Del Riego
 
Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide
   Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide   Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide
Noncommunicable diseases (NCDs) account for 71% of the deaths worldwide
 
Comorbilidades
ComorbilidadesComorbilidades
Comorbilidades
 
2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea
2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea
2002 Tmih V07 P1001 Editorial On Aids Crisis & Cea
 
WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...
WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...
WHO PAYS MORE: Public, Private, Both or None? The Effects of Health Insuranc...
 
Relationship between Health Care System Setup and Adherence To Tuberculosis T...
Relationship between Health Care System Setup and Adherence To Tuberculosis T...Relationship between Health Care System Setup and Adherence To Tuberculosis T...
Relationship between Health Care System Setup and Adherence To Tuberculosis T...
 
A02520108
A02520108A02520108
A02520108
 

Similar to The Burden of Disease: Data analysis, interpretation and linear regression

Epidemiology.pptx
Epidemiology.pptxEpidemiology.pptx
Epidemiology.pptxDeepakRx1
 
The Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regressionThe Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regressionAmanDesai8
 
Making sense of injury data
Making sense of injury dataMaking sense of injury data
Making sense of injury databronwen_bg
 
The Socioeconomic Effects of Disability
The Socioeconomic Effects of DisabilityThe Socioeconomic Effects of Disability
The Socioeconomic Effects of DisabilityMatthew Malinsky
 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
 
Traditional Investment Appraisal Techniques Can Not Cope...
Traditional Investment Appraisal Techniques Can Not Cope...Traditional Investment Appraisal Techniques Can Not Cope...
Traditional Investment Appraisal Techniques Can Not Cope...Tammy Lacy
 
DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...
DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...
DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...The Scan Foundation
 
Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...
Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...
Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...Global Medical Cures™
 
Global regional and national lancet sept. 2015
Global regional and national  lancet sept. 2015Global regional and national  lancet sept. 2015
Global regional and national lancet sept. 2015Univ. of Tripoli
 
2nd Report - Major Diagnostic Categories, 2014.pdf
2nd Report - Major Diagnostic Categories, 2014.pdf2nd Report - Major Diagnostic Categories, 2014.pdf
2nd Report - Major Diagnostic Categories, 2014.pdfmaa77
 
Based on the Reading of Chapter 12, design a Microsoft PowerPoint .docx
Based on the Reading of Chapter 12, design a Microsoft PowerPoint .docxBased on the Reading of Chapter 12, design a Microsoft PowerPoint .docx
Based on the Reading of Chapter 12, design a Microsoft PowerPoint .docxikirkton
 
Burden of Disease Analysis
Burden of Disease AnalysisBurden of Disease Analysis
Burden of Disease Analysissourav goswami
 
SAC360 Chapter 5 epidemiologic principles and methods
SAC360 Chapter 5 epidemiologic principles and methodsSAC360 Chapter 5 epidemiologic principles and methods
SAC360 Chapter 5 epidemiologic principles and methodsBealCollegeOnline
 
Cost utility analysis
Cost utility analysisCost utility analysis
Cost utility analysisARUNAYESUDAS
 
Analysis for the global burden of disease study 2016 lancet 2017
Analysis for the global burden of disease study 2016   lancet 2017Analysis for the global burden of disease study 2016   lancet 2017
Analysis for the global burden of disease study 2016 lancet 2017Luis Sales
 
SDH and Basic Measurments in Epid.22 (1).pdf
SDH and Basic Measurments in Epid.22 (1).pdfSDH and Basic Measurments in Epid.22 (1).pdf
SDH and Basic Measurments in Epid.22 (1).pdfRiyadu
 
HISIM2—CBO’s New Health Insurance Simulation Model
HISIM2—CBO’s New Health Insurance Simulation ModelHISIM2—CBO’s New Health Insurance Simulation Model
HISIM2—CBO’s New Health Insurance Simulation ModelCongressional Budget Office
 
Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine Dr.Farhana Yasmin
 

Similar to The Burden of Disease: Data analysis, interpretation and linear regression (20)

Epidemiology.pptx
Epidemiology.pptxEpidemiology.pptx
Epidemiology.pptx
 
The Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regressionThe Burden of Disease: Data analysis, interpretation and linear regression
The Burden of Disease: Data analysis, interpretation and linear regression
 
Making sense of injury data
Making sense of injury dataMaking sense of injury data
Making sense of injury data
 
Measures Of Morbidity
Measures Of MorbidityMeasures Of Morbidity
Measures Of Morbidity
 
The Socioeconomic Effects of Disability
The Socioeconomic Effects of DisabilityThe Socioeconomic Effects of Disability
The Socioeconomic Effects of Disability
 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
 
Traditional Investment Appraisal Techniques Can Not Cope...
Traditional Investment Appraisal Techniques Can Not Cope...Traditional Investment Appraisal Techniques Can Not Cope...
Traditional Investment Appraisal Techniques Can Not Cope...
 
DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...
DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...
DataBrief No. 26: Medicaid Managed Care and Long-Term Services and Supports F...
 
Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...
Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...
Global Medical Cures™ | Medicare Payments- How Much Do Chronic Conditions Mat...
 
Bus_Rationale[1]
Bus_Rationale[1]Bus_Rationale[1]
Bus_Rationale[1]
 
Global regional and national lancet sept. 2015
Global regional and national  lancet sept. 2015Global regional and national  lancet sept. 2015
Global regional and national lancet sept. 2015
 
2nd Report - Major Diagnostic Categories, 2014.pdf
2nd Report - Major Diagnostic Categories, 2014.pdf2nd Report - Major Diagnostic Categories, 2014.pdf
2nd Report - Major Diagnostic Categories, 2014.pdf
 
Based on the Reading of Chapter 12, design a Microsoft PowerPoint .docx
Based on the Reading of Chapter 12, design a Microsoft PowerPoint .docxBased on the Reading of Chapter 12, design a Microsoft PowerPoint .docx
Based on the Reading of Chapter 12, design a Microsoft PowerPoint .docx
 
Burden of Disease Analysis
Burden of Disease AnalysisBurden of Disease Analysis
Burden of Disease Analysis
 
SAC360 Chapter 5 epidemiologic principles and methods
SAC360 Chapter 5 epidemiologic principles and methodsSAC360 Chapter 5 epidemiologic principles and methods
SAC360 Chapter 5 epidemiologic principles and methods
 
Cost utility analysis
Cost utility analysisCost utility analysis
Cost utility analysis
 
Analysis for the global burden of disease study 2016 lancet 2017
Analysis for the global burden of disease study 2016   lancet 2017Analysis for the global burden of disease study 2016   lancet 2017
Analysis for the global burden of disease study 2016 lancet 2017
 
SDH and Basic Measurments in Epid.22 (1).pdf
SDH and Basic Measurments in Epid.22 (1).pdfSDH and Basic Measurments in Epid.22 (1).pdf
SDH and Basic Measurments in Epid.22 (1).pdf
 
HISIM2—CBO’s New Health Insurance Simulation Model
HISIM2—CBO’s New Health Insurance Simulation ModelHISIM2—CBO’s New Health Insurance Simulation Model
HISIM2—CBO’s New Health Insurance Simulation Model
 
Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine
 

Recently uploaded

Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedKaiNexus
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Marketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet CreationsMarketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet Creationsnakalysalcedo61
 
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCRsoniya singh
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfCatalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfOrient Homes
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756dollysharma2066
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfmuskan1121w
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
A.I. Bot Summit 3 Opening Keynote - Perry Belcher
A.I. Bot Summit 3 Opening Keynote - Perry BelcherA.I. Bot Summit 3 Opening Keynote - Perry Belcher
A.I. Bot Summit 3 Opening Keynote - Perry BelcherPerry Belcher
 
Investment analysis and portfolio management
Investment analysis and portfolio managementInvestment analysis and portfolio management
Investment analysis and portfolio managementJunaidKhan750825
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckHajeJanKamps
 
CATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDF
CATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDFCATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDF
CATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDFOrient Homes
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewasmakika9823
 

Recently uploaded (20)

Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Marketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet CreationsMarketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet Creations
 
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfCatalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdf
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
A.I. Bot Summit 3 Opening Keynote - Perry Belcher
A.I. Bot Summit 3 Opening Keynote - Perry BelcherA.I. Bot Summit 3 Opening Keynote - Perry Belcher
A.I. Bot Summit 3 Opening Keynote - Perry Belcher
 
Investment analysis and portfolio management
Investment analysis and portfolio managementInvestment analysis and portfolio management
Investment analysis and portfolio management
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
 
CATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDF
CATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDFCATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDF
CATALOG cáp điện Goldcup (bảng giá) 1.4.2024.PDF
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
 

The Burden of Disease: Data analysis, interpretation and linear regression

  • 1. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 1/39 CM30_GroupProject_SG30 Team 30 2021-02-14 1 Burden of Disease Mortality rates are a common method used to assess a population’s health. Often used rates for such assessment include child mortality or life expectancy. However, a focus on mortality neglects the suffering caused to people who still live with the disease. A disease impacts, in a direct or indirect manner, the ability of living a normal life. Potential contributions to one’s community, work, or nation, are often lost. Our study, therefore, seeks to understand the magnitude of the burden of diseases by the different disease types, as well as identify factors that amplify such effects. The metric that will be used to measure disease burden is called DALY, which stands for Disability Adjusted Life Years. This metric includes the sum of mortality and morbidity. One DALY stands for 1 year loss in good health due to either premature death, disease, or disability. 1.1 Data import and inspection 1.1.0.1 Importing data for overall disease burden (DALY) Rows: 48,698 Columns: 7 $ entity <chr> … $ code <chr> … $ year <dbl> … $ total_population_gapminder_hyde_un <dbl> … $ continent <chr> … $ health_expenditure_per_capita_current_us <dbl> … $ dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate <dbl> … Code Hide #source: https://ourworldindata.org/burden-of-disease # Reading first file daly_total <- read_csv(here::here('Data',"disease-burden-vs-health-expenditure-per-capita.csv")) %>% clean_names() # Checking for variable types glimpse(daly_total) Hide # Changing variable names and variable types daly_total<- daly_total %>% mutate( location=as.factor(entity), period=year, health_expenditure_per_capita=health_expenditure_per_capita_current_us, daly_adjusted=dal_ys_disability_adjusted_life_years_all_causes_sex_both_age_age_standardized_rate, total_population = total_population_gapminder_hyde_un) %>% select(location,period,daly_adjusted,health_expenditure_per_capita,total_population) 1 Burden of Disease
  • 2. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 2/39 Although important as a whole, DALY rates can futher be divided into 3 sub-categories of disease cause; these being: communicable diseases, non-communicable diseases, and injuries. We, therefore, included the datasets for each individual subcategory below. 1.1.0.2 Adding data for burden of non-communicable diseases Rows: 6,468 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rate <dbl> … Rows: 6,468 Columns: 3 $ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani… $ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,… $ daly_ncds <dbl> 41145.51, 40587.17, 39644.60, 39821.31, 40641.76, 40790.73,… 1.1.0.3 Adding data for burden from communicable, neonatal, maternal and nutritional diseases Rows: 6,468 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex_both_age_age_stan dardized_rate <dbl> … Hide #source:https://ourworldindata.org/burden-of-disease #Reading the file ncds <- read_csv(here::here('Data',"burden-of-disease-rates-from-ncds.csv")) %>% clean_names() # Checking for variable types glimpse(ncds) # Changing variable names and variable types ncds<- ncds %>% mutate(location=as.factor(entity), period=year, daly_ncds=dal_ys_disability_adjusted_life_years_non_communicable_diseases_sex_both_age_age_standardized_rat e) %>% select(location,period,daly_ncds) glimpse(ncds) #Merging data frames total <- merge(daly_total,ncds,by=c("location","period")) Hide #source:https://ourworldindata.org/burden-of-disease #Reading the file cnmnd <- read_csv(here::here('Data',"burden-of-disease-rates-from-communicable-neonatal-maternal-nutritional-disease s.csv")) %>% clean_names() # Checking for variable types glimpse(cnmnd) Hide
  • 3. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 3/39 Rows: 6,468 Columns: 3 $ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghan… $ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999… $ daly_cnmnd <dbl> 51181.84, 47263.29, 38908.25, 36882.69, 38809.79, 38262.20… 1.1.0.4 Adding data for burden from injuries, violence, self-harm and accidents Rows: 6,468 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate <dbl> … Rows: 6,468 Columns: 3 $ location <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani… $ period <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,… $ daly_ivsa <dbl> 11775.715, 13390.289, 12365.622, 11530.363, 13546.148, 1238… Within each of the 3 sub-categories of disease causes, there are speci c diseases that classify as such. We included all categories in our dataset. 1.1.0.5 Adding data for disease burden by cause (DALY by cause) # Changing variable names and variable types cnmnd<- cnmnd %>% mutate(location=as.factor(entity), period=year, daly_cnmnd=dal_ys_disability_adjusted_life_years_communicable_maternal_neonatal_and_nutritional_diseases_sex _both_age_age_standardized_rate) %>% select(location,period,daly_cnmnd) glimpse(cnmnd) Hide #Merging data frames total <- merge(total,cnmnd,by=c("location","period")) Hide #source:https://ourworldindata.org/burden-of-disease #Reading the file ivsa <- read_csv(here::here('Data',"burden-of-disease-rates-from-injuries.csv")) %>% clean_names() # Checking for variable types glimpse(ivsa) Hide # Changing variable names and variable types ivsa<- ivsa %>% mutate(location=as.factor(entity), period=year, daly_ivsa=dal_ys_disability_adjusted_life_years_injuries_sex_both_age_age_standardized_rate) %>% select(location,period,daly_ivsa) glimpse(ivsa) Hide #Merging data frames total <- merge(total,ivsa,by=c("location","period"))
  • 4. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 4/39 Aside from the main variables, additional variables that may be contributing to the nal effect of DALY rates were included in the dataset. 1.1.0.6 Adding data for GDP per capita Hide #source: https://ourworldindata.org/burden-of-disease # Reading second file daly_by_cause <- read_csv(here::here('Data',"burden-of-disease-by-cause.csv")) %>% clean_names() # Checking for variable types #glimpse(daly_by_cause) # Changing variable names and variable types daly_by_cause <- daly_by_cause %>% mutate( location=as.factor(entity), period=year, daly_conflict_terrorism=dal_ys_disability_adjusted_life_years_conflict_and_terrorism_sex_both_age_all_ages_numbe r, daly_hiv_tuberculosis=dal_ys_disability_adjusted_life_years_hiv_aids_and_tuberculosis_sex_both_age_all_ages_numbe r, daly_diahrrea_respiratory=dal_ys_disability_adjusted_life_years_diarrhea_lower_respiratory_and_other_common_infec tious_diseases_sex_both_age_all_ages_number, daly_cvs=dal_ys_disability_adjusted_life_years_cardiovascular_diseases_sex_both_age_all_ages_number, daly_self_harm=dal_ys_disability_adjusted_life_years_self_harm_sex_both_age_all_ages_number, daly_violence=dal_ys_disability_adjusted_life_years_interpersonal_violence_sex_both_age_all_ages_number, daly_nutritional_deficiencies=dal_ys_disability_adjusted_life_years_nutritional_deficiencies_sex_both_age_all_age s_number, daly_transport_injuries=dal_ys_disability_adjusted_life_years_transport_injuries_sex_both_age_all_ages_number, daly_unintentional_injuries=dal_ys_disability_adjusted_life_years_unintentional_injuries_sex_both_age_all_ages_nu mber, daly_maternal_disorders=dal_ys_disability_adjusted_life_years_maternal_disorders_sex_both_age_all_ages_number, daly_neonatal_disorders=dal_ys_disability_adjusted_life_years_neonatal_disorders_sex_both_age_all_ages_number, daly_other_communicable=dal_ys_disability_adjusted_life_years_other_communicable_maternal_neonatal_and_nutritiona l_diseases_sex_both_age_all_ages_number, daly_nature_forces=dal_ys_disability_adjusted_life_years_exposure_to_forces_of_nature_sex_both_age_all_ages_numbe r, daly_chronic_respiratory=dal_ys_disability_adjusted_life_years_chronic_respiratory_diseases_sex_both_age_all_ages _number, daly_chronic_liver=dal_ys_disability_adjusted_life_years_cirrhosis_and_other_chronic_liver_diseases_sex_both_age_ all_ages_number, daly_digestive=dal_ys_disability_adjusted_life_years_digestive_diseases_sex_both_age_all_ages_number, daly_tropical_and_malaria=dal_ys_disability_adjusted_life_years_neglected_tropical_diseases_and_malaria_sex_both_ age_all_ages_number, daly_musculoskeletal=dal_ys_disability_adjusted_life_years_musculoskeletal_disorders_sex_both_age_all_ages_numbe r, daly_other_non_communicable=dal_ys_disability_adjusted_life_years_other_non_communicable_diseases_sex_both_age_al l_ages_number, daly_neurological=dal_ys_disability_adjusted_life_years_neurological_disorders_sex_both_age_all_ages_number, daly_mental_and_substance=dal_ys_disability_adjusted_life_years_mental_and_substance_use_disorders_sex_both_age_a ll_ages_number, daly_diabetes_urogenital_blood_endocrine=dal_ys_disability_adjusted_life_years_diabetes_urogenital_blood_and_endo crine_diseases_sex_both_age_all_ages_number, daly_neoplasms=dal_ys_disability_adjusted_life_years_neoplasms_sex_both_age_all_ages_number)%>% select(location, period,daly_conflict_terrorism,daly_hiv_tuberculosis,daly_diahrrea_respiratory,daly_cvs,daly_self_ harm,daly_violence,daly_nutritional_deficiencies,daly_transport_injuries,daly_unintentional_injuries,daly_mat ernal_disorders,daly_neonatal_disorders,daly_other_communicable,daly_nature_forces,daly_chronic_respiratory,d aly_chronic_liver,daly_digestive,daly_tropical_and_malaria,daly_musculoskeletal,daly_other_non_communicable,d aly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocrine,daly_neoplasms) #glimpse(daly_by_cause) # Merging dataframes total <- merge(total,daly_by_cause,by=c("location","period")) #We will consider taking out health expenditure per capita since it has a complete rate of 57.4% and may distort the final data. Hide
  • 5. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 5/39 #source: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD # Reading third file gdp <- read_csv(here::here('Data',"API_NY.GDP.PCAP.CD_DS2_en_csv_v2_1926744.csv"),skip=3) %>% clean_names() # Checking for variable types glimpse(gdp)
  • 6. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 6/39 Rows: 264 Columns: 66 $ country_name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra"… $ country_code <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG"… $ indicator_name <chr> "GDP per capita (current US$)", "GDP per capita (curre… $ indicator_code <chr> "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", "NY.GDP.PCAP.CD", … $ x1960 <dbl> NA, 59.77319, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1807… $ x1961 <dbl> NA, 59.86087, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1874… $ x1962 <dbl> NA, 58.45801, NA, NA, NA, NA, NA, 1155.89017, NA, NA, … $ x1963 <dbl> NA, 78.70639, NA, NA, NA, NA, NA, 850.30474, NA, NA, N… $ x1964 <dbl> NA, 82.09523, NA, NA, NA, NA, NA, 1173.23821, NA, NA, … $ x1965 <dbl> NA, 101.10830, NA, NA, NA, NA, NA, 1279.11343, NA, NA,… $ x1966 <dbl> NA, 137.59435, NA, NA, NA, NA, NA, 1272.80298, NA, NA,… $ x1967 <dbl> NA, 160.89859, NA, NA, NA, NA, NA, 1062.54355, NA, NA,… $ x1968 <dbl> NA, 129.10832, NA, NA, NA, 224.87811, NA, 1141.08048, … $ x1969 <dbl> NA, 129.32971, NA, NA, NA, 240.03563, NA, 1329.05866, … $ x1970 <dbl> NA, 156.5189, NA, NA, 3238.5568, 262.8663, NA, 1322.59… $ x1971 <dbl> NA, 159.56758, NA, NA, 3498.17365, 295.97104, NA, 1372… $ x1972 <dbl> NA, 135.31731, NA, NA, 4217.17358, 343.56582, NA, 1408… $ x1973 <dbl> NA, 143.14465, NA, NA, 5342.16856, 423.13508, NA, 2097… $ x1974 <dbl> NA, 173.65376, NA, NA, 6319.73903, 777.56068, NA, 2844… $ x1975 <dbl> NA, 186.5109, NA, NA, 7169.1010, 836.2083, 26847.7944,… $ x1976 <dbl> NA, 197.4455, NA, NA, 7152.3751, 1007.1404, 30118.1378… $ x1977 <dbl> NA, 224.2248, NA, NA, 7751.3702, 1123.1433, 33823.3196… $ x1978 <dbl> NA, 247.3541, NA, NA, 9129.7062, 1193.7456, 28456.7374… $ x1979 <dbl> NA, 275.7382, NA, NA, 11820.8494, 1563.7035, 33512.741… $ x1980 <dbl> NA, 272.6553, 710.9816, NA, 12377.4116, 2052.9558, 427… $ x1981 <dbl> NA, 264.1113, 642.3839, NA, 10372.2328, 2050.7698, 449… $ x1982 <dbl> NA, NA, 619.9614, NA, 9610.2663, 1864.8707, 40026.1663… $ x1983 <dbl> NA, NA, 623.4406, NA, 8022.6548, 1699.2152, 34843.1029… $ x1984 <dbl> NA, NA, 637.7152, 639.4847, 7728.9067, 1672.2788, 3230… $ x1985 <dbl> NA, NA, 758.2376, 639.8659, 7774.3938, 1606.7558, 2972… $ x1986 <dbl> 6472.5020, NA, 685.2701, 693.8735, 10361.8160, 1489.84… $ x1987 <dbl> 7885.7965, NA, 756.2619, 674.7934, 12616.1676, 1543.51… $ x1988 <dbl> 9764.7900, NA, 792.3031, 652.7743, 14304.3570, 1476.04… $ x1989 <dbl> 11392.4558, NA, 890.5541, 697.9956, 15166.4379, 1505.5… $ x1990 <dbl> 12307.3117, NA, 947.7042, 617.2304, 18878.5060, 2009.4… $ x1991 <dbl> 13496.0031, NA, 865.6927, 336.5870, 19532.5402, 1929.6… $ x1992 <dbl> 14046.5038, NA, 656.3618, 200.8522, 20547.7118, 2027.8… $ x1993 <dbl> 14936.8272, NA, 441.2007, 367.2792, 16516.4710, 1996.9… $ x1994 <dbl> 16241.0465, NA, 328.6733, 586.4163, 16234.8090, 1989.4… $ x1995 <dbl> 16439.3564, NA, 397.1795, 750.6044, 18461.0649, 2072.7… $ x1996 <dbl> 16586.0684, NA, 522.6438, 1009.9777, 19017.1746, 2235.… $ x1997 <dbl> 17927.7496, NA, 514.2952, 717.3806, 18353.0597, 2319.0… $ x1998 <dbl> 19078.3432, NA, 423.5937, 813.7903, 18894.5215, 2188.9… $ x1999 <dbl> 19356.2034, NA, 387.7843, 1033.2417, 19261.7105, 2331.… $ x2000 <dbl> 20620.7006, NA, 556.8363, 1126.6833, 21854.2468, 2605.… $ x2001 <dbl> 20669.0320, NA, 527.3335, 1281.6594, 22971.5355, 2506.… $ x2002 <dbl> 20436.8871, 179.4266, 872.4945, 1425.1248, 25066.8822,… $ x2003 <dbl> 20833.7616, 190.6838, 982.9609, 1846.1188, 32271.9639,… $ x2004 <dbl> 22569.9750, 211.3821, 1255.5640, 2373.5798, 37969.1750… $ x2005 <dbl> 23300.0396, 242.0313, 1902.4223, 2673.7873, 40066.2569… $ x2006 <dbl> 24045.2725, 263.7337, 2599.5665, 2972.7433, 42675.8128… $ x2007 <dbl> 25835.1327, 359.6932, 3121.9956, 3595.0372, 47803.6936… $ x2008 <dbl> 27084.7037, 364.6607, 4080.9414, 4370.5401, 48718.4969… $ x2009 <dbl> 24630.4537, 438.0760, 3122.7808, 4114.1401, 43503.1855… $ x2010 <dbl> 23512.6026, 543.3030, 3587.8838, 4094.3503, 40852.6668… $ x2011 <dbl> 24985.9933, 591.1628, 4615.4680, 4437.1429, 43335.3289… $ x2012 <dbl> 24713.6980, 641.8715, 5100.0958, 4247.6300, 38686.4613… $ x2013 <dbl> 26189.4355, 637.1655, 5254.8823, 4413.0609, 39538.7667… $ x2014 <dbl> 26647.9381, 613.8567, 5408.4105, 4578.6320, 41303.9294… $ x2015 <dbl> 27980.8807, 578.4664, 4166.9797, 3952.8012, 35762.5231… $ x2016 <dbl> 28281.3505, 509.2187, 3506.0729, 4124.0557, 37474.6654… $ x2017 <dbl> 29007.6930, 519.8848, 4095.8129, 4531.0208, 38962.8804… $ x2018 <dbl> NA, 493.7504, 3289.6467, 5284.3802, 41793.0553, 6601.8… $ x2019 <dbl> NA, 507.1034, 2790.7266, 5353.2449, 40886.3912, 6584.7… $ x2020 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… $ x66 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… Hide
  • 7. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 7/39 1.1.0.7 Adding data for smoking percentages 1.1.0.8 Adding data for healthcare expenditure per capita Rows: 4,675 Columns: 4 $ entity <chr> "Afghan… $ code <chr> "AFG", … $ year <dbl> 2002, 2… $ health_expenditure_per_capita_ppp_constant_2011_international <dbl> 75.9835… # Changing variable names and variable types gdp <- gdp %>% gather(year, gdp,-c(country_name, country_code,indicator_name,indicator_code)) %>% mutate(location=as.factor(country_name), period=readr::parse_number(year)) %>% select(location,period,gdp) # Merging dataframes total <- merge(total,gdp,by=c("location","period")) #skim(total) Hide #source: http://ghdx.healthdata.org/record/ihme-data/gbd-2015-smoking-prevalence-1980-2015 #Reading fourth file smoking_percentage <- read_csv(here::here('Data',"IHME_GBD_2015_SMOKING_PREVALENCE_1980_2015_Y2017M04D05.CSV")) %>% clean_names() # Checking for variable types #skim(smoking_percentage) # Changing variable names and variable types smoking_percentage <- smoking_percentage %>% filter(age_group_name=="Age-standardized", metric=="Percent", sex=="Both") %>% mutate(location=as.factor(location_name), period=year_id, smoking_percentage=mean) %>% select(location,period,smoking_percentage) #skim(smoking_percentage) #Merging data frames total <- merge(total,smoking_percentage,by=c("location","period")) Hide #source:https://ourworldindata.org/grapher/annual-healthcare-expenditure-per-capita?tab=chart&time=1995..2014&region= World #Reading fifth file healthcare_expenditure <- read_csv(here::here('Data',"annual-healthcare-expenditure-per-capita.CSV")) %>% clean_names() # Checking for variable types glimpse(healthcare_expenditure) Hide
  • 8. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 8/39 1.1.0.9 Adding data for percentage of population being overweight Rows: 8,316 Columns: 4 $ entity <chr> "Afghanistan", "A… $ code <chr> "AFG", "AFG", "AF… $ year <dbl> 1975, 1976, 1977,… $ prevalence_of_overweight_adults_both_sexes_who_2019 <dbl> 5.3, 5.5, 5.7, 5.… 1.1.0.10 Adding data for fruit consumption per capita Rows: 11,028 Columns: 4 $ entity <chr> "Afg… $ code <chr> "AFG… $ year <dbl> 1961… $ fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 41.1… # Changing variable names and variable types healthcare_expenditure <- healthcare_expenditure %>% mutate(location=as.factor(entity), period=year, healthcare_expenditure=health_expenditure_per_capita_ppp_constant_2011_international) %>% select(location,period,healthcare_expenditure) #glimpse(healthcare_expenditure) #Merging data frames total <- merge(total,healthcare_expenditure,by=c("location","period")) Hide #source: https://ourworldindata.org/obesity #Reading sixth file percentage_overweight <- read_csv(here::here('Data',"share-of-adults-who-are-overweight.csv")) %>% clean_names() # Checking for variable types glimpse(percentage_overweight) Hide # Changing variable names and variable types percentage_overweight <- percentage_overweight %>% mutate(location=as.factor(entity), period=year, percentage_overweight=prevalence_of_overweight_adults_both_sexes_who_2019) %>% select(location,period,percentage_overweight) #glimpse(percentage_overweight) #Merging data frames total <- merge(total,percentage_overweight,by=c("location","period")) Hide #source: https://ourworldindata.org/diet-compositions #Reading seventh file fruit_consumption <- read_csv(here::here('Data',"fruit-consumption-per-capita.csv")) %>% clean_names() # Checking for variable types glimpse(fruit_consumption) Hide
  • 9. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 9/39 1.1.0.11 Adding data for vegetable consumption per capita Rows: 11,028 Columns: 4 $ entity <chr> "Afghanistan", … $ code <chr> "AFG", "AFG", "… $ year <dbl> 1961, 1962, 196… $ vegetables_food_supply_quantity_kg_capita_yr_fao_2020 <dbl> 36.75, 37.47, 3… 1.1.0.12 Adding data for animal based foods consumption per capita # Changing variable names and variable types fruit_consumption <- fruit_consumption %>% mutate(location=as.factor(entity), period=year, fruit_consumption=fruits_excluding_wine_food_supply_quantity_kg_capita_yr_fao_2020) %>% select(location,period,fruit_consumption) #glimpse(fruit_consumption) #Merging data frames total <- merge(total,fruit_consumption,by=c("location","period")) Hide #source: https://ourworldindata.org/diet-compositions #Reading eigth file vegetable_consumption <- read_csv(here::here('Data',"vegetable-consumption-per-capita.csv")) %>% clean_names() #Checking for variable types glimpse(vegetable_consumption) Hide ## Changing variable names and variable types vegetable_consumption <- vegetable_consumption %>% mutate(location=as.factor(entity), period=year, vegetable_consumption=vegetables_food_supply_quantity_kg_capita_yr_fao_2020) %>% select(location,period,vegetable_consumption) #glimpse(vegetable_consumption) #Merging dataframes total <- merge(total,vegetable_consumption,by=c("location","period")) #skim(total) Hide #source: https://ourworldindata.org/diet-compositions #Reading ninth file animal_protein_consumption <-read_csv(here::here('Data',"share-of-calories-from-animal-protein-vs-gdp-per-capita.csv" )) %>% clean_names() #Checking for variable types glimpse(animal_protein_consumption)
  • 10. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 10/39 Rows: 24,472 Columns: 7 $ entity <chr> … $ code <chr> … $ year <dbl> … $ total_population_gapminder <dbl> … $ continent <chr> … $ share_of_calories_from_animal_protein_fao_2017 <dbl> … $ real_gdp_per_capita_in_2011us_2011_benchmark_maddison_project_database_2018 <dbl> … 1.1.0.13 Adding data for mean years of schooling 1.1.0.14 Adding data for physicians per 1000 people Hide #Changing variable names and type animal_protein_consumption <- animal_protein_consumption %>% mutate(location=as.factor(entity), period=year, animal_protein_consumption=share_of_calories_from_animal_protein_fao_2017) %>% select(location,period,animal_protein_consumption) #glimpse(animal_protein_consumption) #Mergining dataframes total <- merge(total,animal_protein_consumption,by=c("location","period")) #glimpse(total) Hide #source: https://ourworldindata.org/global-education #Reading file education_years <- read_csv(here::here('Data',"mean-years-of-schooling-1.csv")) %>% clean_names() #Checking for variable types #glimpse(education_years) #Changing variable names and type education_years <- education_years %>% mutate(location=as.factor(entity), period=year, education_years=average_total_years_of_schooling_for_adult_population_lee_lee_2016_barro_lee_2018_and_undp_2 018) %>% select(location,period,education_years) #glimpse(education_years) #Merging dataframes total <- merge(total,education_years,by=c("location","period")) Hide
  • 11. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 11/39 1.1.0.15 Adding data for nurses per 1000 people Rows: 1,542 Columns: 4 $ entity <chr> "Afghanistan", "Afghanistan", "A… $ code <chr> "AFG", "AFG", "AFG", "AFG", "AFG… $ year <dbl> 2005, 2006, 2007, 2008, 2009, 20… $ nurses_and_midwives_per_1_000_people <dbl> 0.612000, 0.462000, 0.519000, 0.… Nurses had too little incidences. Thus, it was not included in our nal dataset. 1.1.0.16 Adding data for out-of-pocket expenditure Rows: 3,002 Columns: 4 $ entity <chr> … $ code <chr> … $ year <dbl> … $ out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure <dbl> … #source:https://ourworldindata.org/grapher/physicians-per-1000-people #Reading file physicians <- read_csv(here::here('Data',"physicians-per-1000-people.csv")) %>% clean_names() #Checking for variable types #glimpse(physicians) #Changing variable names and type physicians <- physicians %>% mutate(location=as.factor(entity), period=year, physicians_1000=physicians_per_1_000_people) %>% select(location,period,physicians_1000) #glimpse(physicians) #Merging dataframes total <- merge(total,physicians,by=c("location","period")) Hide #source:https://ourworldindata.org/grapher/nurses-and-midwives-per-1000-people? #Reading file nurses <- read_csv(here::here('Data',"nurses-and-midwives-per-1000-people.csv")) %>% clean_names() #Checking for variable types glimpse(nurses) Hide #source:https://ourworldindata.org/grapher/out-of-pocket-expenditure-per-capita-on-healthcare #Reading file pocket_exp <- read_csv(here::here('Data',"out-of-pocket-expenditure-per-capita-on-healthcare.csv")) %>% clean_names() #Checking for variable types glimpse(pocket_exp) Hide
  • 12. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 12/39 1.1.0.17 Adding data for health protection coverage Rows: 162 Columns: 4 $ entity <chr> "Albania", "… $ code <chr> "ALB", "DZA"… $ year <dbl> 2008, 2005, … $ share_of_population_covered_by_health_insurance_ilo_2014 <dbl> 23.6, 85.2, … Health coverage had too little incidences. Thus, it was not included in our nal dataset. 1.1.0.18 Adding data for literacy rate Rows: 215 Columns: 4 $ entity <chr> "Afghanistan", "Albania", "Algeria", … $ code <chr> "AFG", "ALB", "DZA", "ASM", "AND", "A… $ year <dbl> 2000, 2011, 2006, 1980, 2011, 2011, 1… $ literacy_rate_cia_factbook_2016 <dbl> 28.1, 96.8, 72.6, 97.0, 100.0, 70.4, … Literacy had too little incidences. Thus, it was not included in our nal dataset. 1.1.0.19 Adding data for grouping locations into continents Rows: 194 Columns: 2 $ continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "Africa",… $ country <chr> "Algeria", "Angola", "Benin", "Botswana", "Burkina", "Burun… #Changing variable names and type pocket_exp <- pocket_exp %>% mutate(location=as.factor(entity), period=year, pocket_per_cap=out_of_pocket_expenditure_per_capita_on_healthcare_ppp_usd_who_global_health_expenditure) %>% select(location,period,pocket_per_cap) #Merging dataframes total <- merge(total,pocket_exp,by=c("location","period")) Hide #Reading file health_protect <- read_csv(here::here('Data',"health-protection-coverage.csv")) %>% clean_names() #Checking for variable types glimpse(health_protect) Hide #Reading file literacy <- read_csv(here::here('Data',"literacy-rate-by-country.csv")) %>% clean_names() #Checking for variable types glimpse(literacy) Hide #source: https://github.com/dbouquin/IS_608/blob/master/NanosatDB_munging/Countries-Continents.csv #Reading file continents <- read_csv(here::here('Data',"Continents.csv")) %>% clean_names() #Checking for variable types glimpse(continents)
  • 13. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 13/39 Rows: 194 Columns: 2 $ location <fct> Algeria, Angola, Benin, Botswana, Burkina, Burundi, Cameroo… $ continent <fct> Africa, Africa, Africa, Africa, Africa, Africa, Africa, Afr… 1.1.0.20 Dealing with NAs After including all potentially-relevant and signi cant variables into our dataset, an inital exploration of the data was made. 1.2 Exploratory Data Analsys 1.2.0.1 DALY Rates per Continent Hide #Changing variable names and type continents <- continents %>% mutate(location=as.factor(country), continent=as.factor(continent))%>% select(location, continent) glimpse(continents) Hide #Merging dataframes total <- merge(total,continents,by=c("location")) Hide #Adding variables of per capita healthcare expenditure - per capita gdp total <- total%>% mutate(healthcare_gdp_rate = healthcare_expenditure/gdp) #skim(total) total <- total %>% na.omit() #skim(total) Hide #Selecting data only from 1980 - onward (to gain better insights on the recent situation) total_short <-total %>% filter(period>=1980) #Re-coding DALY variables as averages per continent, per year total_cont<-total_short%>% group_by(period,continent)%>% summarise(daly_adjusted=mean(daly_adjusted/100000), daly_cnmnd = mean(daly_cnmnd/100000), daly_ncds = mean(daly_ncd s/100000), daly_ivsa = mean(daly_ivsa/10000)) #Plotting for average DALY rates per capita accumulated from 1980 to 2017 ggplot(total_cont, aes(x = continent, y = daly_adjusted, fill = continent)) + geom_bar(stat = "identity") + labs(x= "Continent", y = "Overall DALYs", title = "Accumulated Average DALYs per Capita, per Continent 1980 - 2 017")
  • 14. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 14/39 Hide ggplot(total_cont, aes(x = continent, y = daly_cnmnd, fill = continent)) + geom_bar(stat = "identity")+ labs(x= "Continent", y = "Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from Comm unicable Diseases, per Continent 1980 - 2017")
  • 15. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 15/39 Hide ggplot(total_cont, aes(x = continent, y = daly_ncds, fill = continent)) + geom_bar(stat = "identity")+ labs(x= "Continent", y = "Non-Communicable Diseases DALYs", title = "Accumulated Average DALYs per Capita from Non-Communicable Diseases, per Continent 1980 - 2017")
  • 16. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 16/39 Hide ggplot(total_cont, aes(x = continent, y = daly_ivsa, fill = continent)) + geom_bar(stat = "identity")+ labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Con tinent 1980 - 2017")
  • 17. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 17/39 Overall, we nd that Africa has the highest accumulated average DALY rate per capita of all countries (c 90), followed by Asia (c 50), and Oceania (c 40). The high contrast of Africa agaist the rest of the continents is mainly due to its high accumulated average for communicable diseases. In this category, Africa more than tripples the second highest continent (c 55 for Africa compared to c 17 for Asia). When it comes to non-communicable diseases and injuries, rates are fairly even. For non-communicable diseases, DALY rates range c 27 - 33 (North America being the lowest and Africa, the highest). Although with much lower DALY rates, injuriy rates range c 4 - 6 (Europe being the lowest and Africa, the highest). Consequently, communicable diseases are found to have the highest burden in the population, with Africa taking (or having taken) the highest burden. A closer look into these rates were taken to better understand its evolution throught time. 1.2.1 Communicable Diseases Hide graph1 <- total_cont %>% ggplot(aes(x=period, y=daly_cnmnd, fill=continent, text=continent)) + geom_area(alpha = 1) + theme(legend.position="none") + ggtitle(".") + theme(legend.position="none") + labs(x= "Year", y = "DALY for communicable disease", title = "Time Series Average DALYs per Capita from Communica ble Diseases per Continent") ggplotly(graph1) Time Series Average DALYs per Capita from Communicable Diseases per Continent
  • 18. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 18/39 2000 2005 0.0 0.2 0.4 0.6 0.8 Ye DALY for communicable disease
  • 19. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 19/39 As seen from the graph, Africa’s communicable DALY rate seems to be in the decline since 2008.However, this continent has been consistently ranking high over other continents which leaves room to further consider the causes and potential solutions. From the Our World in Data report, it is found that neonatal disorders are the top communicable diseases in terms of total share of burden (7.45% of all causes). It is also known that there is a strong negative correlation between GDP and DALY from communicable diseases. Similarly, a negative correlation is found between health expenditure per capita and DALY from communicable diseases. What about healthcare expenditure as percentage of GDP? Hide ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_cnmnd, color = continent))+ geom_point()+ labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Communicable Diseases", title = "Rates due to Proportion of GDP spent on Healthcare") Hide # No clear correlation yet, but interesting Hide total_short%>% select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>% ggpairs()
  • 20. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 20/39 > A higher GDP per country seems to have a signi cant negative correlation to DALY of communicable diseases. However, the proportion of GDP used for healthcare seems to have a signi cant positive correlation to DALY of communicable diseases. GDP seems to have a signi cant negative correlation to the proportion of GDP spent on healthcare. This could indicate that poorer countries have a higher likelihood of having to combat communicable diseases. Consequently, they spend a greater proportion of their GDP on healthcare than richer countries. Out of pocket expenditure is also highly negatively correlated to DALY of communicable diseases, although highly positively correlated to gdp. This leads to the interpretation that poor countries in which the population is individually responsible for investing in their medical care and are most likely to have higher DALY communicable disease rates. 1.2.2 Injuries With DALY rates for injuries and additional causes having similar rates across all continents, we decided to rst take a closer look at which types of causes were most prominent overall. Hide #This plot shows injury related DALY in a stacked bar chart. start <- total%>% group_by(continent)%>% summarise(daly_conflict_terrorism = mean(daly_conflict_terrorism/total_population), daly_self_harm = mean(daly_self _harm/total_population), daly_violence = mean(daly_violence/total_population), daly_transport_injuries = mean (daly_transport_injuries/total_population), daly_nature_forces = mean(daly_nature_forces/total_population), d aly_unintentional_injuries = mean(daly_unintentional_injuries/total_population)) pivot <- pivot_longer(start, cols=c(daly_conflict_terrorism, daly_self_harm, daly_violence,daly_transport_injuries, d aly_unintentional_injuries, daly_nature_forces), names_to = "diseases",values_to = "value") #select columns from dataset plots <- pivot %>% select(continent,diseases,value) ggplot(plots, aes(fill=diseases, y=value, x=continent)) + geom_bar(position="stack", stat="identity") + labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Injuries, per Conti nent 1980 - 2017")
  • 21. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 21/39 Hide #Plot on Terrorism and Violence terrorism_violence <- start %>% select(daly_conflict_terrorism, daly_violence, continent) terrorism_violence <- pivot_longer(terrorism_violence,c(daly_conflict_terrorism, daly_violence, ),names_to = "diseases",values_to = "value") #select columns from dataset terrorism_violence <- terrorism_violence%>% select(diseases,value,continent) #stacked bar chart ggplot(terrorism_violence, aes(fill=diseases, y=value, x=continent)) + geom_bar(position="stack", stat="identity") + labs(x= "Continent", y = "Injuries DALYs", title = "Accumulated Average DALYs per Capita from Terrorism and Viole nce 1980 - 2017")
  • 22. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 22/39 Hide total_short%>% select(daly_ivsa, gdp, daly_mental_and_substance, physicians_1000, education_years)%>% ggpairs()
  • 23. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 23/39 1.2.3 Non-Communicable Diseases Hide
  • 24. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 24/39 start1 <- total%>% group_by(continent)%>% summarise(daly_cvs = mean(daly_cvs/total_population), daly_nutritional_deficiencies = mean(daly_nutritional_deficie ncies/total_population), daly_maternal_disorders = mean(daly_maternal_disorders/total_population), daly_muscu loskeletal = mean(daly_musculoskeletal/total_population), daly_other_non_communicable = mean(daly_other_non_c ommunicable/total_population), daly_neurological = mean(daly_neurological/total_population), daly_mental_and_ substance = mean(daly_mental_and_substance/total_population), daly_diabetes_urogenital_blood_endocrine = mean (daly_diabetes_urogenital_blood_endocrine/ total_population), daly_neoplasms = mean(daly_neoplasms/total_popu lation), daly_chronic_liver = mean(daly_chronic_liver/total_population)) pivot1 <- pivot_longer(start1, c(daly_cvs,daly_nutritional_deficiencies,daly_maternal_disorders,daly_musculoskeletal, daly_other_non_communicable,daly_neurological,daly_mental_and_substance,daly_diabetes_urogenital_blood_endocr ine,daly_neoplasms,daly_chronic_liver), names_to = "diseases",values_to = "value") #select columns from data set total_short_ncds <- pivot1%>% select(continent,diseases,value) #stacked bar chart # This staked bar chart shows the DALY once again for non communicable diseases but has been adjusted to show data fo r per 100000 population. Additionally the data has been colored to show the different categories of non-commu nicable diseases. #Asia has the highest DALY for non communicable diseases closely followed by Europe. There are reasons to suggest why DALY remains high in both regions. For Asia, the lack of affordability, lack of doctors, and having helathcar e not to the highest standards may all contribute towards this. Due to Europe's aging population, non-communi cable diseases are more likely to be present among its population. As seen in the graphs earlier, a path of n ations to become modern and developed, their population transitions from suffering from communicable disease towards non-communicable disease, which come with age. ggplot(total_short_ncds, aes(fill=diseases, y=value, x=continent)) + geom_bar(position="stack", stat="identity") + labs(x= "Continent", y = "Non-Comm DALYs", title = "Accumulated Average DALYs per Capita from Non-Comm, per Conti nent 1980 - 2017") Hide
  • 25. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 25/39 # Looking into CVS in more detail. ggplot(total_short, aes(x= continent, y = daly_cvs))+ geom_col()+ labs(x= "Continent", y = "Daly due to CVS related conditions", title = "DALY per capita due to CVS condition s per continent") Hide # Looking into neoplasms in more detail. ggplot(total_short, aes(x= continent, y = daly_neoplasms))+ geom_col()+ labs(x= "Continent", y = "Daly due to neoplasm", title = "DALY per capita due to neoplasms per continent")
  • 26. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 26/39 Hide # Looking into diabetes, urogenital, blood, endocrine in more detail. ggplot(total_short, aes(x= continent, y = daly_diabetes_urogenital_blood_endocrine))+ geom_col()+ labs(x= "Continent", y = "Daly due to diabetes, urogenital, blood and endocrine related conditions.", title = "DALY per capita due to diabetes, urogenital, blood and endocrine related conditions per continent")
  • 27. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 27/39 Hide ggplot(total_short, aes(x= continent, y = daly_mental_and_substance))+ geom_col()+ labs(x= "Continent", y = "Daly due to mental and substance related conditions.", title = "DALY per capita du e to mental and substance related conditions per continent")
  • 28. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 28/39 Hide ggplot(total_short, aes(x = healthcare_gdp_rate, y = daly_ncds/100000, color = continent))+ geom_point()+ labs(x= "Healthcare Expenditure as percentage of GDP", y = "DALY from Non- Communicable Diseases", title = "Rates due to Proportion of GDP spent on Healthcare")
  • 29. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 29/39 Hide total_short%>% select(daly_cnmnd, healthcare_gdp_rate, gdp, pocket_per_cap)%>% ggpairs()
  • 30. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 30/39 1.3 Regression analysis Although highly complex, and with many different societal and economical variables affecting the nal DALY rates, we decided to look into certain variables that had enough data to be used for our analysis. These variables affecting both, DALY rates by cause and general DALY rates, can be divided in several categories. Diet habit variables (fruit consumption per capita per year, percentage of animal protein consumption out of total daily calories, vegetable consumption percentage of population being overweight), healthcare variables (annual healtcare expenditure, out of pocket expenditure on healthcare, healthcare per gdp, and number of physicians per 1,000 people), living habits (smoking percentages), other demographics (education years). In addition to these elements, we considered the effect of each continent separately by tranforming them into dummy variables. 1.3.0.1 Models 0 and 1 Hide #Transforming continent factors into dummy variables total=total%>% mutate(Asia=case_when(total$continent=="Asia"~1,TRUE~0))%>% mutate(Europe=case_when(total$continent=="Europe"~1,TRUE~0))%>% mutate(NorthA=case_when(total$continent=="North America"~1,TRUE~0))%>% mutate(Africa=case_when(total$continent=="Africa"~1,TRUE~0))%>% mutate(SouthA=case_when(total$continent=="South America"~1,TRUE~0)) Hide
  • 31. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 31/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate + daly_ivsa + daly_ncds + daly_cnmnd, data = total, subset = gdp) Residuals: Min 1Q Median 3Q Max -6.602e-11 -4.258e-12 -5.730e-13 2.894e-12 1.220e-10 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.867e-12 1.064e-11 -3.640e-01 0.716590 smoking_percentage -1.016e-10 1.900e-11 -5.346e+00 2.49e-07 *** percentage_overweight -6.735e-13 1.006e-13 -6.694e+00 2.26e-10 *** fruit_consumption -7.039e-14 2.012e-14 -3.499e+00 0.000579 *** vegetable_consumption -1.385e-14 2.245e-14 -6.170e-01 0.537948 animal_protein_consumption 5.153e-12 7.264e-13 7.094e+00 2.35e-11 *** education_years 1.769e-12 6.270e-13 2.821e+00 0.005277 ** physicians_1000 3.200e-12 1.696e-12 1.887e+00 0.060675 . pocket_per_cap -2.376e-14 8.464e-15 -2.807e+00 0.005506 ** healthcare_gdp_rate 3.037e-11 1.838e-11 1.652e+00 0.100074 daly_ivsa 1.000e+00 1.045e-15 9.570e+14 < 2e-16 *** daly_ncds 1.000e+00 3.813e-16 2.623e+15 < 2e-16 *** daly_cnmnd 1.000e+00 1.157e-16 8.645e+15 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.446e-11 on 195 degrees of freedom (817 observations deleted due to missingness) Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 3.025e+31 on 12 and 195 DF, p-value: < 2.2e-16 # Lm0 was created to show that daly_ivsa, daly_ncds and daly_cnmnd make up daly_adjusted. As a result, these three va riables are not included in the linear models. lm0= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + daly_ivsa + dal y_ncds + daly_cnmnd, gdp, data = total) summary(lm0) Hide lm1= lm(daly_adjusted ~ smoking_percentage+ percentage_overweight+ fruit_consumption+ vegetable_consumption+ animal_p rotein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ healthcare_gdp_rate + gdp, data = tota l) summary(lm1)
  • 32. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 32/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate + gdp, data = total) Residuals: Min 1Q Median 3Q Max -25122 -6233 -812 4866 59542 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.921e+04 1.894e+03 41.814 < 2e-16 *** smoking_percentage -2.391e+04 6.139e+03 -3.894 0.000105 *** percentage_overweight -2.590e+02 3.531e+01 -7.335 4.53e-13 *** fruit_consumption -5.051e+01 8.655e+00 -5.836 7.19e-09 *** vegetable_consumption -3.828e+01 6.980e+00 -5.484 5.26e-08 *** animal_protein_consumption -1.799e+03 2.719e+02 -6.615 6.00e-11 *** education_years -1.270e+03 2.027e+02 -6.268 5.41e-10 *** physicians_1000 7.964e+02 4.986e+02 1.597 0.110538 pocket_per_cap -1.011e+01 2.508e+00 -4.031 5.96e-05 *** healthcare_gdp_rate 6.335e+03 6.803e+03 0.931 0.351968 gdp 6.325e-02 3.290e-02 1.923 0.054808 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11340 on 1014 degrees of freedom Multiple R-squared: 0.6371, Adjusted R-squared: 0.6335 F-statistic: 178 on 10 and 1014 DF, p-value: < 2.2e-16 Already from model one we reach an adjusted R-squared of 0.6335, meaning these factors can explain approximately 63 percent of general DALY’s uctuation. The variable with the highest p value was dropped sequentially for the below models. Hide lm2 = lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ education_years+ physicians_1000+ pocket_per_cap+ fruit_consumption + gdp, data = total) summary(lm2)
  • 33. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 33/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + fruit_consumption + gdp, data = total) Residuals: Min 1Q Median 3Q Max -25123 -6213 -846 4897 59243 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 *** smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 *** percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 *** vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 *** animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 *** education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 *** physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 . pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 *** fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 *** gdp 5.507e-02 3.170e-02 1.737 0.082661 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11340 on 1015 degrees of freedom Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335 F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16 Dropping healthcare-gdp percentage makes out of pocket expenditure become signi cant. Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption + physicians_1000, data = total) Residuals: Min 1Q Median 3Q Max -25203 -6215 -447 4669 59274 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 79726.798 1410.663 56.517 < 2e-16 *** smoking_percentage -24292.378 6127.035 -3.965 7.86e-05 *** percentage_overweight -266.174 35.115 -7.580 7.76e-14 *** vegetable_consumption -40.473 6.894 -5.871 5.87e-09 *** animal_protein_consumption -1741.370 258.204 -6.744 2.58e-11 *** education_years -1222.730 201.228 -6.076 1.74e-09 *** pocket_per_cap -7.568 2.052 -3.688 0.000238 *** fruit_consumption -47.366 8.442 -5.611 2.59e-08 *** physicians_1000 859.752 484.202 1.776 0.076097 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11350 on 1016 degrees of freedom Multiple R-squared: 0.6357, Adjusted R-squared: 0.6328 F-statistic: 221.6 on 8 and 1016 DF, p-value: < 2.2e-16 1.3.0.2 Drop physicians_1000 Hide lm3=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e ducation_years+ pocket_per_cap+ fruit_consumption + physicians_1000, data = total) summary(lm3) Hide
  • 34. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 34/39 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption, data = total) Residuals: Min 1Q Median 3Q Max -25602 -6210 -408 4778 59363 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 78595.709 1259.974 62.379 < 2e-16 *** smoking_percentage -21927.213 5986.815 -3.663 0.000263 *** percentage_overweight -256.075 34.687 -7.382 3.23e-13 *** vegetable_consumption -37.819 6.737 -5.613 2.56e-08 *** animal_protein_consumption -1623.099 249.729 -6.499 1.26e-10 *** education_years -1107.609 190.699 -5.808 8.44e-09 *** pocket_per_cap -6.709 1.996 -3.361 0.000806 *** fruit_consumption -49.239 8.384 -5.873 5.80e-09 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11360 on 1017 degrees of freedom Multiple R-squared: 0.6346, Adjusted R-squared: 0.632 F-statistic: 252.3 on 7 and 1017 DF, p-value: < 2.2e-16 All variables are now sigi cant, leading to a model with 0.632 as its adjusted R-squared. 1.3.1 Stepwise regression& VIF exam We can also used stepwise regression to nd the optimal model.Stepwise method is more precise than dropping variables mannually since it provides the possibility of adding the dropped variables back in the future steps if it improves the model(lowers model’s AIC),and also examines the signi cance after adding or dropping variables. lm4=lm( daly_adjusted~smoking_percentage+ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ e ducation_years+ pocket_per_cap+ fruit_consumption, data = total) summary(lm4) Hide fit1_step=step(lm1,direction="both")
  • 35. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 35/39 Start: AIC=19149.68 daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + healthcare_gdp_rate + gdp Df Sum of Sq RSS AIC - healthcare_gdp_rate 1 111483768 1.3048e+11 19149 <none> 1.3036e+11 19150 - physicians_1000 1 327963566 1.3069e+11 19150 - gdp 1 475232954 1.3084e+11 19151 - smoking_percentage 1 1949782662 1.3231e+11 19163 - pocket_per_cap 1 2089322026 1.3245e+11 19164 - vegetable_consumption 1 3866128944 1.3423e+11 19178 - fruit_consumption 1 4378585222 1.3474e+11 19182 - education_years 1 5050374107 1.3541e+11 19187 - animal_protein_consumption 1 5625702511 1.3599e+11 19191 - percentage_overweight 1 6917422353 1.3728e+11 19201 Step: AIC=19148.55 daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + gdp Df Sum of Sq RSS AIC <none> 1.3048e+11 19149 - gdp 1 387923531 1.3086e+11 19150 + healthcare_gdp_rate 1 111483768 1.3036e+11 19150 - physicians_1000 1 449760026 1.3093e+11 19150 - smoking_percentage 1 1912380306 1.3239e+11 19162 - pocket_per_cap 1 2075803261 1.3255e+11 19163 - vegetable_consumption 1 4040506535 1.3452e+11 19178 - fruit_consumption 1 4418260882 1.3489e+11 19181 - education_years 1 5027016883 1.3550e+11 19185 - animal_protein_consumption 1 6245781148 1.3672e+11 19194 - percentage_overweight 1 7139008696 1.3761e+11 19201 Call: lm(formula = daly_adjusted ~ smoking_percentage + percentage_overweight + fruit_consumption + vegetable_consumption + animal_protein_consumption + education_years + physicians_1000 + pocket_per_cap + gdp, data = total) Residuals: Min 1Q Median 3Q Max -25123 -6213 -846 4897 59243 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.034e+04 1.453e+03 55.295 < 2e-16 *** smoking_percentage -2.365e+04 6.132e+03 -3.857 0.000122 *** percentage_overweight -2.620e+02 3.516e+01 -7.452 1.96e-13 *** fruit_consumption -5.072e+01 8.651e+00 -5.863 6.16e-09 *** vegetable_consumption -3.893e+01 6.944e+00 -5.606 2.66e-08 *** animal_protein_consumption -1.852e+03 2.657e+02 -6.970 5.68e-12 *** education_years -1.267e+03 2.027e+02 -6.254 5.90e-10 *** physicians_1000 9.062e+02 4.845e+02 1.871 0.061701 . pocket_per_cap -1.008e+01 2.508e+00 -4.018 6.29e-05 *** gdp 5.507e-02 3.170e-02 1.737 0.082661 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11340 on 1015 degrees of freedom Multiple R-squared: 0.6368, Adjusted R-squared: 0.6335 F-statistic: 197.7 on 9 and 1015 DF, p-value: < 2.2e-16 Hide summary(fit1_step)
  • 36. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 36/39 smoking_percentage percentage_overweight 1.668205 2.913117 fruit_consumption vegetable_consumption 1.425008 1.502601 animal_protein_consumption education_years 3.110474 3.088943 physicians_1000 pocket_per_cap 3.754590 3.210832 gdp 2.999374 From the nal result we can see that six variables are signi cant with a p-value lower than 0.1. Expense and pocket_per_cap are both signi cant in this case. However, dropping one of them may lead to insigni cance of the other. This could be because these two have a joint effect on the burden of disease. We can choose from these two models according to our con dence interval. Continents were also considered as part of the model to see their effect. Call: lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption + Asia + Africa + NorthA + Europe + SouthA + Asia + Africa + NorthA + Europe + SouthA, data = total) Residuals: Min 1Q Median 3Q Max -29935 -4148 -431 3996 50866 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 66522.998 2345.279 28.365 < 2e-16 *** percentage_overweight -202.369 33.403 -6.058 1.94e-09 *** vegetable_consumption -33.863 6.185 -5.475 5.51e-08 *** animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 *** education_years -534.613 165.120 -3.238 0.00124 ** pocket_per_cap -8.669 1.678 -5.168 2.86e-07 *** fruit_consumption -40.857 6.824 -5.987 2.96e-09 *** Asia -7140.437 1807.499 -3.950 8.34e-05 *** Africa 13792.577 1918.746 7.188 1.27e-12 *** NorthA -9335.463 1794.310 -5.203 2.38e-07 *** Europe -5196.987 1650.294 -3.149 0.00169 ** SouthA -9146.724 1917.915 -4.769 2.12e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 9298 on 1013 degrees of freedom Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535 F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16 Hide vif(fit1_step) Hide fit = lm(daly_adjusted~ percentage_overweight+ vegetable_consumption+ animal_protein_consumption+ education_years+ po cket_per_cap+ fruit_consumption+Asia+Africa+NorthA+Europe+SouthA+ Asia+ Africa+ NorthA+ Europe+ SouthA, data = total) print(summary(fit)) Hide print(vif(fit))
  • 37. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 37/39 percentage_overweight vegetable_consumption 3.908936 1.772149 animal_protein_consumption education_years 2.885232 3.048594 pocket_per_cap fruit_consumption 2.136245 1.318169 Asia Africa 7.257346 6.590683 NorthA Europe 3.209642 7.556342 SouthA 2.440735 1.3.2 Interpretation on the nal model Our nal model had 11 variables Call: lm(formula = daly_adjusted ~ percentage_overweight + vegetable_consumption + animal_protein_consumption + education_years + pocket_per_cap + fruit_consumption + Asia + Africa + NorthA + Europe + SouthA + Asia + Africa + NorthA + Europe + SouthA, data = total) Residuals: Min 1Q Median 3Q Max -29935 -4148 -431 3996 50866 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 66522.998 2345.279 28.365 < 2e-16 *** percentage_overweight -202.369 33.403 -6.058 1.94e-09 *** vegetable_consumption -33.863 6.185 -5.475 5.51e-08 *** animal_protein_consumption -1030.842 209.884 -4.911 1.05e-06 *** education_years -534.613 165.120 -3.238 0.00124 ** pocket_per_cap -8.669 1.678 -5.168 2.86e-07 *** fruit_consumption -40.857 6.824 -5.987 2.96e-09 *** Asia -7140.437 1807.499 -3.950 8.34e-05 *** Africa 13792.577 1918.746 7.188 1.27e-12 *** NorthA -9335.463 1794.310 -5.203 2.38e-07 *** Europe -5196.987 1650.294 -3.149 0.00169 ** SouthA -9146.724 1917.915 -4.769 2.12e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 9298 on 1013 degrees of freedom Multiple R-squared: 0.7562, Adjusted R-squared: 0.7535 F-statistic: 285.6 on 11 and 1013 DF, p-value: < 2.2e-16 percentage_overweight vegetable_consumption 3.908936 1.772149 animal_protein_consumption education_years 2.885232 3.048594 pocket_per_cap fruit_consumption 2.136245 1.318169 Asia Africa 7.257346 6.590683 NorthA Europe 3.209642 7.556342 SouthA 2.440735 Hide continent_fit=fit summary(continent_fit) Hide vif(continent_fit)
  • 38. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 38/39 (1) (2) (3) (4) (5) (Intercept) 79209.24 *** 80341.02 *** 79726.80 *** 78595.71 *** 66523.00 *** (1894.33)    (1452.94)    (1410.66)    (1259.97)    (2345.28)    smoking_percentage -23905.42 *** -23651.69 *** -24292.38 *** -21927.21 ***         (6138.51)    (6132.06)    (6127.03)    (5986.82)            percentage_overweight -259.02 *** -262.03 *** -266.17 *** -256.08 *** -202.37 *** (35.31)    (35.16)    (35.11)    (34.69)    (33.40)    fruit_consumption -50.51 *** -50.72 *** -47.37 *** -49.24 *** -40.86 *** (8.65)    (8.65)    (8.44)    (8.38)    (6.82)    vegetable_consumption -38.28 *** -38.93 *** -40.47 *** -37.82 *** -33.86 *** (6.98)    (6.94)    (6.89)    (6.74)    (6.18)    animal_protein_consumption -1798.59 *** -1852.18 *** -1741.37 *** -1623.10 *** -1030.84 *** (271.90)    (265.72)    (258.20)    (249.73)    (209.88)    education_years -1270.47 *** -1267.36 *** -1222.73 *** -1107.61 *** -534.61 **  (202.70)    (202.66)    (201.23)    (190.70)    (165.12)    physicians_1000 796.40     906.18     859.75                     (498.63)    (484.46)    (484.20)                    pocket_per_cap -10.11 *** -10.08 *** -7.57 *** -6.71 *** -8.67 *** (2.51)    (2.51)    (2.05)    (2.00)    (1.68)    healthcare_gdp_rate 6334.55                                     (6802.52)                                    gdp 0.06     0.06                             (0.03)    (0.03)                            Asia                                 -7140.44 ***                                 (1807.50)    Africa                                 13792.58 ***                                 (1918.75)    NorthA                                 -9335.46 ***                                 (1794.31)    Europe                                 -5196.99 **                                  (1650.29)    SouthA                                 -9146.72 ***                                 (1917.91)    N 1025        1025        1025        1025        1025        R2 0.64     0.64     0.64     0.63     0.76     logLik -11018.25     -11018.69     -11020.21     -11021.80     -10814.42     Hide huxtable::huxreg(lm1,lm2,lm3,lm4, continent_fit, number_format = "%.2f")
  • 39. 22/02/2021 CM30_GroupProject_SG30 file:///Users/Aman/Downloads/The Burden of Disease Code.html 39/39 AIC 22060.50     22059.38     22060.42     22061.60     21654.84     *** p < 0.001; ** p < 0.01; * p < 0.05. actual predicted actual 1.0000000 0.8572182 predicted 0.8572182 1.0000000 From the 5 models, continent_ t was chosen as the nal model due to having all sigini cant variables, and the highest R squared (0.76). As it can be seen from our predictions, our model is able to predict the correct DALY rates for 2013 with 85.72 percent accuracy. Hide best_model <- continent_fit #Part 2: We wanted to test the prediction efficacy of our model by ensuring that it was able to predict with a certai n level of cofidence the DALYS for the last full year of data (2013) train <- total %>% filter(period<2013) predict <- total %>% filter(period == 2013) continent_fit2 <- lm(continent_fit, data = train) final_prediction <- predict(continent_fit2, newdata = predict) ac_pred <- data.frame(cbind(actual = predict$daly_adjusted, predicted = final_prediction)) correlation_accuracy <- cor(ac_pred) correlation_accuracy