Decades of data about the global burden of disease (measured in disability-adjusted life years) were cleaned, interpreted and visualised. After this, a linear regression was done to create a model that can predict (up to an accuracy of 85.7%) the burden of disease in the future, adjustable to changes in demographics, health systems, diet, education, and so on.
This presentation was created as a group project during the Business Analytics course at London Business School.
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
The Burden of Disease: Data analysis, interpretation and linear regression
1. The Burden of Disease
Group 30
Aman Desai, Jim Huang, Gloria Marín, Carmen Chen, Dimitris Charitos, Lorenzo Gherardi
Index
• Introduction to the case
• Methodology
• Data Visualization
• Initial Regressions
• Moving Forward
2. The Burden of Disease
[Slogan here]
Index
• Introduction to the case
• Main variables
• Exploratory Analysis
• Initial Regressions
• Moving Forward
DALYs
Disability
Adjusted
Life
Years
• Metric used to measure the
Burden of Disease
• DALY includes the sum of
mortality and morbidity due
to a specific disease
• One DALY = loss of 1 year
in good health because of
• Premature death
• Disease
• Disability
- Mortality is used as a method to assess a
population's health
- Through ‘child mortality’
- Through ‘life expectancy’
- The Problem with this method is that it does
not account for a population that lives through
suffering due to a disease which otherwise
prevents a normal life.
- For people to get healthy, attention needs to
be given to the impact on the lives of people
suffering with disease. Years of contribution to
ones one’s community, industry, and nation,
are lost.
ASSIGNMENT PURPOSE:
1. Understand causes
2. Identify factors that magnify its impact
Introduction to the Case
3. The Burden of Disease
[Slogan here]
Introduction to the Case: background and preparation
Communicable diseases
Non-communicable
diseases (NCDs)
Injuries
Diarrhoea, lower
respiratory & other
common infectious
diseases
Cardiovascular diseases
(inc. stroke, heart disease
and heart failure)
Road injuries
Neonatal disorders Cancers
Other transport
injuries
Maternal disorders Respiratory disease Falls
Malaria & neglected
tropical diseases
Diabetes, blood and
endocrine diseases
Drowning
Nutritional deficiencies
Mental and substance use
disorders
Fire, heat and hot
substances
HIV/AIDS Liver diseases Poisonings
Tuberculosis Digestive diseases Self-harm
Other communicable
diseases
Musculoskeletal disorders Interpersonal violence
Neurological disorders
(including dementia)
Conflict & terrorism
Other NCDs Natural disasters
4. Methodology: Linear Model and Data Visualization
Step 1. Identify relevant variables
• Explanatory Variable (x): Select factors covering the following dimensions from hundreds of
other factors: Diet habit (E.g., fruit consumption), Healthcare level (E.g., healthcare expense),
Living habit (smoking %), and Other demographics (E.g., education, overweight %)
• Response Variable (y): Choose ‘Overall DALYs’, ‘Communicable Diseases DALYs’, ‘Non-
Communicable Diseases DALYs’, and ‘Injuries DALYs’ as our response variable from 24
possible variable by comparing the models
Step 2. Check for non-linear relations
Step 3. Generate the linear regression and prediction model
• Dropped all the insignificant level
• Checked the VIF to eliminate the risk of multilinearity
Data Visualization
• Bar Chart and Stacked Bar Chart: Compare causes of DALYs by continent
• Area Chart: Look into the DALY rate over 2000~2017 by continent
• Scatter Plot: Measure DALY due to Proportion of GDP spent on Healthcare by continent
Linear Model
5. Accumulated DALYs per Capita 1980 - 2017
Due to Communicable Diseases
Due to Non-Communicable
Diseases
Due to Injuries
• Overall, Africa has the highest accumulated
average DALY rate (9), followed by Asia (5),
and Oceania (4).
• The high contrast of Africa is mainly due to
communicable diseases, with a rate triple of
that of the next highest continent, Asia
• DALY rate for non-communicable diseases
and injuries are relatively uniform around the
world.
• Africa's communicable DALY rate has
declined since 2008. Despite this, the
burden of disease on the continent remains
high and this leaves room to consider the
causes and potential solutions for this.
Summary
Communicable DALYs over 1980 - 2017
Results & Conclusion: Data Visualization (1)
6. • The variables affecting DALY have been
further broken down. The largest contribution
factor found were:
Ø Cardiovascular Diseases for Non-
Communicable Diseases, and
Ø Unintentional Injuries for Injuries.
• There was no significant cause of disease
found across the different continents.
• GDP has a negative correlation to DALY
for Communicable Diseases, however, the
proprtion of GDP used has a positive
corrlation to the same. This could be
because poorer countries have a higher
liklihood of having to combat communicable
diseases and as a result spend more of their
GDP on healthcare.
Summary
Results & Conclusion: Data Visualization (2)
Causes of DALYs by Continent
due to Non-Communicable Disease
Causes of DALYs by Continent
due to Injuries
Healthcare Expense vs. DALY from
Non-Communicable Disease
Healthcare Expense vs. DALY from
Communicable Disease
Analysis of the Correlation among Variables
7. 1. All explanatory variables’ Pr(>|t|) < 0.01
Results & Conclusion: Final Model
DALY= 66523 - 202.37 overweight% - 33.86 veg_consump - 1030.84 animal_protein_consump -534.61 education - 8.67
pocket_per_cap - 40.86 fruit_consump -7140.44 Asia + 13792.58 Africa -9335.46 NorthA -5196.99 Europe -9146.72 SouthA
Model of best fit
2. VIF (Variance Inflation Factor) is <10
• By ruling out all insignificant variables,
we had 7 variables in our best model.
• The risk of multicollinearity was checked
by ensuring that VIF <10.
• The high R-squared obtained (75.35%)
suggests that the model explains the
variance of DALY accurately.
Summary
8. Results & Conclusion: Prediction
Step 1. Using our linear model, we have estimated the DALY rates worldwide for
2013 using our data for all the years until 2012.
Step 2. The data was filtered to all periods before 2013 and a linear model was created.
Step 3. Using the Linear Model, data for 2013 was predicted.
Step 4. Compared to the actual data available for 2013, the accuracy was determined
Prediction Accuracy was 85.7%
Prediction
9. Moving Forward: Adding New Variables
What other ‘external’ elements may be magnifying results?
COMMON TO ALL
• Percentage of population insured with health insurance.
• Number of medical doctors per 1,000 people.
• Number of nurses per 1,000 people.
• Out-of-pocket expenditure for healthcare.
SPECIFIC TO
a) Communicable, maternal, neonatal, and nutritional diseases
• Nutritional deficiencies.
• Hygiene practices.
• Housing space per person.
b) Non-communicable diseases
• Physical inactivity.
• Wellbeing.
• Genetics.
c) Injuries
• Surveillance.
• Regulations for safety.
10. The Burden of Disease
Group 30 - Aman Desai, Jim Huang, Gloria Marín, Carmen Chen, Dimitris Charitos, Lorenzo Gherardi
Introduction
A glance of DALY
Linear Regressions
DALY= 66523 - 202.37 overweight% - 33.86 veg_consum - 1030.84 animal_consum -534.61 education - 8.67 pocket/cap
- 40.86 fruit_consum -7140.44 Asia + 13792.58 Africa -9335.46 NorthA -5196.99 Europe -9146.72 SouthA
All explanatory variables’ Pr(>|t|) < 0.01 VIF (Variance Inflation Factor) is <10
Conclusions
Moving Forward
Methodology (Model)
Model of best fit
Disability
Adjusted
Life
Years
DALYs
• Metric used to measure the Burden of Disease
• It includes the sum of mortality and morbidity
• DALY = loss of 1 year in good health because of
Premature death, Disease, Disability
Aim of study
• Understand causes
• Identify factors that magnify the impact
Background & Preparation
Burden of Disease, 2017
Disease Burden due to Communicable disease vs GDP per capita
Category of Disease
• Communicable disease
• Non-Communicable
disease (e.g., Cancers)
• Injuries (e.g., Falls, Fire)
DALY Around the World Due to Communicable Disease
Due to Non-Communicable Disease
Step 1. Identify relevant variables
• Explanatory Variable (x): Select factors covering following
dimensions from hundreds of other factors: Diet habit (E.g.,
fruit consumption), Healthcare level (E.g., healthcare expense),
Living habit (smoking %), and Other demographics (E.g.,
education, overweight %
• Response Variable (y): Choose ‘Overall DALYs’, ‘Communicable
Diseases DALYs’, ‘Non-Communicable Diseases DALYs’, and
‘Injuries DALYs’ as our response variable from 24 possible
variable by comparing the models
Step 2. Check for non-linear relations
Step 3. Generate the regression model
• Dropped all the insignificant level
• Checked the VIF to eliminate the risk of multilinearity
Statistical Technique
Healthcare Expense vs.
DALY from Non- and
Communicable Disease
Stacked Bar Chart - Causes of
DALYs by continent for Non-
Communicable Disease and
Injuries
What other ‘external’ elements may be magnifying results?
• Common to all::
• Percentage of population insured with health insurance
• Number of medical doctors/Nurse per 1,000 people
• Specific to
a) Communicable diseases : Family size
b) Non-communicable diseases: Literacy rate
c) Injuries: Alcohol consumption
Linear Regression
• The best model had 7 variables (overweight%, veg_consum, animal_consum,
education, pocket/cap, fruit _consum, continent) including in best model with
all the variavles Pr(>|t|) < 0.01 and VIF<10
• High R-squared (75.35%) suggests the model explain the variance of DALY well
Data Visualization
• Africa has the highest Avg. DALY rate (c 9), followed by Asia (c 5), and Oceania.
• The high contrast of Africa is mainly due to communicable diseases with a rate
more than triples the second highest continent.
• Africa‘s communicable DALY rate declines since 2008 but remain high over other
continents, leaving room to further consider the causes and potential solutions
Communicable DALYs over 1980 - 2017
Analysis of Correlation among
Variables