SlideShare a Scribd company logo
1 of 21
1 RoshikGanesan CIS 5810
Death Cause Analysis in United States
Roshik Ganesan
305641224
California State University Los Angeles
2 RoshikGanesan CIS 5810
A. Data Sets:
URL:
https://catalog.data.gov/dataset/nchs-potentially-excess-deaths-from-the-five-leading-causes-of-
death
Twitter Hashtags:
#Cancer, #lowerrespiratorydiesease, #HeartDisease, #Stroke #UnintentionalInjury.
B. Data Description:
This dataset which deals with the data related to potential excess death in
the country of United States of Americas has very rich details. It contains data from the year 2005
to 2015 which accounts up to almost a decade of data. The dataset possessing a decade of data
hence has more than 2 million records which gives us a good opportunity to drill and get better
insights. Further this data contains the reasons for the cause of death. However, it contains data
related to the top 5 cause of death in United States of America. This includes Cancer, Chronic
lower respiratory disease, Heart disease, Stroke and Unintentional injury. The details of the states
in which these deaths have occurred has also been provided which helps us in analyzing the results
3 RoshikGanesan CIS 5810
based on geographic locations in the country. The abbreviations of the states have also been
provided for easy understanding and analysis. It provides with the HHS region codes which is a
unique code that is assigned by Health and Human Services department to each state. The age
group of the victim is provided which would help in analyzing the major cause of death in each
age group by state. This dataset further drills more deeper and gives us the details on whether the
death had occurred in a metropolitan city or a non-metropolitan city. This would help us analyze
the facilities in each state based on these counts. Then this dataset tells us the number of observed
deaths in the particular state for metropolitan city, Non-Metropolitan city and also the consolidated
version of both. The population column gives us the details of the population in the city at that
particular year. This dataset also tells what is the actual expected death for the year which helps us
in identifying the excess death. The potential Excess death column subtracts the expected deaths
for the year from the observed deaths for the year. This let’s know on what is the number of
unexpected deaths that occurs in a particular year for particular region. The data finally provides
us with the information on what is percentage of increase in the death rates.
C. Data Refinement:
Category 1: Missing Value Removal
Pre-Refinement:
A few states aren’t provided with Non-Metropolitan data, as they cannot be used for analyses they
are being removed.
4 RoshikGanesan CIS 5810
Post-Refinement:
Category 2: Improper column name
Pre-Refinement:
The column is improperly named as “Potentially Excess Death”, hence changing it to “Excess
Death Observed”
Post-Refinement:
5 RoshikGanesan CIS 5810
Category 3: Setting conditions/Filters
A) For regular dataset
Pre-Refinement:
The data in the “locality” field contains Metropolitan, Non-Metropolitan and “ALL” of which
“ALL” is the consolidation of both Metropolitan and Non-Metropolitan hence it is excluded.
Post-Refinement:
6 RoshikGanesan CIS 5810
B) For Twitter Dataset:
As we perform the analysis only inside of United State of America we set the condition from the
twitter hashtags to only United States.
Category 4: Correcting Misfed Values
Pre-Refinement:
The name of the state Arizona has been miss-spelt which is being changed.
Post-Refinement:
7 RoshikGanesan CIS 5810
Calculated Field:
Formula Used: (Observed Death/Population) * 100
This calculated field is done in order to find the percentage of the death that is actually observed
in the state for both the metropolitan and non-metropolitan cities. This let’s us know what
percentage of the total population have observed death.
Data Grouping:
8 RoshikGanesan CIS 5810
This data grouping is done in order to group the states from USA based on the region hey exist.
There are 4 major group created which are East Coast, West Coast, Central East and Central West
and the corresponding states are grouped into those categories.
D. Data Quality:
The dataset with the quality of 77 is before the elimination of the empty values. Upon eliminating
the empty values and proceeding with further refinement we have achieved a data quality of 80.
9 RoshikGanesan CIS 5810
These screenshot shows the change in the column quality of the data before and after refinement.
E. Data Exploration:
Question1:
How does the number of Age Range Compare by Cause of Death?
10 RoshikGanesan CIS 5810
This visualization shows us which ailment has caused most of the deaths in the
entire span of 10 years. It is evident that the top 3 causes are unintentional injury followed by
cancer which is in turn followed by heart disease. The cause unintentional injury tops the list with
a total of 26400 deaths in the last 10 years. Cancer has caused 26370 deaths in total for 10 years
followed by Heart Disease 26346 deaths in the similar span of time. Strokes list to be the 4 in the
top 5 with 24414 deaths and then the chronic lower respiratory disease with 24147 deaths over the
years.
Question 2:
What is the trend of Expected death and Observed death over Years?
This visualization shows us how the trend of deaths been over the years. What we
infer from this visualization is that the difference between the expected deaths and observer deaths
has been significantly more in the earlier days and it is being reduced in the subsequent years. In
11 RoshikGanesan CIS 5810
2005 the average observed death is 2277 and that of expected death is 1514 having a difference of
763 which is reduced to an extent when compared to that of 2015 which is 644. The alarming sign
from the visualization is that though the difference has gone down showing that a better prediction
is made in the current age, the number of death has decreased initially and then been on a steady
raise. The average deaths have increased from a record low of 2233 to 2390 in a span of 5 years.
Question 3:
How do the value of Observed Death compare by Year and Age Range?
As we drill further deep to get further insights related to the previous visualization
we carry out this analysis to identify which particular age range has caused an increase in the
average number of death over the years. This visualization shows the average death for each
particular age range. Analyzing the visualization, we find out that the average of the age range “0-
84” has decreased from 2005 to 2010 and later it has been on the rise, the average number of deaths
has increased from 4479 to 4760 between the years 2010 to 2015. This difference account to an
12 RoshikGanesan CIS 5810
increase of an average of 281 deaths per year. A similar increase has also been observed with the
age range between “0-79” which is from 3581 to 3901 which sums up an alarming average increase
of 320 deaths per year. It is to note that for the lower age groups “0-49”, “0-59” the difference is
very low with an average difference of about 10 to 20. The learning from this insight is that the
age ranges from 49 to 89 has caused the increase in the deaths over the years.
Question4:
What is the relationship between Population and Observed Deaths by State?
This analysis shows us the relationship between the population of a state compared
to the observed death. We filter the analyses for the top 10 states for the country. Without any
surprise we see that the state of California having recorded with highest population has the highest
average of observed deaths. This is followed by Texas which is also a fairly larger state. The 3rd
4th and 5th position in the list are taken by New York, Florida and New Jersey respectively. It’s
interesting to note that thought these 3 states do not have a stark difference in the population the
13 RoshikGanesan CIS 5810
observed death rate for Florida is comparatively high compared to those of New York and New
Jersey, New Jersey being the lowest among them. Florida having a lower population than Texas
has a trivial difference in observed deaths compared to that of Texas. The last 5 states in the list
are Illinois, Pennsylvania, Ohio, Michigan and North Carolina. A worthy note amongst this is that
Illinois population is higher than that of Pennsylvania and Ohio but still the observed death is
lesser than the latter states. This may mean that the heath care facilities in Illinois offer a better
service than the other 2 states.
Question5:
How do the values of Observed Deaths compare by States Based on Region and Locality?
In this visualization we analyze the observed death rates based on the regions of
states in USA. Based on the regional grouping done earlier we have divided the country based on
4 regions which are East Coast, Central East, Central West and West Coast. The interesting insight
from this analysis is that California recorded with the largest Observed death and Population in
14 RoshikGanesan CIS 5810
the previous analysis belonging to the west coast has a lower observed death than the East Coast.
East Coast tops the list recording with the highest average of observed death (5924 and 1729) in
both metropolitan and non-metropolitan locality. It is also to be noted that the difference in the
observed death is also greater in the East Coast compared to that of the West Coast which is 4195
and 2183 respectively. There happen to be a difference of 2012 deaths between the metropolitan
and non-metropolitan of the East Coast and the West Coast. This might stand as an evidence to the
fact that the health care facilities in the East Coast are little inferior to that of the West Coast.
Question6:
What is the Breakdown of the number of Author name by Matching Hashtags?
This simple pie chart visualization from the Social Media(Twitter) shows us what
people have been talking about. Recollecting an analysis from the first visualization which showed
us that the most number of death have been caused by unintentional injury it is so surprising to see
that there hasn’t been people who have tweeted about unintentional injury. From this analysis we
15 RoshikGanesan CIS 5810
see that the talk of the social media has always been about Cancer which has been in the rise from
2010 which is followed by Stroke and Heart disease. It is to note that there hasn’t been any tweet
on the lower respiratory disease either.
Question7:
How does the number of Author Name compare by Author State and Matching Hashtags?
Analyzing further to get an in-depth knowledge on which state people have been
very active in tweeting, it is to no surprise that California tops the list of 10 with its dense
population stating to be the reason. A closer look into this chart will give us an interesting fact that
the analysis from question 5 bolsters this analysis. We have seen that the population and the
observed death have been more in the states of California, Texas, New York and Florida. This
analysis shows that the major tweet has been from the same states which recorded with the highest
observed deaths. An interesting fact to note is that though New Jersey was a part of the Top 5
states recorded with the most observed death but isn’t one of the state that tweets much on deaths.
16 RoshikGanesan CIS 5810
F. Prediction:
Fig F.1
Fig F.2
17 RoshikGanesan CIS 5810
Fig F.3
Fig F.4
18 RoshikGanesan CIS 5810
Fig F.5
Fig F.6
19 RoshikGanesan CIS 5810
The observed death predictor is done in order to predict the number of
observed death in future. This prediction is done taking into consideration the following columns
as input. The columns include Year, State, HHS Region, Population, Expected Deaths, Excess
Death Observed, Age Range, Locality and percentage of potential excess death. This prediction
holds good only for a 21.7% strength as the most of the values aren’t that closely related to each
other. This prediction states us that the observed death is strongly influenced by 6 top relations
which are between the following columns Expected Death, Excess Death Observed, Population.
From Fig F.4 we see that the column Observed Death is linearly regressed with Expected Death,
Excess death observed. The decision tree Fig F.2 states that the observed death is more influenced
by Expected Death and 7 other columns. Hence the highest prediction rate is achieved using those
columns which is 21.7%. The 7 other columns include HHS Region, Excess Death Observed,
Percent Potential excess death, Population, Excess death observed, Age range and Cause of Death.
The decision table in Fig F.3 states that the observed death is a continuous target and hence the
algorithm used by Watson for prediction is CHAID regression tree (Fig F.6). The second strongest
prediction that could be made is using the combination of the columns Excess Death Observed and
HHS region which predicts up to 18.6%. This prediction also using a regression algorithm which
is Linear Regression(ANOVA). Another 2-field prediction using the fields HHS region and
Expected Death together predict the observed death up to 13.9%. More details on this prediction
shows (Fig F.5) that observed death being a continuous value the same Linear
Regression(ANOVA) algorithm is being used. Thus, we learn that from a total of 12 fields which
are used for prediction only 8 of the fields potentially influence in predicting the observed deaths
in the states and the remaining 4 fields do not create an impact in predicting target. As the columns
doesn’t seem to be highly co-related the prediction strength tends to be low for observed deaths.
20 RoshikGanesan CIS 5810
G. Dashboard:
The dashboard shows 4 important visualization of which 2 are analyzed in depth in
the exploration part. The first visualization shows us which cause of death has the taken the toll
on the most people. Unintentional injury has costed most of the lives followed by cancer and heart
disease. The second visualization which is a pie chart explain us which are the top 10 states which
have the highest excess death observed. The Top 5 of them include Texas, Florida, Ohio and
California. California though has the highest population and the highest observed death rate is
ranked 5 in this list which states that California is better in predicting the expected deaths per year.
The 3rd visualization, Scatter plot has also been analyzed in depth in question 4 which shows the
relationship between the population and the observed deaths based on the states. The Top 5 states
here include California, Texas, New York, Florida and New Jersey. The final visualization shows
us an analysis on the number of tweets which is separated based on gender. For each hashtag we
analyze the number of tweets based on gender. From this analysis we learn that there are more
21 RoshikGanesan CIS 5810
than about cancer from female than male. Whereas for heart disease and stroke there has been
more number of tweet from male than female. This may stand an evidence that bolsters a fact that
men are more prone to heart disease and strokes where as women are more prone to cancer. Thus,
this analysis gives us an in-depth review on the deaths that have occurred in the country of United
States of America. We have analyzed the major causes by age range, State and the trends of death
over the years. This analysis would server to be a good guidance for health care facilities to tailor
their services for each Age Range and Cause of death.

More Related Content

What's hot

Knowledge gap theory
Knowledge gap theoryKnowledge gap theory
Knowledge gap theorysabiha anam
 
Digital 2021 Bangladesh (January 2021) v01
Digital 2021 Bangladesh (January 2021) v01Digital 2021 Bangladesh (January 2021) v01
Digital 2021 Bangladesh (January 2021) v01DataReportal
 
Digital 2022 Dominican Republic (February 2022) v01
Digital 2022 Dominican Republic (February 2022) v01Digital 2022 Dominican Republic (February 2022) v01
Digital 2022 Dominican Republic (February 2022) v01DataReportal
 
Digital 2023 Singapore (February 2023) v01
Digital 2023 Singapore (February 2023) v01Digital 2023 Singapore (February 2023) v01
Digital 2023 Singapore (February 2023) v01DataReportal
 
Digital 2022 Malawi (February 2022) v01
Digital 2022 Malawi (February 2022) v01Digital 2022 Malawi (February 2022) v01
Digital 2022 Malawi (February 2022) v01DataReportal
 
The internship report fm 101
The internship report fm 101The internship report fm 101
The internship report fm 101pak2015
 
Digital 2022 Lesotho (February 2022) v01
Digital 2022 Lesotho (February 2022) v01Digital 2022 Lesotho (February 2022) v01
Digital 2022 Lesotho (February 2022) v01DataReportal
 
Digital 2023 United States of America (February 2023) v01
Digital 2023 United States of America (February 2023) v01Digital 2023 United States of America (February 2023) v01
Digital 2023 United States of America (February 2023) v01DataReportal
 
Digital 2023 Nepal (February 2023) v01
Digital 2023 Nepal (February 2023) v01Digital 2023 Nepal (February 2023) v01
Digital 2023 Nepal (February 2023) v01DataReportal
 
Digital 2022 Israel (February 2022) v01
Digital 2022 Israel (February 2022) v01Digital 2022 Israel (February 2022) v01
Digital 2022 Israel (February 2022) v01DataReportal
 
Digital 2022 Australia (February 2022) v02
Digital 2022 Australia (February 2022) v02Digital 2022 Australia (February 2022) v02
Digital 2022 Australia (February 2022) v02DataReportal
 
Digital 2023 France (February 2023) v01
Digital 2023 France (February 2023) v01Digital 2023 France (February 2023) v01
Digital 2023 France (February 2023) v01DataReportal
 
Digital 2022 France (February 2022) v02
Digital 2022 France (February 2022) v02Digital 2022 France (February 2022) v02
Digital 2022 France (February 2022) v02DataReportal
 
Digital 2022 Libya (February 2022) v01
Digital 2022 Libya (February 2022) v01Digital 2022 Libya (February 2022) v01
Digital 2022 Libya (February 2022) v01DataReportal
 
Digital 2022 Germany (February 2022) v02
Digital 2022 Germany (February 2022) v02Digital 2022 Germany (February 2022) v02
Digital 2022 Germany (February 2022) v02DataReportal
 
Digital 2022 Kuwait (February 2022) v01
Digital 2022 Kuwait (February 2022) v01Digital 2022 Kuwait (February 2022) v01
Digital 2022 Kuwait (February 2022) v01DataReportal
 
Digital 2023 United States Virgin Islands (February 2023) v01
Digital 2023 United States Virgin Islands (February 2023) v01Digital 2023 United States Virgin Islands (February 2023) v01
Digital 2023 United States Virgin Islands (February 2023) v01DataReportal
 
Digital 2023 Hungary (February 2023) v01
Digital 2023 Hungary (February 2023) v01Digital 2023 Hungary (February 2023) v01
Digital 2023 Hungary (February 2023) v01DataReportal
 
Digital 2021 Switzerland (January 2021) v01
Digital 2021 Switzerland (January 2021) v01Digital 2021 Switzerland (January 2021) v01
Digital 2021 Switzerland (January 2021) v01DataReportal
 
Digital 2023 Turks and Caicos Islands (February 2023) v01
Digital 2023 Turks and Caicos Islands (February 2023) v01Digital 2023 Turks and Caicos Islands (February 2023) v01
Digital 2023 Turks and Caicos Islands (February 2023) v01DataReportal
 

What's hot (20)

Knowledge gap theory
Knowledge gap theoryKnowledge gap theory
Knowledge gap theory
 
Digital 2021 Bangladesh (January 2021) v01
Digital 2021 Bangladesh (January 2021) v01Digital 2021 Bangladesh (January 2021) v01
Digital 2021 Bangladesh (January 2021) v01
 
Digital 2022 Dominican Republic (February 2022) v01
Digital 2022 Dominican Republic (February 2022) v01Digital 2022 Dominican Republic (February 2022) v01
Digital 2022 Dominican Republic (February 2022) v01
 
Digital 2023 Singapore (February 2023) v01
Digital 2023 Singapore (February 2023) v01Digital 2023 Singapore (February 2023) v01
Digital 2023 Singapore (February 2023) v01
 
Digital 2022 Malawi (February 2022) v01
Digital 2022 Malawi (February 2022) v01Digital 2022 Malawi (February 2022) v01
Digital 2022 Malawi (February 2022) v01
 
The internship report fm 101
The internship report fm 101The internship report fm 101
The internship report fm 101
 
Digital 2022 Lesotho (February 2022) v01
Digital 2022 Lesotho (February 2022) v01Digital 2022 Lesotho (February 2022) v01
Digital 2022 Lesotho (February 2022) v01
 
Digital 2023 United States of America (February 2023) v01
Digital 2023 United States of America (February 2023) v01Digital 2023 United States of America (February 2023) v01
Digital 2023 United States of America (February 2023) v01
 
Digital 2023 Nepal (February 2023) v01
Digital 2023 Nepal (February 2023) v01Digital 2023 Nepal (February 2023) v01
Digital 2023 Nepal (February 2023) v01
 
Digital 2022 Israel (February 2022) v01
Digital 2022 Israel (February 2022) v01Digital 2022 Israel (February 2022) v01
Digital 2022 Israel (February 2022) v01
 
Digital 2022 Australia (February 2022) v02
Digital 2022 Australia (February 2022) v02Digital 2022 Australia (February 2022) v02
Digital 2022 Australia (February 2022) v02
 
Digital 2023 France (February 2023) v01
Digital 2023 France (February 2023) v01Digital 2023 France (February 2023) v01
Digital 2023 France (February 2023) v01
 
Digital 2022 France (February 2022) v02
Digital 2022 France (February 2022) v02Digital 2022 France (February 2022) v02
Digital 2022 France (February 2022) v02
 
Digital 2022 Libya (February 2022) v01
Digital 2022 Libya (February 2022) v01Digital 2022 Libya (February 2022) v01
Digital 2022 Libya (February 2022) v01
 
Digital 2022 Germany (February 2022) v02
Digital 2022 Germany (February 2022) v02Digital 2022 Germany (February 2022) v02
Digital 2022 Germany (February 2022) v02
 
Digital 2022 Kuwait (February 2022) v01
Digital 2022 Kuwait (February 2022) v01Digital 2022 Kuwait (February 2022) v01
Digital 2022 Kuwait (February 2022) v01
 
Digital 2023 United States Virgin Islands (February 2023) v01
Digital 2023 United States Virgin Islands (February 2023) v01Digital 2023 United States Virgin Islands (February 2023) v01
Digital 2023 United States Virgin Islands (February 2023) v01
 
Digital 2023 Hungary (February 2023) v01
Digital 2023 Hungary (February 2023) v01Digital 2023 Hungary (February 2023) v01
Digital 2023 Hungary (February 2023) v01
 
Digital 2021 Switzerland (January 2021) v01
Digital 2021 Switzerland (January 2021) v01Digital 2021 Switzerland (January 2021) v01
Digital 2021 Switzerland (January 2021) v01
 
Digital 2023 Turks and Caicos Islands (February 2023) v01
Digital 2023 Turks and Caicos Islands (February 2023) v01Digital 2023 Turks and Caicos Islands (February 2023) v01
Digital 2023 Turks and Caicos Islands (February 2023) v01
 

Similar to United States Death Cause Analysis

Regionalsnapshot_HealthDisparities_Oct2023
Regionalsnapshot_HealthDisparities_Oct2023Regionalsnapshot_HealthDisparities_Oct2023
Regionalsnapshot_HealthDisparities_Oct2023ARCResearch
 
November 2021 Public Health Snapshot
November 2021 Public Health SnapshotNovember 2021 Public Health Snapshot
November 2021 Public Health Snapshotlaylabellows
 
Covid19 Data Analysis 042220
Covid19 Data Analysis 042220Covid19 Data Analysis 042220
Covid19 Data Analysis 042220Martin Pepper, PE
 
Covid 19 data analysis 042020
Covid 19 data analysis 042020Covid 19 data analysis 042020
Covid 19 data analysis 042020Martin Pepper, PE
 
Regional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro Atlanta Regional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro Atlanta ARCResearch
 
Covid 19 Data Analysis 042420
Covid 19 Data Analysis 042420Covid 19 Data Analysis 042420
Covid 19 Data Analysis 042420Martin Pepper, PE
 
Radius Energy Solutions Covid19 Data Analysis 052220
Radius Energy Solutions Covid19 Data Analysis 052220Radius Energy Solutions Covid19 Data Analysis 052220
Radius Energy Solutions Covid19 Data Analysis 052220Martin Pepper, PE
 
Covid19 Regional, Country and State Analysis 042820
Covid19 Regional, Country and State Analysis 042820Covid19 Regional, Country and State Analysis 042820
Covid19 Regional, Country and State Analysis 042820Martin Pepper, PE
 
Covid19 Data Analysis 050620
Covid19 Data Analysis 050620Covid19 Data Analysis 050620
Covid19 Data Analysis 050620Martin Pepper, PE
 
DISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docxDISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docxsdfghj21
 
DISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docxDISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docxbkbk37
 
An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...
An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...
An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...Sharp Metropolitan Medical Campus
 
Mortality & Morbidity in the 21st Century
Mortality & Morbidity in the 21st CenturyMortality & Morbidity in the 21st Century
Mortality & Morbidity in the 21st CenturyPaul Coelho, MD
 
Regional, Country and State Covid19 Data Analysis 051320
Regional, Country and State Covid19 Data Analysis 051320Regional, Country and State Covid19 Data Analysis 051320
Regional, Country and State Covid19 Data Analysis 051320Martin Pepper, PE
 
Effectiveness of the minimum legal drinking age
Effectiveness of the minimum legal drinking ageEffectiveness of the minimum legal drinking age
Effectiveness of the minimum legal drinking ageChristopher Christensen
 
Regional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro AtlantaRegional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro AtlantaARCResearch
 
Deaths from fall-related traumatic brain injuries are on the rise in U.S.
Deaths from fall-related traumatic brain injuries are on the rise in U.S.Deaths from fall-related traumatic brain injuries are on the rise in U.S.
Deaths from fall-related traumatic brain injuries are on the rise in U.S.Δρ. Γιώργος K. Κασάπης
 
the bmj BMJ 2021;373n1343 doi 10.1136bmj.n1343 1R E
the bmj  BMJ 2021;373n1343  doi 10.1136bmj.n1343 1R Ethe bmj  BMJ 2021;373n1343  doi 10.1136bmj.n1343 1R E
the bmj BMJ 2021;373n1343 doi 10.1136bmj.n1343 1R EGrazynaBroyles24
 

Similar to United States Death Cause Analysis (20)

Regionalsnapshot_HealthDisparities_Oct2023
Regionalsnapshot_HealthDisparities_Oct2023Regionalsnapshot_HealthDisparities_Oct2023
Regionalsnapshot_HealthDisparities_Oct2023
 
November 2021 Public Health Snapshot
November 2021 Public Health SnapshotNovember 2021 Public Health Snapshot
November 2021 Public Health Snapshot
 
Thesis_Final_Draft
Thesis_Final_DraftThesis_Final_Draft
Thesis_Final_Draft
 
Covid19 Data Analysis 042220
Covid19 Data Analysis 042220Covid19 Data Analysis 042220
Covid19 Data Analysis 042220
 
Covid 19 data analysis 042020
Covid 19 data analysis 042020Covid 19 data analysis 042020
Covid 19 data analysis 042020
 
Regional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro Atlanta Regional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro Atlanta
 
Covid 19 Data Analysis 042420
Covid 19 Data Analysis 042420Covid 19 Data Analysis 042420
Covid 19 Data Analysis 042420
 
Radius Energy Solutions Covid19 Data Analysis 052220
Radius Energy Solutions Covid19 Data Analysis 052220Radius Energy Solutions Covid19 Data Analysis 052220
Radius Energy Solutions Covid19 Data Analysis 052220
 
Covid19 Regional, Country and State Analysis 042820
Covid19 Regional, Country and State Analysis 042820Covid19 Regional, Country and State Analysis 042820
Covid19 Regional, Country and State Analysis 042820
 
Covid19 Data Analysis 050620
Covid19 Data Analysis 050620Covid19 Data Analysis 050620
Covid19 Data Analysis 050620
 
DISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docxDISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docx
 
DISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docxDISCUSSION BOARD forum 3.docx
DISCUSSION BOARD forum 3.docx
 
An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...
An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...
An epidemic of early deaths among middle aged whites in USAJAMA--chetty-jama-...
 
Mortality & Morbidity in the 21st Century
Mortality & Morbidity in the 21st CenturyMortality & Morbidity in the 21st Century
Mortality & Morbidity in the 21st Century
 
Regional, Country and State Covid19 Data Analysis 051320
Regional, Country and State Covid19 Data Analysis 051320Regional, Country and State Covid19 Data Analysis 051320
Regional, Country and State Covid19 Data Analysis 051320
 
Effectiveness of the minimum legal drinking age
Effectiveness of the minimum legal drinking ageEffectiveness of the minimum legal drinking age
Effectiveness of the minimum legal drinking age
 
Regional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro AtlantaRegional Snapshot: Public Health in Metro Atlanta
Regional Snapshot: Public Health in Metro Atlanta
 
medicina
medicinamedicina
medicina
 
Deaths from fall-related traumatic brain injuries are on the rise in U.S.
Deaths from fall-related traumatic brain injuries are on the rise in U.S.Deaths from fall-related traumatic brain injuries are on the rise in U.S.
Deaths from fall-related traumatic brain injuries are on the rise in U.S.
 
the bmj BMJ 2021;373n1343 doi 10.1136bmj.n1343 1R E
the bmj  BMJ 2021;373n1343  doi 10.1136bmj.n1343 1R Ethe bmj  BMJ 2021;373n1343  doi 10.1136bmj.n1343 1R E
the bmj BMJ 2021;373n1343 doi 10.1136bmj.n1343 1R E
 

More from Roshik Ganesan

Crime Analysis at Chicago
Crime Analysis at ChicagoCrime Analysis at Chicago
Crime Analysis at ChicagoRoshik Ganesan
 
Real Time Twitter sentiment analysis
Real Time Twitter sentiment analysis Real Time Twitter sentiment analysis
Real Time Twitter sentiment analysis Roshik Ganesan
 
Performance Analysis NFL 2014
Performance Analysis NFL 2014Performance Analysis NFL 2014
Performance Analysis NFL 2014Roshik Ganesan
 
Youth Tobacco Survey Analysis
Youth Tobacco Survey AnalysisYouth Tobacco Survey Analysis
Youth Tobacco Survey AnalysisRoshik Ganesan
 
Kick Start Startup Guide
Kick Start Startup GuideKick Start Startup Guide
Kick Start Startup GuideRoshik Ganesan
 
Predictive analysis of Income
Predictive analysis of Income Predictive analysis of Income
Predictive analysis of Income Roshik Ganesan
 

More from Roshik Ganesan (6)

Crime Analysis at Chicago
Crime Analysis at ChicagoCrime Analysis at Chicago
Crime Analysis at Chicago
 
Real Time Twitter sentiment analysis
Real Time Twitter sentiment analysis Real Time Twitter sentiment analysis
Real Time Twitter sentiment analysis
 
Performance Analysis NFL 2014
Performance Analysis NFL 2014Performance Analysis NFL 2014
Performance Analysis NFL 2014
 
Youth Tobacco Survey Analysis
Youth Tobacco Survey AnalysisYouth Tobacco Survey Analysis
Youth Tobacco Survey Analysis
 
Kick Start Startup Guide
Kick Start Startup GuideKick Start Startup Guide
Kick Start Startup Guide
 
Predictive analysis of Income
Predictive analysis of Income Predictive analysis of Income
Predictive analysis of Income
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

United States Death Cause Analysis

  • 1. 1 RoshikGanesan CIS 5810 Death Cause Analysis in United States Roshik Ganesan 305641224 California State University Los Angeles
  • 2. 2 RoshikGanesan CIS 5810 A. Data Sets: URL: https://catalog.data.gov/dataset/nchs-potentially-excess-deaths-from-the-five-leading-causes-of- death Twitter Hashtags: #Cancer, #lowerrespiratorydiesease, #HeartDisease, #Stroke #UnintentionalInjury. B. Data Description: This dataset which deals with the data related to potential excess death in the country of United States of Americas has very rich details. It contains data from the year 2005 to 2015 which accounts up to almost a decade of data. The dataset possessing a decade of data hence has more than 2 million records which gives us a good opportunity to drill and get better insights. Further this data contains the reasons for the cause of death. However, it contains data related to the top 5 cause of death in United States of America. This includes Cancer, Chronic lower respiratory disease, Heart disease, Stroke and Unintentional injury. The details of the states in which these deaths have occurred has also been provided which helps us in analyzing the results
  • 3. 3 RoshikGanesan CIS 5810 based on geographic locations in the country. The abbreviations of the states have also been provided for easy understanding and analysis. It provides with the HHS region codes which is a unique code that is assigned by Health and Human Services department to each state. The age group of the victim is provided which would help in analyzing the major cause of death in each age group by state. This dataset further drills more deeper and gives us the details on whether the death had occurred in a metropolitan city or a non-metropolitan city. This would help us analyze the facilities in each state based on these counts. Then this dataset tells us the number of observed deaths in the particular state for metropolitan city, Non-Metropolitan city and also the consolidated version of both. The population column gives us the details of the population in the city at that particular year. This dataset also tells what is the actual expected death for the year which helps us in identifying the excess death. The potential Excess death column subtracts the expected deaths for the year from the observed deaths for the year. This let’s know on what is the number of unexpected deaths that occurs in a particular year for particular region. The data finally provides us with the information on what is percentage of increase in the death rates. C. Data Refinement: Category 1: Missing Value Removal Pre-Refinement: A few states aren’t provided with Non-Metropolitan data, as they cannot be used for analyses they are being removed.
  • 4. 4 RoshikGanesan CIS 5810 Post-Refinement: Category 2: Improper column name Pre-Refinement: The column is improperly named as “Potentially Excess Death”, hence changing it to “Excess Death Observed” Post-Refinement:
  • 5. 5 RoshikGanesan CIS 5810 Category 3: Setting conditions/Filters A) For regular dataset Pre-Refinement: The data in the “locality” field contains Metropolitan, Non-Metropolitan and “ALL” of which “ALL” is the consolidation of both Metropolitan and Non-Metropolitan hence it is excluded. Post-Refinement:
  • 6. 6 RoshikGanesan CIS 5810 B) For Twitter Dataset: As we perform the analysis only inside of United State of America we set the condition from the twitter hashtags to only United States. Category 4: Correcting Misfed Values Pre-Refinement: The name of the state Arizona has been miss-spelt which is being changed. Post-Refinement:
  • 7. 7 RoshikGanesan CIS 5810 Calculated Field: Formula Used: (Observed Death/Population) * 100 This calculated field is done in order to find the percentage of the death that is actually observed in the state for both the metropolitan and non-metropolitan cities. This let’s us know what percentage of the total population have observed death. Data Grouping:
  • 8. 8 RoshikGanesan CIS 5810 This data grouping is done in order to group the states from USA based on the region hey exist. There are 4 major group created which are East Coast, West Coast, Central East and Central West and the corresponding states are grouped into those categories. D. Data Quality: The dataset with the quality of 77 is before the elimination of the empty values. Upon eliminating the empty values and proceeding with further refinement we have achieved a data quality of 80.
  • 9. 9 RoshikGanesan CIS 5810 These screenshot shows the change in the column quality of the data before and after refinement. E. Data Exploration: Question1: How does the number of Age Range Compare by Cause of Death?
  • 10. 10 RoshikGanesan CIS 5810 This visualization shows us which ailment has caused most of the deaths in the entire span of 10 years. It is evident that the top 3 causes are unintentional injury followed by cancer which is in turn followed by heart disease. The cause unintentional injury tops the list with a total of 26400 deaths in the last 10 years. Cancer has caused 26370 deaths in total for 10 years followed by Heart Disease 26346 deaths in the similar span of time. Strokes list to be the 4 in the top 5 with 24414 deaths and then the chronic lower respiratory disease with 24147 deaths over the years. Question 2: What is the trend of Expected death and Observed death over Years? This visualization shows us how the trend of deaths been over the years. What we infer from this visualization is that the difference between the expected deaths and observer deaths has been significantly more in the earlier days and it is being reduced in the subsequent years. In
  • 11. 11 RoshikGanesan CIS 5810 2005 the average observed death is 2277 and that of expected death is 1514 having a difference of 763 which is reduced to an extent when compared to that of 2015 which is 644. The alarming sign from the visualization is that though the difference has gone down showing that a better prediction is made in the current age, the number of death has decreased initially and then been on a steady raise. The average deaths have increased from a record low of 2233 to 2390 in a span of 5 years. Question 3: How do the value of Observed Death compare by Year and Age Range? As we drill further deep to get further insights related to the previous visualization we carry out this analysis to identify which particular age range has caused an increase in the average number of death over the years. This visualization shows the average death for each particular age range. Analyzing the visualization, we find out that the average of the age range “0- 84” has decreased from 2005 to 2010 and later it has been on the rise, the average number of deaths has increased from 4479 to 4760 between the years 2010 to 2015. This difference account to an
  • 12. 12 RoshikGanesan CIS 5810 increase of an average of 281 deaths per year. A similar increase has also been observed with the age range between “0-79” which is from 3581 to 3901 which sums up an alarming average increase of 320 deaths per year. It is to note that for the lower age groups “0-49”, “0-59” the difference is very low with an average difference of about 10 to 20. The learning from this insight is that the age ranges from 49 to 89 has caused the increase in the deaths over the years. Question4: What is the relationship between Population and Observed Deaths by State? This analysis shows us the relationship between the population of a state compared to the observed death. We filter the analyses for the top 10 states for the country. Without any surprise we see that the state of California having recorded with highest population has the highest average of observed deaths. This is followed by Texas which is also a fairly larger state. The 3rd 4th and 5th position in the list are taken by New York, Florida and New Jersey respectively. It’s interesting to note that thought these 3 states do not have a stark difference in the population the
  • 13. 13 RoshikGanesan CIS 5810 observed death rate for Florida is comparatively high compared to those of New York and New Jersey, New Jersey being the lowest among them. Florida having a lower population than Texas has a trivial difference in observed deaths compared to that of Texas. The last 5 states in the list are Illinois, Pennsylvania, Ohio, Michigan and North Carolina. A worthy note amongst this is that Illinois population is higher than that of Pennsylvania and Ohio but still the observed death is lesser than the latter states. This may mean that the heath care facilities in Illinois offer a better service than the other 2 states. Question5: How do the values of Observed Deaths compare by States Based on Region and Locality? In this visualization we analyze the observed death rates based on the regions of states in USA. Based on the regional grouping done earlier we have divided the country based on 4 regions which are East Coast, Central East, Central West and West Coast. The interesting insight from this analysis is that California recorded with the largest Observed death and Population in
  • 14. 14 RoshikGanesan CIS 5810 the previous analysis belonging to the west coast has a lower observed death than the East Coast. East Coast tops the list recording with the highest average of observed death (5924 and 1729) in both metropolitan and non-metropolitan locality. It is also to be noted that the difference in the observed death is also greater in the East Coast compared to that of the West Coast which is 4195 and 2183 respectively. There happen to be a difference of 2012 deaths between the metropolitan and non-metropolitan of the East Coast and the West Coast. This might stand as an evidence to the fact that the health care facilities in the East Coast are little inferior to that of the West Coast. Question6: What is the Breakdown of the number of Author name by Matching Hashtags? This simple pie chart visualization from the Social Media(Twitter) shows us what people have been talking about. Recollecting an analysis from the first visualization which showed us that the most number of death have been caused by unintentional injury it is so surprising to see that there hasn’t been people who have tweeted about unintentional injury. From this analysis we
  • 15. 15 RoshikGanesan CIS 5810 see that the talk of the social media has always been about Cancer which has been in the rise from 2010 which is followed by Stroke and Heart disease. It is to note that there hasn’t been any tweet on the lower respiratory disease either. Question7: How does the number of Author Name compare by Author State and Matching Hashtags? Analyzing further to get an in-depth knowledge on which state people have been very active in tweeting, it is to no surprise that California tops the list of 10 with its dense population stating to be the reason. A closer look into this chart will give us an interesting fact that the analysis from question 5 bolsters this analysis. We have seen that the population and the observed death have been more in the states of California, Texas, New York and Florida. This analysis shows that the major tweet has been from the same states which recorded with the highest observed deaths. An interesting fact to note is that though New Jersey was a part of the Top 5 states recorded with the most observed death but isn’t one of the state that tweets much on deaths.
  • 16. 16 RoshikGanesan CIS 5810 F. Prediction: Fig F.1 Fig F.2
  • 17. 17 RoshikGanesan CIS 5810 Fig F.3 Fig F.4
  • 18. 18 RoshikGanesan CIS 5810 Fig F.5 Fig F.6
  • 19. 19 RoshikGanesan CIS 5810 The observed death predictor is done in order to predict the number of observed death in future. This prediction is done taking into consideration the following columns as input. The columns include Year, State, HHS Region, Population, Expected Deaths, Excess Death Observed, Age Range, Locality and percentage of potential excess death. This prediction holds good only for a 21.7% strength as the most of the values aren’t that closely related to each other. This prediction states us that the observed death is strongly influenced by 6 top relations which are between the following columns Expected Death, Excess Death Observed, Population. From Fig F.4 we see that the column Observed Death is linearly regressed with Expected Death, Excess death observed. The decision tree Fig F.2 states that the observed death is more influenced by Expected Death and 7 other columns. Hence the highest prediction rate is achieved using those columns which is 21.7%. The 7 other columns include HHS Region, Excess Death Observed, Percent Potential excess death, Population, Excess death observed, Age range and Cause of Death. The decision table in Fig F.3 states that the observed death is a continuous target and hence the algorithm used by Watson for prediction is CHAID regression tree (Fig F.6). The second strongest prediction that could be made is using the combination of the columns Excess Death Observed and HHS region which predicts up to 18.6%. This prediction also using a regression algorithm which is Linear Regression(ANOVA). Another 2-field prediction using the fields HHS region and Expected Death together predict the observed death up to 13.9%. More details on this prediction shows (Fig F.5) that observed death being a continuous value the same Linear Regression(ANOVA) algorithm is being used. Thus, we learn that from a total of 12 fields which are used for prediction only 8 of the fields potentially influence in predicting the observed deaths in the states and the remaining 4 fields do not create an impact in predicting target. As the columns doesn’t seem to be highly co-related the prediction strength tends to be low for observed deaths.
  • 20. 20 RoshikGanesan CIS 5810 G. Dashboard: The dashboard shows 4 important visualization of which 2 are analyzed in depth in the exploration part. The first visualization shows us which cause of death has the taken the toll on the most people. Unintentional injury has costed most of the lives followed by cancer and heart disease. The second visualization which is a pie chart explain us which are the top 10 states which have the highest excess death observed. The Top 5 of them include Texas, Florida, Ohio and California. California though has the highest population and the highest observed death rate is ranked 5 in this list which states that California is better in predicting the expected deaths per year. The 3rd visualization, Scatter plot has also been analyzed in depth in question 4 which shows the relationship between the population and the observed deaths based on the states. The Top 5 states here include California, Texas, New York, Florida and New Jersey. The final visualization shows us an analysis on the number of tweets which is separated based on gender. For each hashtag we analyze the number of tweets based on gender. From this analysis we learn that there are more
  • 21. 21 RoshikGanesan CIS 5810 than about cancer from female than male. Whereas for heart disease and stroke there has been more number of tweet from male than female. This may stand an evidence that bolsters a fact that men are more prone to heart disease and strokes where as women are more prone to cancer. Thus, this analysis gives us an in-depth review on the deaths that have occurred in the country of United States of America. We have analyzed the major causes by age range, State and the trends of death over the years. This analysis would server to be a good guidance for health care facilities to tailor their services for each Age Range and Cause of death.