SlideShare a Scribd company logo
1 of 16
IS6030: Data Management-Individual Project
Topic: Drug related deaths in the state of Conncecticut
A. Data Description:
Thisdatasethas the listingof eachaccidental deathassociatedwithdrugoverdosein Connecticutfrom
2012 to June 2016. Inthisdatasetcolumnsfrom‘Heroin’to‘AnyOpioid’have valuesYor Null.That
meansitstateswhetherthe particulardrugwas the cause of deathor not.The deathcan be causedby
one or more drugs. Data was derivedfromaninvestigationbythe Office of the Chief Medical Examiner
whichincludesthe toxicityreport,deathcertificate,aswell asa scene investigation. Igotthisdata from
catlog.data.govwebsite andfollowingisthe linkforthe same:
https://catalog.data.gov/dataset/accidental-drug-related-deaths-january-2012-sept-2015
Followingtabledescribesthe datatypesstoredineachof the columns and theirprecisionandlength:
Table: 1
Afterimportingthe datasetinSQLserver,Imade sure that all data typesare appropriate anddata is
importedcorrectly.(Code forthe same isincludedincode file).Inthe nextstepIdidsome basicchecks
on importantcolumnslike findingoutdistinct values,numberof null recordsandmaximum, minimum
and average valuesforthe numerical variablesetc.:
Sex:
Race:
Death cause:
Death locations:
Thisdata can be normalizedusing‘Case Number’asthe primarykey(Thiscolumnwasremovedfrom
datasetas it wasnot necessaryforanalysis).Andthe othercolumnslike age,race,‘ImmeddiatecauseA’
etc.can be put intodifferenttable withforeignkeyinthe maintable.
B. Data Issues:
There were manydata issuesthatneededtobe resolvedbeforestartingthe analysisonthe data:
1. Null Values:There were some null valuesinsome columnsof the dataset.Asthe numberwas
not verylarge (max:7) these recordswere removedfromthe dataset. Thiswasdone inexcel.
2. Date Format: While importingdatasetinTableu,Ifoundthatdate format is not consistent.(Idid
not face thisissue inwhile importingdatainSQL).To solve thisIcreatedtwomore columnsfor
yearand month.(Before doingthissome yearvaluesweremissingfromthe visualizationdue to
improperformat)
3. Data structure: With the currentdata structure it wasnot possible togetrequiredvisualizations
inTableu.Data was restructuredinexcel togetthe same.
4. Inconsistencyin time frame: Inorder to compare the data across the years,average death
count permonthwas usedas foryear 2016, data of onlysix monthsisavailable.
Most of these operationsweredone inusingExcel.Alsofunctionslike‘SUMIFS’,‘CONCAT’,
‘RIGHT’, ‘MID’,‘YEAR()’,‘MONTH()’etc.were used.
C. Data Analysis in SQL:
Total Numberof rows:
Total numberof columns:
Numberof deathsbyyear: (countfor2016 will be lessasit has onlysix monthsdata):
Numberof deathsbySex:
Numberof deathsbyage bracket:
Max, minand average valuesforage:
Numberof deathsbyRace:
D. Primary Data Analysis using Tableau :
Average deathcountpermonthis increasingwithalmostconstantrate overpast5 years:
Fig. 1
From Figures2,3 and 4, we can see thatthoughthe numberof average deathspermonthis maximum
for White people,areaswith maximumnumberof deaths (countof all deathsfrom2012-2016) are
mainly concentratednearthe locationswherepopulationof Black,HispanicandLatinopeople isdense:
Fig. 2
Fig. 3
Fig. 4
For all the races exceptBlackHeroinwasthe leading cause of death,butincase of blacks Cocaine was
the leadingcause:
Fig. 5
Numberof average deathspermonth ismaximumforage group of 40-49 and inall age group20-60 is
the primary victim:
Fig. 6
Heroinisthe main cause of deathsfollowedby cocaine:
Fig. 7
Comparedtoall otherdrugsFentanyl hasthe highestincrease inthe deathsoverthe years.Aswe can
see fromthe figure below,deathcount because of all otherdrugsincreasessteadily,butthere isajump
inthe numberof deathsbecause of Fentanyl (speciallyin2016):
Fig. 8
From the following plot, we canclearlysee thatareaswithmaximumnumberof deathsare
concentratedexactlynearthe locationswherepercapitaincome isquite low:
Fig. 9
Followingisthe graphof Age vstotal numberof deathsfromyear2012-2016. From the thisgraph we
can see that there isa strong positive correlationbetweenage andnumberof deathsinthe lower
spectrumof age and a strongnegative correlationinhigherspectrumof age.
Fig. 10
E. Correlation and Regression Analysis using R-studio:
Let’scheck the correlationandrunthe regressionanalysisonthe same:
R-studiowasusedtorun the statistical analysisonthe data.
a. CorrelationAnalysis:
1. Followingisthe correlationbetweenage (lowerage group15-25) and the average numberof
deathsperyear (i.e.Total numberof deaths/4.5,astotal numberof yearsis 4.5):
0.9812866
2. Followingisthe correlationbetweenage (Middle age group26-44) and the average numberof
deathsperyear:
0.1022106
3. Followingisthe correlationbetweenage (higherage group45-80) and the average numberof
deathsperyear:
-0.955015
b. RegressionAnalysis:
As we can see fromabove valuesthere ishighcorrelationin lowerandhigherrange of agesand
the average numberof deathsperyear. Now we will runthe regressionanalysis (UsingR-studio)
on these age groups:
1. Regressionanalysison Lower Age group (15-25):
Followingisthe plotof the lowerage groupvsaverage numberof deathsperyear:
Fig. 11
Let’srun the regressionmodel onthe data:
From the above outputwe can see that ‘P’valuesforbothage and interceptare lessthan0.05. This
meansthat ‘Beta’coefficientforage issignificantlydifferentfrom 0 andage issignificantfactorinthe
regressionmodel. Asthisissimple linearregressionmodel we getthe same Pvaluesfort-testandF-
test.
Alsothe valuesforR-square andadjustedR-square are quite highi.e.0.9629 and 0.9583 respectively.
So, the final model thatwe generate fromabove analysis:
Average number of deathsper year=1.7576*(Age) - 28.1160
Let ustake a lookat the plotof residualsvsfittedvalues:
s
Fig. 12
As we can see fromthe above plotthere isno specificpatterninthe residuals,theyare randomly
scattered. Thismeansthatwe have capturedmost of the signal fromthe data indeterministicpartof
our model andremainingisjustarandom noise.
Now,let’scheck the normalityof the residuals usingthe q-qplot.Thisisourassumptionandwe needto
validate that:
Fig. 13
We can clearlysee thatabove q-qplotisprettymuch a straightline passingthrough0 whichvalidates
our assumptionof normalityof errors withmean0 (asline ispassingthrough0).
2. Regressionanalysison Higher Age group (45-80):
Followingisthe plotof the higherage groupvsaverage numberof deathsperyear:
Fig. 14
Now,let’srunthe regressionmodel onthe data:
From the above outputwe can see that ‘P’valuesforbothage and interceptare lessthan0.05 for
higherage groupas well.Thismeansthat‘Beta’coefficientforage issignificantlydifferentfrom0and
age issignificantfactorinthe regressionmodel. As thisissimple linearregressionmodelwe getthe
same P valuesfort-testandF-test.
Alsothe valuesforR-square andadjustedR-square are quite highi.e.0.9121 and 0.9089 respectively.
So,the final model thatwe generate fromabove analysis:
Average number of deathsper year=(-0.91072)*(Age) +65.06579
Let ustake a lookat the plotof residualsvsfittedvalues:
Fig. 15
As we can see fromthe above plot there isa straightline of residualsinthe lowerregionof fittedvalues,
but onoverall level itlooksquite scattered. Thismeansthatwe have capturedmostof the signal from
the data (specificallyinhigherfittedvalue spectrum) indeterministicpartof our model andremainingis
justa randomnoise.
Now,let’scheckthe normalityof the residuals usingthe q-qplot.Thisisourassumptionandwe needto
validate that:
Fig. 16
We can see fromabove plotthat apart fromthe curvature at the (-1) quantile,ourplotismostlya
straightline.
F. Key Findings and Insights:
1. The areas withmaximumnumberof deathsare concentratedexactlynearthe locationswhere
percapita income isquite low
2. The areas withmaximumnumberof deathsare mainlyconcentratednearthe locationswhere
populationof Black,HispanicandLatinopeople is dense thoughtthe numberof deathsbydrug
are maximumforwhite people
3. For all the races exceptBlack, Heroinwasthe leadingcause of death,butincase of blacksit was
Cocaine
4. ThoughHeroinisthe maincause,Fentanyl hasthe highest rate of increase inthe deaths count
overthe years.
5. Numberof average deathspermonthismaximumforage group of 40-49
6. We couldsee the peaksinthe deathcount aroundage 30 andage 50 and there isa dipin the
deathcount aroundage 40.
G. Suggestions:
1. As we clearly see thatage group 20-60, whichisthe backbone generationof anynation, isthe
primaryvictimof the drugs and thatis mainlydue tolow income whichinturnI thinkisdue to
lack of education(whichcanprovide themwithdecentjobs).Thisisthe bigconcern as number
isincreasingeveryyearandgovernmentneedstoaddressthisissue andplantoprovide basic
educationtothese people whichcanmake thememployable.
2. As Fentanyl hasthe highestgrowthinthe drugcount,it isnot enoughtocurb the supplyof just
heroinorcocaine
H. Challenges:
1. Many data issuesneededtobe resolvedwhile plottingdatainTableau.Learnedvarious
functionsinexcel toovercome them.
2. As there were toomanyvariablesinthe data,itwas difficulttocarryout the structured
exploratorydataanalysistogainmeaningfulinsights.Example,variableslike age,race,typesof
drugsetc. formnumerousnumberof combinationsonwhichthe trendof deathcountcouldbe
analyzed.

More Related Content

Similar to Drug overdose deaths in CT analyzed

09 rencontres biomédicale LIR Xavier Jouven
09 rencontres biomédicale LIR Xavier Jouven09 rencontres biomédicale LIR Xavier Jouven
09 rencontres biomédicale LIR Xavier JouvenAssociation LIR
 
Standardization of rates
Standardization of ratesStandardization of rates
Standardization of ratesHalyna Lugova
 
Practical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptxPractical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptxKanishka478113
 
COVID-19 Update (Summary): September 28, 2020
COVID-19 Update (Summary): September 28, 2020 COVID-19 Update (Summary): September 28, 2020
COVID-19 Update (Summary): September 28, 2020 Steve Shafer
 
COVID-19 Update (Summary): October 3, 2020
COVID-19 Update (Summary): October 3, 2020 COVID-19 Update (Summary): October 3, 2020
COVID-19 Update (Summary): October 3, 2020 Steve Shafer
 
COVID-19 Update (Summary): October 9, 2020
COVID-19 Update (Summary): October 9, 2020COVID-19 Update (Summary): October 9, 2020
COVID-19 Update (Summary): October 9, 2020Steve Shafer
 
An empirical estimate of the infection fatality rate of COVID-19 from the fir...
An empirical estimate of the infection fatality rate of COVID-19 from the fir...An empirical estimate of the infection fatality rate of COVID-19 from the fir...
An empirical estimate of the infection fatality rate of COVID-19 from the fir...Guy Boulianne
 
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docxPRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docxsleeperharwell
 
Presentation slides abdikarim
Presentation slides abdikarimPresentation slides abdikarim
Presentation slides abdikarimAbdikarim Omar
 
COVID-19 Update (Summary): October 6, 2020
COVID-19 Update (Summary): October 6, 2020 COVID-19 Update (Summary): October 6, 2020
COVID-19 Update (Summary): October 6, 2020 Steve Shafer
 
COVID-19 Update (Summary): October 10, 2020
COVID-19 Update (Summary): October 10, 2020COVID-19 Update (Summary): October 10, 2020
COVID-19 Update (Summary): October 10, 2020Steve Shafer
 
Stroke mortality and risk factors
Stroke mortality and risk factorsStroke mortality and risk factors
Stroke mortality and risk factorssuchiey
 
COVID-19 Update (Summary): September 27, 2020
COVID-19 Update (Summary): September 27, 2020 COVID-19 Update (Summary): September 27, 2020
COVID-19 Update (Summary): September 27, 2020 Steve Shafer
 
COVID-19 Update (Summary): October 5, 2020
COVID-19 Update (Summary): October 5, 2020 COVID-19 Update (Summary): October 5, 2020
COVID-19 Update (Summary): October 5, 2020 Steve Shafer
 
[M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization [M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization Andrea Rubio
 
COVID-19 Update (Summary): October 8, 2020
COVID-19 Update (Summary): October 8, 2020 COVID-19 Update (Summary): October 8, 2020
COVID-19 Update (Summary): October 8, 2020 Steve Shafer
 

Similar to Drug overdose deaths in CT analyzed (20)

SPSS Assignment Help
SPSS  Assignment  HelpSPSS  Assignment  Help
SPSS Assignment Help
 
Fire statistics 2016
Fire statistics 2016 Fire statistics 2016
Fire statistics 2016
 
09 rencontres biomédicale LIR Xavier Jouven
09 rencontres biomédicale LIR Xavier Jouven09 rencontres biomédicale LIR Xavier Jouven
09 rencontres biomédicale LIR Xavier Jouven
 
Standardization of rates
Standardization of ratesStandardization of rates
Standardization of rates
 
C05841121
C05841121C05841121
C05841121
 
Practical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptxPractical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptx
 
Rates & Ratios - Epidemiology
Rates & Ratios - Epidemiology Rates & Ratios - Epidemiology
Rates & Ratios - Epidemiology
 
COVID-19 Update (Summary): September 28, 2020
COVID-19 Update (Summary): September 28, 2020 COVID-19 Update (Summary): September 28, 2020
COVID-19 Update (Summary): September 28, 2020
 
COVID-19 Update (Summary): October 3, 2020
COVID-19 Update (Summary): October 3, 2020 COVID-19 Update (Summary): October 3, 2020
COVID-19 Update (Summary): October 3, 2020
 
COVID-19 Update (Summary): October 9, 2020
COVID-19 Update (Summary): October 9, 2020COVID-19 Update (Summary): October 9, 2020
COVID-19 Update (Summary): October 9, 2020
 
An empirical estimate of the infection fatality rate of COVID-19 from the fir...
An empirical estimate of the infection fatality rate of COVID-19 from the fir...An empirical estimate of the infection fatality rate of COVID-19 from the fir...
An empirical estimate of the infection fatality rate of COVID-19 from the fir...
 
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docxPRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
 
Presentation slides abdikarim
Presentation slides abdikarimPresentation slides abdikarim
Presentation slides abdikarim
 
COVID-19 Update (Summary): October 6, 2020
COVID-19 Update (Summary): October 6, 2020 COVID-19 Update (Summary): October 6, 2020
COVID-19 Update (Summary): October 6, 2020
 
COVID-19 Update (Summary): October 10, 2020
COVID-19 Update (Summary): October 10, 2020COVID-19 Update (Summary): October 10, 2020
COVID-19 Update (Summary): October 10, 2020
 
Stroke mortality and risk factors
Stroke mortality and risk factorsStroke mortality and risk factors
Stroke mortality and risk factors
 
COVID-19 Update (Summary): September 27, 2020
COVID-19 Update (Summary): September 27, 2020 COVID-19 Update (Summary): September 27, 2020
COVID-19 Update (Summary): September 27, 2020
 
COVID-19 Update (Summary): October 5, 2020
COVID-19 Update (Summary): October 5, 2020 COVID-19 Update (Summary): October 5, 2020
COVID-19 Update (Summary): October 5, 2020
 
[M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization [M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization
 
COVID-19 Update (Summary): October 8, 2020
COVID-19 Update (Summary): October 8, 2020 COVID-19 Update (Summary): October 8, 2020
COVID-19 Update (Summary): October 8, 2020
 

Drug overdose deaths in CT analyzed

  • 1. IS6030: Data Management-Individual Project Topic: Drug related deaths in the state of Conncecticut A. Data Description: Thisdatasethas the listingof eachaccidental deathassociatedwithdrugoverdosein Connecticutfrom 2012 to June 2016. Inthisdatasetcolumnsfrom‘Heroin’to‘AnyOpioid’have valuesYor Null.That meansitstateswhetherthe particulardrugwas the cause of deathor not.The deathcan be causedby one or more drugs. Data was derivedfromaninvestigationbythe Office of the Chief Medical Examiner whichincludesthe toxicityreport,deathcertificate,aswell asa scene investigation. Igotthisdata from catlog.data.govwebsite andfollowingisthe linkforthe same: https://catalog.data.gov/dataset/accidental-drug-related-deaths-january-2012-sept-2015 Followingtabledescribesthe datatypesstoredineachof the columns and theirprecisionandlength: Table: 1
  • 2. Afterimportingthe datasetinSQLserver,Imade sure that all data typesare appropriate anddata is importedcorrectly.(Code forthe same isincludedincode file).Inthe nextstepIdidsome basicchecks on importantcolumnslike findingoutdistinct values,numberof null recordsandmaximum, minimum and average valuesforthe numerical variablesetc.: Sex: Race: Death cause: Death locations: Thisdata can be normalizedusing‘Case Number’asthe primarykey(Thiscolumnwasremovedfrom datasetas it wasnot necessaryforanalysis).Andthe othercolumnslike age,race,‘ImmeddiatecauseA’ etc.can be put intodifferenttable withforeignkeyinthe maintable. B. Data Issues: There were manydata issuesthatneededtobe resolvedbeforestartingthe analysisonthe data: 1. Null Values:There were some null valuesinsome columnsof the dataset.Asthe numberwas not verylarge (max:7) these recordswere removedfromthe dataset. Thiswasdone inexcel. 2. Date Format: While importingdatasetinTableu,Ifoundthatdate format is not consistent.(Idid not face thisissue inwhile importingdatainSQL).To solve thisIcreatedtwomore columnsfor yearand month.(Before doingthissome yearvaluesweremissingfromthe visualizationdue to improperformat) 3. Data structure: With the currentdata structure it wasnot possible togetrequiredvisualizations inTableu.Data was restructuredinexcel togetthe same. 4. Inconsistencyin time frame: Inorder to compare the data across the years,average death count permonthwas usedas foryear 2016, data of onlysix monthsisavailable. Most of these operationsweredone inusingExcel.Alsofunctionslike‘SUMIFS’,‘CONCAT’, ‘RIGHT’, ‘MID’,‘YEAR()’,‘MONTH()’etc.were used.
  • 3. C. Data Analysis in SQL: Total Numberof rows: Total numberof columns: Numberof deathsbyyear: (countfor2016 will be lessasit has onlysix monthsdata): Numberof deathsbySex: Numberof deathsbyage bracket: Max, minand average valuesforage:
  • 4. Numberof deathsbyRace: D. Primary Data Analysis using Tableau : Average deathcountpermonthis increasingwithalmostconstantrate overpast5 years: Fig. 1
  • 5. From Figures2,3 and 4, we can see thatthoughthe numberof average deathspermonthis maximum for White people,areaswith maximumnumberof deaths (countof all deathsfrom2012-2016) are mainly concentratednearthe locationswherepopulationof Black,HispanicandLatinopeople isdense: Fig. 2 Fig. 3
  • 6. Fig. 4 For all the races exceptBlackHeroinwasthe leading cause of death,butincase of blacks Cocaine was the leadingcause: Fig. 5
  • 7. Numberof average deathspermonth ismaximumforage group of 40-49 and inall age group20-60 is the primary victim: Fig. 6 Heroinisthe main cause of deathsfollowedby cocaine: Fig. 7
  • 8. Comparedtoall otherdrugsFentanyl hasthe highestincrease inthe deathsoverthe years.Aswe can see fromthe figure below,deathcount because of all otherdrugsincreasessteadily,butthere isajump inthe numberof deathsbecause of Fentanyl (speciallyin2016): Fig. 8 From the following plot, we canclearlysee thatareaswithmaximumnumberof deathsare concentratedexactlynearthe locationswherepercapitaincome isquite low: Fig. 9
  • 9. Followingisthe graphof Age vstotal numberof deathsfromyear2012-2016. From the thisgraph we can see that there isa strong positive correlationbetweenage andnumberof deathsinthe lower spectrumof age and a strongnegative correlationinhigherspectrumof age. Fig. 10 E. Correlation and Regression Analysis using R-studio: Let’scheck the correlationandrunthe regressionanalysisonthe same: R-studiowasusedtorun the statistical analysisonthe data. a. CorrelationAnalysis: 1. Followingisthe correlationbetweenage (lowerage group15-25) and the average numberof deathsperyear (i.e.Total numberof deaths/4.5,astotal numberof yearsis 4.5): 0.9812866 2. Followingisthe correlationbetweenage (Middle age group26-44) and the average numberof deathsperyear: 0.1022106 3. Followingisthe correlationbetweenage (higherage group45-80) and the average numberof deathsperyear: -0.955015
  • 10. b. RegressionAnalysis: As we can see fromabove valuesthere ishighcorrelationin lowerandhigherrange of agesand the average numberof deathsperyear. Now we will runthe regressionanalysis (UsingR-studio) on these age groups: 1. Regressionanalysison Lower Age group (15-25): Followingisthe plotof the lowerage groupvsaverage numberof deathsperyear: Fig. 11 Let’srun the regressionmodel onthe data:
  • 11. From the above outputwe can see that ‘P’valuesforbothage and interceptare lessthan0.05. This meansthat ‘Beta’coefficientforage issignificantlydifferentfrom 0 andage issignificantfactorinthe regressionmodel. Asthisissimple linearregressionmodel we getthe same Pvaluesfort-testandF- test. Alsothe valuesforR-square andadjustedR-square are quite highi.e.0.9629 and 0.9583 respectively. So, the final model thatwe generate fromabove analysis: Average number of deathsper year=1.7576*(Age) - 28.1160 Let ustake a lookat the plotof residualsvsfittedvalues: s Fig. 12 As we can see fromthe above plotthere isno specificpatterninthe residuals,theyare randomly scattered. Thismeansthatwe have capturedmost of the signal fromthe data indeterministicpartof our model andremainingisjustarandom noise.
  • 12. Now,let’scheck the normalityof the residuals usingthe q-qplot.Thisisourassumptionandwe needto validate that: Fig. 13 We can clearlysee thatabove q-qplotisprettymuch a straightline passingthrough0 whichvalidates our assumptionof normalityof errors withmean0 (asline ispassingthrough0).
  • 13. 2. Regressionanalysison Higher Age group (45-80): Followingisthe plotof the higherage groupvsaverage numberof deathsperyear: Fig. 14 Now,let’srunthe regressionmodel onthe data:
  • 14. From the above outputwe can see that ‘P’valuesforbothage and interceptare lessthan0.05 for higherage groupas well.Thismeansthat‘Beta’coefficientforage issignificantlydifferentfrom0and age issignificantfactorinthe regressionmodel. As thisissimple linearregressionmodelwe getthe same P valuesfort-testandF-test. Alsothe valuesforR-square andadjustedR-square are quite highi.e.0.9121 and 0.9089 respectively. So,the final model thatwe generate fromabove analysis: Average number of deathsper year=(-0.91072)*(Age) +65.06579 Let ustake a lookat the plotof residualsvsfittedvalues: Fig. 15 As we can see fromthe above plot there isa straightline of residualsinthe lowerregionof fittedvalues, but onoverall level itlooksquite scattered. Thismeansthatwe have capturedmostof the signal from the data (specificallyinhigherfittedvalue spectrum) indeterministicpartof our model andremainingis justa randomnoise.
  • 15. Now,let’scheckthe normalityof the residuals usingthe q-qplot.Thisisourassumptionandwe needto validate that: Fig. 16 We can see fromabove plotthat apart fromthe curvature at the (-1) quantile,ourplotismostlya straightline.
  • 16. F. Key Findings and Insights: 1. The areas withmaximumnumberof deathsare concentratedexactlynearthe locationswhere percapita income isquite low 2. The areas withmaximumnumberof deathsare mainlyconcentratednearthe locationswhere populationof Black,HispanicandLatinopeople is dense thoughtthe numberof deathsbydrug are maximumforwhite people 3. For all the races exceptBlack, Heroinwasthe leadingcause of death,butincase of blacksit was Cocaine 4. ThoughHeroinisthe maincause,Fentanyl hasthe highest rate of increase inthe deaths count overthe years. 5. Numberof average deathspermonthismaximumforage group of 40-49 6. We couldsee the peaksinthe deathcount aroundage 30 andage 50 and there isa dipin the deathcount aroundage 40. G. Suggestions: 1. As we clearly see thatage group 20-60, whichisthe backbone generationof anynation, isthe primaryvictimof the drugs and thatis mainlydue tolow income whichinturnI thinkisdue to lack of education(whichcanprovide themwithdecentjobs).Thisisthe bigconcern as number isincreasingeveryyearandgovernmentneedstoaddressthisissue andplantoprovide basic educationtothese people whichcanmake thememployable. 2. As Fentanyl hasthe highestgrowthinthe drugcount,it isnot enoughtocurb the supplyof just heroinorcocaine H. Challenges: 1. Many data issuesneededtobe resolvedwhile plottingdatainTableau.Learnedvarious functionsinexcel toovercome them. 2. As there were toomanyvariablesinthe data,itwas difficulttocarryout the structured exploratorydataanalysistogainmeaningfulinsights.Example,variableslike age,race,typesof drugsetc. formnumerousnumberof combinationsonwhichthe trendof deathcountcouldbe analyzed.