SlideShare a Scribd company logo
1 of 45
Detecting Usage of Illegal Drugs with
Machine Learning
MSBA 2019
Business
Understanding
60% of Employers are not Drug Testing
“For some employers the cost to find a single drug user can be high… a positive test result cannot
determine use or impairment at work and [that there’s a] general lack of evidence that drug testing
has an impact on performance or safety.”
- Michael Frone in his book Alcohol and Illicit Drug Use in the Workforce and Workplace
According to the National Council on Alcoholism and Drug Dependence, Inc. (NCADD), drug abuse costs
employers $81 billion annually.
70 percent of the estimated 14.8
million Americans who use illegal
drugs are employed
Companies who drug test are spending money on
job candidates that are not taking drugs
Accommodations and Food
Services
Management
Professional, Scientific, and
Professional Services
Finance and Insurance
19.1%
Other Companies
who are not drug
testing because
money is being
wasted
12.1%
9.0%
6.5%
Past month illicit drug use among
adults ages 18-64 employed full
time, by job category
2008-2001, SAMHA, Center for Behavioral Health Statistics and Quality
National Survey on Drug use and Health
We aim to drug test people according to their
specific risk
Who is at risk for taking drugs?
Based on personality traits and use of legal substances such as
caffeine, alcohol, and nicotine
What drugs are they are risk for taking?
So that we can drug test accordingly and eliminate the cost
associated with sick days, lack of productivity, and inefficiencies due
to drug using employees
What type of drug test should they be given?
To eliminate over- or under- spending on drug testing
01
02
03
There are different costs associated with different
degrees of drug tests
Hair
$100
90+ Days
All
Drug Use
Urine
$50
~60 Days
Marijuana, Cocaine,
Amphetamines,
Opioids
Drug Use
Blood
$200
~3 Days
All
Drug Impairment
Saliva
$70
~7 Days
Marijuana, Cocaine,
Amphetamines,
Methamphetamines
Drug Impairment
Price
Detection
Window
Drugs Tested
Purpose
Data Understanding
UCI Drug Consumption Data Set
UCI DRUG
CONSUMPTION
Use of Legal Drugs
Alcohol, chocolate, caffeine,
nicotine
Use of Illegal Drugs
Amphetamines, amyl nitrite,
benzodiazepine, cannabis,
cocaine, crack, ecstasy, heroin,
ketamine, legal highs, LSD,
methadone, mushrooms, and
volatile substance abuse and
one fictitious drug (Semeron)
Personality Traits
Demographic Information
Level of education, age, gender,
country of residence, and
ethnicity
NEO-FFI-R (neuroticism,
extraversion, openness to
experience, agreeableness, and
conscientiousness), BIS-11
(impusivity), and ImpSS
(sensation seeking)
Drug Consumption Layout
Over half of the participants
in our data set has taken
drugs in the last year
59%
41%
The Methodology
Big Five Measures are used as Indicators
Drug consumptions has an effect on the Big Five
Scores
Data Prep
Our model building process for a
multi-classification problem
Deployment
Evaluation
Model
Creation
Clean
data
Reclassify
Frequency
Reclassify drug
usage to fit our
needs
Splitting,
resampling,
deleting
insignificant
variables
k-NN, logistic
regression, tree
induction
Determine best
model for each
prediction
Data Cleaning
01
Reclassify Drug Use
“Potential user” if taken in the last year or more recently
“Non use” if taken in the last decade or less recently
02
Resample for Rare Classes by Upsampling
Coke, Ketamine, LSD, Heroine, Meth, VSA
03
Delete Insignificant Data
Delete drugs that created to curb over-reporting of drugs (Semler)
04
Categorized Drugs as “Legal”
Cannabis, Coke, Crack, Ecstasy, Heroin, Ketamine, Legalh, LSD, Meth,
Mushrooms, VSA
05
Categorized Drugs as “Illegal”
Alcohol, Amphetamine, Amyls, Benzos, Caffeine, Nicotine
Model Building
Facets of our Model
Target Variable
“Risk” for each individual
illegal drug
A person is coded as “risk” if
they are predicted to have
taken a drug one year or
less ago and “not risk” if
they are predicted not to
have taken the drug in the
past year
Predictor Variables
Personality Traits
Legal Drug Use
Grid search determines proper
parameters and weights
Algorithms
K-NN
Logistic Regression
Decision Tree
Parameters chosen through
gridsearch, model chosen
through cross validation
We created individual models for each drug based on...
We chose the best model
for that specific illegal drug
based on which model had
the smallest recall and
highest AUC
Our goal is minimize false
negatives
Choosing Best Model
There was too little data to
minimize false predictions
beyond our values
We would prefer if the
false negative rate was
lower
Problems with Model
More data would better
inform the distinctions
between classes
Varied data sets would
eliminate overfitting to the
sample
Potential Model
Improvements
Evaluation
Evaluation Metrics Across All Target Variables
We measured the average evaluation metrics
after creating models that analyzed the drug use
risk for each individual drug...
01
Average AUC Score on all target
variables 82%
02
Average accuracy on all target
variables 94%
03 Average recall on all target variables
71%
04 Average F-Measure
75%
Cannabis
Model
Based on the AUC, the
logistic regression model is
the best predictor of the risk
for cannabis use.
This results in the collection
of 211 cannabis users out of
300 total users. The positive
recall is 70%.
Accuracy: 79%
AUC: 89%
Recall: 79%
Precision: 81%
F-Measure: 79%
AUC
Confusion Matrix Normalized Confusion Matrix
Cannabis Model in Accomodation and Food Services Industry
01
With the model, we can reduce 71% of
the total drug users 71%
02
We assume there are 14% of the food
industry that uses cannabis
14%
03 Total salary paid on the industry
$294BN
04 ROI on the industry $117M
Deployment
Using the model
will reduce costs
associated with
drug using
employees
Non-Drug Testing
Companies
Using the model
will reduce number
of candidates
tested, resulting in
$50-100 saved per
untested candidate
Drug Testing
Companies
Based on the
model for cannabis,
there is a drop in
risk of from 53% (is
taking drug) to 16%
(predicted not
taking and is taking
the drug)
Biggest Risk
It may be unethical
to selectively drug
test people, or to
target people
based on their
personalities or
demographics
Ethical
Considerations The biggest
improvement to the
model would be
increasing amount
and diversity of
data to eventually
lower the false
negative rate
Model Needs
Deployment Considerations
Thank you
Our Services
To eliminate the costs associated
with employing someone who
uses drugs
Getting
People Drug
Tested
Keeping Cost
of Drug Tests
Low
To curb the fear of over spending
on drug testing, specify who they
need to drug test and what that
individual requires
Only invest time and money into
testing the candidates who are “at
risk” for drug use
Pinpoint the people to be tested
Different drugs require different tests, and
those tests come at different prices
Specify the type of drug test each
individual needs
By capturing the job candidates that use drugs
before they are hired, costs associated with
hiring drug users will be diminished
Minimize cost of drug users in
the workplace
Avoiding overtesting candidates that are
not likely to take drugs
Only test “at risk” candidates
Our Services
the total salary pay on accommodation and food services industry is 21903 * 13.4M = 293.5 BN, and
about 19.1% percentage of the salary is paid to workers with marijuana smoking history, which is about
56.1 BN. Given our technique, we could discover up 70% of the addicts in the whole industry, which
involves about 39.2 BN salary payment.
If we apply the marijuana screening check on the industry’s recruiting market, then we could potentially
identify 70% of the 19.5% weed smokers in 0.28M new recruits at almost no cost. If we could replace
these new addict recruits with clean workers, then the industry could save a total salary payment of
0.28M * 19.5% * 70% * 21903 = $117M.
Appendix
1.Data Cleaning
2. FOR Loop
2a. Grid Search
2b. ROC-AUC scores
2c. Selecting the best model
Data cleaning
Our data had labelled every positive drug user among one of six classes from CL0 to CL6
The description for these is as follows:
CL0- Never Consumed the drug
CL1-Consumed more than a decade ago
CL2-Consumed more than a year and less than a decade ago
CL3-Consumed less than a year ago
CL4-Consumed last month
CL5-Consumed last week
CL6- Consumed yesterday
We decided that classes CL1 and CL2 should be treated as non-consumers of the drug because taking
the drug once or twice almost a year ago doesn’t qualify the subject as a regular user
This is why we decided to rename those classes as negative classes along with CL0
This was done as follows:
‘0’ signifies non-user and ‘1’ signifies user
After this we split the dataset into three parts each containing:
1. Personality traits and subject background(personality)
2. Usage data of legal/prescription drugs(legal)
3. Usage data of illegal drugs(illegal)
NOTE: For creating the training set; the personality and legal dataframes along with columns for
choclolate and nicotine consumption were merged to create a single dataframe named ‘legal’
After this we plotted statistics to see examine what the rare classes are and how many subjects
belong to the negative class on all drugs
Each of the columns in the ‘illegal’ dataframe corresponds to a single target variable that our model
predicts
So, our model predicts consumption for 12 illegal drugs each of which
are divided into classes consumed(1) and not-consumed (0)
The rare classes were identified using the following code
Any class with a ratio less than 70/30 was considered as a rare class
FOR Loop
A massive For loop, iterating over each target drug and
containing the following sequence of operations was
constructed:
1. Split the data into train and test sets and resample
2. Perform grid search on 3 classifiers
3. Calculate the ROC-AUC score using best parameters for classifier
4. Choose the best model for the particular drug class
Data splitting
Note: The concatenation at the end is done for the purpose of resampling
Resampling
The minority class for each drug is identified and upsampling is done. We choose upsampling because we have
less data available
Upsampling code
The upsampled training data is combined again:
Grid Search
Grid search was done for each of the three classifiers:
1. Decision Tree Classifier
2. KNN Classifier
3. Logistic regression Classifier
scoring= ’recall’
We have used the recall score to optimise for our parameters. Grid search tries to maximise the recall score. This
will result in a model that gives less number of false negatives
The best parameters generated by grid search are then saved:
ROC-AUC Score
The best parameters for each classifier are used to calculate the AUC score for the best model
Selecting best model for the drug
The model with the best AUC score on test-data was selected as the best model for predicting
that particular illegal drug
The model parameters for each of these best models would be saved and then used to make predictions for each
drug when a new data-instance comes in
Cannabis example
Confusion Matrix
Based on true positives and false negatives:
Original risk: 300/566=56%
Risk after using our model: 89/566=16%
Risk Reduction: (56%-16%)/56%=71%
Fitting Curve
The best classifier on cannabis was logistic regression
END

More Related Content

Similar to Drug Detection using Machine Learning

20130909-best practices work group-presentation.ppt
20130909-best practices work group-presentation.ppt20130909-best practices work group-presentation.ppt
20130909-best practices work group-presentation.pptShirazKhokhar1
 
Market survey on Chemist
Market survey on ChemistMarket survey on Chemist
Market survey on ChemistSandhya Rathi
 
Rx15 tt tues_1230_1_carnevale_2green_3paone-tuazon
Rx15 tt tues_1230_1_carnevale_2green_3paone-tuazonRx15 tt tues_1230_1_carnevale_2green_3paone-tuazon
Rx15 tt tues_1230_1_carnevale_2green_3paone-tuazonOPUNITE
 
Post Marketing Surveillance and Case Study
Post Marketing Surveillance and Case StudyPost Marketing Surveillance and Case Study
Post Marketing Surveillance and Case StudyMuhammadNafees42
 
WP_Life Sciences_Drug Utilization_Alok Anand
WP_Life Sciences_Drug Utilization_Alok AnandWP_Life Sciences_Drug Utilization_Alok Anand
WP_Life Sciences_Drug Utilization_Alok AnandAlok Anand
 
COMMUNITY PHARMACY PRESENTATION FINAL.pptx
COMMUNITY PHARMACY PRESENTATION FINAL.pptxCOMMUNITY PHARMACY PRESENTATION FINAL.pptx
COMMUNITY PHARMACY PRESENTATION FINAL.pptxsardarjarrar
 
Accelerate Healthy Outcomes with Data and AI
Accelerate Healthy Outcomes with Data and AIAccelerate Healthy Outcomes with Data and AI
Accelerate Healthy Outcomes with Data and AICognizant
 
Meta analysis and spontaneous reporting
Meta analysis and spontaneous reportingMeta analysis and spontaneous reporting
Meta analysis and spontaneous reportinghamzakhan643
 
Green pt1 state-pmp
Green pt1 state-pmpGreen pt1 state-pmp
Green pt1 state-pmpWelcome40
 
The economic burden of prescription opioid overdose... 2013.
The economic burden of prescription opioid overdose... 2013.The economic burden of prescription opioid overdose... 2013.
The economic burden of prescription opioid overdose... 2013.Paul Coelho, MD
 
Rx16 tpp wed_330_1_stack_2nelson_3roberts_4skinner
Rx16 tpp wed_330_1_stack_2nelson_3roberts_4skinnerRx16 tpp wed_330_1_stack_2nelson_3roberts_4skinner
Rx16 tpp wed_330_1_stack_2nelson_3roberts_4skinnerOPUNITE
 
Proper prescribing module1_revised v2
Proper prescribing module1_revised v2Proper prescribing module1_revised v2
Proper prescribing module1_revised v2bahlinnm
 
Masters thesis differential effectiveness of substance abuse treatment by j f...
Masters thesis differential effectiveness of substance abuse treatment by j f...Masters thesis differential effectiveness of substance abuse treatment by j f...
Masters thesis differential effectiveness of substance abuse treatment by j f...Joyce Fuller
 

Similar to Drug Detection using Machine Learning (20)

20130909-best practices work group-presentation.ppt
20130909-best practices work group-presentation.ppt20130909-best practices work group-presentation.ppt
20130909-best practices work group-presentation.ppt
 
Market survey on Chemist
Market survey on ChemistMarket survey on Chemist
Market survey on Chemist
 
eco620finalpaper
eco620finalpapereco620finalpaper
eco620finalpaper
 
Rx15 tt tues_1230_1_carnevale_2green_3paone-tuazon
Rx15 tt tues_1230_1_carnevale_2green_3paone-tuazonRx15 tt tues_1230_1_carnevale_2green_3paone-tuazon
Rx15 tt tues_1230_1_carnevale_2green_3paone-tuazon
 
Post Marketing Surveillance and Case Study
Post Marketing Surveillance and Case StudyPost Marketing Surveillance and Case Study
Post Marketing Surveillance and Case Study
 
WP_Life Sciences_Drug Utilization_Alok Anand
WP_Life Sciences_Drug Utilization_Alok AnandWP_Life Sciences_Drug Utilization_Alok Anand
WP_Life Sciences_Drug Utilization_Alok Anand
 
COMMUNITY PHARMACY PRESENTATION FINAL.pptx
COMMUNITY PHARMACY PRESENTATION FINAL.pptxCOMMUNITY PHARMACY PRESENTATION FINAL.pptx
COMMUNITY PHARMACY PRESENTATION FINAL.pptx
 
Twillman preventing rx abuse
Twillman preventing rx abuseTwillman preventing rx abuse
Twillman preventing rx abuse
 
Accelerate Healthy Outcomes with Data and AI
Accelerate Healthy Outcomes with Data and AIAccelerate Healthy Outcomes with Data and AI
Accelerate Healthy Outcomes with Data and AI
 
Meta analysis and spontaneous reporting
Meta analysis and spontaneous reportingMeta analysis and spontaneous reporting
Meta analysis and spontaneous reporting
 
EOP.SOJA.S5
EOP.SOJA.S5EOP.SOJA.S5
EOP.SOJA.S5
 
PharmMD
PharmMDPharmMD
PharmMD
 
Pharmaconomics
PharmaconomicsPharmaconomics
Pharmaconomics
 
Pharmaconomics
PharmaconomicsPharmaconomics
Pharmaconomics
 
Green pt1 state-pmp
Green pt1 state-pmpGreen pt1 state-pmp
Green pt1 state-pmp
 
The economic burden of prescription opioid overdose... 2013.
The economic burden of prescription opioid overdose... 2013.The economic burden of prescription opioid overdose... 2013.
The economic burden of prescription opioid overdose... 2013.
 
High Dose Initiation
High Dose InitiationHigh Dose Initiation
High Dose Initiation
 
Rx16 tpp wed_330_1_stack_2nelson_3roberts_4skinner
Rx16 tpp wed_330_1_stack_2nelson_3roberts_4skinnerRx16 tpp wed_330_1_stack_2nelson_3roberts_4skinner
Rx16 tpp wed_330_1_stack_2nelson_3roberts_4skinner
 
Proper prescribing module1_revised v2
Proper prescribing module1_revised v2Proper prescribing module1_revised v2
Proper prescribing module1_revised v2
 
Masters thesis differential effectiveness of substance abuse treatment by j f...
Masters thesis differential effectiveness of substance abuse treatment by j f...Masters thesis differential effectiveness of substance abuse treatment by j f...
Masters thesis differential effectiveness of substance abuse treatment by j f...
 

More from Shaik Khasim

More from Shaik Khasim (6)

Resume
ResumeResume
Resume
 
Tag fintech
Tag fintech Tag fintech
Tag fintech
 
Wikipedia Data Mining
Wikipedia Data MiningWikipedia Data Mining
Wikipedia Data Mining
 
Analyzing networks of Hip-Hop Artists
Analyzing networks of Hip-Hop ArtistsAnalyzing networks of Hip-Hop Artists
Analyzing networks of Hip-Hop Artists
 
Resume
ResumeResume
Resume
 
Sino indian relations
Sino indian relationsSino indian relations
Sino indian relations
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Drug Detection using Machine Learning

  • 1. Detecting Usage of Illegal Drugs with Machine Learning MSBA 2019
  • 3. 60% of Employers are not Drug Testing “For some employers the cost to find a single drug user can be high… a positive test result cannot determine use or impairment at work and [that there’s a] general lack of evidence that drug testing has an impact on performance or safety.” - Michael Frone in his book Alcohol and Illicit Drug Use in the Workforce and Workplace
  • 4. According to the National Council on Alcoholism and Drug Dependence, Inc. (NCADD), drug abuse costs employers $81 billion annually. 70 percent of the estimated 14.8 million Americans who use illegal drugs are employed
  • 5. Companies who drug test are spending money on job candidates that are not taking drugs Accommodations and Food Services Management Professional, Scientific, and Professional Services Finance and Insurance 19.1% Other Companies who are not drug testing because money is being wasted 12.1% 9.0% 6.5% Past month illicit drug use among adults ages 18-64 employed full time, by job category 2008-2001, SAMHA, Center for Behavioral Health Statistics and Quality National Survey on Drug use and Health
  • 6. We aim to drug test people according to their specific risk Who is at risk for taking drugs? Based on personality traits and use of legal substances such as caffeine, alcohol, and nicotine What drugs are they are risk for taking? So that we can drug test accordingly and eliminate the cost associated with sick days, lack of productivity, and inefficiencies due to drug using employees What type of drug test should they be given? To eliminate over- or under- spending on drug testing 01 02 03
  • 7. There are different costs associated with different degrees of drug tests Hair $100 90+ Days All Drug Use Urine $50 ~60 Days Marijuana, Cocaine, Amphetamines, Opioids Drug Use Blood $200 ~3 Days All Drug Impairment Saliva $70 ~7 Days Marijuana, Cocaine, Amphetamines, Methamphetamines Drug Impairment Price Detection Window Drugs Tested Purpose
  • 9. UCI Drug Consumption Data Set UCI DRUG CONSUMPTION Use of Legal Drugs Alcohol, chocolate, caffeine, nicotine Use of Illegal Drugs Amphetamines, amyl nitrite, benzodiazepine, cannabis, cocaine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, mushrooms, and volatile substance abuse and one fictitious drug (Semeron) Personality Traits Demographic Information Level of education, age, gender, country of residence, and ethnicity NEO-FFI-R (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness), BIS-11 (impusivity), and ImpSS (sensation seeking)
  • 10. Drug Consumption Layout Over half of the participants in our data set has taken drugs in the last year 59% 41%
  • 12. Big Five Measures are used as Indicators
  • 13. Drug consumptions has an effect on the Big Five Scores
  • 15. Our model building process for a multi-classification problem Deployment Evaluation Model Creation Clean data Reclassify Frequency Reclassify drug usage to fit our needs Splitting, resampling, deleting insignificant variables k-NN, logistic regression, tree induction Determine best model for each prediction
  • 16. Data Cleaning 01 Reclassify Drug Use “Potential user” if taken in the last year or more recently “Non use” if taken in the last decade or less recently 02 Resample for Rare Classes by Upsampling Coke, Ketamine, LSD, Heroine, Meth, VSA 03 Delete Insignificant Data Delete drugs that created to curb over-reporting of drugs (Semler) 04 Categorized Drugs as “Legal” Cannabis, Coke, Crack, Ecstasy, Heroin, Ketamine, Legalh, LSD, Meth, Mushrooms, VSA 05 Categorized Drugs as “Illegal” Alcohol, Amphetamine, Amyls, Benzos, Caffeine, Nicotine
  • 18. Facets of our Model Target Variable “Risk” for each individual illegal drug A person is coded as “risk” if they are predicted to have taken a drug one year or less ago and “not risk” if they are predicted not to have taken the drug in the past year Predictor Variables Personality Traits Legal Drug Use Grid search determines proper parameters and weights Algorithms K-NN Logistic Regression Decision Tree Parameters chosen through gridsearch, model chosen through cross validation
  • 19. We created individual models for each drug based on... We chose the best model for that specific illegal drug based on which model had the smallest recall and highest AUC Our goal is minimize false negatives Choosing Best Model There was too little data to minimize false predictions beyond our values We would prefer if the false negative rate was lower Problems with Model More data would better inform the distinctions between classes Varied data sets would eliminate overfitting to the sample Potential Model Improvements
  • 21. Evaluation Metrics Across All Target Variables We measured the average evaluation metrics after creating models that analyzed the drug use risk for each individual drug... 01 Average AUC Score on all target variables 82% 02 Average accuracy on all target variables 94% 03 Average recall on all target variables 71% 04 Average F-Measure 75%
  • 22. Cannabis Model Based on the AUC, the logistic regression model is the best predictor of the risk for cannabis use. This results in the collection of 211 cannabis users out of 300 total users. The positive recall is 70%. Accuracy: 79% AUC: 89% Recall: 79% Precision: 81% F-Measure: 79% AUC Confusion Matrix Normalized Confusion Matrix
  • 23. Cannabis Model in Accomodation and Food Services Industry 01 With the model, we can reduce 71% of the total drug users 71% 02 We assume there are 14% of the food industry that uses cannabis 14% 03 Total salary paid on the industry $294BN 04 ROI on the industry $117M
  • 25. Using the model will reduce costs associated with drug using employees Non-Drug Testing Companies Using the model will reduce number of candidates tested, resulting in $50-100 saved per untested candidate Drug Testing Companies Based on the model for cannabis, there is a drop in risk of from 53% (is taking drug) to 16% (predicted not taking and is taking the drug) Biggest Risk It may be unethical to selectively drug test people, or to target people based on their personalities or demographics Ethical Considerations The biggest improvement to the model would be increasing amount and diversity of data to eventually lower the false negative rate Model Needs Deployment Considerations
  • 27. Our Services To eliminate the costs associated with employing someone who uses drugs Getting People Drug Tested Keeping Cost of Drug Tests Low To curb the fear of over spending on drug testing, specify who they need to drug test and what that individual requires Only invest time and money into testing the candidates who are “at risk” for drug use Pinpoint the people to be tested Different drugs require different tests, and those tests come at different prices Specify the type of drug test each individual needs By capturing the job candidates that use drugs before they are hired, costs associated with hiring drug users will be diminished Minimize cost of drug users in the workplace Avoiding overtesting candidates that are not likely to take drugs Only test “at risk” candidates
  • 28. Our Services the total salary pay on accommodation and food services industry is 21903 * 13.4M = 293.5 BN, and about 19.1% percentage of the salary is paid to workers with marijuana smoking history, which is about 56.1 BN. Given our technique, we could discover up 70% of the addicts in the whole industry, which involves about 39.2 BN salary payment. If we apply the marijuana screening check on the industry’s recruiting market, then we could potentially identify 70% of the 19.5% weed smokers in 0.28M new recruits at almost no cost. If we could replace these new addict recruits with clean workers, then the industry could save a total salary payment of 0.28M * 19.5% * 70% * 21903 = $117M.
  • 29. Appendix 1.Data Cleaning 2. FOR Loop 2a. Grid Search 2b. ROC-AUC scores 2c. Selecting the best model
  • 30. Data cleaning Our data had labelled every positive drug user among one of six classes from CL0 to CL6 The description for these is as follows: CL0- Never Consumed the drug CL1-Consumed more than a decade ago CL2-Consumed more than a year and less than a decade ago CL3-Consumed less than a year ago CL4-Consumed last month CL5-Consumed last week CL6- Consumed yesterday We decided that classes CL1 and CL2 should be treated as non-consumers of the drug because taking the drug once or twice almost a year ago doesn’t qualify the subject as a regular user This is why we decided to rename those classes as negative classes along with CL0
  • 31. This was done as follows: ‘0’ signifies non-user and ‘1’ signifies user
  • 32. After this we split the dataset into three parts each containing: 1. Personality traits and subject background(personality) 2. Usage data of legal/prescription drugs(legal) 3. Usage data of illegal drugs(illegal) NOTE: For creating the training set; the personality and legal dataframes along with columns for choclolate and nicotine consumption were merged to create a single dataframe named ‘legal’
  • 33. After this we plotted statistics to see examine what the rare classes are and how many subjects belong to the negative class on all drugs Each of the columns in the ‘illegal’ dataframe corresponds to a single target variable that our model predicts So, our model predicts consumption for 12 illegal drugs each of which are divided into classes consumed(1) and not-consumed (0)
  • 34. The rare classes were identified using the following code Any class with a ratio less than 70/30 was considered as a rare class
  • 35. FOR Loop A massive For loop, iterating over each target drug and containing the following sequence of operations was constructed: 1. Split the data into train and test sets and resample 2. Perform grid search on 3 classifiers 3. Calculate the ROC-AUC score using best parameters for classifier 4. Choose the best model for the particular drug class
  • 36. Data splitting Note: The concatenation at the end is done for the purpose of resampling
  • 37. Resampling The minority class for each drug is identified and upsampling is done. We choose upsampling because we have less data available
  • 38. Upsampling code The upsampled training data is combined again:
  • 39. Grid Search Grid search was done for each of the three classifiers: 1. Decision Tree Classifier 2. KNN Classifier 3. Logistic regression Classifier
  • 40. scoring= ’recall’ We have used the recall score to optimise for our parameters. Grid search tries to maximise the recall score. This will result in a model that gives less number of false negatives The best parameters generated by grid search are then saved:
  • 41. ROC-AUC Score The best parameters for each classifier are used to calculate the AUC score for the best model
  • 42. Selecting best model for the drug The model with the best AUC score on test-data was selected as the best model for predicting that particular illegal drug The model parameters for each of these best models would be saved and then used to make predictions for each drug when a new data-instance comes in
  • 43. Cannabis example Confusion Matrix Based on true positives and false negatives: Original risk: 300/566=56% Risk after using our model: 89/566=16% Risk Reduction: (56%-16%)/56%=71%
  • 44. Fitting Curve The best classifier on cannabis was logistic regression
  • 45. END