SlideShare a Scribd company logo
1 of 28
Download to read offline
www.datakind.org | @DataKindSF
Advancing Alcohol Behavior Change with Data Science
Jaya Pokuri
Jaya Pokuri
Data Ambassador @ DataKind SF
Introduction
20,000+
community members worldwide in 98 countries,
representing the largest global data science for social good
network
5 global chapters
250+ events around the world
Volunteer sign-ups from 174 countries
300+ projects completed, providing the most
comprehensive library of data science for social good
projects
150+ organizations helped
200,000+ hours donated
$35M+ pro bono services delivered
DataKind
by the Numbers
From evening or weekend events to
multi-month projects, our programs
are designed to provide social
organizations with the pro bono data
science innovation team they need to
tackle critical humanitarian issues.
4
DataKind’s Global Network
5
DataKind’s strategy is grounded in four key principles
Catalyze a thriving Data Science for Good ecosystem through partnerships: A healthy
ecosystem requires forming the right “we” to deliver on a range of data science needs – while we will
continue to deliver data science projects, we will also use partnerships wherever possible to create
an accessible ecosystem of data science resources.
Be the connective tissue between the social sector and private sector data science resources:
We will elevate attention to the Data Science for Good field, with the goal of building nonprofit
demand, data science talent / resources, and philanthropic investment for all.
Identify the brightest opportunities for data science: We are known for our ability to scope how to
apply data science to social sector organizations, ensuring that data science solutions are designed
thoughtfully, and implemented ethically, and used effectively.
Build data science projects that advance the field: DataKind will work directly with nonprofits in
targeted issue areas where there are unmet data science needs that stretch beyond the individual
organization and a coalition of interested and committed partners.
6
How data science might help nonprofits?
Expand impact by anticipating future needs
Scale services by providing personalized support
Save staff time by automating processes
Better understand the communities served
Better target efforts and find those in need of services
Use open/external data sources to inform decision making
DataCorps Process
The team wrangles
the data and
identifies external
data sources to
leverage.
We explore what’s
possible, then staff an
expert volunteer
team.
2. Data
Discovery
1. Problem
Exploration
The team co-creates
solutions with the
partner, while
DataKind oversee
their work.
Based on feedback,
the team makes
adjustments to meet
the partner’s needs.
The team delivers the
final version and
documentation so the
partner can increase
its impact.
3. Prototyping 4. Refinement 5. Solution
HSM is an Australian-based
non-profit focused on helping people
form healthier relationships with
alcohol.
Created the Daybreak app which is a
professional and community support
social network.
About Hello Sunday Morning
Daybreak
Members select a
mood they’re feeling.
Share how they’re feeling
by making a post.
Comment, like, and save
other members’ posts.
Set goals and
reminders.
CHALLENGE
Challenge
Moderators read every post and flag those which are potentially problematic-
either those that indicate potentially harmful behavior or those in breach
of community guidelines.
Moderators will either provide support or escalate members to a clinical
team.
HSM is facing the problem of growing memberships: the task of
moderators is becoming unmanageable with hundreds of thousands of
community activity (posts, comments, reactions) to review and flag if
necessary.
Ask
Moderators need assistance from an automated approach to
develop an efficient and scalable solution to flag and categorize
the risky or breach activity.
APPROACH
Data Provided
HSM provided historical (Jan-Sept 2019) , labeled post data
containing raw text (with PII removed), timestamp of post, and
risk/breach category.
Large amount of data but significant class imbalance (< 0.1% of
the posts were risky/breach)
Objective 1: Identify Risky Posts
A model was built to predict the probability of a post being risky.
Steps:
1) Remove weekend posts from the dataset. 
2) Calculate lexicon-based sentiment score.
3) Clean text data.
4) Tokenize posts. 
5) Create more features. 
6) Train model.
Assessment of model
Threshold =
0.1
Threshold =
0.3
Threshold =
0.5
Threshold =
0.7
Recall 0.8 0.5 0.3 0.2
Precision 0.8 0.9 0.9 0.9
F1
score 0.8 0.7 0.5 0.4
Table 1: Model performance on test data at varying probability thresholds
The model was tested on a sample of post data unseen by the
model (Nov 2019 – Jan 2020)
HSM looking to use the threshold of 0.1 as to minimize the number of
false negatives
Objective 1: Identify Risky Posts
A keyword detector were built to predict indicate potentially risky
words/phrases in a post.
• Suicide
• Domestic Violence
• DUI
• Risky Behavior
• Detox Withdrawal
• Mental Health
• Self Harm
• Other
Objective 2: Identify Breach Posts
Pre-trained models were used to detect posts with PII or profanity.
Steps:
1) Detecting PII by utilizing pre-trained/off-the-shelf models for named
entity recognition and regex-based detection. Detects text related to
people, organizations, locations, dates, times, email addresses, phone
numbers, and street addresses.
2) Detecting Profanity by using an off-the-shelf regex-based model.
Implementation
Deploying in Production
A REST API was built in Flask to enable usage of the solutions created.
• Pre-processing Data
• Feature Engineering
• Generating Model
Predictions/Outputs
API HSM
(Daybreak)
Request
Response
Deploying in Production
The API has three endpoints that HSM can utilize.
Outputs of the endpoints:
1. Probability a post is risky on a scale of 0-1.
2. Risk keywords in the post.
3. PII categories and words in the post.
Examples of API Output: Endpoint #1
Probability Risk: Probability a post is risky on a scale of 0-1.
API
{
share_content: “I feel like things are
starting to turn around for me.”,
created_at: “2020-01-01 09:17:42”
}
Request Response
{'Prediction Risk': 0.01}
Examples of API Output: Endpoint #2
Risk Keywords: Risk keywords in the post.
API
{
share_content: “I drank too much last
night and am now in a bad place.”,
created_at: “2019-11-24 08:18:31”
}
{
'DUI': [‘drank too much’],
'Mental Health': [‘bad place’]
}
Request Response
Examples of API Output: Endpoint #3
PII/Profanity: PII breaches in the post.
API
{
share_content: “Hey guys my name is
John, anyone in Sydney want to
meet up at Hyde Park this Saturday
at noon?”,
created_at: “2019-10-18 02:11:21”
}
{
'People': ['John’],
'Locations': ['Sydney', 'Hyde Park’],
'Dates': ['this Saturday’],
'Times': ['noon’]
}
Application
Application
HSM is working on a system that allows moderators to review shares more
efficiently. The API endpoints will be part of this system.
https://www.linkedin.com/company/datakind-sf/
Get involved with DataKind!
28

More Related Content

What's hot

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_DescriptionSuman Banerjee
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Artificial Intelligence Institute at UofSC
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurSri Ambati
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BIMonaheng Diaho
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DigitYser
 
Different Career Paths in Data Science
Different Career Paths in Data ScienceDifferent Career Paths in Data Science
Different Career Paths in Data ScienceRoger Huang
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 
CGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big dataCGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big dataCGIAR
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...Michel Dumontier
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
Fundamentals of Big Data in 2 minutes!!
Fundamentals of Big Data in  2 minutes!!Fundamentals of Big Data in  2 minutes!!
Fundamentals of Big Data in 2 minutes!!Simplify360
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National LabsLucidworks
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1eshtiyak
 

What's hot (19)

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
 
Different Career Paths in Data Science
Different Career Paths in Data ScienceDifferent Career Paths in Data Science
Different Career Paths in Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
CGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big dataCGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big data
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Fundamentals of Big Data in 2 minutes!!
Fundamentals of Big Data in  2 minutes!!Fundamentals of Big Data in  2 minutes!!
Fundamentals of Big Data in 2 minutes!!
 
Opra W2&4 Tech Essentials
Opra W2&4 Tech EssentialsOpra W2&4 Tech Essentials
Opra W2&4 Tech Essentials
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
 

Similar to Advancing Alcohol Behavior Change

Software Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopSoftware Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopManrique Lopez
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)Mark Underwood
 
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...icwe2015
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
STATS415-Final_report
STATS415-Final_reportSTATS415-Final_report
STATS415-Final_reportYilei Zhang
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...OW2
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIsBala Iyer
 

Similar to Advancing Alcohol Behavior Change (20)

Software Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopSoftware Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshop
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
 
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Industry 4.0 module 4
Industry 4.0 module 4Industry 4.0 module 4
Industry 4.0 module 4
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
STATS415-Final_report
STATS415-Final_reportSTATS415-Final_report
STATS415-Final_report
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Advancing Alcohol Behavior Change

  • 1. www.datakind.org | @DataKindSF Advancing Alcohol Behavior Change with Data Science Jaya Pokuri
  • 2. Jaya Pokuri Data Ambassador @ DataKind SF Introduction
  • 3. 20,000+ community members worldwide in 98 countries, representing the largest global data science for social good network 5 global chapters 250+ events around the world Volunteer sign-ups from 174 countries 300+ projects completed, providing the most comprehensive library of data science for social good projects 150+ organizations helped 200,000+ hours donated $35M+ pro bono services delivered DataKind by the Numbers From evening or weekend events to multi-month projects, our programs are designed to provide social organizations with the pro bono data science innovation team they need to tackle critical humanitarian issues.
  • 5. 5 DataKind’s strategy is grounded in four key principles Catalyze a thriving Data Science for Good ecosystem through partnerships: A healthy ecosystem requires forming the right “we” to deliver on a range of data science needs – while we will continue to deliver data science projects, we will also use partnerships wherever possible to create an accessible ecosystem of data science resources. Be the connective tissue between the social sector and private sector data science resources: We will elevate attention to the Data Science for Good field, with the goal of building nonprofit demand, data science talent / resources, and philanthropic investment for all. Identify the brightest opportunities for data science: We are known for our ability to scope how to apply data science to social sector organizations, ensuring that data science solutions are designed thoughtfully, and implemented ethically, and used effectively. Build data science projects that advance the field: DataKind will work directly with nonprofits in targeted issue areas where there are unmet data science needs that stretch beyond the individual organization and a coalition of interested and committed partners.
  • 6. 6 How data science might help nonprofits? Expand impact by anticipating future needs Scale services by providing personalized support Save staff time by automating processes Better understand the communities served Better target efforts and find those in need of services Use open/external data sources to inform decision making
  • 7. DataCorps Process The team wrangles the data and identifies external data sources to leverage. We explore what’s possible, then staff an expert volunteer team. 2. Data Discovery 1. Problem Exploration The team co-creates solutions with the partner, while DataKind oversee their work. Based on feedback, the team makes adjustments to meet the partner’s needs. The team delivers the final version and documentation so the partner can increase its impact. 3. Prototyping 4. Refinement 5. Solution
  • 8. HSM is an Australian-based non-profit focused on helping people form healthier relationships with alcohol. Created the Daybreak app which is a professional and community support social network. About Hello Sunday Morning
  • 9. Daybreak Members select a mood they’re feeling. Share how they’re feeling by making a post. Comment, like, and save other members’ posts. Set goals and reminders.
  • 11. Challenge Moderators read every post and flag those which are potentially problematic- either those that indicate potentially harmful behavior or those in breach of community guidelines. Moderators will either provide support or escalate members to a clinical team. HSM is facing the problem of growing memberships: the task of moderators is becoming unmanageable with hundreds of thousands of community activity (posts, comments, reactions) to review and flag if necessary.
  • 12. Ask Moderators need assistance from an automated approach to develop an efficient and scalable solution to flag and categorize the risky or breach activity.
  • 14. Data Provided HSM provided historical (Jan-Sept 2019) , labeled post data containing raw text (with PII removed), timestamp of post, and risk/breach category. Large amount of data but significant class imbalance (< 0.1% of the posts were risky/breach)
  • 15. Objective 1: Identify Risky Posts A model was built to predict the probability of a post being risky. Steps: 1) Remove weekend posts from the dataset.  2) Calculate lexicon-based sentiment score. 3) Clean text data. 4) Tokenize posts.  5) Create more features.  6) Train model.
  • 16. Assessment of model Threshold = 0.1 Threshold = 0.3 Threshold = 0.5 Threshold = 0.7 Recall 0.8 0.5 0.3 0.2 Precision 0.8 0.9 0.9 0.9 F1 score 0.8 0.7 0.5 0.4 Table 1: Model performance on test data at varying probability thresholds The model was tested on a sample of post data unseen by the model (Nov 2019 – Jan 2020) HSM looking to use the threshold of 0.1 as to minimize the number of false negatives
  • 17. Objective 1: Identify Risky Posts A keyword detector were built to predict indicate potentially risky words/phrases in a post. • Suicide • Domestic Violence • DUI • Risky Behavior • Detox Withdrawal • Mental Health • Self Harm • Other
  • 18. Objective 2: Identify Breach Posts Pre-trained models were used to detect posts with PII or profanity. Steps: 1) Detecting PII by utilizing pre-trained/off-the-shelf models for named entity recognition and regex-based detection. Detects text related to people, organizations, locations, dates, times, email addresses, phone numbers, and street addresses. 2) Detecting Profanity by using an off-the-shelf regex-based model.
  • 20. Deploying in Production A REST API was built in Flask to enable usage of the solutions created. • Pre-processing Data • Feature Engineering • Generating Model Predictions/Outputs API HSM (Daybreak) Request Response
  • 21. Deploying in Production The API has three endpoints that HSM can utilize. Outputs of the endpoints: 1. Probability a post is risky on a scale of 0-1. 2. Risk keywords in the post. 3. PII categories and words in the post.
  • 22. Examples of API Output: Endpoint #1 Probability Risk: Probability a post is risky on a scale of 0-1. API { share_content: “I feel like things are starting to turn around for me.”, created_at: “2020-01-01 09:17:42” } Request Response {'Prediction Risk': 0.01}
  • 23. Examples of API Output: Endpoint #2 Risk Keywords: Risk keywords in the post. API { share_content: “I drank too much last night and am now in a bad place.”, created_at: “2019-11-24 08:18:31” } { 'DUI': [‘drank too much’], 'Mental Health': [‘bad place’] } Request Response
  • 24. Examples of API Output: Endpoint #3 PII/Profanity: PII breaches in the post. API { share_content: “Hey guys my name is John, anyone in Sydney want to meet up at Hyde Park this Saturday at noon?”, created_at: “2019-10-18 02:11:21” } { 'People': ['John’], 'Locations': ['Sydney', 'Hyde Park’], 'Dates': ['this Saturday’], 'Times': ['noon’] }
  • 26. Application HSM is working on a system that allows moderators to review shares more efficiently. The API endpoints will be part of this system.
  • 28. 28