SlideShare a Scribd company logo
www.datakind.org | @DataKindSF
Advancing Alcohol Behavior Change with Data Science
Jaya Pokuri
Jaya Pokuri
Data Ambassador @ DataKind SF
Introduction
20,000+
community members worldwide in 98 countries,
representing the largest global data science for social good
network
5 global chapters
250+ events around the world
Volunteer sign-ups from 174 countries
300+ projects completed, providing the most
comprehensive library of data science for social good
projects
150+ organizations helped
200,000+ hours donated
$35M+ pro bono services delivered
DataKind
by the Numbers
From evening or weekend events to
multi-month projects, our programs
are designed to provide social
organizations with the pro bono data
science innovation team they need to
tackle critical humanitarian issues.
4
DataKind’s Global Network
5
DataKind’s strategy is grounded in four key principles
Catalyze a thriving Data Science for Good ecosystem through partnerships: A healthy
ecosystem requires forming the right “we” to deliver on a range of data science needs – while we will
continue to deliver data science projects, we will also use partnerships wherever possible to create
an accessible ecosystem of data science resources.
Be the connective tissue between the social sector and private sector data science resources:
We will elevate attention to the Data Science for Good field, with the goal of building nonprofit
demand, data science talent / resources, and philanthropic investment for all.
Identify the brightest opportunities for data science: We are known for our ability to scope how to
apply data science to social sector organizations, ensuring that data science solutions are designed
thoughtfully, and implemented ethically, and used effectively.
Build data science projects that advance the field: DataKind will work directly with nonprofits in
targeted issue areas where there are unmet data science needs that stretch beyond the individual
organization and a coalition of interested and committed partners.
6
How data science might help nonprofits?
Expand impact by anticipating future needs
Scale services by providing personalized support
Save staff time by automating processes
Better understand the communities served
Better target efforts and find those in need of services
Use open/external data sources to inform decision making
DataCorps Process
The team wrangles
the data and
identifies external
data sources to
leverage.
We explore what’s
possible, then staff an
expert volunteer
team.
2. Data
Discovery
1. Problem
Exploration
The team co-creates
solutions with the
partner, while
DataKind oversee
their work.
Based on feedback,
the team makes
adjustments to meet
the partner’s needs.
The team delivers the
final version and
documentation so the
partner can increase
its impact.
3. Prototyping 4. Refinement 5. Solution
HSM is an Australian-based
non-profit focused on helping people
form healthier relationships with
alcohol.
Created the Daybreak app which is a
professional and community support
social network.
About Hello Sunday Morning
Daybreak
Members select a
mood they’re feeling.
Share how they’re feeling
by making a post.
Comment, like, and save
other members’ posts.
Set goals and
reminders.
CHALLENGE
Challenge
Moderators read every post and flag those which are potentially problematic-
either those that indicate potentially harmful behavior or those in breach
of community guidelines.
Moderators will either provide support or escalate members to a clinical
team.
HSM is facing the problem of growing memberships: the task of
moderators is becoming unmanageable with hundreds of thousands of
community activity (posts, comments, reactions) to review and flag if
necessary.
Ask
Moderators need assistance from an automated approach to
develop an efficient and scalable solution to flag and categorize
the risky or breach activity.
APPROACH
Data Provided
HSM provided historical (Jan-Sept 2019) , labeled post data
containing raw text (with PII removed), timestamp of post, and
risk/breach category.
Large amount of data but significant class imbalance (< 0.1% of
the posts were risky/breach)
Objective 1: Identify Risky Posts
A model was built to predict the probability of a post being risky.
Steps:
1) Remove weekend posts from the dataset. 
2) Calculate lexicon-based sentiment score.
3) Clean text data.
4) Tokenize posts. 
5) Create more features. 
6) Train model.
Assessment of model
Threshold =
0.1
Threshold =
0.3
Threshold =
0.5
Threshold =
0.7
Recall 0.8 0.5 0.3 0.2
Precision 0.8 0.9 0.9 0.9
F1
score 0.8 0.7 0.5 0.4
Table 1: Model performance on test data at varying probability thresholds
The model was tested on a sample of post data unseen by the
model (Nov 2019 – Jan 2020)
HSM looking to use the threshold of 0.1 as to minimize the number of
false negatives
Objective 1: Identify Risky Posts
A keyword detector were built to predict indicate potentially risky
words/phrases in a post.
• Suicide
• Domestic Violence
• DUI
• Risky Behavior
• Detox Withdrawal
• Mental Health
• Self Harm
• Other
Objective 2: Identify Breach Posts
Pre-trained models were used to detect posts with PII or profanity.
Steps:
1) Detecting PII by utilizing pre-trained/off-the-shelf models for named
entity recognition and regex-based detection. Detects text related to
people, organizations, locations, dates, times, email addresses, phone
numbers, and street addresses.
2) Detecting Profanity by using an off-the-shelf regex-based model.
Implementation
Deploying in Production
A REST API was built in Flask to enable usage of the solutions created.
• Pre-processing Data
• Feature Engineering
• Generating Model
Predictions/Outputs
API HSM
(Daybreak)
Request
Response
Deploying in Production
The API has three endpoints that HSM can utilize.
Outputs of the endpoints:
1. Probability a post is risky on a scale of 0-1.
2. Risk keywords in the post.
3. PII categories and words in the post.
Examples of API Output: Endpoint #1
Probability Risk: Probability a post is risky on a scale of 0-1.
API
{
share_content: “I feel like things are
starting to turn around for me.”,
created_at: “2020-01-01 09:17:42”
}
Request Response
{'Prediction Risk': 0.01}
Examples of API Output: Endpoint #2
Risk Keywords: Risk keywords in the post.
API
{
share_content: “I drank too much last
night and am now in a bad place.”,
created_at: “2019-11-24 08:18:31”
}
{
'DUI': [‘drank too much’],
'Mental Health': [‘bad place’]
}
Request Response
Examples of API Output: Endpoint #3
PII/Profanity: PII breaches in the post.
API
{
share_content: “Hey guys my name is
John, anyone in Sydney want to
meet up at Hyde Park this Saturday
at noon?”,
created_at: “2019-10-18 02:11:21”
}
{
'People': ['John’],
'Locations': ['Sydney', 'Hyde Park’],
'Dates': ['this Saturday’],
'Times': ['noon’]
}
Application
Application
HSM is working on a system that allows moderators to review shares more
efficiently. The API endpoints will be part of this system.
https://www.linkedin.com/company/datakind-sf/
Get involved with DataKind!
28

More Related Content

What's hot

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
Experian_US
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_DescriptionSuman Banerjee
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
Skillspeed
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Artificial Intelligence Institute at UofSC
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
Sri Ambati
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
Monaheng Diaho
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
Lyn Fenex
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DigitYser
 
Different Career Paths in Data Science
Different Career Paths in Data ScienceDifferent Career Paths in Data Science
Different Career Paths in Data Science
Roger Huang
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
LivePerson
 
CGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big dataCGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Laguna State Polytechnic University
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
Michel Dumontier
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
Fundamentals of Big Data in 2 minutes!!
Fundamentals of Big Data in  2 minutes!!Fundamentals of Big Data in  2 minutes!!
Fundamentals of Big Data in 2 minutes!!
Simplify360
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
Lucidworks
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
eshtiyak
 

What's hot (19)

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
 
Different Career Paths in Data Science
Different Career Paths in Data ScienceDifferent Career Paths in Data Science
Different Career Paths in Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
CGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big dataCGIAR Collaborative Platform for Gender Research - Gender meets big data
CGIAR Collaborative Platform for Gender Research - Gender meets big data
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Fundamentals of Big Data in 2 minutes!!
Fundamentals of Big Data in  2 minutes!!Fundamentals of Big Data in  2 minutes!!
Fundamentals of Big Data in 2 minutes!!
 
Opra W2&4 Tech Essentials
Opra W2&4 Tech EssentialsOpra W2&4 Tech Essentials
Opra W2&4 Tech Essentials
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
 

Similar to Advancing Alcohol Behavior Change

Software Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopSoftware Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshop
Manrique Lopez
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
jadhavpravin920
 
Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
Mark Underwood
 
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
icwe2015
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Stats Statswork
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Stats Statswork
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Industry 4.0 module 4
Industry 4.0 module 4Industry 4.0 module 4
Industry 4.0 module 4
Shabana Ashraf
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Polash Halder
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
Philip Bourne
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
STATS415-Final_report
STATS415-Final_reportSTATS415-Final_report
STATS415-Final_reportYilei Zhang
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
Mike Kujawski
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
Bohitesh Misra, PMP
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 

Similar to Advancing Alcohol Behavior Change (20)

Software Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopSoftware Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshop
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
 
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
(Web User Interfaces track) "Getting the Query Right: User Interface Design o...
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Industry 4.0 module 4
Industry 4.0 module 4Industry 4.0 module 4
Industry 4.0 module 4
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
STATS415-Final_report
STATS415-Final_reportSTATS415-Final_report
STATS415-Final_report
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 

Advancing Alcohol Behavior Change

  • 1. www.datakind.org | @DataKindSF Advancing Alcohol Behavior Change with Data Science Jaya Pokuri
  • 2. Jaya Pokuri Data Ambassador @ DataKind SF Introduction
  • 3. 20,000+ community members worldwide in 98 countries, representing the largest global data science for social good network 5 global chapters 250+ events around the world Volunteer sign-ups from 174 countries 300+ projects completed, providing the most comprehensive library of data science for social good projects 150+ organizations helped 200,000+ hours donated $35M+ pro bono services delivered DataKind by the Numbers From evening or weekend events to multi-month projects, our programs are designed to provide social organizations with the pro bono data science innovation team they need to tackle critical humanitarian issues.
  • 5. 5 DataKind’s strategy is grounded in four key principles Catalyze a thriving Data Science for Good ecosystem through partnerships: A healthy ecosystem requires forming the right “we” to deliver on a range of data science needs – while we will continue to deliver data science projects, we will also use partnerships wherever possible to create an accessible ecosystem of data science resources. Be the connective tissue between the social sector and private sector data science resources: We will elevate attention to the Data Science for Good field, with the goal of building nonprofit demand, data science talent / resources, and philanthropic investment for all. Identify the brightest opportunities for data science: We are known for our ability to scope how to apply data science to social sector organizations, ensuring that data science solutions are designed thoughtfully, and implemented ethically, and used effectively. Build data science projects that advance the field: DataKind will work directly with nonprofits in targeted issue areas where there are unmet data science needs that stretch beyond the individual organization and a coalition of interested and committed partners.
  • 6. 6 How data science might help nonprofits? Expand impact by anticipating future needs Scale services by providing personalized support Save staff time by automating processes Better understand the communities served Better target efforts and find those in need of services Use open/external data sources to inform decision making
  • 7. DataCorps Process The team wrangles the data and identifies external data sources to leverage. We explore what’s possible, then staff an expert volunteer team. 2. Data Discovery 1. Problem Exploration The team co-creates solutions with the partner, while DataKind oversee their work. Based on feedback, the team makes adjustments to meet the partner’s needs. The team delivers the final version and documentation so the partner can increase its impact. 3. Prototyping 4. Refinement 5. Solution
  • 8. HSM is an Australian-based non-profit focused on helping people form healthier relationships with alcohol. Created the Daybreak app which is a professional and community support social network. About Hello Sunday Morning
  • 9. Daybreak Members select a mood they’re feeling. Share how they’re feeling by making a post. Comment, like, and save other members’ posts. Set goals and reminders.
  • 11. Challenge Moderators read every post and flag those which are potentially problematic- either those that indicate potentially harmful behavior or those in breach of community guidelines. Moderators will either provide support or escalate members to a clinical team. HSM is facing the problem of growing memberships: the task of moderators is becoming unmanageable with hundreds of thousands of community activity (posts, comments, reactions) to review and flag if necessary.
  • 12. Ask Moderators need assistance from an automated approach to develop an efficient and scalable solution to flag and categorize the risky or breach activity.
  • 14. Data Provided HSM provided historical (Jan-Sept 2019) , labeled post data containing raw text (with PII removed), timestamp of post, and risk/breach category. Large amount of data but significant class imbalance (< 0.1% of the posts were risky/breach)
  • 15. Objective 1: Identify Risky Posts A model was built to predict the probability of a post being risky. Steps: 1) Remove weekend posts from the dataset.  2) Calculate lexicon-based sentiment score. 3) Clean text data. 4) Tokenize posts.  5) Create more features.  6) Train model.
  • 16. Assessment of model Threshold = 0.1 Threshold = 0.3 Threshold = 0.5 Threshold = 0.7 Recall 0.8 0.5 0.3 0.2 Precision 0.8 0.9 0.9 0.9 F1 score 0.8 0.7 0.5 0.4 Table 1: Model performance on test data at varying probability thresholds The model was tested on a sample of post data unseen by the model (Nov 2019 – Jan 2020) HSM looking to use the threshold of 0.1 as to minimize the number of false negatives
  • 17. Objective 1: Identify Risky Posts A keyword detector were built to predict indicate potentially risky words/phrases in a post. • Suicide • Domestic Violence • DUI • Risky Behavior • Detox Withdrawal • Mental Health • Self Harm • Other
  • 18. Objective 2: Identify Breach Posts Pre-trained models were used to detect posts with PII or profanity. Steps: 1) Detecting PII by utilizing pre-trained/off-the-shelf models for named entity recognition and regex-based detection. Detects text related to people, organizations, locations, dates, times, email addresses, phone numbers, and street addresses. 2) Detecting Profanity by using an off-the-shelf regex-based model.
  • 20. Deploying in Production A REST API was built in Flask to enable usage of the solutions created. • Pre-processing Data • Feature Engineering • Generating Model Predictions/Outputs API HSM (Daybreak) Request Response
  • 21. Deploying in Production The API has three endpoints that HSM can utilize. Outputs of the endpoints: 1. Probability a post is risky on a scale of 0-1. 2. Risk keywords in the post. 3. PII categories and words in the post.
  • 22. Examples of API Output: Endpoint #1 Probability Risk: Probability a post is risky on a scale of 0-1. API { share_content: “I feel like things are starting to turn around for me.”, created_at: “2020-01-01 09:17:42” } Request Response {'Prediction Risk': 0.01}
  • 23. Examples of API Output: Endpoint #2 Risk Keywords: Risk keywords in the post. API { share_content: “I drank too much last night and am now in a bad place.”, created_at: “2019-11-24 08:18:31” } { 'DUI': [‘drank too much’], 'Mental Health': [‘bad place’] } Request Response
  • 24. Examples of API Output: Endpoint #3 PII/Profanity: PII breaches in the post. API { share_content: “Hey guys my name is John, anyone in Sydney want to meet up at Hyde Park this Saturday at noon?”, created_at: “2019-10-18 02:11:21” } { 'People': ['John’], 'Locations': ['Sydney', 'Hyde Park’], 'Dates': ['this Saturday’], 'Times': ['noon’] }
  • 26. Application HSM is working on a system that allows moderators to review shares more efficiently. The API endpoints will be part of this system.
  • 28. 28