SlideShare a Scribd company logo
1 of 32
Download to read offline
Confidential Material – Chegg Inc. © 2005 - 2016. All Rights Reserved.© 2005 – 2017 by Chegg Inc. All Rights Reserved. 1
Natural Language Comprehension: Human Machine Collaboration.
Sanghamitra	Deb,	Data	Scientist	
Gabriela	Brown,	Summer	Intern
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.2
Chegg
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.3
What is Chegg Tutors?
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.4
Unstructured data in Business
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.5
Dark Data at Chegg
Chats	between	tutors	and	students
Chegg Study	Q&A
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.6
Bringing light to Dark Data
DeepDive and	snorkel	processes	such	documents	from	
public	and	dark	web	to	extract	evidential	data,	such	as	
names,	addresses,	phone	numbers,	job	types,	job	
requirements,	information	about	rates	of	service,	etc.
Wikipedia	extractions
Detecting	Online	Sex	Trafficking
Professor	Chris	Re
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.7
Student looking for tutors
I	need	a	10	page	essay	
written	on	the	
deforestation	of	the	
amazon	rainforest.	must	
have	7	resources.
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.8
Students intents: Fraud
• Do my homework
• Take online quiz for me
• Do my scheduled take home exam
Universities	typically	have	strict	honor	policies,	stating	that	your	homework,	
exams,	take	home	etc should	be	completed	by	the	student	without	any	
external	help.		
A	small	number	of	students	come	to	platform	to	get	their	homework	done	
or	ask	someone	to	take	their	exam	for	them.	This	is	a	strict	violation	of	
honor	code.
Examples
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.9
Typical NLP Machine Learning Flow
High	Performing	Machine	Learning	Models	
could	require	100,000	labelled	data	!!
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.10
Traditional Feature Engineering
Winning	solution!!
Feature		Engineering
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.11
Generating a training set
• Human	reading	and	labeling
• Several	hundreds	of	expert	hours
• Difficult	to	scale	with	evolving	
business	questions
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.12
The snorkel pipeline
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.13
Human Machine Collaboration
Knowledge	
transfer
What	is	
important	to	
product	and	
business
Language,	
business	needs	
and	teams	
evolve.	
Data	Scientist
Product/Businesss
SME
Iterate
Knowledge	
transfer
What	is	
important	to	
product	and	
business
SME
Data	Scientist
Product/Businesss
• Create	Filters	
• Create	rules
• Redefine	Filters	
• Redefine	rulesReplaces	manual	generation	of	labelled	data
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.14
Automated Features
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.15
Creating Filters: Candidate Extraction
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.16
Creating Filters: Candidate Extraction
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.17
Observing the candidates
Humans/SME’s	look	at	~100-200	of	them	and	label	them.
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.18
Creating Rules
v
I	will	pay	someone	to	write	my	essay.
Reference	to	the	tutor		+		verb	followed	by	“my”	
This	is	an	honor	code	violation✓
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.19
What do the rule functions look like?
Several	tens	of	rules	create	the	training	set
The	rules	are	judged	based	on	the	labels	provided	
By	humans	or	SME’s
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.20
Developing the training set: one rule
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training	set 1:	Class	1
0:	unlabelled data
-1:	Class	2
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.21
Developing the training set: four rules
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training	set 1:	Class	1
0:	unlabelled data
-1:	Class	2
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.22
Developing the training set: eight rules
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training	set 1:	Class	1
0:	unlabelled data
-1:	Class	2
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.23
Developing the training set: one rule
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training	set 1:	Class	1
0:	unlabelled data
-1:	Class	2
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.24
Developing the training set: twenty rules
0
20
40
60
80
100
120
140
160
180
200
1 0 -1
Training	set 1:	Class	1
0:	unlabelled data
-1:	Class	2
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.25
Performance
Evaluation	Metrics
Positive	accuracy 68.3%
Negative	accuracy 90.7%
Precision 71.8%
Recall 68.3%
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.26
Production: Iterations
• Snorkel	codes	run	on	the	opportunities	sent	the	day	before,	humans	check	the	list	and	update	a	file	with	
real		honor	code	violations.		
• After	doing	unsupervised	learning	(topic	modeling,	word2vec)	on	the	positive	and	negative	HCV’s	from	
human	generated	data	the	rules	are	changed	to	improve	positive	accuracy.
In	dynamic	two	sided	
market	places	language	
and	behavior	changes	
continuously,	hence	
having	iterations	every	
3-4	months		keeps	the	
model	fresh
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.27
Generalization: Matching Problem for Chegg Tutors
• Feature	Generation	for	student	tutor	pairs.		
• Chegg tutors	is	a	two	sided	market	place	with	students	and	tutors	
being	paired	based	on	their	overlapping	characteristics.
• Generating	features	is	an	important	part	of		creating	this	
recommendation	system.	Snorkel	helps	generate	key	phrases	
associated	with	student-tutor	pairs.
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.28
Behind training set generation: PGM’s
https://arxiv.org/pdf/1605.07723.pdf
Model	the	rules	as	
independent	similar	to	
Naïve	Bayes
Consider	interdependencies	between	the	rules.
Similar fix reinforce
exclude
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.29
Noisy sources of truth
credit:
https://hazyresearch.github.io/snorkel/
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.30
Generalization
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.31
Thank you
sdeb@chegg.com
@sangha_deb
Example Slide
Chegg Inc. © 2005 – 2017. All Rights Reserved.32
Stanford NLP & Tools
Intent	of	Honor	Code	ViolationOpportunities
• Are	students	and	tutors	having	
classes	offsite?
• Are	tutors	comitting fraud?
• Are	students/tutors	using	offensive	
language?
• Do	students	want	lessons	
immediately	or	they	are	willing	to	
wait?	…...
Other	business	questions
Other	datasets:	Chat

More Related Content

Similar to Natural Language Comprehension: Human Machine Collaboration.

Balancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitive
Balancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitiveBalancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitive
Balancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitiveDaniel Walsh
 
Better Resumes For Applying Online
Better Resumes For Applying Online Better Resumes For Applying Online
Better Resumes For Applying Online Denis Curtin
 
Optimize Your Resume (Will County) For Applicant Tracking Systems 2017
Optimize Your Resume (Will County) For Applicant Tracking Systems 2017Optimize Your Resume (Will County) For Applicant Tracking Systems 2017
Optimize Your Resume (Will County) For Applicant Tracking Systems 2017Denis Curtin
 
Select a Research Brand Name
Select a Research Brand NameSelect a Research Brand Name
Select a Research Brand NameNader Ale Ebrahim
 
Chegg fy2016 presentation
Chegg fy2016 presentationChegg fy2016 presentation
Chegg fy2016 presentationSagar Shah
 
Trends and Tools in Training for Business 2017
Trends and Tools in Training for Business 2017Trends and Tools in Training for Business 2017
Trends and Tools in Training for Business 2017Allen Partridge
 
The HR Technology Market: Trends and Disruptions for 2018
The HR Technology Market:  Trends and Disruptions for 2018The HR Technology Market:  Trends and Disruptions for 2018
The HR Technology Market: Trends and Disruptions for 2018Josh Bersin
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersAmazon Web Services
 
Analyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableAnalyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableSagarKumar0812
 
Chegg India guideline presentation
Chegg India guideline presentation Chegg India guideline presentation
Chegg India guideline presentation Vikas Barnwal
 
Making Your User Stories "Ready" to Get to "Done"
Making Your User Stories "Ready" to Get to "Done" Making Your User Stories "Ready" to Get to "Done"
Making Your User Stories "Ready" to Get to "Done" EBG Consulting, Inc.
 
Getting Started in Tech (June 19th, Santa Monica)
Getting Started in Tech (June 19th, Santa Monica)Getting Started in Tech (June 19th, Santa Monica)
Getting Started in Tech (June 19th, Santa Monica)Thinkful
 
Ai revolution for human capital for individuals 2nd feb 2018
Ai revolution for human capital for individuals 2nd feb 2018Ai revolution for human capital for individuals 2nd feb 2018
Ai revolution for human capital for individuals 2nd feb 2018Liew Wei Da Andrew
 
Delivering balanced solutions by nu cognitive for pints with pdx product mana...
Delivering balanced solutions by nu cognitive for pints with pdx product mana...Delivering balanced solutions by nu cognitive for pints with pdx product mana...
Delivering balanced solutions by nu cognitive for pints with pdx product mana...Daniel Walsh
 
Carmen hudson 1 pager - sourcing is about more than boolean
Carmen hudson 1 pager - sourcing is about more than booleanCarmen hudson 1 pager - sourcing is about more than boolean
Carmen hudson 1 pager - sourcing is about more than booleanTalent42
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Salesforce Spring 17 features for Higher Ed, HEDA best practices and Free apps
Salesforce Spring 17 features for Higher Ed, HEDA best practices and Free appsSalesforce Spring 17 features for Higher Ed, HEDA best practices and Free apps
Salesforce Spring 17 features for Higher Ed, HEDA best practices and Free appsBuyan Thyagarajan
 
Learning in the Flow of Work
Learning in the Flow of WorkLearning in the Flow of Work
Learning in the Flow of WorkJosh Bersin
 
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...Edureka!
 

Similar to Natural Language Comprehension: Human Machine Collaboration. (20)

Balancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitive
Balancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitiveBalancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitive
Balancing Business + Usage + Technology Workshop by Daniel Walsh nuCognitive
 
Better Resumes For Applying Online
Better Resumes For Applying Online Better Resumes For Applying Online
Better Resumes For Applying Online
 
Optimize Your Resume (Will County) For Applicant Tracking Systems 2017
Optimize Your Resume (Will County) For Applicant Tracking Systems 2017Optimize Your Resume (Will County) For Applicant Tracking Systems 2017
Optimize Your Resume (Will County) For Applicant Tracking Systems 2017
 
Select a Research Brand Name
Select a Research Brand NameSelect a Research Brand Name
Select a Research Brand Name
 
Chegg fy2016 presentation
Chegg fy2016 presentationChegg fy2016 presentation
Chegg fy2016 presentation
 
Trends and Tools in Training for Business 2017
Trends and Tools in Training for Business 2017Trends and Tools in Training for Business 2017
Trends and Tools in Training for Business 2017
 
The HR Technology Market: Trends and Disruptions for 2018
The HR Technology Market:  Trends and Disruptions for 2018The HR Technology Market:  Trends and Disruptions for 2018
The HR Technology Market: Trends and Disruptions for 2018
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
 
Analyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableAnalyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on Teachable
 
Chegg India guideline presentation
Chegg India guideline presentation Chegg India guideline presentation
Chegg India guideline presentation
 
Making Your User Stories "Ready" to Get to "Done"
Making Your User Stories "Ready" to Get to "Done" Making Your User Stories "Ready" to Get to "Done"
Making Your User Stories "Ready" to Get to "Done"
 
Getting Started in Tech (June 19th, Santa Monica)
Getting Started in Tech (June 19th, Santa Monica)Getting Started in Tech (June 19th, Santa Monica)
Getting Started in Tech (June 19th, Santa Monica)
 
Report on web development
Report on web developmentReport on web development
Report on web development
 
Ai revolution for human capital for individuals 2nd feb 2018
Ai revolution for human capital for individuals 2nd feb 2018Ai revolution for human capital for individuals 2nd feb 2018
Ai revolution for human capital for individuals 2nd feb 2018
 
Delivering balanced solutions by nu cognitive for pints with pdx product mana...
Delivering balanced solutions by nu cognitive for pints with pdx product mana...Delivering balanced solutions by nu cognitive for pints with pdx product mana...
Delivering balanced solutions by nu cognitive for pints with pdx product mana...
 
Carmen hudson 1 pager - sourcing is about more than boolean
Carmen hudson 1 pager - sourcing is about more than booleanCarmen hudson 1 pager - sourcing is about more than boolean
Carmen hudson 1 pager - sourcing is about more than boolean
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Salesforce Spring 17 features for Higher Ed, HEDA best practices and Free apps
Salesforce Spring 17 features for Higher Ed, HEDA best practices and Free appsSalesforce Spring 17 features for Higher Ed, HEDA best practices and Free apps
Salesforce Spring 17 features for Higher Ed, HEDA best practices and Free apps
 
Learning in the Flow of Work
Learning in the Flow of WorkLearning in the Flow of Work
Learning in the Flow of Work
 
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (15)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineBruce Bennett
 
Zeeman Effect normal and Anomalous zeeman effect
Zeeman Effect normal and Anomalous zeeman effectZeeman Effect normal and Anomalous zeeman effect
Zeeman Effect normal and Anomalous zeeman effectPriyanshuRawat56
 
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdfreStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdfKen Fuller
 
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen DatingDubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Datingkojalkojal131
 
Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)
Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)
Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)sonalinghatmal
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja Nehwal
 
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...sonalitrivedi431
 
Motilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptxMotilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptxMaulikVasani1
 
Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''Lauren Prophet-Bryant
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)Delhi Call girls
 
Dark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls DubaiDark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls Dubaikojalkojal131
 
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...shivangimorya083
 
Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.GabrielaMiletti
 
Hot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night Standkumarajju5765
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...Call Girls in Nagpur High Profile
 
CFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector ExperienceCFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector ExperienceSanjay Bokadia
 
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...ranjana rawat
 
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big BoodyDubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boodykojalkojal131
 
OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理cowagem
 
Production Day 1.pptxjvjbvbcbcb bj bvcbj
Production Day 1.pptxjvjbvbcbcb bj bvcbjProduction Day 1.pptxjvjbvbcbcb bj bvcbj
Production Day 1.pptxjvjbvbcbcb bj bvcbjLewisJB
 

Recently uploaded (20)

Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
 
Zeeman Effect normal and Anomalous zeeman effect
Zeeman Effect normal and Anomalous zeeman effectZeeman Effect normal and Anomalous zeeman effect
Zeeman Effect normal and Anomalous zeeman effect
 
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdfreStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
 
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen DatingDubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
 
Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)
Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)
Toxicokinetics studies.. (toxicokinetics evaluation in preclinical studies)
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
 
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
 
Motilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptxMotilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptx
 
Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
 
Dark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls DubaiDark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls Dubai
 
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
 
Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.
 
Hot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Janakpuri ☎ 9711199171 Book Your One night Stand
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
 
CFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector ExperienceCFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector Experience
 
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
 
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big BoodyDubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
 
OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理
 
Production Day 1.pptxjvjbvbcbcb bj bvcbj
Production Day 1.pptxjvjbvbcbcb bj bvcbjProduction Day 1.pptxjvjbvbcbcb bj bvcbj
Production Day 1.pptxjvjbvbcbcb bj bvcbj
 

Natural Language Comprehension: Human Machine Collaboration.

  • 1. Confidential Material – Chegg Inc. © 2005 - 2016. All Rights Reserved.© 2005 – 2017 by Chegg Inc. All Rights Reserved. 1 Natural Language Comprehension: Human Machine Collaboration. Sanghamitra Deb, Data Scientist Gabriela Brown, Summer Intern
  • 2. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.2 Chegg
  • 3. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.3 What is Chegg Tutors?
  • 4. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.4 Unstructured data in Business
  • 5. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.5 Dark Data at Chegg Chats between tutors and students Chegg Study Q&A
  • 6. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.6 Bringing light to Dark Data DeepDive and snorkel processes such documents from public and dark web to extract evidential data, such as names, addresses, phone numbers, job types, job requirements, information about rates of service, etc. Wikipedia extractions Detecting Online Sex Trafficking Professor Chris Re
  • 7. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.7 Student looking for tutors I need a 10 page essay written on the deforestation of the amazon rainforest. must have 7 resources.
  • 8. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.8 Students intents: Fraud • Do my homework • Take online quiz for me • Do my scheduled take home exam Universities typically have strict honor policies, stating that your homework, exams, take home etc should be completed by the student without any external help. A small number of students come to platform to get their homework done or ask someone to take their exam for them. This is a strict violation of honor code. Examples
  • 9. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.9 Typical NLP Machine Learning Flow High Performing Machine Learning Models could require 100,000 labelled data !!
  • 10. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.10 Traditional Feature Engineering Winning solution!! Feature Engineering
  • 11. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.11 Generating a training set • Human reading and labeling • Several hundreds of expert hours • Difficult to scale with evolving business questions
  • 12. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.12 The snorkel pipeline
  • 13. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.13 Human Machine Collaboration Knowledge transfer What is important to product and business Language, business needs and teams evolve. Data Scientist Product/Businesss SME Iterate Knowledge transfer What is important to product and business SME Data Scientist Product/Businesss • Create Filters • Create rules • Redefine Filters • Redefine rulesReplaces manual generation of labelled data
  • 14. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.14 Automated Features
  • 15. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.15 Creating Filters: Candidate Extraction
  • 16. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.16 Creating Filters: Candidate Extraction
  • 17. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.17 Observing the candidates Humans/SME’s look at ~100-200 of them and label them.
  • 18. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.18 Creating Rules v I will pay someone to write my essay. Reference to the tutor + verb followed by “my” This is an honor code violation✓
  • 19. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.19 What do the rule functions look like? Several tens of rules create the training set The rules are judged based on the labels provided By humans or SME’s
  • 20. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.20 Developing the training set: one rule 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set 1: Class 1 0: unlabelled data -1: Class 2
  • 21. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.21 Developing the training set: four rules 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set 1: Class 1 0: unlabelled data -1: Class 2
  • 22. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.22 Developing the training set: eight rules 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set 1: Class 1 0: unlabelled data -1: Class 2
  • 23. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.23 Developing the training set: one rule 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set 1: Class 1 0: unlabelled data -1: Class 2
  • 24. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.24 Developing the training set: twenty rules 0 20 40 60 80 100 120 140 160 180 200 1 0 -1 Training set 1: Class 1 0: unlabelled data -1: Class 2
  • 25. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.25 Performance Evaluation Metrics Positive accuracy 68.3% Negative accuracy 90.7% Precision 71.8% Recall 68.3%
  • 26. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.26 Production: Iterations • Snorkel codes run on the opportunities sent the day before, humans check the list and update a file with real honor code violations. • After doing unsupervised learning (topic modeling, word2vec) on the positive and negative HCV’s from human generated data the rules are changed to improve positive accuracy. In dynamic two sided market places language and behavior changes continuously, hence having iterations every 3-4 months keeps the model fresh
  • 27. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.27 Generalization: Matching Problem for Chegg Tutors • Feature Generation for student tutor pairs. • Chegg tutors is a two sided market place with students and tutors being paired based on their overlapping characteristics. • Generating features is an important part of creating this recommendation system. Snorkel helps generate key phrases associated with student-tutor pairs.
  • 28. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.28 Behind training set generation: PGM’s https://arxiv.org/pdf/1605.07723.pdf Model the rules as independent similar to Naïve Bayes Consider interdependencies between the rules. Similar fix reinforce exclude
  • 29. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.29 Noisy sources of truth credit: https://hazyresearch.github.io/snorkel/
  • 30. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.30 Generalization
  • 31. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.31 Thank you sdeb@chegg.com @sangha_deb
  • 32. Example Slide Chegg Inc. © 2005 – 2017. All Rights Reserved.32 Stanford NLP & Tools Intent of Honor Code ViolationOpportunities • Are students and tutors having classes offsite? • Are tutors comitting fraud? • Are students/tutors using offensive language? • Do students want lessons immediately or they are willing to wait? …... Other business questions Other datasets: Chat