SlideShare a Scribd company logo
1 of 34
Data Science at Whisper:
From Content Quality to
Personalization
ULAS BARDAK, MAARTEN BOSMA, MARK HSIAO
Whisper @ Big Data Day, LA - 2015/06/27
This Talk
 Introduction to Whisper
 Data Science problems overview
 Examples
 Personalization and Recommendations
 Deep dive – Identifying Similar Users and its applications
 Like-minded users for recommendations
 Identifying content with broad appeal
Whisper @ Big Data Day, LA - 2015/06/27
The Rise of the Anonymous Apps
Google Trends – “Anonymous App”
Whisper @ Big Data Day, LA - 2015/06/27
A little background on Whisper
 Anonymous Social Network
focused on mobile apps
 Users come to share secrets,
make confessions, find others to
connect to
 No need to create an account
 Engagement through replies,
direct messages, “hearts”
 Millions of users & hundreds of
millions of whispers
Whisper @ Big Data Day, LA - 2015/06/27
High Level Usage Patterns
App Launch
Recommended
Whispers
Recommendation
Engine
User + Content
Models
User Engagement
Whisper Create
Suggest Image
Creation Flow
Interaction Flow
Whisper @ Big Data Day, LA - 2015/06/27
Some Problems We Are Tackling
Content Understanding
• Spam detection
• Language detection
• Content quality
prediction
• Content classification
• Image Suggestion
User Understanding
• Spammer detection
• Personalization
• Similar user detection
• Churn prediction
Overall
• A/B testing
• Reporting
Whisper @ Big Data Day, LA - 2015/06/27
Language Detection
Whisper @ Big Data Day, LA - 2015/06/27
Image Quality Estimation
Low Quality High Quality
Whisper @ Big Data Day, LA - 2015/06/27
Recommendations
Problem:
Showing every user the exact same content is not
efficient. Engagement and interest depend on matching
users’ preferences to content, i.e. personalization.
Requirements:
Fast and able to work with little data
Whisper @ Big Data Day, LA - 2015/06/27
Recommendation Engine
High Personalization
• Like-minded users
• Collaborative
Filtering
• …
High Coverage
• Popular in location
• Recently popular
• Popular with new
users
• …
Combiner
• Merge results, deciding
on the right ordering
• If not enough results,
use fallback methods to
backfill.
Whisper @ Big Data Day, LA - 2015/06/27
High Personalization
• Identifying Like-minded Users for Recommendations (by Nick Stucky-Mack)
• Online learning to rank for Collaborative Filtering
Whisper @ Big Data Day, LA - 2015/06/27
Ko-Jen (Mark) Hsiao
How do we find likeminded users?
1. Agglomeration [Convert the user into a giant document]
2. Pre-processing [Lowercase, remove stopwords, etc..]
3. Vectorization [Bag of words into vectors]
4. Dimensionality reduction [Autoencoder maps 5K+ into ~100]
5. Similarity calculation [Top k users via cosine similarity]
6. Recommendation [Collect whispers from similar users]
7. Feedback [Regenerate model with new activity]
Whisper @ Big Data Day, LA - 2015/06/27
High Personalization
• Identifying Like-minded Users for Recommendations
• Online learning to rank for Collaborative Filtering
Whisper @ Big Data Day, LA - 2015/06/27
Collaborative Filtering
Whispers
Users User
Whisper
=
x
Whisper @ Big Data Day, LA - 2015/06/27
Collaborative Filtering
 We want to learn a low dimensional embedding for each
user and each Whisper.
 Instead of solving this problem by matrix factorization, we
view this as a ranking problem.
 We only care about top recommended results, not accurate
score predictions for all whispers.
Whisper @ Big Data Day, LA - 2015/06/27
Learning to rank
Basic idea:
Every time there is an interaction between a user a and
whisper w update embeddings a and w such that
corresponding inner product has a higher value.
Whisper @ Big Data Day, LA - 2015/06/27
Collaborative Filtering
 Learn a score function f(u,w) that gives scores for whispers
given a user. Ex:
 Define a rank function that ranks all whispers for all users
f u,w( )=Uu
T
×Ww
rank u,w( )= I f u,k( )> f u,w( ){ }
kÎw,k¹w
å
Whisper @ Big Data Day, LA - 2015/06/27
Learning to rank
We can then define an error function
using the template:
where L is a non-decreasing loss
function and rank is the actual rank.
err f x( ), y( )= L rank x, y( )( )
Rank
Loss
Whisper @ Big Data Day, LA - 2015/06/27
Learning to rank
Problem: For large datasets like ours, it is computationally
expensive to obtain exact ranks of items.
Solution: Don’t use exact rank! Follow a sampling process to
approximate the rank:
where D is how many times of random samplings before we find the first violation.
Online learning to rank - utilize Weighted Approximate Rank
Pairwise Loss and optimize with stochastic gradient descent.
*Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings.
Machine learning, 81(1):21–35, 2010.
approx-rank = #TotalWhispers
D
Whisper @ Big Data Day, LA - 2015/06/27
Evaluation
 Achieved recall@1%:
 40% for our Hearts dataset.
 20% for our Conversations dataset.
 Used for personalized push notifications and feed
generation.
Whisper @ Big Data Day, LA - 2015/06/27
Examples Using Hearts
Whisper @ Big Data Day, LA - 2015/06/27
Examples Using Hearts
Whisper @ Big Data Day, LA - 2015/06/27
For Users Without Sufficient Signals
Whisper @ Big Data Day, LA - 2015/06/27
Identifying High-Quality Content
Maarten Bosma
Whisper @ Big Data Day, LA - 2015/06/27
Whisper’s variety of content
Whisper @ Big Data Day, LA - 2015/06/27
What is high quality?
 Liked by a wide variety of people
 Deep, Emotional
 Text: Writing style, grammar & spelling
 Image: Quality photo
 Original
 “High stakes”
Whisper @ Big Data Day, LA - 2015/06/27
Popular ≠ High quality
 Great content might not get exposure
 Selection bias
 Low quality content might be engaging
 Low quality content generates attention
 Still useful to rank a set of preselected whispers
Whisper @ Big Data Day, LA - 2015/06/27
The problem with using only
recommendations
 May be one-sided
 Exploitation, no exploration
 Algorithm makes mistakes
 New content problem
Whisper @ Big Data Day, LA - 2015/06/27
The solution
 Two potential uses:
 Human Curation
 Use quality score as a tool to find high quality whispers
 Quality Score Filter for recommendations
 Quality score for each piece of content
Whisper @ Big Data Day, LA - 2015/06/27
Basic Model
 40k whispers promoted by curators
 100k whispers from background collection
 Model
 Logistic Regression
 χ2 feature selection
 1 to 6 n-grams of characters
Whisper @ Big Data Day, LA - 2015/06/27
Additional Features
 Length of text
 Pos-Tags
 Punctuation, Capitalization
 Similarity with background corpus
 Likelihood under language model
 Out-of-vocabulary
 Topic Models
 KL Divergence
 See Agichtein et. al., Finding High-Quality Content in Social Media, WSDM, 2008
Whisper @ Big Data Day, LA - 2015/06/27
TextShape
 Opposite of stop word removal and stemming
 Used as alternative model to find good whispers
Ex:
I danced with two people at my wedding. The one I married,
and the one man I loved. 
I xed with two x at my xing. The one I xed, and the one x I
xed.
Whisper @ Big Data Day, LA - 2015/06/27
Thank You for Listening!
Questions?
 For more info:
 http://www.whisper.sh
 Contact us at ulas@whisper.sh
 Try out the app for yourself!
Whisper @ Big Data Day, LA - 2015/06/27
Our Technology Stack for DS
Whisper @ Big Data Day, LA - 2015/06/27

More Related Content

What's hot

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science teamAshish Bansal
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Social Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's ConnectionsSocial Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's Connections501 Commons
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approachjoshwills
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongMarTech Conference
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudJaipaul Agonus
 
Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...Zach Gemignani
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyExperian_US
 
Big Data Innovation
Big Data InnovationBig Data Innovation
Big Data Innovationpaul.hawking
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationDomino Data Lab
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questionscrystalpullen
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Edureka!
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 

What's hot (20)

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Social Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's ConnectionsSocial Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's Connections
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
How does big data impact you
How does big data impact youHow does big data impact you
How does big data impact you
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and Society
 
Big Data Innovation
Big Data InnovationBig Data Innovation
Big Data Innovation
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 

Similar to Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
InstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docxInstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docxpauline234567
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinarQuestionPro
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and TextNBER
 
SurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarSurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarQuestionPro
 
SurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesSurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesQuestionPro
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)Zina Petrushyna
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...keelangreen
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1NBER
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big DataKhadija Atiya
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011idoguy
 
Advancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeAdvancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeChad Travis
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseLisa Cohen
 

Similar to Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper (20)

SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
InstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docxInstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docx
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
SurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarSurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff Webinar
 
SurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesSurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar Slides
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 
Community needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handoutCommunity needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handout
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Advancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeAdvancing Alcohol Behavior Change
Advancing Alcohol Behavior Change
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

  • 1. Data Science at Whisper: From Content Quality to Personalization ULAS BARDAK, MAARTEN BOSMA, MARK HSIAO Whisper @ Big Data Day, LA - 2015/06/27
  • 2. This Talk  Introduction to Whisper  Data Science problems overview  Examples  Personalization and Recommendations  Deep dive – Identifying Similar Users and its applications  Like-minded users for recommendations  Identifying content with broad appeal Whisper @ Big Data Day, LA - 2015/06/27
  • 3. The Rise of the Anonymous Apps Google Trends – “Anonymous App” Whisper @ Big Data Day, LA - 2015/06/27
  • 4. A little background on Whisper  Anonymous Social Network focused on mobile apps  Users come to share secrets, make confessions, find others to connect to  No need to create an account  Engagement through replies, direct messages, “hearts”  Millions of users & hundreds of millions of whispers Whisper @ Big Data Day, LA - 2015/06/27
  • 5. High Level Usage Patterns App Launch Recommended Whispers Recommendation Engine User + Content Models User Engagement Whisper Create Suggest Image Creation Flow Interaction Flow Whisper @ Big Data Day, LA - 2015/06/27
  • 6. Some Problems We Are Tackling Content Understanding • Spam detection • Language detection • Content quality prediction • Content classification • Image Suggestion User Understanding • Spammer detection • Personalization • Similar user detection • Churn prediction Overall • A/B testing • Reporting Whisper @ Big Data Day, LA - 2015/06/27
  • 7. Language Detection Whisper @ Big Data Day, LA - 2015/06/27
  • 8. Image Quality Estimation Low Quality High Quality Whisper @ Big Data Day, LA - 2015/06/27
  • 9. Recommendations Problem: Showing every user the exact same content is not efficient. Engagement and interest depend on matching users’ preferences to content, i.e. personalization. Requirements: Fast and able to work with little data Whisper @ Big Data Day, LA - 2015/06/27
  • 10. Recommendation Engine High Personalization • Like-minded users • Collaborative Filtering • … High Coverage • Popular in location • Recently popular • Popular with new users • … Combiner • Merge results, deciding on the right ordering • If not enough results, use fallback methods to backfill. Whisper @ Big Data Day, LA - 2015/06/27
  • 11. High Personalization • Identifying Like-minded Users for Recommendations (by Nick Stucky-Mack) • Online learning to rank for Collaborative Filtering Whisper @ Big Data Day, LA - 2015/06/27 Ko-Jen (Mark) Hsiao
  • 12. How do we find likeminded users? 1. Agglomeration [Convert the user into a giant document] 2. Pre-processing [Lowercase, remove stopwords, etc..] 3. Vectorization [Bag of words into vectors] 4. Dimensionality reduction [Autoencoder maps 5K+ into ~100] 5. Similarity calculation [Top k users via cosine similarity] 6. Recommendation [Collect whispers from similar users] 7. Feedback [Regenerate model with new activity] Whisper @ Big Data Day, LA - 2015/06/27
  • 13. High Personalization • Identifying Like-minded Users for Recommendations • Online learning to rank for Collaborative Filtering Whisper @ Big Data Day, LA - 2015/06/27
  • 15. Collaborative Filtering  We want to learn a low dimensional embedding for each user and each Whisper.  Instead of solving this problem by matrix factorization, we view this as a ranking problem.  We only care about top recommended results, not accurate score predictions for all whispers. Whisper @ Big Data Day, LA - 2015/06/27
  • 16. Learning to rank Basic idea: Every time there is an interaction between a user a and whisper w update embeddings a and w such that corresponding inner product has a higher value. Whisper @ Big Data Day, LA - 2015/06/27
  • 17. Collaborative Filtering  Learn a score function f(u,w) that gives scores for whispers given a user. Ex:  Define a rank function that ranks all whispers for all users f u,w( )=Uu T ×Ww rank u,w( )= I f u,k( )> f u,w( ){ } kÎw,k¹w å Whisper @ Big Data Day, LA - 2015/06/27
  • 18. Learning to rank We can then define an error function using the template: where L is a non-decreasing loss function and rank is the actual rank. err f x( ), y( )= L rank x, y( )( ) Rank Loss Whisper @ Big Data Day, LA - 2015/06/27
  • 19. Learning to rank Problem: For large datasets like ours, it is computationally expensive to obtain exact ranks of items. Solution: Don’t use exact rank! Follow a sampling process to approximate the rank: where D is how many times of random samplings before we find the first violation. Online learning to rank - utilize Weighted Approximate Rank Pairwise Loss and optimize with stochastic gradient descent. *Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1):21–35, 2010. approx-rank = #TotalWhispers D Whisper @ Big Data Day, LA - 2015/06/27
  • 20. Evaluation  Achieved recall@1%:  40% for our Hearts dataset.  20% for our Conversations dataset.  Used for personalized push notifications and feed generation. Whisper @ Big Data Day, LA - 2015/06/27
  • 21. Examples Using Hearts Whisper @ Big Data Day, LA - 2015/06/27
  • 22. Examples Using Hearts Whisper @ Big Data Day, LA - 2015/06/27
  • 23. For Users Without Sufficient Signals Whisper @ Big Data Day, LA - 2015/06/27
  • 24. Identifying High-Quality Content Maarten Bosma Whisper @ Big Data Day, LA - 2015/06/27
  • 25. Whisper’s variety of content Whisper @ Big Data Day, LA - 2015/06/27
  • 26. What is high quality?  Liked by a wide variety of people  Deep, Emotional  Text: Writing style, grammar & spelling  Image: Quality photo  Original  “High stakes” Whisper @ Big Data Day, LA - 2015/06/27
  • 27. Popular ≠ High quality  Great content might not get exposure  Selection bias  Low quality content might be engaging  Low quality content generates attention  Still useful to rank a set of preselected whispers Whisper @ Big Data Day, LA - 2015/06/27
  • 28. The problem with using only recommendations  May be one-sided  Exploitation, no exploration  Algorithm makes mistakes  New content problem Whisper @ Big Data Day, LA - 2015/06/27
  • 29. The solution  Two potential uses:  Human Curation  Use quality score as a tool to find high quality whispers  Quality Score Filter for recommendations  Quality score for each piece of content Whisper @ Big Data Day, LA - 2015/06/27
  • 30. Basic Model  40k whispers promoted by curators  100k whispers from background collection  Model  Logistic Regression  χ2 feature selection  1 to 6 n-grams of characters Whisper @ Big Data Day, LA - 2015/06/27
  • 31. Additional Features  Length of text  Pos-Tags  Punctuation, Capitalization  Similarity with background corpus  Likelihood under language model  Out-of-vocabulary  Topic Models  KL Divergence  See Agichtein et. al., Finding High-Quality Content in Social Media, WSDM, 2008 Whisper @ Big Data Day, LA - 2015/06/27
  • 32. TextShape  Opposite of stop word removal and stemming  Used as alternative model to find good whispers Ex: I danced with two people at my wedding. The one I married, and the one man I loved.  I xed with two x at my xing. The one I xed, and the one x I xed. Whisper @ Big Data Day, LA - 2015/06/27
  • 33. Thank You for Listening! Questions?  For more info:  http://www.whisper.sh  Contact us at ulas@whisper.sh  Try out the app for yourself! Whisper @ Big Data Day, LA - 2015/06/27
  • 34. Our Technology Stack for DS Whisper @ Big Data Day, LA - 2015/06/27