SlideShare a Scribd company logo
Data Science at Whisper:
From Content Quality to
Personalization
ULAS BARDAK, MAARTEN BOSMA, MARK HSIAO
Whisper @ Big Data Day, LA - 2015/06/27
This Talk
 Introduction to Whisper
 Data Science problems overview
 Examples
 Personalization and Recommendations
 Deep dive – Identifying Similar Users and its applications
 Like-minded users for recommendations
 Identifying content with broad appeal
Whisper @ Big Data Day, LA - 2015/06/27
The Rise of the Anonymous Apps
Google Trends – “Anonymous App”
Whisper @ Big Data Day, LA - 2015/06/27
A little background on Whisper
 Anonymous Social Network
focused on mobile apps
 Users come to share secrets,
make confessions, find others to
connect to
 No need to create an account
 Engagement through replies,
direct messages, “hearts”
 Millions of users & hundreds of
millions of whispers
Whisper @ Big Data Day, LA - 2015/06/27
High Level Usage Patterns
App Launch
Recommended
Whispers
Recommendation
Engine
User + Content
Models
User Engagement
Whisper Create
Suggest Image
Creation Flow
Interaction Flow
Whisper @ Big Data Day, LA - 2015/06/27
Some Problems We Are Tackling
Content Understanding
• Spam detection
• Language detection
• Content quality
prediction
• Content classification
• Image Suggestion
User Understanding
• Spammer detection
• Personalization
• Similar user detection
• Churn prediction
Overall
• A/B testing
• Reporting
Whisper @ Big Data Day, LA - 2015/06/27
Language Detection
Whisper @ Big Data Day, LA - 2015/06/27
Image Quality Estimation
Low Quality High Quality
Whisper @ Big Data Day, LA - 2015/06/27
Recommendations
Problem:
Showing every user the exact same content is not
efficient. Engagement and interest depend on matching
users’ preferences to content, i.e. personalization.
Requirements:
Fast and able to work with little data
Whisper @ Big Data Day, LA - 2015/06/27
Recommendation Engine
High Personalization
• Like-minded users
• Collaborative
Filtering
• …
High Coverage
• Popular in location
• Recently popular
• Popular with new
users
• …
Combiner
• Merge results, deciding
on the right ordering
• If not enough results,
use fallback methods to
backfill.
Whisper @ Big Data Day, LA - 2015/06/27
High Personalization
• Identifying Like-minded Users for Recommendations (by Nick Stucky-Mack)
• Online learning to rank for Collaborative Filtering
Whisper @ Big Data Day, LA - 2015/06/27
Ko-Jen (Mark) Hsiao
How do we find likeminded users?
1. Agglomeration [Convert the user into a giant document]
2. Pre-processing [Lowercase, remove stopwords, etc..]
3. Vectorization [Bag of words into vectors]
4. Dimensionality reduction [Autoencoder maps 5K+ into ~100]
5. Similarity calculation [Top k users via cosine similarity]
6. Recommendation [Collect whispers from similar users]
7. Feedback [Regenerate model with new activity]
Whisper @ Big Data Day, LA - 2015/06/27
High Personalization
• Identifying Like-minded Users for Recommendations
• Online learning to rank for Collaborative Filtering
Whisper @ Big Data Day, LA - 2015/06/27
Collaborative Filtering
Whispers
Users User
Whisper
=
x
Whisper @ Big Data Day, LA - 2015/06/27
Collaborative Filtering
 We want to learn a low dimensional embedding for each
user and each Whisper.
 Instead of solving this problem by matrix factorization, we
view this as a ranking problem.
 We only care about top recommended results, not accurate
score predictions for all whispers.
Whisper @ Big Data Day, LA - 2015/06/27
Learning to rank
Basic idea:
Every time there is an interaction between a user a and
whisper w update embeddings a and w such that
corresponding inner product has a higher value.
Whisper @ Big Data Day, LA - 2015/06/27
Collaborative Filtering
 Learn a score function f(u,w) that gives scores for whispers
given a user. Ex:
 Define a rank function that ranks all whispers for all users
f u,w( )=Uu
T
×Ww
rank u,w( )= I f u,k( )> f u,w( ){ }
kÎw,k¹w
å
Whisper @ Big Data Day, LA - 2015/06/27
Learning to rank
We can then define an error function
using the template:
where L is a non-decreasing loss
function and rank is the actual rank.
err f x( ), y( )= L rank x, y( )( )
Rank
Loss
Whisper @ Big Data Day, LA - 2015/06/27
Learning to rank
Problem: For large datasets like ours, it is computationally
expensive to obtain exact ranks of items.
Solution: Don’t use exact rank! Follow a sampling process to
approximate the rank:
where D is how many times of random samplings before we find the first violation.
Online learning to rank - utilize Weighted Approximate Rank
Pairwise Loss and optimize with stochastic gradient descent.
*Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings.
Machine learning, 81(1):21–35, 2010.
approx-rank = #TotalWhispers
D
Whisper @ Big Data Day, LA - 2015/06/27
Evaluation
 Achieved recall@1%:
 40% for our Hearts dataset.
 20% for our Conversations dataset.
 Used for personalized push notifications and feed
generation.
Whisper @ Big Data Day, LA - 2015/06/27
Examples Using Hearts
Whisper @ Big Data Day, LA - 2015/06/27
Examples Using Hearts
Whisper @ Big Data Day, LA - 2015/06/27
For Users Without Sufficient Signals
Whisper @ Big Data Day, LA - 2015/06/27
Identifying High-Quality Content
Maarten Bosma
Whisper @ Big Data Day, LA - 2015/06/27
Whisper’s variety of content
Whisper @ Big Data Day, LA - 2015/06/27
What is high quality?
 Liked by a wide variety of people
 Deep, Emotional
 Text: Writing style, grammar & spelling
 Image: Quality photo
 Original
 “High stakes”
Whisper @ Big Data Day, LA - 2015/06/27
Popular ≠ High quality
 Great content might not get exposure
 Selection bias
 Low quality content might be engaging
 Low quality content generates attention
 Still useful to rank a set of preselected whispers
Whisper @ Big Data Day, LA - 2015/06/27
The problem with using only
recommendations
 May be one-sided
 Exploitation, no exploration
 Algorithm makes mistakes
 New content problem
Whisper @ Big Data Day, LA - 2015/06/27
The solution
 Two potential uses:
 Human Curation
 Use quality score as a tool to find high quality whispers
 Quality Score Filter for recommendations
 Quality score for each piece of content
Whisper @ Big Data Day, LA - 2015/06/27
Basic Model
 40k whispers promoted by curators
 100k whispers from background collection
 Model
 Logistic Regression
 χ2 feature selection
 1 to 6 n-grams of characters
Whisper @ Big Data Day, LA - 2015/06/27
Additional Features
 Length of text
 Pos-Tags
 Punctuation, Capitalization
 Similarity with background corpus
 Likelihood under language model
 Out-of-vocabulary
 Topic Models
 KL Divergence
 See Agichtein et. al., Finding High-Quality Content in Social Media, WSDM, 2008
Whisper @ Big Data Day, LA - 2015/06/27
TextShape
 Opposite of stop word removal and stemming
 Used as alternative model to find good whispers
Ex:
I danced with two people at my wedding. The one I married,
and the one man I loved. 
I xed with two x at my xing. The one I xed, and the one x I
xed.
Whisper @ Big Data Day, LA - 2015/06/27
Thank You for Listening!
Questions?
 For more info:
 http://www.whisper.sh
 Contact us at ulas@whisper.sh
 Try out the app for yourself!
Whisper @ Big Data Day, LA - 2015/06/27
Our Technology Stack for DS
Whisper @ Big Data Day, LA - 2015/06/27

More Related Content

What's hot

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
Experian_US
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
Ashish Bansal
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
mark madsen
 
Social Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's ConnectionsSocial Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's Connections
501 Commons
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
joshwills
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
MarTech Conference
 
How does big data impact you
How does big data impact youHow does big data impact you
How does big data impact you
Annzalie (Ann) Barrett
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
mark madsen
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Jaipaul Agonus
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
DataWorks Summit/Hadoop Summit
 
Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...
Zach Gemignani
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and Society
Experian_US
 
Big Data Innovation
Big Data InnovationBig Data Innovation
Big Data Innovation
paul.hawking
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
Microsoft
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
Domino Data Lab
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
crystalpullen
 
Business analytics
Business analyticsBusiness analytics
Business analytics
SwarnaLatha177
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
mark madsen
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
Klaas Bosteels
 

What's hot (20)

What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Social Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's ConnectionsSocial Network Analysis: Mapping Your Nonprofit's Connections
Social Network Analysis: Mapping Your Nonprofit's Connections
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
How does big data impact you
How does big data impact youHow does big data impact you
How does big data impact you
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and Society
 
Big Data Innovation
Big Data InnovationBig Data Innovation
Big Data Innovation
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 

Similar to Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
20211a05p7
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
InstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docxInstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docx
pauline234567
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
QuestionPro
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
NBER
 
SurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesSurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar Slides
QuestionPro
 
SurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarSurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff Webinar
QuestionPro
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
Zina Petrushyna
 
Community needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handoutCommunity needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handout
Eastern Lancaster County Library
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
keelangreen
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
Matthew Rowe
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
Albert Anthony Gavino, MBA
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
Kris Jack
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
NBER
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
Khadija Atiya
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
Dr. Mohan K. Bavirisetty
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
Advancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeAdvancing Alcohol Behavior Change
Advancing Alcohol Behavior Change
Chad Travis
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
Farida Vis
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
Lisa Cohen
 

Similar to Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper (20)

SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
InstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docxInstructionsA SWOT analysis is used as a strategic planning tech.docx
InstructionsA SWOT analysis is used as a strategic planning tech.docx
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
SurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesSurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar Slides
 
SurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarSurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff Webinar
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 
Community needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handoutCommunity needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handout
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Advancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeAdvancing Alcohol Behavior Change
Advancing Alcohol Behavior Change
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

  • 1. Data Science at Whisper: From Content Quality to Personalization ULAS BARDAK, MAARTEN BOSMA, MARK HSIAO Whisper @ Big Data Day, LA - 2015/06/27
  • 2. This Talk  Introduction to Whisper  Data Science problems overview  Examples  Personalization and Recommendations  Deep dive – Identifying Similar Users and its applications  Like-minded users for recommendations  Identifying content with broad appeal Whisper @ Big Data Day, LA - 2015/06/27
  • 3. The Rise of the Anonymous Apps Google Trends – “Anonymous App” Whisper @ Big Data Day, LA - 2015/06/27
  • 4. A little background on Whisper  Anonymous Social Network focused on mobile apps  Users come to share secrets, make confessions, find others to connect to  No need to create an account  Engagement through replies, direct messages, “hearts”  Millions of users & hundreds of millions of whispers Whisper @ Big Data Day, LA - 2015/06/27
  • 5. High Level Usage Patterns App Launch Recommended Whispers Recommendation Engine User + Content Models User Engagement Whisper Create Suggest Image Creation Flow Interaction Flow Whisper @ Big Data Day, LA - 2015/06/27
  • 6. Some Problems We Are Tackling Content Understanding • Spam detection • Language detection • Content quality prediction • Content classification • Image Suggestion User Understanding • Spammer detection • Personalization • Similar user detection • Churn prediction Overall • A/B testing • Reporting Whisper @ Big Data Day, LA - 2015/06/27
  • 7. Language Detection Whisper @ Big Data Day, LA - 2015/06/27
  • 8. Image Quality Estimation Low Quality High Quality Whisper @ Big Data Day, LA - 2015/06/27
  • 9. Recommendations Problem: Showing every user the exact same content is not efficient. Engagement and interest depend on matching users’ preferences to content, i.e. personalization. Requirements: Fast and able to work with little data Whisper @ Big Data Day, LA - 2015/06/27
  • 10. Recommendation Engine High Personalization • Like-minded users • Collaborative Filtering • … High Coverage • Popular in location • Recently popular • Popular with new users • … Combiner • Merge results, deciding on the right ordering • If not enough results, use fallback methods to backfill. Whisper @ Big Data Day, LA - 2015/06/27
  • 11. High Personalization • Identifying Like-minded Users for Recommendations (by Nick Stucky-Mack) • Online learning to rank for Collaborative Filtering Whisper @ Big Data Day, LA - 2015/06/27 Ko-Jen (Mark) Hsiao
  • 12. How do we find likeminded users? 1. Agglomeration [Convert the user into a giant document] 2. Pre-processing [Lowercase, remove stopwords, etc..] 3. Vectorization [Bag of words into vectors] 4. Dimensionality reduction [Autoencoder maps 5K+ into ~100] 5. Similarity calculation [Top k users via cosine similarity] 6. Recommendation [Collect whispers from similar users] 7. Feedback [Regenerate model with new activity] Whisper @ Big Data Day, LA - 2015/06/27
  • 13. High Personalization • Identifying Like-minded Users for Recommendations • Online learning to rank for Collaborative Filtering Whisper @ Big Data Day, LA - 2015/06/27
  • 15. Collaborative Filtering  We want to learn a low dimensional embedding for each user and each Whisper.  Instead of solving this problem by matrix factorization, we view this as a ranking problem.  We only care about top recommended results, not accurate score predictions for all whispers. Whisper @ Big Data Day, LA - 2015/06/27
  • 16. Learning to rank Basic idea: Every time there is an interaction between a user a and whisper w update embeddings a and w such that corresponding inner product has a higher value. Whisper @ Big Data Day, LA - 2015/06/27
  • 17. Collaborative Filtering  Learn a score function f(u,w) that gives scores for whispers given a user. Ex:  Define a rank function that ranks all whispers for all users f u,w( )=Uu T ×Ww rank u,w( )= I f u,k( )> f u,w( ){ } kÎw,k¹w å Whisper @ Big Data Day, LA - 2015/06/27
  • 18. Learning to rank We can then define an error function using the template: where L is a non-decreasing loss function and rank is the actual rank. err f x( ), y( )= L rank x, y( )( ) Rank Loss Whisper @ Big Data Day, LA - 2015/06/27
  • 19. Learning to rank Problem: For large datasets like ours, it is computationally expensive to obtain exact ranks of items. Solution: Don’t use exact rank! Follow a sampling process to approximate the rank: where D is how many times of random samplings before we find the first violation. Online learning to rank - utilize Weighted Approximate Rank Pairwise Loss and optimize with stochastic gradient descent. *Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1):21–35, 2010. approx-rank = #TotalWhispers D Whisper @ Big Data Day, LA - 2015/06/27
  • 20. Evaluation  Achieved recall@1%:  40% for our Hearts dataset.  20% for our Conversations dataset.  Used for personalized push notifications and feed generation. Whisper @ Big Data Day, LA - 2015/06/27
  • 21. Examples Using Hearts Whisper @ Big Data Day, LA - 2015/06/27
  • 22. Examples Using Hearts Whisper @ Big Data Day, LA - 2015/06/27
  • 23. For Users Without Sufficient Signals Whisper @ Big Data Day, LA - 2015/06/27
  • 24. Identifying High-Quality Content Maarten Bosma Whisper @ Big Data Day, LA - 2015/06/27
  • 25. Whisper’s variety of content Whisper @ Big Data Day, LA - 2015/06/27
  • 26. What is high quality?  Liked by a wide variety of people  Deep, Emotional  Text: Writing style, grammar & spelling  Image: Quality photo  Original  “High stakes” Whisper @ Big Data Day, LA - 2015/06/27
  • 27. Popular ≠ High quality  Great content might not get exposure  Selection bias  Low quality content might be engaging  Low quality content generates attention  Still useful to rank a set of preselected whispers Whisper @ Big Data Day, LA - 2015/06/27
  • 28. The problem with using only recommendations  May be one-sided  Exploitation, no exploration  Algorithm makes mistakes  New content problem Whisper @ Big Data Day, LA - 2015/06/27
  • 29. The solution  Two potential uses:  Human Curation  Use quality score as a tool to find high quality whispers  Quality Score Filter for recommendations  Quality score for each piece of content Whisper @ Big Data Day, LA - 2015/06/27
  • 30. Basic Model  40k whispers promoted by curators  100k whispers from background collection  Model  Logistic Regression  χ2 feature selection  1 to 6 n-grams of characters Whisper @ Big Data Day, LA - 2015/06/27
  • 31. Additional Features  Length of text  Pos-Tags  Punctuation, Capitalization  Similarity with background corpus  Likelihood under language model  Out-of-vocabulary  Topic Models  KL Divergence  See Agichtein et. al., Finding High-Quality Content in Social Media, WSDM, 2008 Whisper @ Big Data Day, LA - 2015/06/27
  • 32. TextShape  Opposite of stop word removal and stemming  Used as alternative model to find good whispers Ex: I danced with two people at my wedding. The one I married, and the one man I loved.  I xed with two x at my xing. The one I xed, and the one x I xed. Whisper @ Big Data Day, LA - 2015/06/27
  • 33. Thank You for Listening! Questions?  For more info:  http://www.whisper.sh  Contact us at ulas@whisper.sh  Try out the app for yourself! Whisper @ Big Data Day, LA - 2015/06/27
  • 34. Our Technology Stack for DS Whisper @ Big Data Day, LA - 2015/06/27