Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

Data Science at Whisper:
From Content Quality to
Personalization
ULAS BARDAK, MAARTEN BOSMA, MARK HSIAO
Whisper @ Big Data Day, LA - 2015/06/27

This Talk
 Introduction to Whisper
 Data Science problems overview
 Examples
 Personalization and Recommendations
 Deep dive – Identifying Similar Users and its applications
 Like-minded users for recommendations
 Identifying content with broad appeal

The Rise of the Anonymous Apps
Google Trends – “Anonymous App”

A little background on Whisper
 Anonymous Social Network
focused on mobile apps
 Users come to share secrets,
make confessions, find others to
connect to
 No need to create an account
 Engagement through replies,
direct messages, “hearts”
 Millions of users & hundreds of
millions of whispers

High Level Usage Patterns
App Launch
Recommended
Whispers
Recommendation
Engine
User + Content
Models
User Engagement
Whisper Create
Suggest Image
Creation Flow
Interaction Flow

Some Problems We Are Tackling
Content Understanding
• Spam detection
• Language detection
• Content quality
prediction
• Content classification
• Image Suggestion
User Understanding
• Spammer detection
• Personalization
• Similar user detection
• Churn prediction
Overall
• A/B testing
• Reporting

Language Detection

Image Quality Estimation
Low Quality High Quality

Recommendations
Problem:
Showing every user the exact same content is not
efficient. Engagement and interest depend on matching
users’ preferences to content, i.e. personalization.
Requirements:
Fast and able to work with little data

Recommendation Engine
High Personalization
• Like-minded users
• Collaborative
Filtering
• …
High Coverage
• Popular in location
• Recently popular
• Popular with new
users
• …
Combiner
• Merge results, deciding
on the right ordering
• If not enough results,
use fallback methods to
backfill.

• Identifying Like-minded Users for Recommendations (by Nick Stucky-Mack)
• Online learning to rank for Collaborative Filtering
Ko-Jen (Mark) Hsiao

How do we find likeminded users?
1. Agglomeration [Convert the user into a giant document]
2. Pre-processing [Lowercase, remove stopwords, etc..]
3. Vectorization [Bag of words into vectors]
4. Dimensionality reduction [Autoencoder maps 5K+ into ~100]
5. Similarity calculation [Top k users via cosine similarity]
6. Recommendation [Collect whispers from similar users]
7. Feedback [Regenerate model with new activity]

• Identifying Like-minded Users for Recommendations
• Online learning to rank for Collaborative Filtering

Collaborative Filtering
Whispers
Users User
Whisper
=
x

 We want to learn a low dimensional embedding for each
user and each Whisper.
 Instead of solving this problem by matrix factorization, we
view this as a ranking problem.
 We only care about top recommended results, not accurate
score predictions for all whispers.

Learning to rank
Basic idea:
Every time there is an interaction between a user a and
whisper w update embeddings a and w such that
corresponding inner product has a higher value.

 Learn a score function f(u,w) that gives scores for whispers
given a user. Ex:
 Define a rank function that ranks all whispers for all users
f u,w( )=Uu
T
×Ww
rank u,w( )= I f u,k( )> f u,w( ){ }
kÎw,k¹w
å

Learning to rank
We can then define an error function
using the template:
where L is a non-decreasing loss
function and rank is the actual rank.
err f x( ), y( )= L rank x, y( )( )
Rank
Loss

Learning to rank
Problem: For large datasets like ours, it is computationally
expensive to obtain exact ranks of items.
Solution: Don’t use exact rank! Follow a sampling process to
approximate the rank:
where D is how many times of random samplings before we find the first violation.
Online learning to rank - utilize Weighted Approximate Rank
Pairwise Loss and optimize with stochastic gradient descent.
*Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings.
Machine learning, 81(1):21–35, 2010.
approx-rank = #TotalWhispers
D

Evaluation
 Achieved recall@1%:
 40% for our Hearts dataset.
 20% for our Conversations dataset.
 Used for personalized push notifications and feed
generation.

Examples Using Hearts

For Users Without Sufficient Signals

Identifying High-Quality Content
Maarten Bosma

Whisper’s variety of content

What is high quality?
 Liked by a wide variety of people
 Deep, Emotional
 Text: Writing style, grammar & spelling
 Image: Quality photo
 Original
 “High stakes”

Popular ≠ High quality
 Great content might not get exposure
 Selection bias
 Low quality content might be engaging
 Low quality content generates attention
 Still useful to rank a set of preselected whispers

The problem with using only
recommendations
 May be one-sided
 Exploitation, no exploration
 Algorithm makes mistakes
 New content problem

The solution
 Two potential uses:
 Human Curation
 Use quality score as a tool to find high quality whispers
 Quality Score Filter for recommendations
 Quality score for each piece of content

Basic Model
 40k whispers promoted by curators
 100k whispers from background collection
 Model
 Logistic Regression
 χ2 feature selection
 1 to 6 n-grams of characters

Additional Features
 Length of text
 Pos-Tags
 Punctuation, Capitalization
 Similarity with background corpus
 Likelihood under language model
 Out-of-vocabulary
 Topic Models
 KL Divergence
 See Agichtein et. al., Finding High-Quality Content in Social Media, WSDM, 2008

TextShape
 Opposite of stop word removal and stemming
 Used as alternative model to find good whispers
Ex:
I danced with two people at my wedding. The one I married,
and the one man I loved. 
I xed with two x at my xing. The one I xed, and the one x I
xed.

Thank You for Listening!
Questions?
 For more info:
 http://www.whisper.sh
 Contact us at ulas@whisper.sh
 Try out the app for yourself!

Our Technology Stack for DS

Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper

Similar to Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Big Data Day LA 2015 - Data Science at Whisper - From content quality to personalization by Ulas Bardak of Whisper