Data based user segmentation - a practical guide for data analysts

VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Data Update - 01/27/2016vsco.co/blevishkin
Data Update - 03/17/17vsco.co/prazakj
07 DEC 2017
RUBEN KOGEL ( VSCO )
RUBEN@VSCO.CO
@CHILICONDATA on Twitter
Data-based User
Segmentation

VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
What is VSCO?
→ Community and tools for creators
→ 45M monthly audience (web + mobile)
→ 12B images served monthly
→ 70% of daily audience create

Why segment?
→ Marketing / Design
• where do we position our product?
• how do we message our target audience?
• what usage do we design for?
• how do we make our UI more intuitive?
→ Growth / Biz Ops
• are our users engaged?
• how are they using our app in practice?
vsco.co/evanhundelt

the theory
usage frequency
milesdriven
commuters
taxi driversweekenders
greenies

where do you draw the line??
0 20 40 60 80 100
0102030
editing usage
number of actions
numberofpeople(inthousands)
the practice
0 20 40 60 80 100
01020304050
sessions
number of actions
numberofpeople(inthousands) 0 20 40 60 80 100
010203040
publishing usage
number of actions
numberofpeople(inthousands)

→ k-means find the dimensions with the most separation and use that information to form “clusters”
• each additional dimension will change the output - but does it add information?
→ eliminate unnecessary input variables
• use intuition and data exploration
→ segment only on the things that matter:
• age on the platform
• sum of past behavior
• current behavior - what we want to model
→ this is an iterative process: re-do this step after running the clustering algorithm
step 1: choose the right inputs

step 2:
0 20 40 60 80 100
0200004000060000
0 1 2 3 4
010000200003000040000
→ otherwise your model assumes the gap between people
editing 1 and 2 photos counts the same as between people
editing 101 and 102 photos
→ log transform so that the gap between few actions gets
blown up and the gap between large numbers get shrieked
• log(2) - log(1) = 0.69
• log(102) - log(101) = 0.01

step 3: choose the number of clusters that make sense
balance:
→ sparseness
→ interpretability
• does it match intuition?

step 4: deliver the insights in an intuitive way
1 2 3 4 5 6
dimension 0.0 0.0 0.0 0.9 2.8 0.5
dimension 0.0 0.0 0.0 0.6 1.9 0.3
dimension 0.0 0.0 0.0 0.5 1.5 0.3
dimension 0.2 0.1 0.1 8.5 18.4 2.5
dimension 0.2 0.1 0.1 3.1 3.9 1.4
dimension 0.3 4.8 27.1 2.1 20.5 22.7
dimension 0.3 2.5 7.6 1.3 7.7 6.9
dimension 0.3 1.9 3.3 1.1 3.4 3.3
dimension 0.2 3.6 21.4 0.3 3.4 7.3
dimension 0.1 0.2 0.1 2.7 13.0 10.5
dimension 0.1 0.1 0.1 1.6 6.5 4.1
dimension 0.1 0.1 0.1 1.3 3.2 2.5
dimension 0.0 0.0 0.0 0.5 6.4 0.1
dimension 0.0 0.0 0.0 0.4 4.2 0.1
dimension 0.0 0.0 0.0 0.4 2.5 0.1

step 5: use programmatic rules to track segments
→ what happens if we re-compute the clusters every month?
• k-means will define different looking clusters for every different dataset
• a user classified “super editor” one period might be classified “casual editor” the next period with
the exact same behavior
→ instead infer the segment boundaries from the cluster analysis and use these set boundaries to classify
users on an on-going basis
• more stable
• easier to explain

step 6: track on-going classification on a dashboard
segmentation, over time source of the “green” segment, in each month

Summary
→ marketers, designers, and analysts use different
but complementary segmentation approaches
→ data-based segmentation is useful to track
usage; should be based on behavioral data only
→ most usage data is exponential so need log
transform and machine algorithms to identify
cluster boundaries
6 steps to doing a clustering analysis
1. choose the right inputs
2. log transform (almost) everything
3. choose the number of clusters that make sense
4. deliver the insights in an intuitive way
5. use programmatic rules to track cohorts
6. deliver dashboard or on-going classification
vsco.co/sannalinn

Questions?

Data based user segmentation - a practical guide for data analysts

Recommended

Recommended

More Related Content

Similar to Data based user segmentation - a practical guide for data analysts

Similar to Data based user segmentation - a practical guide for data analysts (20)

Recently uploaded

Recently uploaded (20)

Data based user segmentation - a practical guide for data analysts