SlideShare a Scribd company logo
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Data Update - 01/27/2016vsco.co/blevishkin
Data Update - 03/17/17vsco.co/prazakj
07 DEC 2017
RUBEN KOGEL ( VSCO )
RUBEN@VSCO.CO
@CHILICONDATA on Twitter
Data-based User
Segmentation
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
What is VSCO?
→ Community and tools for creators
→ 45M monthly audience (web + mobile)
→ 12B images served monthly
→ 70% of daily audience create
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Why segment?
→ Marketing / Design
• where do we position our product?
• how do we message our target audience?
• what usage do we design for?
• how do we make our UI more intuitive?
→ Growth / Biz Ops
• are our users engaged?
• how are they using our app in practice?
vsco.co/evanhundelt
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
the theory
usage frequency
milesdriven
commuters
taxi driversweekenders
greenies
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
where do you draw the line??
0 20 40 60 80 100
0102030
editing usage
number of actions
numberofpeople(inthousands)
the practice
0 20 40 60 80 100
01020304050
sessions
number of actions
numberofpeople(inthousands) 0 20 40 60 80 100
010203040
publishing usage
number of actions
numberofpeople(inthousands)
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
→ k-means find the dimensions with the most separation and use that information to form “clusters”
• each additional dimension will change the output - but does it add information?
→ eliminate unnecessary input variables
• use intuition and data exploration
→ segment only on the things that matter:
• age on the platform
• sum of past behavior
• current behavior - what we want to model
→ this is an iterative process: re-do this step after running the clustering algorithm
step 1: choose the right inputs
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 2:
0 20 40 60 80 100
0200004000060000
0 1 2 3 4
010000200003000040000
→ otherwise your model assumes the gap between people
editing 1 and 2 photos counts the same as between people
editing 101 and 102 photos
→ log transform so that the gap between few actions gets
blown up and the gap between large numbers get shrieked
• log(2) - log(1) = 0.69
• log(102) - log(101) = 0.01
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 3: choose the number of clusters that make sense
balance:
→ sparseness
→ interpretability
• does it match intuition?
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 4: deliver the insights in an intuitive way
1 2 3 4 5 6
dimension 0.0 0.0 0.0 0.9 2.8 0.5
dimension 0.0 0.0 0.0 0.6 1.9 0.3
dimension 0.0 0.0 0.0 0.5 1.5 0.3
dimension 0.2 0.1 0.1 8.5 18.4 2.5
dimension 0.2 0.1 0.1 3.1 3.9 1.4
dimension 0.3 4.8 27.1 2.1 20.5 22.7
dimension 0.3 2.5 7.6 1.3 7.7 6.9
dimension 0.3 1.9 3.3 1.1 3.4 3.3
dimension 0.2 3.6 21.4 0.3 3.4 7.3
dimension 0.1 0.2 0.1 2.7 13.0 10.5
dimension 0.1 0.1 0.1 1.6 6.5 4.1
dimension 0.1 0.1 0.1 1.3 3.2 2.5
dimension 0.0 0.0 0.0 0.5 6.4 0.1
dimension 0.0 0.0 0.0 0.4 4.2 0.1
dimension 0.0 0.0 0.0 0.4 2.5 0.1
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 5: use programmatic rules to track segments
→ what happens if we re-compute the clusters every month?
• k-means will define different looking clusters for every different dataset
• a user classified “super editor” one period might be classified “casual editor” the next period with
the exact same behavior
→ instead infer the segment boundaries from the cluster analysis and use these set boundaries to classify
users on an on-going basis
• more stable
• easier to explain
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 6: track on-going classification on a dashboard
segmentation, over time source of the “green” segment, in each month
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Summary
→ marketers, designers, and analysts use different
but complementary segmentation approaches
→ data-based segmentation is useful to track
usage; should be based on behavioral data only
→ most usage data is exponential so need log
transform and machine algorithms to identify
cluster boundaries
6 steps to doing a clustering analysis
1. choose the right inputs
2. log transform (almost) everything
3. choose the number of clusters that make sense
4. deliver the insights in an intuitive way
5. use programmatic rules to track cohorts
6. deliver dashboard or on-going classification
vsco.co/sannalinn
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Questions?

More Related Content

Similar to Data based user segmentation - a practical guide for data analysts

Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
Roman Gavuliak
 
Data Analytics: Understanding Your MongoDB Data
Data Analytics: Understanding Your MongoDB DataData Analytics: Understanding Your MongoDB Data
Data Analytics: Understanding Your MongoDB Data
MongoDB
 
Ogilvie - Beyond the statistical average
Ogilvie  - Beyond the statistical averageOgilvie  - Beyond the statistical average
Ogilvie - Beyond the statistical average
International Software Benchmarking Standards Group (ISBSG)
 
Openobject bi
Openobject biOpenobject bi
Openobject bi
openerpwiki
 
B.Pearson_Ten Trends Reshaping our Industry_r01
B.Pearson_Ten Trends Reshaping our Industry_r01B.Pearson_Ten Trends Reshaping our Industry_r01
B.Pearson_Ten Trends Reshaping our Industry_r01W2O Group
 
Openobject bi
Openobject biOpenobject bi
Openobject bi
Ali Mashduqi
 
Visualizations that make an impact - see what s new in minitab statistical s...
Visualizations that make an impact  - see what s new in minitab statistical s...Visualizations that make an impact  - see what s new in minitab statistical s...
Visualizations that make an impact - see what s new in minitab statistical s...
Minitab, LLC
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
GameCamp
 
Ordina SOFTC Presentation - UsingGeoData_ReportBuilder
Ordina SOFTC Presentation - UsingGeoData_ReportBuilderOrdina SOFTC Presentation - UsingGeoData_ReportBuilder
Ordina SOFTC Presentation - UsingGeoData_ReportBuilder
Ordina Belgium
 
Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...
Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...
Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...
UXPA International
 
Quantitative Analysis of 3D Refractive Index Maps
Quantitative Analysis of 3D Refractive Index MapsQuantitative Analysis of 3D Refractive Index Maps
Quantitative Analysis of 3D Refractive Index Maps
MathieuFRECHIN
 
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
IT Arena
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
Tunghai University
 
The Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB DataThe Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB Data
MongoDB
 
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
Donald Miner
 
Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07
Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07
Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07
pseybold
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
AASTHA76
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Wout Scheepers
 
Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...
Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...
Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...
Olympus IMS
 
Disciplined Entrepreneurship: What can you do for your customer?
Disciplined Entrepreneurship: What can you do for your customer?Disciplined Entrepreneurship: What can you do for your customer?
Disciplined Entrepreneurship: What can you do for your customer?
Elaine Chen
 

Similar to Data based user segmentation - a practical guide for data analysts (20)

Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
 
Data Analytics: Understanding Your MongoDB Data
Data Analytics: Understanding Your MongoDB DataData Analytics: Understanding Your MongoDB Data
Data Analytics: Understanding Your MongoDB Data
 
Ogilvie - Beyond the statistical average
Ogilvie  - Beyond the statistical averageOgilvie  - Beyond the statistical average
Ogilvie - Beyond the statistical average
 
Openobject bi
Openobject biOpenobject bi
Openobject bi
 
B.Pearson_Ten Trends Reshaping our Industry_r01
B.Pearson_Ten Trends Reshaping our Industry_r01B.Pearson_Ten Trends Reshaping our Industry_r01
B.Pearson_Ten Trends Reshaping our Industry_r01
 
Openobject bi
Openobject biOpenobject bi
Openobject bi
 
Visualizations that make an impact - see what s new in minitab statistical s...
Visualizations that make an impact  - see what s new in minitab statistical s...Visualizations that make an impact  - see what s new in minitab statistical s...
Visualizations that make an impact - see what s new in minitab statistical s...
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
 
Ordina SOFTC Presentation - UsingGeoData_ReportBuilder
Ordina SOFTC Presentation - UsingGeoData_ReportBuilderOrdina SOFTC Presentation - UsingGeoData_ReportBuilder
Ordina SOFTC Presentation - UsingGeoData_ReportBuilder
 
Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...
Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...
Introducing a New UX Maturity Metric Team Engagement Score (TES) During Usabi...
 
Quantitative Analysis of 3D Refractive Index Maps
Quantitative Analysis of 3D Refractive Index MapsQuantitative Analysis of 3D Refractive Index Maps
Quantitative Analysis of 3D Refractive Index Maps
 
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
 
The Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB DataThe Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB Data
 
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
 
Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07
Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07
Using Web 2.0 For Outside I Nnovation Seybold Stm Dec 07
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...
Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...
Using Digital Microscopes to Solve Common Microscopy Issues: Even First-Time ...
 
Disciplined Entrepreneurship: What can you do for your customer?
Disciplined Entrepreneurship: What can you do for your customer?Disciplined Entrepreneurship: What can you do for your customer?
Disciplined Entrepreneurship: What can you do for your customer?
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 

Data based user segmentation - a practical guide for data analysts

  • 1. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE Data Update - 01/27/2016vsco.co/blevishkin Data Update - 03/17/17vsco.co/prazakj 07 DEC 2017 RUBEN KOGEL ( VSCO ) RUBEN@VSCO.CO @CHILICONDATA on Twitter Data-based User Segmentation
  • 2. VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE What is VSCO? → Community and tools for creators → 45M monthly audience (web + mobile) → 12B images served monthly → 70% of daily audience create
  • 3. VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE Why segment? → Marketing / Design • where do we position our product? • how do we message our target audience? • what usage do we design for? • how do we make our UI more intuitive? → Growth / Biz Ops • are our users engaged? • how are they using our app in practice? vsco.co/evanhundelt
  • 5. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE where do you draw the line?? 0 20 40 60 80 100 0102030 editing usage number of actions numberofpeople(inthousands) the practice 0 20 40 60 80 100 01020304050 sessions number of actions numberofpeople(inthousands) 0 20 40 60 80 100 010203040 publishing usage number of actions numberofpeople(inthousands)
  • 6. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE → k-means find the dimensions with the most separation and use that information to form “clusters” • each additional dimension will change the output - but does it add information? → eliminate unnecessary input variables • use intuition and data exploration → segment only on the things that matter: • age on the platform • sum of past behavior • current behavior - what we want to model → this is an iterative process: re-do this step after running the clustering algorithm step 1: choose the right inputs
  • 7. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE step 2: 0 20 40 60 80 100 0200004000060000 0 1 2 3 4 010000200003000040000 → otherwise your model assumes the gap between people editing 1 and 2 photos counts the same as between people editing 101 and 102 photos → log transform so that the gap between few actions gets blown up and the gap between large numbers get shrieked • log(2) - log(1) = 0.69 • log(102) - log(101) = 0.01
  • 8. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE step 3: choose the number of clusters that make sense balance: → sparseness → interpretability • does it match intuition?
  • 9. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE step 4: deliver the insights in an intuitive way 1 2 3 4 5 6 dimension 0.0 0.0 0.0 0.9 2.8 0.5 dimension 0.0 0.0 0.0 0.6 1.9 0.3 dimension 0.0 0.0 0.0 0.5 1.5 0.3 dimension 0.2 0.1 0.1 8.5 18.4 2.5 dimension 0.2 0.1 0.1 3.1 3.9 1.4 dimension 0.3 4.8 27.1 2.1 20.5 22.7 dimension 0.3 2.5 7.6 1.3 7.7 6.9 dimension 0.3 1.9 3.3 1.1 3.4 3.3 dimension 0.2 3.6 21.4 0.3 3.4 7.3 dimension 0.1 0.2 0.1 2.7 13.0 10.5 dimension 0.1 0.1 0.1 1.6 6.5 4.1 dimension 0.1 0.1 0.1 1.3 3.2 2.5 dimension 0.0 0.0 0.0 0.5 6.4 0.1 dimension 0.0 0.0 0.0 0.4 4.2 0.1 dimension 0.0 0.0 0.0 0.4 2.5 0.1
  • 10. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE step 5: use programmatic rules to track segments → what happens if we re-compute the clusters every month? • k-means will define different looking clusters for every different dataset • a user classified “super editor” one period might be classified “casual editor” the next period with the exact same behavior → instead infer the segment boundaries from the cluster analysis and use these set boundaries to classify users on an on-going basis • more stable • easier to explain
  • 11. VSCO→CONFIDENTIAL→DONOTDISTRIBUTE step 6: track on-going classification on a dashboard segmentation, over time source of the “green” segment, in each month
  • 12. VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE Summary → marketers, designers, and analysts use different but complementary segmentation approaches → data-based segmentation is useful to track usage; should be based on behavioral data only → most usage data is exponential so need log transform and machine algorithms to identify cluster boundaries 6 steps to doing a clustering analysis 1. choose the right inputs 2. log transform (almost) everything 3. choose the number of clusters that make sense 4. deliver the insights in an intuitive way 5. use programmatic rules to track cohorts 6. deliver dashboard or on-going classification vsco.co/sannalinn