SlideShare a Scribd company logo
1 of 44
Download to read offline
How we built analytics from scratch
(in seven easy steps)
Jodi Moran, Co-founder & CTO
1
Plumbee: social casino games
2
Plumbee’s growth
3
Oct 2011
• 3 founders & 3
founding
employees
• 0 in data
March 2012
• Mirrorball Slots
on Facebook
launch
• 15 staff
• 0 in data
Dec 2012
• Mirrorball Slots
on iOS beta
launch
• 29 staff
• 4 in data
Today
• 1.2M MAU
• 250K DAU
• 39 staff
• 5 in data
“Build, measure, learn”
4
Timing and targeting of offers
Balancing of the virtual economy
Creation of engaging features
Cost-effective acquisition
Goals
5
Never say “we don’t have that data”
Breadth of data use
Depth of data use
Agile data use
Scalable foundation for the future
In the beginning…
6
Step #1:
7
Blank slate No time
No
bandwidth
No
experience
3rd party analytics
Third-party analytics
• Low opportunity
cost
• Full stack solution
• Lots of choices
• Get useful data to
everyone fast
8
9
Step #2:
10
3rd party
systems lack
flexibility
Want to own
the data
Don’t know
what we want
to know
Analytics is
strategic
Collect everything
What is everything?
• State-changing calls from client to server
• Changes of state
• State-changing calls from client to third
parties (Facebook)
Yes, this is a lot of data: 450m events (45 GB
compressed) per day.
Using Amazon Web Services makes this
possible.
11
12
12
12
12
12
Why we like it
No need:
– To test instrumentation
– To add instrumentation of new features
– To touch transactional databases
– To worry we won’t have the data
Easy and fast to implement
... but we still miss things.
13
14
14
Step #3:
15
Lots and lots
of data
Need access
Data is
unstructured
No time to
build
structure
Elastic MapReduce & Hive
16
16
The secret to success
17
The
right
analyst
Technical
skills
Unstructured
data
Data
architecture
Step #4:
18
Only access
via SQL
Lack of
visibility
Want data to
be everyday
Google Spreadsheets
19
20
20
Step #5:
21
Want to know
what worked
Can’t
separate
factors
Want
flexibility
In-house split testing
It’s easy to serve experiments…
• Server-side random assignment of users
• Second tier allows deep tests (bonus:
canary deployments)
• Tool for configuration-only tests
• Test & variant pairs attached to every
analytics event
22
… but it’s hard to analyse experiments
23
Web
analytics
Conversion
rate
Binomial
distribution
Simple
tests
•Measuring variables that don’t satisfy
“conversion rate” assumptions
•The need for an Overall Evaluation Criterion
Step #6:
24
All data
processing is
manual
This is getting
expensive
And it takes a
long time to
run
Automation & optimization
(Basic) optimization
• Spot instances
• Output compression with snappy
• Python streaming jobs
• There’s a lot more we could do…
26
Step #7:
27
Expensive
Hive clusters
Queries take a
long time to
run
Hive
functionality
is limited
Relational data mart
Why Hive AND a traditional database?
15 GB of
aggregates
20 TB total
28
29
29
29
Plumbee analytics today
Goals
30
Never say “we don’t have that data”
Breadth of data use
Depth of data use
Agile data use
Scalable foundation for the future
The results: average daily spenders
31
Month
But we have tons to do.Engineering
• Replace our custom event
aggregators with Flume
• Replace pull-based Hive &
Python streaming jobs with
Cascading + JVM-based
languages
• Change event storage from
JSON to Avro
• Better dashboards and tools
• Consider in-memory
processing, e.g. Spark/Shark
• Toward “big data”
Analysis
• More “actionable”, less
“interesting”
• Continuous optimization: split
/ multivariate testing, multi-
armed bandit
• Better predictive models
• Clustering, segmentation,
personalization
• Toward “data science”
32
33
Jodi Moran jobs.plumbee.com
jodi@plumbee.com www.plumbee.com
@jodi_p_moran apps.facebook.com/mirrorballslots
www.facebook.com/jodipmoran
www.linkedin.com/in/jmoran
Questions? Get in touch!

More Related Content

Similar to How we built analytics from scratch (in seven easy steps)

6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
Dynatrace
 
Spca2014 sp buy orbuild goedhart
Spca2014 sp buy orbuild goedhartSpca2014 sp buy orbuild goedhart
Spca2014 sp buy orbuild goedhart
NCCOMMS
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
Denodo
 
Quality at Speed: More API Testing, Less UI Testing
Quality at Speed: More API Testing, Less UI TestingQuality at Speed: More API Testing, Less UI Testing
Quality at Speed: More API Testing, Less UI Testing
Sauce Labs
 

Similar to How we built analytics from scratch (in seven easy steps) (20)

Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - Plumbee
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Learnings from 7 Years of Integrating Mission-Critical IBM Z® and IBM i with ...
Learnings from 7 Years of Integrating Mission-Critical IBM Z® and IBM i with ...Learnings from 7 Years of Integrating Mission-Critical IBM Z® and IBM i with ...
Learnings from 7 Years of Integrating Mission-Critical IBM Z® and IBM i with ...
 
6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
 
Spca2014 sp buy orbuild goedhart
Spca2014 sp buy orbuild goedhartSpca2014 sp buy orbuild goedhart
Spca2014 sp buy orbuild goedhart
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at Tout
 
Leverage Big Data and Analytics for Testing
Leverage Big Data and Analytics for TestingLeverage Big Data and Analytics for Testing
Leverage Big Data and Analytics for Testing
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's Monitoring
 
MKEsearch 2018 | CSI: Forensic SEO Audits
MKEsearch 2018 | CSI: Forensic SEO AuditsMKEsearch 2018 | CSI: Forensic SEO Audits
MKEsearch 2018 | CSI: Forensic SEO Audits
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BI
 
GIAF USA Spring 2015 - Demystifying data
GIAF USA Spring 2015 - Demystifying dataGIAF USA Spring 2015 - Demystifying data
GIAF USA Spring 2015 - Demystifying data
 
Finding The Perfect Donor Database In An Imperfect World
Finding The Perfect Donor Database In An Imperfect WorldFinding The Perfect Donor Database In An Imperfect World
Finding The Perfect Donor Database In An Imperfect World
 
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
 
Store, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged ConferenceStore, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged Conference
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Quality at Speed: More API Testing, Less UI Testing
Quality at Speed: More API Testing, Less UI TestingQuality at Speed: More API Testing, Less UI Testing
Quality at Speed: More API Testing, Less UI Testing
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

How we built analytics from scratch (in seven easy steps)

  • 1. How we built analytics from scratch (in seven easy steps) Jodi Moran, Co-founder & CTO 1
  • 3. Plumbee’s growth 3 Oct 2011 • 3 founders & 3 founding employees • 0 in data March 2012 • Mirrorball Slots on Facebook launch • 15 staff • 0 in data Dec 2012 • Mirrorball Slots on iOS beta launch • 29 staff • 4 in data Today • 1.2M MAU • 250K DAU • 39 staff • 5 in data
  • 4. “Build, measure, learn” 4 Timing and targeting of offers Balancing of the virtual economy Creation of engaging features Cost-effective acquisition
  • 5. Goals 5 Never say “we don’t have that data” Breadth of data use Depth of data use Agile data use Scalable foundation for the future
  • 7. Step #1: 7 Blank slate No time No bandwidth No experience 3rd party analytics
  • 8. Third-party analytics • Low opportunity cost • Full stack solution • Lots of choices • Get useful data to everyone fast 8
  • 9. 9
  • 10. Step #2: 10 3rd party systems lack flexibility Want to own the data Don’t know what we want to know Analytics is strategic Collect everything
  • 11. What is everything? • State-changing calls from client to server • Changes of state • State-changing calls from client to third parties (Facebook) Yes, this is a lot of data: 450m events (45 GB compressed) per day. Using Amazon Web Services makes this possible. 11
  • 12. 12
  • 13. 12
  • 14. 12
  • 15. 12
  • 16. 12
  • 17. Why we like it No need: – To test instrumentation – To add instrumentation of new features – To touch transactional databases – To worry we won’t have the data Easy and fast to implement ... but we still miss things. 13
  • 18. 14
  • 19. 14
  • 20. Step #3: 15 Lots and lots of data Need access Data is unstructured No time to build structure Elastic MapReduce & Hive
  • 21. 16
  • 22. 16
  • 23. The secret to success 17 The right analyst Technical skills Unstructured data Data architecture
  • 24. Step #4: 18 Only access via SQL Lack of visibility Want data to be everyday Google Spreadsheets
  • 25. 19
  • 26. 20
  • 27. 20
  • 28. Step #5: 21 Want to know what worked Can’t separate factors Want flexibility In-house split testing
  • 29. It’s easy to serve experiments… • Server-side random assignment of users • Second tier allows deep tests (bonus: canary deployments) • Tool for configuration-only tests • Test & variant pairs attached to every analytics event 22
  • 30. … but it’s hard to analyse experiments 23 Web analytics Conversion rate Binomial distribution Simple tests •Measuring variables that don’t satisfy “conversion rate” assumptions •The need for an Overall Evaluation Criterion
  • 31. Step #6: 24 All data processing is manual This is getting expensive And it takes a long time to run Automation & optimization
  • 32.
  • 33.
  • 34.
  • 35. (Basic) optimization • Spot instances • Output compression with snappy • Python streaming jobs • There’s a lot more we could do… 26
  • 36. Step #7: 27 Expensive Hive clusters Queries take a long time to run Hive functionality is limited Relational data mart
  • 37. Why Hive AND a traditional database? 15 GB of aggregates 20 TB total 28
  • 38. 29
  • 39. 29
  • 41. Goals 30 Never say “we don’t have that data” Breadth of data use Depth of data use Agile data use Scalable foundation for the future
  • 42. The results: average daily spenders 31 Month
  • 43. But we have tons to do.Engineering • Replace our custom event aggregators with Flume • Replace pull-based Hive & Python streaming jobs with Cascading + JVM-based languages • Change event storage from JSON to Avro • Better dashboards and tools • Consider in-memory processing, e.g. Spark/Shark • Toward “big data” Analysis • More “actionable”, less “interesting” • Continuous optimization: split / multivariate testing, multi- armed bandit • Better predictive models • Clustering, segmentation, personalization • Toward “data science” 32
  • 44. 33 Jodi Moran jobs.plumbee.com jodi@plumbee.com www.plumbee.com @jodi_p_moran apps.facebook.com/mirrorballslots www.facebook.com/jodipmoran www.linkedin.com/in/jmoran Questions? Get in touch!