BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Like this? Share it with your network


BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko



Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga.

Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga.
- Tim Piatenko, ex-Zynga, ex-eBay



Total Views
Views on SlideShare
Embed Views



3 Embeds 25 19 5 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko Presentation Transcript

  • 1. Big Data Analytics at Play
    a Social Gaming industry perspective at Zynga
  • 2. Before we begin
    Why does a(n online) company need analytics?
    • To monitor its operations (data)
    • 3. is the site/app online and functional?
    • 4. is data flowing?
    • 5. do we get alerted when something breaks?
    • 6. To monitor its business (information)
    • 7. are top line metrics looking healthy?
    • 8. are we on target for this week/month/quarter?
    • 9. To understand its business (knowledge)
    • 10. how are metrics related?
    • 11. what drives changes?
    • 12. To use knowledge strategically (insight)
  • So what about Zynga?
    • Monitoring need is the same as everyone else's +
    • 13. It's an app within an app (FB) within a browser
    • 14. more places for things to break
    • 15. It's a huge operational challenge to keep everything running, when millions are playing
    • 16. It's a content push model with (really) fast release cycle
    • 17. Collecting all the data and keeping it flowing internally is also a huge challenge
    • 18. So all of that makes it imperative to stay on top of things 24/7
  • That's operations, but what about the business?
    • Content driven means you have to monitor business metrics all the time as well!
    • 19. Best to have overlap with operational metrics
    • 20. use raw counts for things like visits
    • 21. But also need calculated metrics with trends
    • 22. engagement, retention, virality, reach
    • 23. Need a system that can handle this real and near real time
    • 24. Need human beings to run the system and use the data
  • Zynga's approach
    • Robust, simple real-time system (memcache, MySQL)
    • 25. Robust, sophisticated, and scalable data warehousing solution (Vertica)
    • 26. In-house developed reporting platform
    • 27. also includes easy to use A/B testing
    • 28. A rather large team of engineers and analysts
    • 29. software tools and DB developers and admins
    • 30. reporting analysts embedded in game studios
    • 31. central analysts working with marketing etc.
    • 32. a research team for deeper understanding
  • Real-time analytics (monitoring)
    • Meant for quickly pushing raw data into a simple database without any calculations
    • 33. The point is to know when something is broken as soon as possible
    • 34. This is not a system for answers, it's a system for alerts!
    • 35. Throw a chart up on a monitor and watch it every few minutes
  • The big guns — Vertica!
    • 90+% of analytics happens here
    • 36. near real-time processed data
    • 37. remove duplicates and such
    • 38. nighly aggregated data = warehouse
    • 39. Column storage ideal for huge datasets, where most work is performed on aggregated data
    • 40. Is scaling very nicely to large clusters
    • 41. Has very sophisticated SQL extensions
    • 42. Does have its quirks as well...
  • Why Vertica? Why not Hadoop?
    • Speed: often want to know things in near real time, not wait for a big map/reduce job to come back
    • 43. Synergy with the company: good to be the biggest client of a surging business. Our success is your success!
    • 44. Easier to find good (business) analysts with great SQL background, while map/reduce is often the domain of engineers and academics
    • 45. In the end, for practical rather than religious reasons :)
  • Data Warehouse(s)
    • Production cluster runs the reporting and A/B testing platforms
    • 46. Mirror cluster for ad hoc analysis and deep dives
    • 47. 1% sample cluster for order of magnitude calculations and games like Cityville and Farmville with too much data :)
    • 48. not really useful for virals...
    • 49. Given the number of people accessing data and the amount of data recorded, very important to understand the limitations!
  • How big is Big?
    • Let's say a game has 10M DAU, some come multiple times
    • 50. Even a very short session will have 10s of recorded activities
    • 51. game load tracking, assets loading, game state, clicks
    • 52. And then there are virals FB feed posts and requests
    • 53. So all in all, 10s of billions of rows, several terabytes a day
    • 54. not unusual to pull a dataset of 1B rows
    • 55. not something you dump into Excel :)
  • In-house analytics
    • Scale and data specifics make it hard to find canned solutions
    • 56. Want the ability to dig to arbitrary depth
    • 57. Want the ability to combine arbitrary data ad hoc
    • 58. Want to cater to a studio's specific needs
    • 59. Want to create a simple, scalable, usable system to:
    • 60. minimize data sources that need reconciliation
    • 61. minimize operational points of failure
    • 62. minimize the number of steps involved in analysis
  • In-house analytics continued
    • Need a balance of self-service and analyst support
    • 63. Simple reporting web portal with SQL queries wrapped in XML + basic Fusion Charts visualizations
    • 64. created, maintained,and used by reporting analysts
    • 65. available to everyone 24/7
    • 66. everyone is looking at the same data!
    • 67. Analysts embedded directly into individual studios
    • 68. "on the ground" understanding of each game
    • 69. part of the fabric of the studio
    • 70. yet leveraging the support of the wider analytics org
    • 71. Analysts in direct contact with infrastructure
    • 72. solid understanding of the data flow + business needs
  • Fine, so what is it all used for?
    • Dashboards and reports
    • 73. MAU/WAU/DAU, user acquisition, daily/weekly retention, lapse/death, player engagement, virality, k-factor, levels, game actions, and of course revenues
    • 74. Distributions, trends, funnels, segmentation
    • 75. Combining metrics, understanding feature performance, user behavior, revenue successes and failures
    • 76. Adjusting quickly, learning from mistakes
    • 77. Deploying successes widely, planning ahead
  • Role of an analyst
    • PMs can
    • 78. track metrics for games/features
    • 79. pull various reports when something is off
    • 80. run "simple" ad hoc queries
    • 81. create and run A/B tests
    • 82. But analysts can
    • 83. bridge business and infrastructure
    • 84. dig deeper into the data
    • 85. combine huge datasets efficiently
    • 86. apply their intuitive "feel" for big data
    • 87. leverage each other's work