• Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.

BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko


Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga. …

Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga.
- Tim Piatenko, ex-Zynga, ex-eBay

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Big Data Analytics at Play
    a Social Gaming industry perspective at Zynga
  • 2. Before we begin
    Why does a(n online) company need analytics?
    • To monitor its operations (data)
    • 3. is the site/app online and functional?
    • 4. is data flowing?
    • 5. do we get alerted when something breaks?
    • 6. To monitor its business (information)
    • 7. are top line metrics looking healthy?
    • 8. are we on target for this week/month/quarter?
    • 9. To understand its business (knowledge)
    • 10. how are metrics related?
    • 11. what drives changes?
    • 12. To use knowledge strategically (insight)
  • So what about Zynga?
    • Monitoring need is the same as everyone else's +
    • 13. It's an app within an app (FB) within a browser
    • 14. more places for things to break
    • 15. It's a huge operational challenge to keep everything running, when millions are playing
    • 16. It's a content push model with (really) fast release cycle
    • 17. Collecting all the data and keeping it flowing internally is also a huge challenge
    • 18. So all of that makes it imperative to stay on top of things 24/7
  • That's operations, but what about the business?
    • Content driven means you have to monitor business metrics all the time as well!
    • 19. Best to have overlap with operational metrics
    • 20. use raw counts for things like visits
    • 21. But also need calculated metrics with trends
    • 22. engagement, retention, virality, reach
    • 23. Need a system that can handle this real and near real time
    • 24. Need human beings to run the system and use the data
  • Zynga's approach
    • Robust, simple real-time system (memcache, MySQL)
    • 25. Robust, sophisticated, and scalable data warehousing solution (Vertica)
    • 26. In-house developed reporting platform
    • 27. also includes easy to use A/B testing
    • 28. A rather large team of engineers and analysts
    • 29. software tools and DB developers and admins
    • 30. reporting analysts embedded in game studios
    • 31. central analysts working with marketing etc.
    • 32. a research team for deeper understanding
  • Real-time analytics (monitoring)
    • Meant for quickly pushing raw data into a simple database without any calculations
    • 33. The point is to know when something is broken as soon as possible
    • 34. This is not a system for answers, it's a system for alerts!
    • 35. Throw a chart up on a monitor and watch it every few minutes
  • The big guns — Vertica!
    • 90+% of analytics happens here
    • 36. near real-time processed data
    • 37. remove duplicates and such
    • 38. nighly aggregated data = warehouse
    • 39. Column storage ideal for huge datasets, where most work is performed on aggregated data
    • 40. Is scaling very nicely to large clusters
    • 41. Has very sophisticated SQL extensions
    • 42. Does have its quirks as well...
  • Why Vertica? Why not Hadoop?
    • Speed: often want to know things in near real time, not wait for a big map/reduce job to come back
    • 43. Synergy with the company: good to be the biggest client of a surging business. Our success is your success!
    • 44. Easier to find good (business) analysts with great SQL background, while map/reduce is often the domain of engineers and academics
    • 45. In the end, for practical rather than religious reasons :)
  • Data Warehouse(s)
    • Production cluster runs the reporting and A/B testing platforms
    • 46. Mirror cluster for ad hoc analysis and deep dives
    • 47. 1% sample cluster for order of magnitude calculations and games like Cityville and Farmville with too much data :)
    • 48. not really useful for virals...
    • 49. Given the number of people accessing data and the amount of data recorded, very important to understand the limitations!
  • How big is Big?
    • Let's say a game has 10M DAU, some come multiple times
    • 50. Even a very short session will have 10s of recorded activities
    • 51. game load tracking, assets loading, game state, clicks
    • 52. And then there are virals FB feed posts and requests
    • 53. So all in all, 10s of billions of rows, several terabytes a day
    • 54. not unusual to pull a dataset of 1B rows
    • 55. not something you dump into Excel :)
  • In-house analytics
    • Scale and data specifics make it hard to find canned solutions
    • 56. Want the ability to dig to arbitrary depth
    • 57. Want the ability to combine arbitrary data ad hoc
    • 58. Want to cater to a studio's specific needs
    • 59. Want to create a simple, scalable, usable system to:
    • 60. minimize data sources that need reconciliation
    • 61. minimize operational points of failure
    • 62. minimize the number of steps involved in analysis
  • In-house analytics continued
    • Need a balance of self-service and analyst support
    • 63. Simple reporting web portal with SQL queries wrapped in XML + basic Fusion Charts visualizations
    • 64. created, maintained,and used by reporting analysts
    • 65. available to everyone 24/7
    • 66. everyone is looking at the same data!
    • 67. Analysts embedded directly into individual studios
    • 68. "on the ground" understanding of each game
    • 69. part of the fabric of the studio
    • 70. yet leveraging the support of the wider analytics org
    • 71. Analysts in direct contact with infrastructure
    • 72. solid understanding of the data flow + business needs
  • Fine, so what is it all used for?
    • Dashboards and reports
    • 73. MAU/WAU/DAU, user acquisition, daily/weekly retention, lapse/death, player engagement, virality, k-factor, levels, game actions, and of course revenues
    • 74. Distributions, trends, funnels, segmentation
    • 75. Combining metrics, understanding feature performance, user behavior, revenue successes and failures
    • 76. Adjusting quickly, learning from mistakes
    • 77. Deploying successes widely, planning ahead
  • Role of an analyst
    • PMs can
    • 78. track metrics for games/features
    • 79. pull various reports when something is off
    • 80. run "simple" ad hoc queries
    • 81. create and run A/B tests
    • 82. But analysts can
    • 83. bridge business and infrastructure
    • 84. dig deeper into the data
    • 85. combine huge datasets efficiently
    • 86. apply their intuitive "feel" for big data
    • 87. leverage each other's work