Big Data Analytics at Play<br /> a Social Gaming industry perspective at Zynga<br />
Before we begin<br />Why does a(n online) company need analytics?<br /><ul><li>To monitor its operations (data)
is the site/app online and functional?
is data flowing?
do we get alerted when something breaks?
To monitor its business (information)
are top line metrics looking healthy?
are we on target for this week/month/quarter?
To understand its business (knowledge)
how are metrics related?
what drives changes?
To use knowledge strategically (insight)</li></li></ul><li>So what about Zynga?<br /><ul><li>Monitoring need is the same a...
It's an app within an app (FB) within a browser
more places for things to break
Upcoming SlideShare
Loading in …5
×

BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

2,173 views

Published on

Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga.
- Tim Piatenko, ex-Zynga, ex-eBay

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,173
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
44
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

  1. 1. Big Data Analytics at Play<br /> a Social Gaming industry perspective at Zynga<br />
  2. 2. Before we begin<br />Why does a(n online) company need analytics?<br /><ul><li>To monitor its operations (data)
  3. 3. is the site/app online and functional?
  4. 4. is data flowing?
  5. 5. do we get alerted when something breaks?
  6. 6. To monitor its business (information)
  7. 7. are top line metrics looking healthy?
  8. 8. are we on target for this week/month/quarter?
  9. 9. To understand its business (knowledge)
  10. 10. how are metrics related?
  11. 11. what drives changes?
  12. 12. To use knowledge strategically (insight)</li></li></ul><li>So what about Zynga?<br /><ul><li>Monitoring need is the same as everyone else's +
  13. 13. It's an app within an app (FB) within a browser
  14. 14. more places for things to break
  15. 15. It's a huge operational challenge to keep everything running, when millions are playing
  16. 16. It's a content push model with (really) fast release cycle
  17. 17. Collecting all the data and keeping it flowing internally is also a huge challenge
  18. 18. So all of that makes it imperative to stay on top of things 24/7</li></li></ul><li>That's operations, but what about the business?<br /><ul><li>Content driven means you have to monitor business metrics all the time as well!
  19. 19. Best to have overlap with operational metrics
  20. 20. use raw counts for things like visits
  21. 21. But also need calculated metrics with trends
  22. 22. engagement, retention, virality, reach
  23. 23. Need a system that can handle this real and near real time
  24. 24. Need human beings to run the system and use the data</li></li></ul><li>Zynga's approach<br /><ul><li>Robust, simple real-time system (memcache, MySQL)
  25. 25. Robust, sophisticated, and scalable data warehousing solution (Vertica)
  26. 26. In-house developed reporting platform
  27. 27. also includes easy to use A/B testing
  28. 28. A rather large team of engineers and analysts
  29. 29. software tools and DB developers and admins
  30. 30. reporting analysts embedded in game studios
  31. 31. central analysts working with marketing etc.
  32. 32. a research team for deeper understanding</li></li></ul><li>Real-time analytics (monitoring)<br /><ul><li>Meant for quickly pushing raw data into a simple database without any calculations
  33. 33. The point is to know when something is broken as soon as possible
  34. 34. This is not a system for answers, it's a system for alerts!
  35. 35. Throw a chart up on a monitor and watch it every few minutes</li></li></ul><li>The big guns — Vertica!<br /><ul><li>90+% of analytics happens here
  36. 36. near real-time processed data
  37. 37. remove duplicates and such
  38. 38. nighly aggregated data = warehouse
  39. 39. Column storage ideal for huge datasets, where most work is performed on aggregated data
  40. 40. Is scaling very nicely to large clusters
  41. 41. Has very sophisticated SQL extensions
  42. 42. Does have its quirks as well...</li></li></ul><li>Why Vertica? Why not Hadoop?<br /><ul><li>Speed: often want to know things in near real time, not wait for a big map/reduce job to come back
  43. 43. Synergy with the company: good to be the biggest client of a surging business. Our success is your success!
  44. 44. Easier to find good (business) analysts with great SQL background, while map/reduce is often the domain of engineers and academics
  45. 45. In the end, for practical rather than religious reasons :)</li></li></ul><li>Data Warehouse(s)<br /><ul><li>Production cluster runs the reporting and A/B testing platforms
  46. 46. Mirror cluster for ad hoc analysis and deep dives
  47. 47. 1% sample cluster for order of magnitude calculations and games like Cityville and Farmville with too much data :)
  48. 48. not really useful for virals...
  49. 49. Given the number of people accessing data and the amount of data recorded, very important to understand the limitations!</li></li></ul><li>How big is Big?<br /><ul><li>Let's say a game has 10M DAU, some come multiple times
  50. 50. Even a very short session will have 10s of recorded activities
  51. 51. game load tracking, assets loading, game state, clicks
  52. 52. And then there are virals FB feed posts and requests
  53. 53. So all in all, 10s of billions of rows, several terabytes a day
  54. 54. not unusual to pull a dataset of 1B rows
  55. 55. not something you dump into Excel :)</li></li></ul><li>In-house analytics<br /><ul><li>Scale and data specifics make it hard to find canned solutions
  56. 56. Want the ability to dig to arbitrary depth
  57. 57. Want the ability to combine arbitrary data ad hoc
  58. 58. Want to cater to a studio's specific needs
  59. 59. Want to create a simple, scalable, usable system to:
  60. 60. minimize data sources that need reconciliation
  61. 61. minimize operational points of failure
  62. 62. minimize the number of steps involved in analysis</li></li></ul><li>In-house analytics continued<br /><ul><li>Need a balance of self-service and analyst support
  63. 63. Simple reporting web portal with SQL queries wrapped in XML + basic Fusion Charts visualizations
  64. 64. created, maintained,and used by reporting analysts
  65. 65. available to everyone 24/7
  66. 66. everyone is looking at the same data!
  67. 67. Analysts embedded directly into individual studios
  68. 68. "on the ground" understanding of each game
  69. 69. part of the fabric of the studio
  70. 70. yet leveraging the support of the wider analytics org
  71. 71. Analysts in direct contact with infrastructure
  72. 72. solid understanding of the data flow + business needs</li></li></ul><li>Fine, so what is it all used for?<br /><ul><li>Dashboards and reports
  73. 73. MAU/WAU/DAU, user acquisition, daily/weekly retention, lapse/death, player engagement, virality, k-factor, levels, game actions, and of course revenues
  74. 74. Distributions, trends, funnels, segmentation
  75. 75. Combining metrics, understanding feature performance, user behavior, revenue successes and failures
  76. 76. Adjusting quickly, learning from mistakes
  77. 77. Deploying successes widely, planning ahead</li></li></ul><li>Role of an analyst<br /><ul><li>PMs can
  78. 78. track metrics for games/features
  79. 79. pull various reports when something is off
  80. 80. run "simple" ad hoc queries
  81. 81. create and run A/B tests
  82. 82. But analysts can
  83. 83. bridge business and infrastructure
  84. 84. dig deeper into the data
  85. 85. combine huge datasets efficiently
  86. 86. apply their intuitive "feel" for big data
  87. 87. leverage each other's work</li>

×