• Share
  • Email
  • Embed
  • Like
  • Private Content
BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko



Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga.

Big Data Analytics at Play - a " Social Gaming" industry perspective at Zynga.
- Tim Piatenko, ex-Zynga, ex-eBay



Total Views
Views on SlideShare
Embed Views



3 Embeds 25

http://www.linkedin.com 19
http://paper.li 5
https://www.linkedin.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko Presentation Transcript

    • Big Data Analytics at Play
      a Social Gaming industry perspective at Zynga
    • Before we begin
      Why does a(n online) company need analytics?
      • To monitor its operations (data)
      • is the site/app online and functional?
      • is data flowing?
      • do we get alerted when something breaks?
      • To monitor its business (information)
      • are top line metrics looking healthy?
      • are we on target for this week/month/quarter?
      • To understand its business (knowledge)
      • how are metrics related?
      • what drives changes?
      • To use knowledge strategically (insight)
    • So what about Zynga?
      • Monitoring need is the same as everyone else's +
      • It's an app within an app (FB) within a browser
      • more places for things to break
      • It's a huge operational challenge to keep everything running, when millions are playing
      • It's a content push model with (really) fast release cycle
      • Collecting all the data and keeping it flowing internally is also a huge challenge
      • So all of that makes it imperative to stay on top of things 24/7
    • That's operations, but what about the business?
      • Content driven means you have to monitor business metrics all the time as well!
      • Best to have overlap with operational metrics
      • use raw counts for things like visits
      • But also need calculated metrics with trends
      • engagement, retention, virality, reach
      • Need a system that can handle this real and near real time
      • Need human beings to run the system and use the data
    • Zynga's approach
      • Robust, simple real-time system (memcache, MySQL)
      • Robust, sophisticated, and scalable data warehousing solution (Vertica)
      • In-house developed reporting platform
      • also includes easy to use A/B testing
      • A rather large team of engineers and analysts
      • software tools and DB developers and admins
      • reporting analysts embedded in game studios
      • central analysts working with marketing etc.
      • a research team for deeper understanding
    • Real-time analytics (monitoring)
      • Meant for quickly pushing raw data into a simple database without any calculations
      • The point is to know when something is broken as soon as possible
      • This is not a system for answers, it's a system for alerts!
      • Throw a chart up on a monitor and watch it every few minutes
    • The big guns — Vertica!
      • 90+% of analytics happens here
      • near real-time processed data
      • remove duplicates and such
      • nighly aggregated data = warehouse
      • Column storage ideal for huge datasets, where most work is performed on aggregated data
      • Is scaling very nicely to large clusters
      • Has very sophisticated SQL extensions
      • Does have its quirks as well...
    • Why Vertica? Why not Hadoop?
      • Speed: often want to know things in near real time, not wait for a big map/reduce job to come back
      • Synergy with the company: good to be the biggest client of a surging business. Our success is your success!
      • Easier to find good (business) analysts with great SQL background, while map/reduce is often the domain of engineers and academics
      • In the end, for practical rather than religious reasons :)
    • Data Warehouse(s)
      • Production cluster runs the reporting and A/B testing platforms
      • Mirror cluster for ad hoc analysis and deep dives
      • 1% sample cluster for order of magnitude calculations and games like Cityville and Farmville with too much data :)
      • not really useful for virals...
      • Given the number of people accessing data and the amount of data recorded, very important to understand the limitations!
    • How big is Big?
      • Let's say a game has 10M DAU, some come multiple times
      • Even a very short session will have 10s of recorded activities
      • game load tracking, assets loading, game state, clicks
      • And then there are virals FB feed posts and requests
      • So all in all, 10s of billions of rows, several terabytes a day
      • not unusual to pull a dataset of 1B rows
      • not something you dump into Excel :)
    • In-house analytics
      • Scale and data specifics make it hard to find canned solutions
      • Want the ability to dig to arbitrary depth
      • Want the ability to combine arbitrary data ad hoc
      • Want to cater to a studio's specific needs
      • Want to create a simple, scalable, usable system to:
      • minimize data sources that need reconciliation
      • minimize operational points of failure
      • minimize the number of steps involved in analysis
    • In-house analytics continued
      • Need a balance of self-service and analyst support
      • Simple reporting web portal with SQL queries wrapped in XML + basic Fusion Charts visualizations
      • created, maintained,and used by reporting analysts
      • available to everyone 24/7
      • everyone is looking at the same data!
      • Analysts embedded directly into individual studios
      • "on the ground" understanding of each game
      • part of the fabric of the studio
      • yet leveraging the support of the wider analytics org
      • Analysts in direct contact with infrastructure
      • solid understanding of the data flow + business needs
    • Fine, so what is it all used for?
      • Dashboards and reports
      • MAU/WAU/DAU, user acquisition, daily/weekly retention, lapse/death, player engagement, virality, k-factor, levels, game actions, and of course revenues
      • Distributions, trends, funnels, segmentation
      • Combining metrics, understanding feature performance, user behavior, revenue successes and failures
      • Adjusting quickly, learning from mistakes
      • Deploying successes widely, planning ahead
    • Role of an analyst
      • PMs can
      • track metrics for games/features
      • pull various reports when something is off
      • run "simple" ad hoc queries
      • create and run A/B tests
      • But analysts can
      • bridge business and infrastructure
      • dig deeper into the data
      • combine huge datasets efficiently
      • apply their intuitive "feel" for big data
      • leverage each other's work