• Save
Gdco12 kartik ayyar
Upcoming SlideShare
Loading in...5
×
 

Gdco12 kartik ayyar

on

  • 609 views

 

Statistics

Views

Total Views
609
Views on SlideShare
523
Embed Views
86

Actions

Likes
0
Downloads
0
Comments
0

5 Embeds 86

http://www.linkedin.com 74
https://www.linkedin.com 7
https://twitter.com 2
http://www.qvilt.com 2
http://search.yahoo.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • .. or "How being a gamer can teach you how to run a live game"
  • Have continuous integration testing, though not continuous deployment
  • Release is a vast superset of building Very different from a shrink wrapped product - Not an SCM admin problem - Distributed systems problem Client Server on the same version
  • Always measure what performance is doing for your game.
  • - Be sure to accurately try to replace prod data for certain performance bugs, e.g. a recurring n^2 loop in a data structure that was initialized very differently in production than in test environments.
  • Timescale: Mar-2011 – Sep 2012
  • We started with just a proxy graph Built a new dashboard that is now adopted Zynga wide - Many data sources allow cross referencing - How are users engaging with our features ? - How is our infrastructure is behaving ? - What are the key performance metrics of our game like ? -How are we affected by external services ?
  • We are an update heavy application, unlike many other websites. User state grows with time, and leave it unchecked can translate to higher server inefficiencies.
  • ODUS and Hidef are both proprietary native PHP extensions Small changes can have a big impact - Quest hook caching AOE optimization

Gdco12 kartik ayyar Gdco12 kartik ayyar Presentation Transcript

  • CityVilleLessons learned & tools used to run a large social gameKartik Ayyar - @ayyarStudio CTO, Zynga
  • Me• Kartik Ayyar• CTO of CityVille @ Zynga
  • What is Cityville?•Casual, social city Building game•Largest social game by MAU peak•Winner, Social Game of the Year, IAIS ’11•Winner, Crunchie, Best Time Sink App ’10
  • What can I learn from it?• Grew rapidly between 12/10 - 1/2011• Top overall social game of 2011• Lessons of velocity and scale are general
  • Its alive. Congrats. What now?• Stay on target.• Grow• Ship• Keep your game healthy
  • Grow
  • Growth I: Server growth• Most learning here happened before us• 3 Tiers: Web tier, MemCache, MySQL• MySQL mostly used as NoSQL• Very sharding friendly architecture• General flow:• Client -> Web -> MemCache -> MySQL
  • Observations• Devlopers think about the game• Insulated from persistence and queries• Writeback caching• Migration and sharding is insivible• Failure recovery is under the hood
  • We did have some hiccups• Persistence relies on loose typing• Very easy to add data• Also very easy to modify data• Having many friends taxed some code
  • Ship
  • How do we ship code?• Ship 2-4 times daily, 4-5 days a week• Code profile keeps moving• Taxed our testing and release tools
  • Lesson I: Content• Content is core to our game• Started with hand edited text file • One giant database in a text file • Gave us iteration flexibility• Thankfully, we fixed these post launch
  • Content tools• Built Game Chef post launch• Replace yourself as engineers• Tools and tests are game changers
  • Shipping Lesson II: Release tools• There were subtle bugs in the tools• This was a distributed systems problem• Also your rollback tool
  • Release tool• Zcon: Parallel and paranoid release tool• - Runs PHP and Flexunit tests• - Performs and verifies CDN uploads• - Checks for unpropagated commits• - Includes notifications• All steps must pass before a release
  • Ship lesson II: Testing• Not enough automation at release• Lots of major iterations pre launch• Inaccuracies in testing• Inadequate unit tests at launch• Thankfully, we fixed this.
  • Automated testing• Enter Automated testing:• - Unit tests, via PHPUnit / Flexunit• - VM cluster running Genie tests• - Mandatory to for new features to add unit tests• Cut down test times to 45 minutes• Lots of bugs caught earlier
  • Health
  • Health I: Performance• Treat performance as a first class feature• Keep running, keep experimenting• Measure load time / FPS vs. bus. metrics.
  • Different profiling strategies• Runtime, programmatic: CIPRO, reports• Runtime, interactive: Monocle• Summary, for alerting: Zops
  • Load time performance• Load time depends on geography• Good geographies are CPU bound• Bad geographies are network bound• 1%tile => USA and Europe• 99%tile => Asia and South America ..• .. and USA too !
  • Load time over time• Daily shipping is awesome, but..• .. avoid death by a thousand cuts
  • Loading optimizations• Network: compress, cache, prefetch• CPU: Lazy process, spread processing• Understanding dependencies is key• Keep experimenting
  • Rendering performance• 2010: Shipped with a display list engine• 2011: Switched over to blitting• Mostly bypasses the Flash display list• Uses low level copyPixels() APIs• 2012: Cityville GPU
  • Cityville GPU – work in progress
  • Health II: Debugging• We shipped with one traffic graph• Had Vertica reports, but they took long• Lots of changes going on at many layers• We needed to debug in real time
  • Zops Dashboard• We built a responsive ops dashboard• Aggregates data from:• Splunk, DBs, nagios, services• Be aware of external events • Social network, browser upgrades, soccer, ISPs, infrastructure providers, royal weddings
  • Health III: Scaling• Concurrents at 20M DAU ? No problem• However, our app is write intensive• User data keeps growing• Watch out for data / user
  • Improving memory per request• Blob analyzer• Hidef - low memory shared constants• Blob splitting to add new worlds• ODUS - lazy serialization extension
  • Future directions• Better content tools• Experimenting with HipHop• Scheduler for animations• Extending tests for performance testing
  • Parting thoughts• Control >= game code• Tools >= user facing code for control• Top of mind:• Content, monitoring, perf and releasing• Assume change and watch out for it
  • Acknowledgements• Cityville team• Zynga Shared Technology• Zcloud team• Cityville Tencent team• Anyone else that I missed
  • Thank you !Reaching me:•@ayyar on Twitter•kayyar [at] zynga•www.quora.com/Kartik-Ayyar