.. or "How being a gamer can teach you how to run a live game"
Have continuous integration testing, though not continuous deployment
Release is a vast superset of building Very different from a shrink wrapped product - Not an SCM admin problem - Distributed systems problem Client <-> Server on the same version
Always measure what performance is doing for your game.
- Be sure to accurately try to replace prod data for certain performance bugs, e.g. a recurring n^2 loop in a data structure that was initialized very differently in production than in test environments.
Timescale: Mar-2011 – Sep 2012
We started with just a proxy graph Built a new dashboard that is now adopted Zynga wide - Many data sources allow cross referencing - How are users engaging with our features ? - How is our infrastructure is behaving ? - What are the key performance metrics of our game like ? -How are we affected by external services ?
We are an update heavy application, unlike many other websites. User state grows with time, and leave it unchecked can translate to higher server inefficiencies.
ODUS and Hidef are both proprietary native PHP extensions Small changes can have a big impact - Quest hook caching AOE optimization
Gdco12 kartik ayyar
CityVilleLessons learned & tools used to run a large social gameKartik Ayyar - @ayyarStudio CTO, Zynga
Growth I: Server growth• Most learning here happened before us• 3 Tiers: Web tier, MemCache, MySQL• MySQL mostly used as NoSQL• Very sharding friendly architecture• General flow:• Client -> Web -> MemCache -> MySQL
Observations• Devlopers think about the game• Insulated from persistence and queries• Writeback caching• Migration and sharding is insivible• Failure recovery is under the hood
We did have some hiccups• Persistence relies on loose typing• Very easy to add data• Also very easy to modify data• Having many friends taxed some code
How do we ship code?• Ship 2-4 times daily, 4-5 days a week• Code profile keeps moving• Taxed our testing and release tools
Lesson I: Content• Content is core to our game• Started with hand edited text file • One giant database in a text file • Gave us iteration flexibility• Thankfully, we fixed these post launch
Content tools• Built Game Chef post launch• Replace yourself as engineers• Tools and tests are game changers
Shipping Lesson II: Release tools• There were subtle bugs in the tools• This was a distributed systems problem• Also your rollback tool
Release tool• Zcon: Parallel and paranoid release tool• - Runs PHP and Flexunit tests• - Performs and verifies CDN uploads• - Checks for unpropagated commits• - Includes notifications• All steps must pass before a release
Ship lesson II: Testing• Not enough automation at release• Lots of major iterations pre launch• Inaccuracies in testing• Inadequate unit tests at launch• Thankfully, we fixed this.
Automated testing• Enter Automated testing:• - Unit tests, via PHPUnit / Flexunit• - VM cluster running Genie tests• - Mandatory to for new features to add unit tests• Cut down test times to 45 minutes• Lots of bugs caught earlier
Health I: Performance• Treat performance as a first class feature• Keep running, keep experimenting• Measure load time / FPS vs. bus. metrics.
Different profiling strategies• Runtime, programmatic: CIPRO, reports• Runtime, interactive: Monocle• Summary, for alerting: Zops
Load time performance• Load time depends on geography• Good geographies are CPU bound• Bad geographies are network bound• 1%tile => USA and Europe• 99%tile => Asia and South America ..• .. and USA too !
Load time over time• Daily shipping is awesome, but..• .. avoid death by a thousand cuts
Health II: Debugging• We shipped with one traffic graph• Had Vertica reports, but they took long• Lots of changes going on at many layers• We needed to debug in real time
Zops Dashboard• We built a responsive ops dashboard• Aggregates data from:• Splunk, DBs, nagios, services• Be aware of external events • Social network, browser upgrades, soccer, ISPs, infrastructure providers, royal weddings
Health III: Scaling• Concurrents at 20M DAU ? No problem• However, our app is write intensive• User data keeps growing• Watch out for data / user
Improving memory per request• Blob analyzer• Hidef - low memory shared constants• Blob splitting to add new worlds• ODUS - lazy serialization extension
Future directions• Better content tools• Experimenting with HipHop• Scheduler for animations• Extending tests for performance testing
Parting thoughts• Control >= game code• Tools >= user facing code for control• Top of mind:• Content, monitoring, perf and releasing• Assume change and watch out for it
Acknowledgements• Cityville team• Zynga Shared Technology• Zcloud team• Cityville Tencent team• Anyone else that I missed
Thank you !Reaching me:•@ayyar on Twitter•kayyar [at] zynga•www.quora.com/Kartik-Ayyar