Gdco12 kartik ayyar

660 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
660
On SlideShare
0
From Embeds
0
Number of Embeds
103
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • .. or "How being a gamer can teach you how to run a live game"
  • Have continuous integration testing, though not continuous deployment
  • Release is a vast superset of building Very different from a shrink wrapped product - Not an SCM admin problem - Distributed systems problem Client <-> Server on the same version
  • Always measure what performance is doing for your game.
  • - Be sure to accurately try to replace prod data for certain performance bugs, e.g. a recurring n^2 loop in a data structure that was initialized very differently in production than in test environments.
  • Timescale: Mar-2011 – Sep 2012
  • We started with just a proxy graph Built a new dashboard that is now adopted Zynga wide - Many data sources allow cross referencing - How are users engaging with our features ? - How is our infrastructure is behaving ? - What are the key performance metrics of our game like ? -How are we affected by external services ?
  • We are an update heavy application, unlike many other websites. User state grows with time, and leave it unchecked can translate to higher server inefficiencies.
  • ODUS and Hidef are both proprietary native PHP extensions Small changes can have a big impact - Quest hook caching AOE optimization
  • Gdco12 kartik ayyar

    1. 1. CityVilleLessons learned & tools used to run a large social gameKartik Ayyar - @ayyarStudio CTO, Zynga
    2. 2. Me• Kartik Ayyar• CTO of CityVille @ Zynga
    3. 3. What is Cityville?•Casual, social city Building game•Largest social game by MAU peak•Winner, Social Game of the Year, IAIS ’11•Winner, Crunchie, Best Time Sink App ’10
    4. 4. What can I learn from it?• Grew rapidly between 12/10 - 1/2011• Top overall social game of 2011• Lessons of velocity and scale are general
    5. 5. Its alive. Congrats. What now?• Stay on target.• Grow• Ship• Keep your game healthy
    6. 6. Grow
    7. 7. Growth I: Server growth• Most learning here happened before us• 3 Tiers: Web tier, MemCache, MySQL• MySQL mostly used as NoSQL• Very sharding friendly architecture• General flow:• Client -> Web -> MemCache -> MySQL
    8. 8. Observations• Devlopers think about the game• Insulated from persistence and queries• Writeback caching• Migration and sharding is insivible• Failure recovery is under the hood
    9. 9. We did have some hiccups• Persistence relies on loose typing• Very easy to add data• Also very easy to modify data• Having many friends taxed some code
    10. 10. Ship
    11. 11. How do we ship code?• Ship 2-4 times daily, 4-5 days a week• Code profile keeps moving• Taxed our testing and release tools
    12. 12. Lesson I: Content• Content is core to our game• Started with hand edited text file • One giant database in a text file • Gave us iteration flexibility• Thankfully, we fixed these post launch
    13. 13. Content tools• Built Game Chef post launch• Replace yourself as engineers• Tools and tests are game changers
    14. 14. Shipping Lesson II: Release tools• There were subtle bugs in the tools• This was a distributed systems problem• Also your rollback tool
    15. 15. Release tool• Zcon: Parallel and paranoid release tool• - Runs PHP and Flexunit tests• - Performs and verifies CDN uploads• - Checks for unpropagated commits• - Includes notifications• All steps must pass before a release
    16. 16. Ship lesson II: Testing• Not enough automation at release• Lots of major iterations pre launch• Inaccuracies in testing• Inadequate unit tests at launch• Thankfully, we fixed this.
    17. 17. Automated testing• Enter Automated testing:• - Unit tests, via PHPUnit / Flexunit• - VM cluster running Genie tests• - Mandatory to for new features to add unit tests• Cut down test times to 45 minutes• Lots of bugs caught earlier
    18. 18. Health
    19. 19. Health I: Performance• Treat performance as a first class feature• Keep running, keep experimenting• Measure load time / FPS vs. bus. metrics.
    20. 20. Different profiling strategies• Runtime, programmatic: CIPRO, reports• Runtime, interactive: Monocle• Summary, for alerting: Zops
    21. 21. Load time performance• Load time depends on geography• Good geographies are CPU bound• Bad geographies are network bound• 1%tile => USA and Europe• 99%tile => Asia and South America ..• .. and USA too !
    22. 22. Load time over time• Daily shipping is awesome, but..• .. avoid death by a thousand cuts
    23. 23. Loading optimizations• Network: compress, cache, prefetch• CPU: Lazy process, spread processing• Understanding dependencies is key• Keep experimenting
    24. 24. Rendering performance• 2010: Shipped with a display list engine• 2011: Switched over to blitting• Mostly bypasses the Flash display list• Uses low level copyPixels() APIs• 2012: Cityville GPU
    25. 25. Cityville GPU – work in progress
    26. 26. Health II: Debugging• We shipped with one traffic graph• Had Vertica reports, but they took long• Lots of changes going on at many layers• We needed to debug in real time
    27. 27. Zops Dashboard• We built a responsive ops dashboard• Aggregates data from:• Splunk, DBs, nagios, services• Be aware of external events • Social network, browser upgrades, soccer, ISPs, infrastructure providers, royal weddings
    28. 28. Health III: Scaling• Concurrents at 20M DAU ? No problem• However, our app is write intensive• User data keeps growing• Watch out for data / user
    29. 29. Improving memory per request• Blob analyzer• Hidef - low memory shared constants• Blob splitting to add new worlds• ODUS - lazy serialization extension
    30. 30. Future directions• Better content tools• Experimenting with HipHop• Scheduler for animations• Extending tests for performance testing
    31. 31. Parting thoughts• Control >= game code• Tools >= user facing code for control• Top of mind:• Content, monitoring, perf and releasing• Assume change and watch out for it
    32. 32. Acknowledgements• Cityville team• Zynga Shared Technology• Zcloud team• Cityville Tencent team• Anyone else that I missed
    33. 33. Thank you !Reaching me:•@ayyar on Twitter•kayyar [at] zynga•www.quora.com/Kartik-Ayyar

    ×