Your SlideShare is downloading. ×
0
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Zero to ten million daily users in four weeks: sustainable speed is king
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Zero to ten million daily users in four weeks: sustainable speed is king

933

Published on

A successful social game can achieve very rapid growth to large scale during a short period of time, with extreme cases (CityVille, The Sims Social, Gardens of Time) achieving millions of daily active …

A successful social game can achieve very rapid growth to large scale during a short period of time, with extreme cases (CityVille, The Sims Social, Gardens of Time) achieving millions of daily active users and tens of millions of monthly active users within a few weeks of launch. In this presentation I will argue that achieving the fastest possible sustainable speed of development is the most important technical factor in launching and operating a successful social game, or indeed any large-scale web business -- whether at startup phase, during rapid growth, or established and mature. I will propose some techniques, both technical and non-technical, to optimize rapid development at large scale. Finally, I will describe some of the typical architecture, processes, and tools used in the development of both the transactional and analytical systems of social games.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
933
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Presentation: "Zero to ten million daily users in four weeks: sustainable speed is king"Track: Highly-available systems / Time: Wednesday 15:20 - 16:20 / Location: FlemingA successful social game can achieve very rapid growth to large scale during a short period of time, with extreme cases (CityVille, The Sims Social, Gardens of Time) achieving millions of daily active users and tens of millions of monthly active users within a few weeks of launch. In this presentation I will argue that achieving the fastest possible sustainable speed of development is the most important technical factor in launching and operating a successful social game, or indeed any large-scale web business -- whether at startup phase, during rapid growth, or established and mature. I will propose some techniques, both technical and non-technical, to optimize rapid development at large scale. Finally, I will describe some of the typical architecture, processes, and tools used in the development of both the transactional and analytical systems of social games.
  • Variety of target audiences, different problems Habbo, Betfair,NCSoft, Atari, Playfish, PlumbeeI am an “organizer”, always looking for the connections between lots of different piecesSo this talk is a bit different, it’s about the big picture and touches just a bit on many different topics
  • I’m trying to apply the lessons I’ve learned from massive, high-growth systems to the startup space= on a budget!!
  • Not enough to crunch to get one feature doneWeeks, months, and years
  • There’s a lot of thinking about speed these days, it goes under different names agile, lean, radical managementReturns greater => you see them earlier, so they accumulate over more timeInvestments are less => there’s a bunch of things you won’t have to do
  • My first lesson: be agileStarting off with agile, because agile is sustainable speed 101I include in agile any framework for incremental deliveryThings like deliver frequently, embrace change, maximize the work not doneThe best places don’t “do” agile, they “are” agile.Process is a tool, a means to an end -- find what works for your business and don’t be afraid to change it
  • Automate everything, but especially the routine work of build, test, deployMachines are faster than people at routine workMachines make fewer mistakes than people when doing routine work, which also makes things faster
  • Don’t try to make process perfect from the startDon’t try to make a perfect system from the startDo what needs doing
  • Has a cumulative effect over development iterationsBecause you can incur it retroactively (you realize later there was a better way)
  • 1. User edits config in googlespreadsheet2. User requests deployment to stable3. Transformer pulls data, transforms to XML4. Transformer commits to Git5. Transformer tells Jenkins to deploy6. Jenkins fetches XML and packages7. Jenkins uploads new config package to S38. Jenkins updates environment config in S3 with new version number9. Jenkins bounces server in stable10. Server gets new config location from S311. Sever reloads config12. Jenkins emails userGoogle docs spreadsheet (CT)TransformerGit (CT)Jenkins (CT)S3 (CT)
  • Dev communities => easy and quick to hire fluent developers, easy access to informationLots of open source components => you can use to make things fasterEncourages componentization => your devs will make things that can be recombined and re-leveraged
  • You don’t have to buildYou don’t have to maintain
  • Push launch dates forward and backward without concern for hardware lead times; never be caught without capacity100 to 1 ratio of servers to opsLower knowledge barrier, lower risk if we choose to migrateThe amount of engineering that goes on! Availability zones, highly reliable distributed storage, block store migrated between machinesDesign for failure tolerance and horizontal server scaling
  • Splitting values into multiple fields means don’t read/write whole thing every time (better performance)Secondary indexes have to be maintained manually, if needed
  • - Reads esp. of friend data offloaded to slave- Shards have to be manually rebalanced with a custom tool
  • You can now get even highly redundant storage at ~10 cents per GB-month = Storing hundreds of GB a day is quite reasonableSo what excuse to not collect everything and keep it forever?Better to collect data through events – can capture non-persistent things, done real time instead of in bulk = more timely, doesn’t overload databasesAt plumbee, we capture the content of every server call and every database write = can recreate any session
  • Personalization: price points,payment methods, discounts,bonuses,make recommendations,
  • Each user who visits while a test is running is permanently assigned to a variant in that test according to the variant weights (may be the control variant)Each combination of variants for all running tests is potentially directed to a different server group => this allows serving totally different functionality to that user
  • Collect everythingSegue: how we do this @ PlumbeeWhat you can do with dataSegue: how we do split testing @ PlumbeeWhy you need this for speed
  • Motivated by object-oriented programmingWe encapsulate data & functions on that data into one componentAn extrapolation of OOP to distributed systemsOrganize system according to functionalityEach block of functionality is called a serviceEach service is developed and deployed independentlyServices interact only through remote calls
  • … and in order to motivate those, I want to talk a bit philosophically about building a software-as-a-service organization.
  • Think about your organization as a system designed for two thingsthe constant delivery of softwarethe operation of constantly changing software. This means you should organize to optimize flow: the flow of software from development to operations the flow of information from operations back into developmentAll of this creates a positive feedback loop that will continuously improve the service you are operating.
  • When designing distd system, Minimize comm + long distance commSo how do you deto optimize flow? Any technical people in the audience will recognize that when designing distributed systems you try to minimize communication across the network because it’s the most expensive and slow thing to do, especially the further you have to send the message. It’s the same thing with an organization – the more people have to communicate in order to get something done, and the more widely spread those people are, the slower things are going to happen. Seems obvious -- the best way to organize is so that the people who have to communicate the most are in the same team.
  • The best way to optimize for flow and minimizecross-team communications is to create small, cross-functional teams who are completely dedicated to a particular product or slice of functionality. These teams should contain all of the skill sets necessary to deliver and operate a service, relying as little as possible on other teams. Small teams can operate very rapidly because fewer people are communicating to make a decision and get things done, which also means that less process is needed to manage that communication. When all the needed skill sets are available to the team, there are no dependencies or gates that can slow things down. Also, restricting the size of the team also means you are forced to restrict the size of the product that the team is tackling. This means that everyone on the team can fully understand what they are trying to achieve, and there’s a greater sense of responsibility for each team member to contribute to making their product better. So how small should your team be? Evidence suggests that the optimal team size is what Amazon calls a two-pizza team – if you can’t feed the team with two pizzas, the team is too large. When this happens the team should be split into smaller teams, each with its own separate full slice of functionality.
  • These are roles, not people – some people may cover more than oneThis is not exhaustive, more roles might be requiredEngineers that cover every required skill set (actionscript, javascript, java)Yes I’m oversimplifying
  • Internal “commodity tech” Faster developmentTeam independenceReduced scope for each teamFlexibilityDevelopment scalabilityGrow through division
  • To encourage use of commodity tech… a postmoden programming mentalityJames Noble and Robert Biddle
  • Google Test Automation ConfWhat’s really dead: taking away the responsibility for quality from the people who create the workThe best teams have the least QA.The last one is the most important, but it can’t be determined by test => product needs to be in the hands of the user!Testing less is actually LESS risky =>(1) you get to the most important test of all (with the user) quicker and so minimize your greatest risk (2) your bugs are less costly because they can do less damage because you can fix them quicker
  • It should involve experimental design (stating hypothesis, removing confounding factors) and proper statistical analysis -- this is rareSimulating actual user behaviour is hard, predicting user behaviour even harderIn fact, you already have gradual rollout – you need same infrastructure for split-testing!Every new product version should be a split test
  • More reliabilityBetter monitoringBetter loggingSelf-managing systems
  • Good developers by nature are carefulHire the best => then need to encourage them to lean forward all the time, “move fast and break things”The best way to get sustainable speed is to create a high-speed culture.
  • Transcript

    • 1. Zero to ten million daily users in fourweeks: sustainable speed is king Jodi Moran, CTO, Plumbee 1
    • 2. Who am I?• Mass market web entertainment for more than 8 years• Small businesses, large businesses• Small teams, large teams• Variety of products• I’m all about the big picture 2
    • 3. About social games • Free-to-play games on Facebook, monetized with microtransactions • Highly interactive • Cost per user matters • Can grow very quickly 3
    • 4. Case study: The Sims Social • Released mid-August 2011 • By mid-September 2011 – 10 million daily active users – 65 million monthly active users – 1 TB of analytics data collected daily 4
    • 5. About Plumbee• Social casino games• Development started October 2011 with 3 engineers• 5 engineers Dec 2011, 8 engineers today• Launching first product in just a few weeks 5
    • 6. What is sustainable speed? • Speed measured by end-to-end time for each change • Sustainability measured by maintaining speed over long time periods 6
    • 7. Why sustainable speed? • Responsiveness – To fickle audience – To changing competition – To changing platform • Returns are greater • Investments are less 7
    • 8. Achieving sustainable speed • Iterate and automate • Use commodity technology • Analyse and improve • Build services • Create a high-speed culture 8
    • 9. Achieving sustainable speed • Iterate and automate • Use commodity technology • Analyse and improve • Build services • Create a high-speed culture 9
    • 10. Be agile • Framework for incremental delivery • Incremental delivery (small batches) by definition improves end-to-end time • Framework for reflecting on process • Focus on principles, not practices: process is a means to an end 10
    • 11. Automate routine work Code build and Deployment of code Provisioning of test execution of unit and configuration to environment tests test environment Execution of Provisioning of Promotion of build to automated end-to- production production end regression test environment suite Deployment of code and configuration to production environment 11
    • 12. Isolate changes • Makes problem causes easy to identify • Place high-risk parts of the system on different release tracks from low-risk parts • Each release track can have a different cadence • At Plumbee: Client, server, application configuration / content, environment configuration each separately versioned and independently releaseable 12
    • 13. Make it minimally viable first • Launch with minimal product … and minimal process … and minimal tech • “If you aren’t embarrassed by your first launch, you didn’t launch early enough” 13
    • 14. Prepare for technical debt • Too much slows you down • But it’s not possible to avoid • So you will need a way to keep it under control • Take it on intentionally when needed 14
    • 15. Case study: content tools • Special case of change isolation / automation • Games have a lot of “configuration” or content that needs to be tweaked and balanced • Content tools usually end at build stage • At Plumbee: – Edit with familiar interface – Button click to deploy to playtesting – Button click to deploy to live 15
    • 16. Content editing tools 16
    • 17. Iterate and automate • Small batches reduce end-to-end time • Small batches help you find problems faster • Automation makes things faster • Automation reduces errors 17
    • 18. Achieving sustainable speed • Iterate and automate • Use commodity technology • Analyse and improve • Build services • Create a high-speed culture 18
    • 19. Use commodity languages• Large developer communities• Many open source components• Aids and encourages componentization and reuse• For example: Java, Javascript, Actionscript 19
    • 20. Use third-party services• World-class features Internal• Low opportunity Company email, calendars, documents, accounting, HR, cost bug-tracking• Maintain business focus Technical Version control, build systems, monitoring, analytics, infrastructure User- facing Customer support, bulk and transactional email 20
    • 21. Virtualized infrastructure with AWS • Flexibility & agility • Small operations team • Infrastructure, not platform • Advanced features • Forces good software practice 21
    • 22. Case study: Highly-scalable storage with commodity tech 22
    • 23. Plumbee data access patterns • Thick client means data is cached client side • High ratio of writes to reads • User primarily reads and writes their own data • Secondarily, reads and rare writes of friends’ data 23
    • 24. Plumbee data storage: micro view • Data stored in key-value form, key is user id • Multiple values stored against the user id • Each value is a data structure serialized to binary format with Google Protocol Buffers • {userid, valueid, value} tuples stored in single table in InnoDB/MySQL: {int, int, blob} • Transactions managed with (modified) Spring / AspectJ 24
    • 25. Plumbee data storage: macro view • MySQL on multi-AZ RDS • Read slaves handle e.g. reads of friend data • Users are spread across many shards • Shards are managed with custom library • Users allocated using simple round-robin to shards, shard mapping persisted 25
    • 26. The results • By using commodity tech + services: MySQL/InnoDB, RDS, Java, Spring/AspectJ, GPB • We have: – Fast access for use cases – Easy to understand and use – No downtime for schema changes – Easy monitoring and tuning – Horizontal scaling – Highly reliability – Automatic failover (with replica reassignment) – Easy snapshot backups • All with just a few man-weeks of effort! 26
    • 27. Commodity technology• Easy and cheap to acquire• Easy to hire people who know it• Quick assembly of product from many parts• Easy to change 27
    • 28. Achieving sustainable speed • Iterate and automate • Use commodity technology • Analyse and improve • Build services • Create a high-speed culture 28
    • 29. Collect user data • Never too much data: collect everything and store it forever • Collect data through events • At Plumbee: we collect the entire content of every request and every database write 29
    • 30. Collect system data • Just another kind of analytics • Instead of “what is user doing”, “what is system doing” • Collect and use system data alongside user data • Report on and monitor both user and system metrics 30
    • 31. Analytics with commodity tech 31
    • 32. What can you do with data? • Reporting • Monitoring • Data mining • Predictive analytics • Personalization • Split-testing 32
    • 33. Split testing • Run controlled experiments to determine how changes affect users • To do this: assign users randomly to one of several product versions, called “variants” • Tag all collected events with variant • Calculate metrics are separately for each variant • Perform statistical tests to determine whether the difference in metric is significant 33
    • 34. Simple random sampling variants = empty; foreach (test in currently running tests) { selectedVariant = test.getStoredVariantForUser(user); if (selectedVariant == null) { if (shard in test) { selectedVariant = test.chooseRandomWeightedVariant(); user.storeVariantForTest(test, selectedVariant); } } variants.addTestAndVariantPair(test, selectedVariant); } serverGroup = getServerGroupForVariants(variants); serverGroup.forwardRequest(request, user, shard, variants); 34
    • 35. Simple significance testing • Conditions – Metric to be improved is a proportion: e.g. percent of users converting to spender. – Proportions are not too close to 0 or 1 – Independent samples – Random sampling • Result: super-simple test for confidence (z-test) that runs in linear time wrt size of test 35
    • 36. Analyse and improve • Analysing your system tells you how to improve it • The more accessible and timely your data, the quicker your decision-making • And the greater your responsiveness to changes • Good analysis and split-testing means you do less work! 36
    • 37. Achieving sustainable speed • Iterate and automate • Use commodity technology • Analyse and improve • Build services • Create a high-speed culture 37
    • 38. What are services? • Essential quality: data & functions on that data combined into one component • Data only accessible through remote API • Each service is developed, deployed, and operated independently of other services • “Service-oriented architecture” is an extrapolation of object-oriented programming to distributed systems 38
    • 39. Technical benefits of SOA • Scalability improvements: data is partitioned • Performance improvements: data storage optimized for specific use cases • System availability improvements: system can fail in parts • But there’s other benefits too… 39
    • 40. Consider the round-trip Concept Live system Development analysis Operation 40
    • 41. Apply distributed systems design • Minimize communication – especially long-distance communication • Make local progress • Optimize the 90% case • Place people who need to communicate the most in the same team 41
    • 42. Match architecture and organization • Create little startups – small cross-functional teams – completely accountable for a specific service – act independently of other teams • “Two-pizza teams” – Small teams can act fast – Scope is restricted – Everyone can understand team purpose – Greater sense of shared responsibility 42
    • 43. A social game team Business, product management Production, process Art, design Data analysis Marketing, community Automated Engineering infrastructure 43
    • 44. Build services • Small, independent, cross-functional teams can iterate quickly • Forming little startups creates organizational scalability • Services act as internal “commodity tech” that can be easily reused 44
    • 45. Create a high-speed culture • Iterate and automate • Use commodity technology • Analyse and improve • Build services • Create a high-speed culture 45
    • 46. Post-modern programming• Glue code is beautiful• “Functionality is an asset, code is a liability”• Embrace heterogeneity, don’t try to standardize• There is no big picture 46
    • 47. Test is dead • Alberto Savoias keynote at GTAC 2011 • What are tests for? – Is the functionality what the developer intended? – Is the functionality what the product owner intended? – Is the functionality what the user wants? • Testing less is less risky 47
    • 48. Corollary: load test is dead • (Good) load tests are very difficult • To test ability to add capacity? – Design for horizontal scalability • To predict capacity needed for more users? – If you have IaaS, good design, good monitoring, and speed, you can scale just-in-time • To predict capacity needed for new features? – Invest in ability to do dark launches and gradual rollouts 48
    • 49. And also: operations is dead • All engineers need mission-critical mentality • When all engineers operate the system, the constraints and requirements of the operational environment will always be taken into account • At Plumbee: all engineers have (audited) access to production, and all engineers take turns on call 49
    • 50. High-speed culture • Hire the best people • Get them passionate about the product • Provide easy access to information • Give them freedom & responsibility • Trust them to make decisions • Encourage them to take risks 50
    • 51. Sustainable speedFrom this… Get this! • Iterate & automate • Get ahead of your • Use commodity customers, competition technology , and platform • Analyse & improve • Achieve greater returns • Build services • Do less work! • Create a high-speed culture 51
    • 52. Come and help us!www.plumbeegames.com 52

    ×