When Devs Do Ops
Upcoming SlideShare
Loading in...5

When Devs Do Ops



At wooga the separate game teams operate their own games. That means that two developers not only develop the backend for a social game but they also the administrator's part. ...

At wooga the separate game teams operate their own games. That means that two developers not only develop the backend for a social game but they also the administrator's part.
This presentation gives an insight on how this is done, what tools are used and how the most important challenges were solved.



Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://a0.twimg.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

When Devs Do Ops When Devs Do Ops Presentation Transcript

  • 1 When Devs Do Ops! 1,000,000 daily users and just two developers !Jesper Richter-Reichhelm!Head of Engineering!wooga !
  • About ! 2!"#$%&##()& !"#$%&.#/-%+0&1#023&$()*+*",&)(&-."/001" S&()>T+*"U&."/0V0"()*2)%3"$()*+-45"6&7*+-8$)"9&:28&75" 62%%+48"4+77+-"$B"E&%2>"#&)*4"2)"8T+";$78<=-2)>?"@+)8(-+4"A8$8&7"$B"CDEFG" #$-7*"H)8+-)&I$)&7"8+&E"$B"J0" W$8&7"8+&E"42<+"24"VD"A/X"=&>?+)*5"YX"B-$E"/0">$()8-2+4"2)"6+-72)" B-$)8+)*G"*+,&-%)%-& *+,&-%)%-&D"%&E+4"$)"&>+=$$?K"/0E"&>IL+"(4+-4" ;$48+*"&8"&>+=$$?"62%%+48"+(-$:+&)"4$>2&7"%&E+" 7&4T">72+)8"*+L+7$:+-5"MN"#$-7*"#2*+" Z(=."$)"Z&274"=&>?+)*"O)7."DP"$B"(4+-4"B-$E"&*L+-I42)%" U.[S"F"Z+*24"]6"J0P"$B"(4+-4"&-+"B+E&7+"A&%+"/0QR0G" !"#$$%&"
  • When Devs Do Ops ! 3!  Starting Point!!  Finding Helpers!!  Challenges and Solutions!!  Looking back! !"#$$%&"
  • Starting Point ! 4!  InOctober 2009 we set out to build a backend for woogaʼs first game with a persistent world.!!  Our goal was to have more than 1,000,000 daily active users.!!  Wehave never done something like this before (who had?)! !"#$$%&"
  • Hosting model must fit the needs ! 5!  Small team dedicated to a single game! !  2 backend folks to do both development and operation!!  “Extreme” life cycle of a game! !(graphic by Rightscale)!!  We simply did not know what to expect! !  Scale up hosting when you are successful – not before!! !"#$$%&"
  • When Devs Do Ops ! 6!  Starting Point!!  Finding Helpers!!  Challenges and Solutions!!  Looking back! !"#$$%&"
  • Focus on what you do best… ! 7 and get help for the rest !!  Amazon Web Services! !  Easy to scale up and down! !  No limitations!!  Scalarium! !  Making operation of a large cluster easy! !  Provides default setup!!  New Relic! !  Profiling of application at runtime! !  Info from HTTP request down to SQL query! !"#$$%&"
  • When Devs Do Ops ! 8!  Starting Point!!  Finding Helpers!!  Challenges and Solutions!!  Looking back! !"#$$%&"
  • Challenge:
 9 Growing traffic !1,200,0001,000,000 800,000 600,000 400,000 200,000 0 4/22/14 5/22/14 6/22/14 7/22/14 8/22/14 9/22/14 10/22/14 11/22/14 !"#$$%&"
  • Solution:
 10 Automate to scale up and out easily !!  Scaling up! !  Application servers: 2 cores => 8 cores! !  DB servers: 7.5GB => 68GB!!  Scaling out! !  Application servers: 2 => up to 50! !  MySQL servers: 2 => 16 => 8!!  Easy installation by automation! !  Chef recipes managed by Scalarium make that easy! !"#$$%&"
  • Challenge:
 11Idle servers cost money, too! peak : valley ratio 20:1 @ VZ 5:1 @ FB !"#$$%&"
  • Solution! 12 Run servers only when needed!!  Scalarium offers time and load based instances! !  Start and stop instances based on time! !  Start and stop instances based on load! !"#$$%&"
  • Solution! 13Run servers only when needed! !"#$$%&"
  • Challenge ! 14 No application is perfect !!  Do you know your applicationʼs behavior?! !  How is it used?! !  Whatʼs the throughput right now?! !  What HTTP requests stress the DB most?!!  What did change in the last release?! !  How up-to-date is your information?! !  Can you compare performance now with last weekʼs?! !"#$$%&"
  • Solution! 15New Relic provides trace information! !"#$$%&"
  • Solution
 16New Relic provides overviews ! !"#$$%&"
  • Solution
 17 New Relic provides custom charts !!  Screenshot: Scalarium! !"#$$%&"
  • Challenge! 18 Itʼs hard to scale out MySQL!!  Caching requests would not work! !  Almost all HTTP requests were changing something in DB!!  We optimized our MySQL configuration! !  Perconaʼs XtraDB, innodb_flush_method = O_DIRECT! !  Patches to ActiveRecord and data_fabric gem!!  Still I/O performance of EBS was a hard limit! !  Maximum of 1,000 write transactions / sec / server! !  But already 5,000 writes / sec at peak for 8 masters!!  So we sharded our MySQL databases! !  But handling 16 DBs is no fun…! !  … and at that time we only had 300,000 users! !"#$$%&"
  • Solution! 19 Pick a DB thatʼs better suited!!  Redis was our choice! !  Master runs in-memory only (45,000 writes / sec / server)! !  Slaves backup data to disk every 15 minutes! !  Rich data model that is way beyond simple key/value!!  We migrated most write heavy tables to Redis! !  Currently Redis has 2.5x transactions / sec than MySQL! !  But MySQL has still more data (256 GB vs. 40 GB)! !"#$$%&"
  • Challenge
 20 Handling Data (Bases) is hard!!  MySQL has its problems! !  Making a backup of 64GB takes about 30 minutes…! !  But restoring it can take 6 hours or more!!  Redis is not perfect, too! !  Memory consumption of process grows over time! !  If too much memory is used backup to disk no longer works! !  Every two weeks we had to replace servers to “reset” RAM! !"#$$%&"
  • Challenge
 21Redis memory fragmentation! !"#$$%&"
  • “Solution”
 22 Automated setups always helps !!  Replacing MySQL DBs! !  Start up new master / slave and restore backup! !  Make master slave of existing slave! !  Wait until replication in sync again (some hours)! !  Switch to new master and remove old master / slave! E0V&" 40V&" E0V=" 40V="!  Replacing Redis DBs! !  Same procedure as above! !  But everything can be done in 30 minutes! !"#$$%&"
  • When Devs Do Ops ! 23!  Starting Point!!  Finding Helpers!!  Challenges and Solutions!!  Looking back! !"#$$%&"
  • We still have only 2 backend
 24 developers to operate this! !"#$$%&"
  • Know what it means
 25 to be in a Cloud !!  Using a cloud has some disadvantages! !  Another game with dedicated HW has 8x better performance! !  I/O and network performance of EC2 is quite … err … limited! !  You cannot pick the best hardware possible! !  All hosts have the same chance of failure!!  But offers unique advantages! !  Having unlimited servers on demand is just awesome!! !  You pay only for what you need when you need it! !  You can concentrate on your product! !  Itʼs very easy to experiment! !"#$$%&"
  • Play to its strengths
 26 and adjust for its weaknesses !!  Play to its strengths! !  Program your infrastructure, automate as much as possible! !  Measure closely and react to changes! !  Scaling up and out is quite easy! !  Sit back and relax…!!  And adjust for its weaknesses! !  Avoid I/O – consider an in memory database or caching! !  Be prepared that every host can fail! !"#$$%&"
  • Thank you! ! 27 ps. wooga.com/jobs jesper@wooga.com !"#$$%&"