Your SlideShare is downloading. ×
0
A Tale of a Server Architecture                            Ville Lautanala                                   @lautis
WHO AM I   @lautis
Flowdock, team collaboration app with software developer as primary target audience.Right-hand side: chat, left-hand side:...
Facts         •   Single page JavaScript front-end         •   WebSocket based communication layer         •   Three repli...
Goal: beat your hosting provider in                      uptimeHave a good uptime on unreliable hardware.
We don’t want to wake up at night to fix our app like this guy in this picture. The founders had previously a hosting compa...
This is not an exact science, every          app is different.
Architecture ArchaeologyWe haven’t been always doing very well
Flowdock 2010                                                          Apache                                    Messages ...
Divide and ConquerNice strategy for building your SOA, sorting lists and taking over the world.
GeoDNS                                                      Stunnel                                                     HA...
Separated concerns...
but many parts to configure
So, you need to setup boxes...
Chef                   Infrastructure as (Ruby) CodeChef lets you to automate server configuration with Ruby code.
Chef at Flowdock          •    Firewall configuration          •    Distribute SSH host keys          •    User setup      ...
•Cookbooks• Recipes• Roles
Chef serverCentralized chef server which nodes communicate with and get updates from.
cookbooks/flowdock/oulu.rb          include_recipe "flowdock:users"          package "ruby"          %w{port listen_to flow...
roles/rails.rb                               name "rails"                               description "Rails Box"           ...
Managing Chef cluster$ knife cookbook upload -a -o cookbooks
Managing Chef cluster$ knife search node role:flowdock-app-serverNode Name:   imaginary-serverEnvironment: qaFQDN:        ...
Managing Chef cluster                         $ knife ssh role:qa echo "lol"                         imaginary-server lol ...
Testing Chef Recipes          • Use Chef environments to             isolate changes          • Run chef-client on throw-a...
Automatic Failover                                          Avoiding Single Point of FailuresMongoDB works flawlessly as fa...
HAproxy                                TCP/HTTP Load Balancer with Failover handlingHAproxy provides easy failover for Rai...
MongoDB has automatic failover                      built-inMongoDB might have many problems, but failover isn’t one of th...
Redis and Postgres have            replication, but failover is manualNot only do you need to promote master automatically...
ZooKeeper
Distributed coordinationEach operation has to be agreed by majority of servers. Eventual consistency.
require zk        $queue = Queue.new        zk = ZK.new        zk.register(/hello_world) do |event|          # need to res...
zk = ZK.newzk.with_lock(/lock, :wait => 5.0) do |lock|  # do stuff  # others have to waitend
Redis master failover using       ZooKeeper
gem install redis_failover    but in 3 programming languages
Redis Failover                              W                 App                               at                        ...
Postgres failover with pgpool-II and                      ZooKeeperpgpool manages pg cluster, queries can be distributed t...
Postgres Failover                                                                      PGpool monitor                     ...
Zoos are keptSimilar scheme can be used for other master-slave based replications, e.g. handling twitter integration failo...
Test your failoverYou might only need some failover few times a year.Not sure if everything of our stuff is top-notch, but...
Chef vs ZooKeeper                                    Chef                             ZooKeeper                           ...
Mesh-based VPN between boxesEncrypted MongoDB traffic between masters and slaves. Saved the day few times when there has b...
SSL endpoints in AWSRouting issues between our German ISP and Comcast. Move SSL front ends closer to client to fix this and...
WinningWe don’t need to worry about waking up at nights. The whole team could go sailing and be without internet access at...
Lessons learnedWhat have we learned?
WebSockets are cool, but make                    your life harderHeroku, Amazon Elastic Load Balancer, CloudFlare and Goog...
Let it crashMake your app crash, at least you are there to fix things.
Questions?
Thanks!
A Tale of a Server Architecture (Frozen Rails 2012)
Upcoming SlideShare
Loading in...5
×

A Tale of a Server Architecture (Frozen Rails 2012)

9,558

Published on

Ville Lautanala's talk from Frozen Rails 2012: how Flowdock uses chef and ZooKeeper to manage a set of distributed services.

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,558
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
60
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • Flowdock, team collaboration app with software developer as primary target audience.\nRight-hand side: chat, left-hand side: inbox or activity stream for your team.\nIf you’ve read a Node.JS tutorial you probably know needed the architecture.\n
  • WebSockets == no third-party load-balancers/PaaS for us\n99.99% according to CEO, but I’m being conservative\n
  • WebSockets == no third-party load-balancers/PaaS for us\n99.99% according to CEO, but I’m being conservative\n
  • WebSockets == no third-party load-balancers/PaaS for us\n99.99% according to CEO, but I’m being conservative\n
  • WebSockets == no third-party load-balancers/PaaS for us\n99.99% according to CEO, but I’m being conservative\n
  • WebSockets == no third-party load-balancers/PaaS for us\n99.99% according to CEO, but I’m being conservative\n
  • Have a good uptime on unreliable hardware.\n
  • We don’t want to wake up at night to fix our app like this guy in this picture. The founders had previously a hosting company.\n
  • \n
  • We haven’t been always doing very well\n
  • Simple stack, but the messaging part quickly became hairy. It had HTTP streaming, Twitter integration and e-mail server. Lot of brittle state.\n
  • Works well also for sorting lists and taking over the world\n
  • Works well also for sorting lists and taking over the world\n
  • Works well also for sorting lists and taking over the world\n
  • Nice strategy for building your SOA, sorting lists and taking over the world.\n
  • These are all different processes. \nMore components, but this has enabled us to easily add new features to components\n
  • \n
  • \n
  • \n
  • \n
  • Chef lets you to automate server configuration with Ruby code.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.\nSSH host keys prevent MITM\nWe have a mesh-based VPN, which is automatically configured based on Chef data\n
  • Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.\nSSH host keys prevent MITM\nWe have a mesh-based VPN, which is automatically configured based on Chef data\n
  • Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.\nSSH host keys prevent MITM\nWe have a mesh-based VPN, which is automatically configured based on Chef data\n
  • Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.\nSSH host keys prevent MITM\nWe have a mesh-based VPN, which is automatically configured based on Chef data\n
  • Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.\nSSH host keys prevent MITM\nWe have a mesh-based VPN, which is automatically configured based on Chef data\n
  • \n
  • \n
  • Centralized chef server which nodes communicate with and get updates from.\n
  • Recipe for our IRC server\n
  • Recipe in Ruby DSL\nEach node can be assigned any number of roles\nOverride attributes can be used to override recipe attributes\n
  • \n
  • \n
  • Most useful command: trigger chef run on servers\n
  • sous-chef could be used to automate VM setup\nOur experience with cucumber-chef and sous-chef is limited\nYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken\n\n
  • sous-chef could be used to automate VM setup\nOur experience with cucumber-chef and sous-chef is limited\nYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken\n\n
  • sous-chef could be used to automate VM setup\nOur experience with cucumber-chef and sous-chef is limited\nYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken\n\n
  • sous-chef could be used to automate VM setup\nOur experience with cucumber-chef and sous-chef is limited\nYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken\n\n
  • MongoDB works flawlessly as failover is built-in, but how to handle Redis?\n
  • HAproxy provides easy failover for Rails instances\n
  • IP failover has less latency than DNS-based solution, but we got the DNS failover for free\n
  • IP failover has less latency than DNS-based solution, but we got the DNS failover for free\n
  • IP failover has less latency than DNS-based solution, but we got the DNS failover for free\n
  • IP failover has less latency than DNS-based solution, but we got the DNS failover for free\n
  • MongoDB might have many problems, but failover isn’t one of them. Drivers are always connected to master.\n
  • Not only do you need to promote master automatically, but also change application configuration.\n
  • \n
  • Each operation has to be agreed by majority of servers. Eventual consistency.\n
  • \n
  • \n
  • \n
  • \n
  • Using the high-level zk gem. Block is run every time value is updated.\nZK gem has locks and other stuff implemented.\n
  • \n
  • \n
  • \n
  • Our apps might not use redis_failover or read ZK directly. Script restarts the app when ZK changes.\nHAproxy or DNS based solutions also possible, but this gives us more control over the app restart.\n\n
  • pgpool manages pg cluster, queries can be distributed to slaves\nI’m afraid of pgpool, configuration and monitoring scripts are really scary\n
  • zookeeper/pgpool monitoring is used to provide redundancy to pgpool\nIf pgpool fails, app needs to reconnect to new server\n
  • Similar scheme can be used for other master-slave based replications, e.g. handling twitter integration failover.\n\nREMEMBER TO TEST\n
  • You might only need some failover few times a year.\nNot sure if everything of our stuff is top-notch, but there have been one-time use cases for the complicated stuff.\n
  • Chef write long configuration files, ZooKeeper only contains few variablesChef boostraps server and keeps them up-to-date, ZooKeeper is used to elect master nodes in master-slave scenarios.\n
  • Encrypted MongoDB traffic between masters and slaves. Saved the day few times when there has been routing issues between data centers.\n
  • Routing issues between our German ISP and Comcast. Move SSL front ends closer to client to fix this and reduce latency. Front-page loads 150ms faster.\n
  • We don’t need to worry about waking up at nights. The whole team could go sailing and be without internet access at the same time.\n
  • What have we learned?\n
  • Heroku, Amazon Elastic Load Balancer, CloudFlare and Google App engine don’t work with WS. If you only need to stream stuff, using HTTP EventStreaming is better choice.\n
  • Decoupling had instant effect on our uptime\n
  • Make your app crash, at least you are there to fix things.\n
  • \n
  • \n
  • Transcript of "A Tale of a Server Architecture (Frozen Rails 2012)"

    1. 1. A Tale of a Server Architecture Ville Lautanala @lautis
    2. 2. WHO AM I @lautis
    3. 3. Flowdock, team collaboration app with software developer as primary target audience.Right-hand side: chat, left-hand side: inbox or activity stream for your team.If you’ve read a Node.JS tutorial you probably know needed the architecture.
    4. 4. Facts • Single page JavaScript front-end • WebSocket based communication layer • Three replicated databases • Running on dedicated servers in Germany • 99.98% availabilityWebSockets == no third-party load-balancers/PaaS for us99.99% according to CEO, but I’m being conservative
    5. 5. Goal: beat your hosting provider in uptimeHave a good uptime on unreliable hardware.
    6. 6. We don’t want to wake up at night to fix our app like this guy in this picture. The founders had previously a hosting company.
    7. 7. This is not an exact science, every app is different.
    8. 8. Architecture ArchaeologyWe haven’t been always doing very well
    9. 9. Flowdock 2010 Apache Messages Rails MongoDB PostgreSQLSimple stack, but the messaging part quickly became hairy. It had HTTP streaming, Twitter integration and e-mail server. Lot ofbrittle state.
    10. 10. Divide and ConquerNice strategy for building your SOA, sorting lists and taking over the world.
    11. 11. GeoDNS Stunnel HAproxy HTTP WebSocket RSS IRC Streaming API Redis API Rails Message Backend MongoDB PostgreSQLThese are all different processes.More components, but this has enabled us to easily add new features to components
    12. 12. Separated concerns...
    13. 13. but many parts to configure
    14. 14. So, you need to setup boxes...
    15. 15. Chef Infrastructure as (Ruby) CodeChef lets you to automate server configuration with Ruby code.
    16. 16. Chef at Flowdock • Firewall configuration • Distribute SSH host keys • User setup • Join mesh-based VPN • And app/server specific stuffFirewall set up is based on IP-whitelist. Only nodes in chef can access private services.SSH host keys prevent MITMWe have a mesh-based VPN, which is automatically configured based on Chef data
    17. 17. •Cookbooks• Recipes• Roles
    18. 18. Chef serverCentralized chef server which nodes communicate with and get updates from.
    19. 19. cookbooks/flowdock/oulu.rb include_recipe "flowdock:users" package "ruby" %w{port listen_to flowdock_domain}.each do |e| template "#{node[:flowdock][:oulu][:envdir]}/#{e.upcase}" do source "envdir_file.erb" variables :value => node[:flowdock][:oulu][e] owner "oulu" mode "0600" end end runit_service "oulu" do options :use_config => true endRecipe for our IRC server
    20. 20. roles/rails.rb name "rails" description "Rails Box" run_list(   "recipe[nginx]", "recipe[passenger]" ) override_attributes( passenger: { version: "3.0.7" } )Recipe in Ruby DSLEach node can be assigned any number of rolesOverride attributes can be used to override recipe attributes
    21. 21. Managing Chef cluster$ knife cookbook upload -a -o cookbooks
    22. 22. Managing Chef cluster$ knife search node role:flowdock-app-serverNode Name: imaginary-serverEnvironment: qaFQDN: imaginary-server.flowdock.dmzIP: 10.0.0.1Run List: role[qa], role[flowdock-app-server], role[web-server]Roles: qa, flowdock-app-server, web-serverRecipes: ubuntu, firewall, chef, flowdock, unicorn, haproxyPlatform: ubuntu 12.04Tags:
    23. 23. Managing Chef cluster $ knife ssh role:qa echo "lol" imaginary-server lol qa-db1 lol qa-db2 lolMost useful command: trigger chef run on servers
    24. 24. Testing Chef Recipes • Use Chef environments to isolate changes • Run chef-client on throw-away VMs • cucumber-chefsous-chef could be used to automate VM setupOur experience with cucumber-chef and sous-chef is limitedYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken
    25. 25. Automatic Failover Avoiding Single Point of FailuresMongoDB works flawlessly as failover is built-in, but how to handle Redis?
    26. 26. HAproxy TCP/HTTP Load Balancer with Failover handlingHAproxy provides easy failover for Rails instances
    27. 27. MongoDB has automatic failover built-inMongoDB might have many problems, but failover isn’t one of them. Drivers are always connected to master.
    28. 28. Redis and Postgres have replication, but failover is manualNot only do you need to promote master automatically, but also change application configuration.
    29. 29. ZooKeeper
    30. 30. Distributed coordinationEach operation has to be agreed by majority of servers. Eventual consistency.
    31. 31. require zk $queue = Queue.new zk = ZK.new zk.register(/hello_world) do |event| # need to reset watch data = zk.get(/hello_world, watch: true).first # do stuff $queue.push(:event) end zk.create(/hello_world, sup?) $queue.pop # Handle local synchronization zk.set(/hello_world, omg, update)Using the high-level zk gem. Block is run every time value is updated.ZK gem has locks and other stuff implemented.
    32. 32. zk = ZK.newzk.with_lock(/lock, :wait => 5.0) do |lock| # do stuff # others have to waitend
    33. 33. Redis master failover using ZooKeeper
    34. 34. gem install redis_failover but in 3 programming languages
    35. 35. Redis Failover W App at te Node Manager c h p da U Node Manager App ZooKeeper Mon itor App Redis Node Redis NodeOur apps might not use redis_failover or read ZK directly. Script restarts the app when ZK changes.HAproxy or DNS based solutions also possible, but this gives us more control over the app restart.
    36. 36. Postgres failover with pgpool-II and ZooKeeperpgpool manages pg cluster, queries can be distributed to slavesI’m afraid of pgpool, configuration and monitoring scripts are really scary
    37. 37. Postgres Failover PGpool monitor ZooKeeper App pgpool PG PGzookeeper/pgpool monitoring is used to provide redundancy to pgpoolIf pgpool fails, app needs to reconnect to new server
    38. 38. Zoos are keptSimilar scheme can be used for other master-slave based replications, e.g. handling twitter integration failover.REMEMBER TO TEST
    39. 39. Test your failoverYou might only need some failover few times a year.Not sure if everything of our stuff is top-notch, but there have been one-time use cases for the complicated stuff.
    40. 40. Chef vs ZooKeeper Chef ZooKeeper Dynamic configuration Configuration files variables Server boostrap Failover handlingChef write long configuration files, ZooKeeper only contains few variablesChef boostraps server and keeps them up-to-date, ZooKeeper is used to elect master nodes in master-slave scenarios.
    41. 41. Mesh-based VPN between boxesEncrypted MongoDB traffic between masters and slaves. Saved the day few times when there has been routing issues betweendata centers.
    42. 42. SSL endpoints in AWSRouting issues between our German ISP and Comcast. Move SSL front ends closer to client to fix this and reduce latency. Front-page loads 150ms faster.
    43. 43. WinningWe don’t need to worry about waking up at nights. The whole team could go sailing and be without internet access at the sametime.
    44. 44. Lessons learnedWhat have we learned?
    45. 45. WebSockets are cool, but make your life harderHeroku, Amazon Elastic Load Balancer, CloudFlare and Google App engine don’t work with WS. If you only need to stream stuff,using HTTP EventStreaming is better choice.
    46. 46. Let it crashMake your app crash, at least you are there to fix things.
    47. 47. Questions?
    48. 48. Thanks!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×