Rails infrastructure

461 views
416 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
461
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rails infrastructure

  1. 1. Rails Infrastructurehttp://omarqureshi.net@omarqureshi 1
  2. 2. Topics Covered 2
  3. 3. Topics Covered• Lots of facepalm 2
  4. 4. Topics Covered• Lots of facepalm• Rackspace 2
  5. 5. Topics Covered• Lots of facepalm• Rackspace• Linux distribution choices 2
  6. 6. Topics Covered• Lots of facepalm• Rackspace• Linux distribution choices• Automation and Orchestration 2
  7. 7. Topics Covered• Lots of facepalm• Rackspace• Linux distribution choices• Automation and Orchestration• Logging 2
  8. 8. Edison Nation 3
  9. 9. Edison Nation• Distributed team (US/Canada/UK) 3
  10. 10. Edison Nation• Distributed team (US/Canada/UK)• Old (2009) and poorly maintained application 3
  11. 11. Edison Nation• Distributed team (US/Canada/UK)• Old (2009) and poorly maintained application• Rails 2.3 app 3
  12. 12. Edison Nation• Distributed team (US/Canada/UK)• Old (2009) and poorly maintained application• Rails 2.3 app• (previous) focus on churn 3
  13. 13. Edison Nation• Distributed team (US/Canada/UK)• Old (2009) and poorly maintained application• Rails 2.3 app• (previous) focus on churn• 3 Rails developers (+ 1 designer and an intern) 3
  14. 14. Edison Nation• Distributed team (US/Canada/UK)• Old (2009) and poorly maintained application• Rails 2.3 app• (previous) focus on churn• 3 Rails developers (+ 1 designer and an intern)• >100,000 members 3
  15. 15. Edison Nation• Distributed team (US/Canada/UK)• Old (2009) and poorly maintained application• Rails 2.3 app• (previous) focus on churn• 3 Rails developers (+ 1 designer and an intern)• >100,000 members• Little inhouse sysadmin experience 3
  16. 16. Additional Quirks 4
  17. 17. Additional Quirks• Used 1.8.7 since God does not play nicely with Ruby Enterprise Edition and we couldn’t use 1.9 because of Rails 2.3 4
  18. 18. Additional Quirks• Used 1.8.7 since God does not play nicely with Ruby Enterprise Edition and we couldn’t use 1.9 because of Rails 2.3• Provisioning process was terribly slow 4
  19. 19. Additional Quirks• Used 1.8.7 since God does not play nicely with Ruby Enterprise Edition and we couldn’t use 1.9 because of Rails 2.3• Provisioning process was terribly slow• Very little caching 4
  20. 20. Additional Quirks• Used 1.8.7 since God does not play nicely with Ruby Enterprise Edition and we couldn’t use 1.9 because of Rails 2.3• Provisioning process was terribly slow• Very little caching• Quite a lot of server generated JS 4
  21. 21. SURPRISE! 5
  22. 22. Featured on Nightline 6
  23. 23. Featured on Nightline• No warning (announced pretty late EST) 6
  24. 24. Featured on Nightline• No warning (announced pretty late EST)• No preparation time (engineers already signed off for the night) 6
  25. 25. Featured on Nightline• No warning (announced pretty late EST)• No preparation time (engineers already signed off for the night)• Couldn’t provision servers to deal with the traffic spike in time (and we would have needed a lot of them) 6
  26. 26. 7
  27. 27. Load balancer recorded3000 concurrent requests including assets or around 300 excluding assets 8
  28. 28. The Stack 9
  29. 29. Figuring out the bottlenecks 10
  30. 30. Nginx kept serving -though these were 502 errors 11
  31. 31. Post-mortem of therequests that did make itthrough made it look like the application servers were to blame 12
  32. 32. Database was underheavy load but by nomeans the bottleneck 13
  33. 33. Make better use of theapplication server pool 14
  34. 34. Got some quick wins inthe code by caching more and moving jQuery to Google 15
  35. 35. <script src="// ajax.googleapis.com/ ajax/libs/jquery/1.6.2/jquery.min.js"></script> 16
  36. 36. Get rid of any server generated JS 17
  37. 37. Pretty much re-trainedmyself to be a systems administrator 18
  38. 38. Completely re-think the way we do Operations 19
  39. 39. What components makeup a solid multi-server setup? 20
  40. 40. Load balancing 21
  41. 41. TLS SNI Extension 22
  42. 42. Theoretically only have two load balancers for ALL domains 23
  43. 43. Simplified SSL Nginx config server { listen 443; server_name www.edisonnation.com; ssl on; ssl_certificate /path/to/cert/en.com.cert; ssl_certificate_key /path/to/cert/en.com.key; } server { listen 443; server_name www.edisonnation.vn; ssl on; ssl_certificate /path/to/cert/en.vn.cert; ssl_certificate_key /path/to/cert/en.vn.key; } 24
  44. 44. Windows XP + Internet Explorer 25
  45. 45. Windows XP• Internet Explorer 6-8 on Windows XP would not work compared to modern OS + Browser combinations• Ignores the server name for HTTPS• Will give you an invalid SSL certificate error when browsing 26
  46. 46. Rackspace (v2) Load Balancer 27
  47. 47. Rackspace Load Balancer• SSL termination at the Load Balancer • No need to serve HTTPS traffic from Nginx any more - X-Forwarded-Proto tells Rails if page is supposed to be encrypted • Less processing required here • Less complexity managing certificates and Nginx configs 28
  48. 48. Split up the application servers 29
  49. 49. Move Nginx to it’s own machine and reverseproxy back to Unicorn app servers 30
  50. 50. New stack 31
  51. 51. Switch Unicorn to useTCP sockets rather than Unix 32
  52. 52. Linux 33
  53. 53. Debian Squeeze 34
  54. 54. Why Debian? 35
  55. 55. Why Debian?• Pick the most stable distribution 35
  56. 56. Why Debian?• Pick the most stable distribution• Debian is pretty stable, plus you can use Lucid Lynx packages for anything that you need which is cutting edge 35
  57. 57. Why Debian?• Pick the most stable distribution• Debian is pretty stable, plus you can use Lucid Lynx packages for anything that you need which is cutting edge• However, God requires you to use a custom kernel before it will work properly http://bugs.debian.org/cgi-bin/ bugreport.cgi?bug=609004 35
  58. 58. Ubuntu LTS also viable as a choice as is any RHEL 36
  59. 59. Basically, anything wherethe packages aren’t crazy and support is still there (not Arch/Fedora/ Ubuntu) 37
  60. 60. Packaging 38
  61. 61. We don’t image servers(but may start doing so) 39
  62. 62. Provisioning tools should be able to build a server on any hardware 40
  63. 63. Never build from source 41
  64. 64. Never build from source• Either package yourself or get from a reliable source 41
  65. 65. Never build from source• Either package yourself or get from a reliable source• Ditch RVM (though they now have binary rubies - anyone tried?) 41
  66. 66. Never build from source• Either package yourself or get from a reliable source• Ditch RVM (though they now have binary rubies - anyone tried?)• Check out Brightbox Next Generation Ubuntu packages http://wiki.brightbox.co.uk/docs:ruby-ng 41
  67. 67. Pin everything elsePackage: *Pin: release a=squeeze-backportsPin-Priority: 200Package: puppetPin: release a=squeeze-backportsPin-Priority: 900Package: puppet-commonPin: release a=squeeze-backportsPin-Priority: 900 42
  68. 68. Server build time decreased from 45minutes to < 15 minutes 43
  69. 69. How do we provision servers? 44
  70. 70. A small bash script + Puppet 45
  71. 71. Bash script does basic pinning and installsessential packages (Ruby + Emacs + Puppet + puppet-el) 46
  72. 72. Works very well since we use Hetzner EX4S’s for non-critical systems 47
  73. 73. Hetzner + (Xen/OpenVZ) == FANTASTIC 48
  74. 74. (See me at the end if you want to talk aboutprovisioning some more) 49
  75. 75. Managing Puppet 50
  76. 76. Always running Puppet rather than run on demand 51
  77. 77. Encourage developers todocument infrastructure changes 52
  78. 78. Still unsure about how togo about Puppet testing 53
  79. 79. Campfire reporting 54
  80. 80. Orchestration 55
  81. 81. MCollective 56
  82. 82. STOMP server connectsall of our servers together 57
  83. 83. MCollective executesRemote Procedure Calls 58
  84. 84. Great for pushing outurgent Puppet updates 59
  85. 85. Also great for Munin#!/bin/bashstr="includedir /etc/munin/munin-conf.d"for addr in `/usr/bin/mco facts ipaddress | awk {gsub("found", "");print $1} | grep "^[0-9]"`do fqdn=`/usr/bin/mco facts fqdn -F ipaddress=$addr | grep "^W" |awk {print $1}` str="$str[$fqdn] address $addr use_node_name yes"doneecho "$str" > /etc/munin/munin.conf/usr/sbin/service munin-node restart 60
  86. 86. No longer have tomanually maintain Munin 61
  87. 87. Can be used for other painful tasks - such asmaking sure packages are up to date on all the servers 62
  88. 88. RPC libraries are written in Ruby 63
  89. 89. Service management 64
  90. 90. M/Monit 65
  91. 91. Not free - however,extremely worthwhile. Can hook into shell scripts 66
  92. 92. Log management 67
  93. 93. Graylog2 68
  94. 94. Java JAR with a Rails frontend andElasticsearch + Mongo backend 69
  95. 95. Deals with exception management 70
  96. 96. Can do analytics on logs 71
  97. 97. Specify streams of logs (i.e 404 errors) 72
  98. 98. No longer have to jugglelots of files which exist on different machines 73
  99. 99. A little tricky to set-up 74
  100. 100. Use the gelf-rb gemsparingly in your Rails app and NOT as your main logger 75
  101. 101. Found out, that the log requests were not threaded 76
  102. 102. For us, gelf-rb ONLY sends exception notifications 77
  103. 103. Introducing Logstashd 78
  104. 104. Written by the awesome Jordan Sissel (FPM) 79
  105. 105. Nginx doesn’t support sending to Graylog straight out 80
  106. 106. Logstashd acts as a logtailing and transporting mechanism 81
  107. 107. Runs in its own process - so threading doesnt matter so much 82
  108. 108. Whats left? 83
  109. 109. Upgrade to Rails 3 84
  110. 110. Great benefits with Rails 3 such as Dalli formemcached failovers and Lograge 85
  111. 111. Oh yeah - assets pipeline! 86
  112. 112. Implement read slaves for backups 87
  113. 113. Make Jenkins do our deployment 88
  114. 114. Better caching solutions - maybe Varnish / conditional GET 89
  115. 115. Re-implement TLS SNI once Windows XP security updates stop 90
  116. 116. Handle large spikes better 91
  117. 117. Autoscaling? 92
  118. 118. Using AWS as anadditional cloud failover 93
  119. 119. Hybrid Dedicated andCloud for production 94

×