Performance Optimization of Rails Applications

16,543
-1

Published on

Published in: Technology
0 Comments
37 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
16,543
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
236
Comments
0
Likes
37
Embeds 0
No embeds

No notes for slide

Performance Optimization of Rails Applications

  1. 1. Advanced Performance Optimization of Rails Applications Serge Smetana RuPy 2009 www.acunote.com
  2. 2. What Am I Optimizing? Acunote www.acunote.com Online project management and scrum software Ruby on Rails application since inception in 2006 <ul><li>~5300 companies
  3. 3. ~13000 users
  4. 4. Hosted on Engine Yard
  5. 5. Hosted on Customer's Servers
  6. 6. nginx + mongrel
  7. 7. PostgreSQL </li></ul>
  8. 8. Performance Degradation Over Time April 2008 May 2008 June 2008 July 2008 Request Time (on development box), % Actually Happens: O(n c ) Best Case: O(log n)
  9. 9. Solutions? Throw Some Hardware at it!
  10. 10. Solutions? Performance Optimization!
  11. 11. What to optimize?
  12. 12. What To Optimize? Development?
  13. 13. What To Optimize? Development AND Production
  14. 14. How to optimize?
  15. 15. How To Optimize? Three rules of performance optimization
  16. 16. Three Rules Of Performance Optimization 1. Measure!
  17. 17. Three Rules Of Performance Optimization 2. Optimize only what's slow!
  18. 18. Three Rules Of Performance Optimization 3. Optimize for the user!
  19. 19. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  20. 20. Rails code
  21. 21. Database queries
  22. 22. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  23. 23. Live debugging
  24. 24. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  25. 25. Javascript
  26. 26. Internet Explorer </li></ul></ul>
  27. 27. Optimizing Ruby: Date Class What's wrong with Date? > puts Benchmark.realtime { 1000.times { Time.mktime(2009, 5, 6, 0, 0, 0) } } 0.005 > puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } } 0.080 16x slower than Time! Why? %self total self wait child calls name 7.23 0.66 0.18 0.00 0.48 18601 <Class::Rational>#reduce 6.83 0.27 0.17 0.00 0.10 5782 <Class::Date>#jd_to_civil 6.43 0.21 0.16 0.00 0.05 31528 Rational#initialize 5.62 0.23 0.14 0.00 0.09 18601 Integer#gcd
  28. 28. Optimizing Ruby: Date Class Fixing Date: Use C, Luke! Date::Performance gem with Date partially rewritten in C by Ryan Tomayko (with patches by Alex Dymo in 0.4.7) > puts Benchmark.realtime { 1000.times { Time.mktime(2009, 5, 6, 0, 0, 0) } } 0.005 > puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } } 0.080 > require 'date/performance' puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } } 0.006 git clone git://github.com/rtomayko/date-performance.git rake package:build cd dist && gem install date-performance-0.4.8.gem
  29. 29. Optimizing Ruby: Date Class Real-world impact of Date::Performance: Before: 0.95 sec After: 0.65 sec 1.5x!
  30. 30. Optimizing Ruby: Misc Use String::<< instead of String::+= > long_string = &quot;foo&quot; * 100000 > Benchmark.realtime { long_string += &quot;foo&quot; } 0.0003 > Benchmark.realtime { long_string << &quot;foo&quot; } 0.000004 Avoid BigDecimal comparisons with strings and integers > n = BigDecimal(&quot;4.5&quot;) > Benchmark.realtime { 10000.times { n <=> 4.5 } } 0.063 > Benchmark.realtime { 10000.times { n <=> BigDecimal(&quot;4.5&quot;) } } 0.014 in theory: 4.5x in practice: 1.15x in theory: 75x in practice: up to 70x
  31. 31. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  32. 32. Rails code
  33. 33. Database queries
  34. 34. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  35. 35. Live debugging
  36. 36. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  37. 37. Javascript
  38. 38. Internet Explorer </li></ul></ul>
  39. 39. Optimizing Rails: String Callbacks What can be wrong with this code? class Task < ActiveRecord::Base before_save &quot;some_check()&quot; end ... 100.times { Task.create attributes } Kernel#binding is called to eval() the string callback That will duplicate your execution context in memory! More memory taken => More time for GC
  40. 40. Optimizing Rails: String Callbacks What to do class Task < ActiveRecord::Base before_save :some_check end
  41. 41. Optimizing Rails: Partial Rendering Not too uncommon, right? <% for object in objects %> #1000 times <%= render :partial => 'object', :locals => { :object => object } %> <% end %> We create 1000 View instances for each object here! Why?
  42. 42. Optimizing Rails: Partial Rendering Template inlining for the resque: <% for object in objects %> #1000 times <%= render :partial => 'object', :locals => { :object => object }, :inline => true %> <% end %> list.rhtml list.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml
  43. 43. Optimizing Rails: Partial Rendering Template Inliner plugin: http://github.com/acunote/template_inliner/ Real world effect from template inlining: Rendering of 300 objects, 5 partials for each object without inlining: 0.89 sec with inlining: 0.75 sec 1.2x
  44. 44. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  45. 45. Rails code
  46. 46. Database queries
  47. 47. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  48. 48. Live debugging
  49. 49. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  50. 50. Javascript
  51. 51. Internet Explorer </li></ul></ul>
  52. 52. Optimizing Database How to optimize PostgreSQL: explain analyze explain analyze explain analyze ...
  53. 53. Optimizing Database: PostgreSQL Tips EXPLAIN ANALYZE explains everything, but... ... run it also for the &quot;cold&quot; database state! Example: complex query which works on 230 000 rows and does 9 subselects / joins: cold state: 28 sec, hot state: 2.42 sec Database server restart doesn't help Need to clear disk cache: sudo echo 3 | sudo tee /proc/sys/vm/drop_caches (Linux)
  54. 54. Optimizing Database: PostgreSQL Tips Use any(array ()) instead of in() to force subselect and avoid join explain analyze select * from issues where id in (select issue_id from tags_issues); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Merge IN Join (actual time=0.096..576.704 rows=55363 loops=1) Merge Cond: (issues.id = tags_issues.issue_id) -> Index Scan using issues_pkey on issues (actual time=0.027..270.557 rows=229991 loops=1) -> Index Scan using tags_issues_issue_id_key on tags_issues (actual time=0.051..73.903 rows=70052loops=1) Total runtime: 605.274 ms explain analyze select * from issues where id = any( array( (select issue_id from tags_issues) ) ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on issues (actual time=247.358..297.932 rows=55363 loops=1) Recheck Cond: (id = ANY ($0)) InitPlan -> Seq Scan on tags_issues (actual time=0.017..51.291 rows=70052 loops=1) -> Bitmap Index Scan on issues_pkey (actual time=246.589..246.589 rows=70052 loops=1) Index Cond: (id = ANY ($0)) Total runtime: 325.205 ms 2x!
  55. 55. Database Optimization: PostgreSQL Tips Push down conditions into subselects and joins PostgreSQL often won't do that for you select *, ( select notes.author from notes where notes.issue_id = issues.id ) as note_authors from issues where org_id = 1 select *, ( select notes.author from notes where notes.issue_id = issues.id and org_id = 1 ) as note_authors from issues where org_id = 1 Issues id serial name varchar org_id integer Notes id serial name varchar issue_id integer org_id integer
  56. 56. What To Do? <ul><li>Optimize For Development Box </li><ul><li>Ruby code
  57. 57. Rails code
  58. 58. Database queries
  59. 59. Alternative Ruby </li></ul><li>Optimize For Production </li><ul><li>Shared filesystems and databases
  60. 60. Live debugging
  61. 61. Load balancing </li></ul><li>Optimize For The User </li><ul><li>HTTP
  62. 62. Javascript
  63. 63. Internet Explorer </li></ul></ul>
  64. 64. Alternative Ruby Everybody says &quot;JRuby and Ruby 1.9 are faster&quot; Is that true in production?
  65. 65. Alternative Ruby In short, YES! = Acunote Benchmarks =   MRI   JRuby 1.9.1 Date/Time Intensive Ops     1.79 0.67   0.62 Rendering Intensive Ops   0.59 0.44   0.40 Calculations Intensive Ops   2.36 1.79   1.79 Database Intensive Ops        4.87 4.63   3.66
  66. 66. Alternative Ruby In short, YES! = Acunote Benchmarks =   MRI   JRuby 1.9.1 Date/Time Intensive Ops     1x 2.6x   2.9x Rendering Intensive Ops   1x 1.3x   1.5x Calculations Intensive Ops   1x 1.3x   1.3x Database Intensive Ops       1x 1x   1.3x JRuby: 1.55x faster Ruby 1.9: 1.75x faster
  67. 67. Alternative Ruby In short, YES! = Acunote Benchmarks =   MRI   JRuby 1.9.1 Date/Time Intensive Ops     1x 2.6x   2.9x Rendering Intensive Ops   1x 1.3x   1.5x Calculations Intensive Ops   1x 1.3x   1.3x Database Intensive Ops       1x 1x   1.3x JRuby: 1.55x faster Ruby 1.9: 1.75x faster
  68. 68. Alternative Ruby What is faster ? Acunote Copy Tasks Benchmark  MRI   JRuby 1.9.1 Request Time 5.52 4.45 3.24 Template Rendering Time  0.35 0.21  0.21 Database Time   0.70 1.32   0.69 GC Time   1.07 N/A   0.62 Faster template rendering! Less GC! JDBC database driver performance issue with JRuby?
  69. 69. Alternative Ruby Why faster?
  70. 70. Alternative Ruby Things I usually see in the profiler after optimizing: %self self calls name 2.73 0.05 351 Range#each-1 2.73 0.05 33822 Hash#[]= 2.19 0.04 4 Acts::AdvancedTree::Tree#walk_tree 2.19 0.04 44076 Hash#[] 1.64 0.03 1966 Array#each-1 1.64 0.03 378 Org#pricing_plan 1.64 0.03 1743 Array#each 1.09 0.02 1688 ActiveRecord::AttributeMethods#respond_to? 1.09 0.02 1311 Hash#each 1.09 0.02 6180 ActiveRecord::AttributeMethods#read_attribute_before_typecast 1.09 0.02 13725 Fixnum#== 1.09 0.02 46736 Array#[] 1.09 0.02 15631 String#to_s 1.09 0.02 24330 String#concat 1.09 0.02 916 ActiveRecord::Associations#association_instance_get 1.09 0.02 242 ActionView::Helpers::NumberHelper#number_with_precision 1.09 0.02 7417 Fixnum#to_s
  71. 71. Alternative Ruby # of method calls during one request: 50 000 - Array 35 000 - Hash 25 000 - String Slow classes written in Ruby: Date Rational
  72. 72. Alternative Ruby Alternative Rubys optimize mostly: <ul><li>the cost of function call
  73. 73. complex computations in pure Ruby
  74. 74. memory by not keeping source code AST </li></ul>
  75. 75. Alternative Ruby Alternative Rubys optimize mostly: <ul><li>the cost of function call
  76. 76. complex computations in pure Ruby
  77. 77. memory by not keeping source code AST </li></ul>
  78. 78. Alternative Ruby So, shall I use alternative Ruby? Definitely Yes!... but JRuby: if your application works with it (run requests hundreds of times to check) Ruby 1.9: if all gems you need are ported
  79. 79. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  80. 80. Rails code
  81. 81. Database queries
  82. 82. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  83. 83. Live debugging
  84. 84. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  85. 85. Javascript
  86. 86. Internet Explorer </li></ul></ul>
  87. 87. Optimizing For Shared Environment Issues we experienced deploying on Engine Yard: 1) VPS is just too damn slow 2) VPS may have too little memory to run the request! 3) shared database server is a problem 4) network filesystem may cause harm as well
  88. 88. Optimizing For Shared Environment VPS may have too little memory to run the request Think 512M should be enough? Think again. We saw requests that took 1G of memory! Solutions: <ul><li>buy more memory
  89. 89. optimize memory
  90. 90. set memory limits for mongrels (with monit) </li></ul>
  91. 91. Optimizing For Shared Environment You're competing for cache on a shared server: 1. two databases with equal load share the cache
  92. 92. Optimizing For Shared Environment You're competing for memory cache on a shared server: 2. one of the databases gets more load and wins the cache
  93. 93. Optimizing For Shared Environment As a result, your database can always be in a &quot;cold&quot; state and you read data from disk, not from memory! complex query which works on 230 000 rows and does 9 subselects / joins: from disk: 28 sec, from memory: 2.42 sec Solutions: optimize for the cold state push down SQL conditions sudo echo 3 | sudo tee /proc/sys/vm/drop_caches
  94. 94. Optimizing For Shared Environment fstat() is slow on network filesystem (GFS) Request to render list of tasks in Acunote: on development box: 0.50 sec on production box: 0.50 - 2.50 sec
  95. 95. Optimizing For Shared Environment fstat() is slow on network filesystem (GFS) Couldn't figure out why until we ran strace We used a) filesystem store for fragment caching b) expire_fragment(regexp) Later looked through all cache directories even though we knew the cache is located in only one specific subdir
  96. 96. Optimizing For Shared Environment fstat() is slow on network filesystem (GFS) Solution: memcached instead of filesystem if filesystem is ok, here's a trick: http://blog.pluron.com/2008/07/hell-is-paved-w.html
  97. 97. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  98. 98. Rails code
  99. 99. Database queries
  100. 100. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  101. 101. Live debugging
  102. 102. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  103. 103. Javascript
  104. 104. Internet Explorer </li></ul></ul>
  105. 105. Live Debugging To see what's wrong on &quot;live&quot; application: For Linux: strace and oprofile For Mac and Solaris: dtrace For Windows: uhm... about time to switch ;) To monitor for known problems: monit nagios own scripts to analyze application logs
  106. 106. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  107. 107. Rails code
  108. 108. Database queries
  109. 109. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  110. 110. Live debugging
  111. 111. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  112. 112. Javascript
  113. 113. Internet Explorer </li></ul></ul>
  114. 114. Load Balancing The problem of round-robin and fair load balancing per-process queues Rails App 1 Rails App 2 Rails App 3 1 3 2 1 3 3 2 1 2
  115. 115. Load Balancing The problem of round-robin and fair load balancing per-process queues Rails App 1 Rails App 2 Rails App 3 1 1 3 2 1 3 2 2
  116. 116. Load Balancing Solution: the global queue mod_rails / Passenger Rails App 1 Rails App 2 Rails App 3 2 1 4 5 3
  117. 117. Load Balancing Dedicated queues for long-running requests queue for long-running requests regular per-process queues nginx dedicated queues Rails App 1 Rails App 2 Rails App 3 1 1 2 1 3 2
  118. 118. Load Balancing nginx configuration for dedicated queues upstream mongrel { server 127.0.0.1:5000; server 127.0.0.1:5001; } upstream rss_mongrel { server 127.0.0.1:5002; } server { location / { location ~ ^/feeds/(rss|atom) { if (!-f $request_filename) { proxy_pass http://rss_mongrel; break; } } if (!-f $request_filename) { proxy_pass http://mongrel; } } }
  119. 119. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  120. 120. Rails code
  121. 121. Database queries
  122. 122. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  123. 123. Live debugging
  124. 124. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  125. 125. Javascript
  126. 126. Internet Explorer </li></ul></ul>
  127. 127. Optimize For The User: HTTP Things to consider: <ul><li>Gzip HTML, CSS and JS
  128. 128. Minify JS
  129. 129. Collect JS and CSS (javascript_include_tag :all, :cache => true)
  130. 130. Far future expires headers for JS, CSS, images
  131. 131. Sprites
  132. 132. Cache-Control: public
  133. 133. everything else YSlow tells you </li></ul>5% 95% Network and Frontend Backend
  134. 134. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  135. 135. Rails code
  136. 136. Database queries
  137. 137. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  138. 138. Live debugging
  139. 139. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  140. 140. Javascript
  141. 141. Internet Explorer </li></ul></ul>
  142. 142. Optimize Frontend: Javascript Things you don't want to hear from your users: &quot;...Your server is slow...&quot; said the user after clicking on the link to show a form with plain javascript (no AJAX)
  143. 143. Optimize Frontend: Javascript Known hotspots in Javascript: - eval() - all DOM operations - avoid if possible, for example - use element.className instead of element.readAttribute('class') - use element.id instead of element.readAttirbute('id') - $$() selectors, especially attribute selectors - may be expensive, measure first - $$('#some .listing td a.popup[accesslink]' - use getElementsByTagName() and iterate results instead - element.style.* changes - change class instead - $() and getElementById on large (~20000 elements) pages
  144. 144. Things To Optimize <ul><li>Development </li><ul><li>Ruby code
  145. 145. Rails code
  146. 146. Database queries
  147. 147. Alternative Ruby </li></ul><li>Production </li><ul><li>Shared filesystems and databases
  148. 148. Live debugging
  149. 149. Load balancing </li></ul><li>Frontend </li><ul><li>HTTP
  150. 150. Javascript
  151. 151. Internet Explorer </li></ul></ul>
  152. 152. Optimize Frontend: IE Slow things that are especially slow in IE: - $() and $$(), even on small pages - getElementsByName() - style switching
  153. 153. Optimize Frontend: IE Good things about IE: profiler in IE8 fast in IE => fast everywhere else!
  154. 154. Keep It Fast! So, you've optimized your application. How to keep it fast?
  155. 155. Keep It Fast! Measure, measure and measure... Use profiler Optimize CPU and Memory Performance Regression Tests
  156. 156. Keep It Fast: Measure Keep a set of benchmarks for most frequent user requests. For example: Benchmark Burndown 120 0.70 ± 0.00 Benchmark Inc. Burndown 120 0.92 ± 0.01 Benchmark Sprint 20 x (1+5) (C) 0.45 ± 0.00 Benchmark Issues 100 (C) 0.34 ± 0.00 Benchmark Prediction 120 0.56 ± 0.00 Benchmark Progress 120 0.23 ± 0.00 Benchmark Sprint 20 x (1+5) 0.93 ± 0.00 Benchmark Timeline 5x100 0.11 ± 0.00 Benchmark Signup 0.77 ± 0.00 Benchmark Export 0.20 ± 0.00 Benchmark Move Here 20/120 0.89 ± 0.00 Benchmark Order By User 0.98 ± 0.00 Benchmark Set Field (EP) 0.21 ± 0.00 Benchmark Task Create + Tag 0.23 ± 0.00 ... 30 more ...
  157. 157. Keep It Fast: Measure Benchmarks as a special kind of tests: class RenderingTest < ActionController::IntegrationTest def test_sprint_rendering login_with users (:user), &quot;user&quot; benchmark :title => &quot;Sprint 20 x (1+5) (C)&quot;, :route => &quot;projects/1/sprints/3/show&quot;, :assert_template => &quot;tasks/index&quot; end end Benchmark Sprint 20 x (1+5) (C) 0.45 ± 0.00
  158. 158. Keep It Fast: Measure Benchmarks as a special kind of tests: def benchmark (options = {}) (0..100). each do |i| GC. start pid = fork do begin out = File. open (&quot;values&quot;, &quot;a&quot;) ActiveRecord::Base. transaction do elapsed_time = Benchmark:: realtime do request_method = options[:post] ? :post : :get send (request_method, options[:route]) end out. puts elapsed_time if i > 0 out. close raise CustomTransactionError end rescue CustomTransactionError exit end end Process:: waitpid pid ActiveRecord::Base. connection . reconnect ! end values = File. read (&quot;values&quot;) print &quot;#{ mean (values).to_02f} ± #{ sigma (values).to_02f} &quot; end
  159. 159. Keep It Fast: Query Testing Losing 10ms in benchmark might seem OK Except that it's sometimes not because you're running one more SQL query
  160. 160. Keep It Fast: Query Testing def test_queries queries = track_queries do get :index end assert_equal queries, [ &quot;Foo Load&quot;, &quot;Bar Load&quot;, &quot;Event Create&quot; ] end
  161. 161. Keep It Fast: Query Testing module ActiveSupport class BufferedLogger attr_reader :tracked_queries def tracking=(val) @tracked_queries = [] @tracking = val end def debug_with_tracking(message) @tracked_queries << $1 if @tracking && message =~ /3[56];1m(.* (Load|Create|Update|Destroy)) (/ debug_without_tracking (message) end alias_method_chain :debug, :tracking end end class ActiveSupport::TestCase def track_queries(&block) RAILS_DEFAULT_LOGGER. tracking = true yield result = RAILS_DEFAULT_LOGGER. tracked_queries RAILS_DEFAULT_LOGGER. tracking = false result end end
  162. 162. Keep It Fast: Use Profiler Profiler will always tell you what's wrong: %self total self child calls name 8.39 0.54 0.23 0.31 602 Array#each_index 7.30 0.41 0.20 0.21 1227 Integer#gcd 6.20 0.49 0.17 0.32 5760 Timecell#date 5.11 0.15 0.14 0.01 1 Magick::Image#to_blob gem install ruby-prof KCachegrind to visualize the results http://kcachegrind.sourceforge.net
  163. 163. Keep It Fast: Use Profiler
  164. 164. Keep It Fast: Optimize CPU and Memory Memory profiler will explain the missing details: Example benchmark: 5.52 sec request time Consumed memory: 55M 1.07 sec GC time Ruby runs GC after allocating 8M memory or doing 10000 allocations Simple math: 55 / 8 = 6 GC calls Each GC call takes 0.18 sec !
  165. 165. Keep It Fast: Optimize CPU and Memory How to use memory profiler: Recompile Ruby with GC patch http://www.acunote.com/system/ruby186-p287-gc.patch Reinstall ruby-prof Use RUBY_PROF_MEASURE_MODE=memory when running ruby-prof http://blog.pluron.com/2008/02/memory-profilin.html
  166. 166. Remember! Measure, measure, measure... (with ruby-prof) Optimize only what's slow Optimize not only CPU, but memory Optimize for user experience Keep a set of performance regression tests Monitor performance
  167. 167. Thank you! Rails performance articles and more: http://blog.pluron.com

×