Performance Optimization of Rails Applications

  • 15,110 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
15,110
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
214
Comments
0
Likes
33

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Advanced Performance Optimization of Rails Applications Serge Smetana RuPy 2009 www.acunote.com
  • 2. What Am I Optimizing? Acunote www.acunote.com Online project management and scrum software Ruby on Rails application since inception in 2006
    • ~5300 companies
    • 3. ~13000 users
    • 4. Hosted on Engine Yard
    • 5. Hosted on Customer's Servers
    • 6. nginx + mongrel
    • 7. PostgreSQL
  • 8. Performance Degradation Over Time April 2008 May 2008 June 2008 July 2008 Request Time (on development box), % Actually Happens: O(n c ) Best Case: O(log n)
  • 9. Solutions? Throw Some Hardware at it!
  • 10. Solutions? Performance Optimization!
  • 11. What to optimize?
  • 12. What To Optimize? Development?
  • 13. What To Optimize? Development AND Production
  • 14. How to optimize?
  • 15. How To Optimize? Three rules of performance optimization
  • 16. Three Rules Of Performance Optimization 1. Measure!
  • 17. Three Rules Of Performance Optimization 2. Optimize only what's slow!
  • 18. Three Rules Of Performance Optimization 3. Optimize for the user!
  • 19. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 23. Live debugging
      • 24. Load balancing
    • Frontend
  • 27. Optimizing Ruby: Date Class What's wrong with Date? > puts Benchmark.realtime { 1000.times { Time.mktime(2009, 5, 6, 0, 0, 0) } } 0.005 > puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } } 0.080 16x slower than Time! Why? %self total self wait child calls name 7.23 0.66 0.18 0.00 0.48 18601 <Class::Rational>#reduce 6.83 0.27 0.17 0.00 0.10 5782 <Class::Date>#jd_to_civil 6.43 0.21 0.16 0.00 0.05 31528 Rational#initialize 5.62 0.23 0.14 0.00 0.09 18601 Integer#gcd
  • 28. Optimizing Ruby: Date Class Fixing Date: Use C, Luke! Date::Performance gem with Date partially rewritten in C by Ryan Tomayko (with patches by Alex Dymo in 0.4.7) > puts Benchmark.realtime { 1000.times { Time.mktime(2009, 5, 6, 0, 0, 0) } } 0.005 > puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } } 0.080 > require 'date/performance' puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } } 0.006 git clone git://github.com/rtomayko/date-performance.git rake package:build cd dist && gem install date-performance-0.4.8.gem
  • 29. Optimizing Ruby: Date Class Real-world impact of Date::Performance: Before: 0.95 sec After: 0.65 sec 1.5x!
  • 30. Optimizing Ruby: Misc Use String::<< instead of String::+= > long_string = &quot;foo&quot; * 100000 > Benchmark.realtime { long_string += &quot;foo&quot; } 0.0003 > Benchmark.realtime { long_string << &quot;foo&quot; } 0.000004 Avoid BigDecimal comparisons with strings and integers > n = BigDecimal(&quot;4.5&quot;) > Benchmark.realtime { 10000.times { n <=> 4.5 } } 0.063 > Benchmark.realtime { 10000.times { n <=> BigDecimal(&quot;4.5&quot;) } } 0.014 in theory: 4.5x in practice: 1.15x in theory: 75x in practice: up to 70x
  • 31. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 35. Live debugging
      • 36. Load balancing
    • Frontend
  • 39. Optimizing Rails: String Callbacks What can be wrong with this code? class Task < ActiveRecord::Base before_save &quot;some_check()&quot; end ... 100.times { Task.create attributes } Kernel#binding is called to eval() the string callback That will duplicate your execution context in memory! More memory taken => More time for GC
  • 40. Optimizing Rails: String Callbacks What to do class Task < ActiveRecord::Base before_save :some_check end
  • 41. Optimizing Rails: Partial Rendering Not too uncommon, right? <% for object in objects %> #1000 times <%= render :partial => 'object', :locals => { :object => object } %> <% end %> We create 1000 View instances for each object here! Why?
  • 42. Optimizing Rails: Partial Rendering Template inlining for the resque: <% for object in objects %> #1000 times <%= render :partial => 'object', :locals => { :object => object }, :inline => true %> <% end %> list.rhtml list.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml _object.rhtml
  • 43. Optimizing Rails: Partial Rendering Template Inliner plugin: http://github.com/acunote/template_inliner/ Real world effect from template inlining: Rendering of 300 objects, 5 partials for each object without inlining: 0.89 sec with inlining: 0.75 sec 1.2x
  • 44. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 48. Live debugging
      • 49. Load balancing
    • Frontend
  • 52. Optimizing Database How to optimize PostgreSQL: explain analyze explain analyze explain analyze ...
  • 53. Optimizing Database: PostgreSQL Tips EXPLAIN ANALYZE explains everything, but... ... run it also for the &quot;cold&quot; database state! Example: complex query which works on 230 000 rows and does 9 subselects / joins: cold state: 28 sec, hot state: 2.42 sec Database server restart doesn't help Need to clear disk cache: sudo echo 3 | sudo tee /proc/sys/vm/drop_caches (Linux)
  • 54. Optimizing Database: PostgreSQL Tips Use any(array ()) instead of in() to force subselect and avoid join explain analyze select * from issues where id in (select issue_id from tags_issues); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Merge IN Join (actual time=0.096..576.704 rows=55363 loops=1) Merge Cond: (issues.id = tags_issues.issue_id) -> Index Scan using issues_pkey on issues (actual time=0.027..270.557 rows=229991 loops=1) -> Index Scan using tags_issues_issue_id_key on tags_issues (actual time=0.051..73.903 rows=70052loops=1) Total runtime: 605.274 ms explain analyze select * from issues where id = any( array( (select issue_id from tags_issues) ) ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on issues (actual time=247.358..297.932 rows=55363 loops=1) Recheck Cond: (id = ANY ($0)) InitPlan -> Seq Scan on tags_issues (actual time=0.017..51.291 rows=70052 loops=1) -> Bitmap Index Scan on issues_pkey (actual time=246.589..246.589 rows=70052 loops=1) Index Cond: (id = ANY ($0)) Total runtime: 325.205 ms 2x!
  • 55. Database Optimization: PostgreSQL Tips Push down conditions into subselects and joins PostgreSQL often won't do that for you select *, ( select notes.author from notes where notes.issue_id = issues.id ) as note_authors from issues where org_id = 1 select *, ( select notes.author from notes where notes.issue_id = issues.id and org_id = 1 ) as note_authors from issues where org_id = 1 Issues id serial name varchar org_id integer Notes id serial name varchar issue_id integer org_id integer
  • 56. What To Do?
    • Optimize For Development Box
    • Optimize For Production
      • Shared filesystems and databases
      • 60. Live debugging
      • 61. Load balancing
    • Optimize For The User
  • 64. Alternative Ruby Everybody says &quot;JRuby and Ruby 1.9 are faster&quot; Is that true in production?
  • 65. Alternative Ruby In short, YES! = Acunote Benchmarks =   MRI   JRuby 1.9.1 Date/Time Intensive Ops     1.79 0.67   0.62 Rendering Intensive Ops   0.59 0.44   0.40 Calculations Intensive Ops   2.36 1.79   1.79 Database Intensive Ops        4.87 4.63   3.66
  • 66. Alternative Ruby In short, YES! = Acunote Benchmarks =   MRI   JRuby 1.9.1 Date/Time Intensive Ops     1x 2.6x   2.9x Rendering Intensive Ops   1x 1.3x   1.5x Calculations Intensive Ops   1x 1.3x   1.3x Database Intensive Ops       1x 1x   1.3x JRuby: 1.55x faster Ruby 1.9: 1.75x faster
  • 67. Alternative Ruby In short, YES! = Acunote Benchmarks =   MRI   JRuby 1.9.1 Date/Time Intensive Ops     1x 2.6x   2.9x Rendering Intensive Ops   1x 1.3x   1.5x Calculations Intensive Ops   1x 1.3x   1.3x Database Intensive Ops       1x 1x   1.3x JRuby: 1.55x faster Ruby 1.9: 1.75x faster
  • 68. Alternative Ruby What is faster ? Acunote Copy Tasks Benchmark  MRI   JRuby 1.9.1 Request Time 5.52 4.45 3.24 Template Rendering Time  0.35 0.21  0.21 Database Time   0.70 1.32   0.69 GC Time   1.07 N/A   0.62 Faster template rendering! Less GC! JDBC database driver performance issue with JRuby?
  • 69. Alternative Ruby Why faster?
  • 70. Alternative Ruby Things I usually see in the profiler after optimizing: %self self calls name 2.73 0.05 351 Range#each-1 2.73 0.05 33822 Hash#[]= 2.19 0.04 4 Acts::AdvancedTree::Tree#walk_tree 2.19 0.04 44076 Hash#[] 1.64 0.03 1966 Array#each-1 1.64 0.03 378 Org#pricing_plan 1.64 0.03 1743 Array#each 1.09 0.02 1688 ActiveRecord::AttributeMethods#respond_to? 1.09 0.02 1311 Hash#each 1.09 0.02 6180 ActiveRecord::AttributeMethods#read_attribute_before_typecast 1.09 0.02 13725 Fixnum#== 1.09 0.02 46736 Array#[] 1.09 0.02 15631 String#to_s 1.09 0.02 24330 String#concat 1.09 0.02 916 ActiveRecord::Associations#association_instance_get 1.09 0.02 242 ActionView::Helpers::NumberHelper#number_with_precision 1.09 0.02 7417 Fixnum#to_s
  • 71. Alternative Ruby # of method calls during one request: 50 000 - Array 35 000 - Hash 25 000 - String Slow classes written in Ruby: Date Rational
  • 72. Alternative Ruby Alternative Rubys optimize mostly:
    • the cost of function call
    • 73. complex computations in pure Ruby
    • 74. memory by not keeping source code AST
  • 75. Alternative Ruby Alternative Rubys optimize mostly:
    • the cost of function call
    • 76. complex computations in pure Ruby
    • 77. memory by not keeping source code AST
  • 78. Alternative Ruby So, shall I use alternative Ruby? Definitely Yes!... but JRuby: if your application works with it (run requests hundreds of times to check) Ruby 1.9: if all gems you need are ported
  • 79. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 83. Live debugging
      • 84. Load balancing
    • Frontend
  • 87. Optimizing For Shared Environment Issues we experienced deploying on Engine Yard: 1) VPS is just too damn slow 2) VPS may have too little memory to run the request! 3) shared database server is a problem 4) network filesystem may cause harm as well
  • 88. Optimizing For Shared Environment VPS may have too little memory to run the request Think 512M should be enough? Think again. We saw requests that took 1G of memory! Solutions:
    • buy more memory
    • 89. optimize memory
    • 90. set memory limits for mongrels (with monit)
  • 91. Optimizing For Shared Environment You're competing for cache on a shared server: 1. two databases with equal load share the cache
  • 92. Optimizing For Shared Environment You're competing for memory cache on a shared server: 2. one of the databases gets more load and wins the cache
  • 93. Optimizing For Shared Environment As a result, your database can always be in a &quot;cold&quot; state and you read data from disk, not from memory! complex query which works on 230 000 rows and does 9 subselects / joins: from disk: 28 sec, from memory: 2.42 sec Solutions: optimize for the cold state push down SQL conditions sudo echo 3 | sudo tee /proc/sys/vm/drop_caches
  • 94. Optimizing For Shared Environment fstat() is slow on network filesystem (GFS) Request to render list of tasks in Acunote: on development box: 0.50 sec on production box: 0.50 - 2.50 sec
  • 95. Optimizing For Shared Environment fstat() is slow on network filesystem (GFS) Couldn't figure out why until we ran strace We used a) filesystem store for fragment caching b) expire_fragment(regexp) Later looked through all cache directories even though we knew the cache is located in only one specific subdir
  • 96. Optimizing For Shared Environment fstat() is slow on network filesystem (GFS) Solution: memcached instead of filesystem if filesystem is ok, here's a trick: http://blog.pluron.com/2008/07/hell-is-paved-w.html
  • 97. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 101. Live debugging
      • 102. Load balancing
    • Frontend
  • 105. Live Debugging To see what's wrong on &quot;live&quot; application: For Linux: strace and oprofile For Mac and Solaris: dtrace For Windows: uhm... about time to switch ;) To monitor for known problems: monit nagios own scripts to analyze application logs
  • 106. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 110. Live debugging
      • 111. Load balancing
    • Frontend
  • 114. Load Balancing The problem of round-robin and fair load balancing per-process queues Rails App 1 Rails App 2 Rails App 3 1 3 2 1 3 3 2 1 2
  • 115. Load Balancing The problem of round-robin and fair load balancing per-process queues Rails App 1 Rails App 2 Rails App 3 1 1 3 2 1 3 2 2
  • 116. Load Balancing Solution: the global queue mod_rails / Passenger Rails App 1 Rails App 2 Rails App 3 2 1 4 5 3
  • 117. Load Balancing Dedicated queues for long-running requests queue for long-running requests regular per-process queues nginx dedicated queues Rails App 1 Rails App 2 Rails App 3 1 1 2 1 3 2
  • 118. Load Balancing nginx configuration for dedicated queues upstream mongrel { server 127.0.0.1:5000; server 127.0.0.1:5001; } upstream rss_mongrel { server 127.0.0.1:5002; } server { location / { location ~ ^/feeds/(rss|atom) { if (!-f $request_filename) { proxy_pass http://rss_mongrel; break; } } if (!-f $request_filename) { proxy_pass http://mongrel; } } }
  • 119. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 123. Live debugging
      • 124. Load balancing
    • Frontend
  • 127. Optimize For The User: HTTP Things to consider:
    • Gzip HTML, CSS and JS
    • 128. Minify JS
    • 129. Collect JS and CSS (javascript_include_tag :all, :cache => true)
    • 130. Far future expires headers for JS, CSS, images
    • 131. Sprites
    • 132. Cache-Control: public
    • 133. everything else YSlow tells you
    5% 95% Network and Frontend Backend
  • 134. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 138. Live debugging
      • 139. Load balancing
    • Frontend
  • 142. Optimize Frontend: Javascript Things you don't want to hear from your users: &quot;...Your server is slow...&quot; said the user after clicking on the link to show a form with plain javascript (no AJAX)
  • 143. Optimize Frontend: Javascript Known hotspots in Javascript: - eval() - all DOM operations - avoid if possible, for example - use element.className instead of element.readAttribute('class') - use element.id instead of element.readAttirbute('id') - $$() selectors, especially attribute selectors - may be expensive, measure first - $$('#some .listing td a.popup[accesslink]' - use getElementsByTagName() and iterate results instead - element.style.* changes - change class instead - $() and getElementById on large (~20000 elements) pages
  • 144. Things To Optimize
    • Development
    • Production
      • Shared filesystems and databases
      • 148. Live debugging
      • 149. Load balancing
    • Frontend
  • 152. Optimize Frontend: IE Slow things that are especially slow in IE: - $() and $$(), even on small pages - getElementsByName() - style switching
  • 153. Optimize Frontend: IE Good things about IE: profiler in IE8 fast in IE => fast everywhere else!
  • 154. Keep It Fast! So, you've optimized your application. How to keep it fast?
  • 155. Keep It Fast! Measure, measure and measure... Use profiler Optimize CPU and Memory Performance Regression Tests
  • 156. Keep It Fast: Measure Keep a set of benchmarks for most frequent user requests. For example: Benchmark Burndown 120 0.70 ± 0.00 Benchmark Inc. Burndown 120 0.92 ± 0.01 Benchmark Sprint 20 x (1+5) (C) 0.45 ± 0.00 Benchmark Issues 100 (C) 0.34 ± 0.00 Benchmark Prediction 120 0.56 ± 0.00 Benchmark Progress 120 0.23 ± 0.00 Benchmark Sprint 20 x (1+5) 0.93 ± 0.00 Benchmark Timeline 5x100 0.11 ± 0.00 Benchmark Signup 0.77 ± 0.00 Benchmark Export 0.20 ± 0.00 Benchmark Move Here 20/120 0.89 ± 0.00 Benchmark Order By User 0.98 ± 0.00 Benchmark Set Field (EP) 0.21 ± 0.00 Benchmark Task Create + Tag 0.23 ± 0.00 ... 30 more ...
  • 157. Keep It Fast: Measure Benchmarks as a special kind of tests: class RenderingTest < ActionController::IntegrationTest def test_sprint_rendering login_with users (:user), &quot;user&quot; benchmark :title => &quot;Sprint 20 x (1+5) (C)&quot;, :route => &quot;projects/1/sprints/3/show&quot;, :assert_template => &quot;tasks/index&quot; end end Benchmark Sprint 20 x (1+5) (C) 0.45 ± 0.00
  • 158. Keep It Fast: Measure Benchmarks as a special kind of tests: def benchmark (options = {}) (0..100). each do |i| GC. start pid = fork do begin out = File. open (&quot;values&quot;, &quot;a&quot;) ActiveRecord::Base. transaction do elapsed_time = Benchmark:: realtime do request_method = options[:post] ? :post : :get send (request_method, options[:route]) end out. puts elapsed_time if i > 0 out. close raise CustomTransactionError end rescue CustomTransactionError exit end end Process:: waitpid pid ActiveRecord::Base. connection . reconnect ! end values = File. read (&quot;values&quot;) print &quot;#{ mean (values).to_02f} ± #{ sigma (values).to_02f} &quot; end
  • 159. Keep It Fast: Query Testing Losing 10ms in benchmark might seem OK Except that it's sometimes not because you're running one more SQL query
  • 160. Keep It Fast: Query Testing def test_queries queries = track_queries do get :index end assert_equal queries, [ &quot;Foo Load&quot;, &quot;Bar Load&quot;, &quot;Event Create&quot; ] end
  • 161. Keep It Fast: Query Testing module ActiveSupport class BufferedLogger attr_reader :tracked_queries def tracking=(val) @tracked_queries = [] @tracking = val end def debug_with_tracking(message) @tracked_queries << $1 if @tracking && message =~ /3[56];1m(.* (Load|Create|Update|Destroy)) (/ debug_without_tracking (message) end alias_method_chain :debug, :tracking end end class ActiveSupport::TestCase def track_queries(&block) RAILS_DEFAULT_LOGGER. tracking = true yield result = RAILS_DEFAULT_LOGGER. tracked_queries RAILS_DEFAULT_LOGGER. tracking = false result end end
  • 162. Keep It Fast: Use Profiler Profiler will always tell you what's wrong: %self total self child calls name 8.39 0.54 0.23 0.31 602 Array#each_index 7.30 0.41 0.20 0.21 1227 Integer#gcd 6.20 0.49 0.17 0.32 5760 Timecell#date 5.11 0.15 0.14 0.01 1 Magick::Image#to_blob gem install ruby-prof KCachegrind to visualize the results http://kcachegrind.sourceforge.net
  • 163. Keep It Fast: Use Profiler
  • 164. Keep It Fast: Optimize CPU and Memory Memory profiler will explain the missing details: Example benchmark: 5.52 sec request time Consumed memory: 55M 1.07 sec GC time Ruby runs GC after allocating 8M memory or doing 10000 allocations Simple math: 55 / 8 = 6 GC calls Each GC call takes 0.18 sec !
  • 165. Keep It Fast: Optimize CPU and Memory How to use memory profiler: Recompile Ruby with GC patch http://www.acunote.com/system/ruby186-p287-gc.patch Reinstall ruby-prof Use RUBY_PROF_MEASURE_MODE=memory when running ruby-prof http://blog.pluron.com/2008/02/memory-profilin.html
  • 166. Remember! Measure, measure, measure... (with ruby-prof) Optimize only what's slow Optimize not only CPU, but memory Optimize for user experience Keep a set of performance regression tests Monitor performance
  • 167. Thank you! Rails performance articles and more: http://blog.pluron.com