Lessons learnt in 2009
         Pratik Naik



                       Jobless
Lessons learnt in 2009
         Pratik Naik

                       Freelancer/
                       ActionRails

      ...
And some hobby apps


   http://planetrubyonrails.com



     http://tweetmuffler.com
• My first ever presentation.
• Usually don’t like any conferences
• But it’s Brazil!!
So if I screw up...




DONT Tweet FFS
Overview
•   Using Ruby Enterprise Edition

•   Testing

    •   Faster tests

    •   Factory v/s Fixtures

•   Security
...
This talk is targeted at
      Ruby Web Developers
      Not everything may apply to things outside the web




Also if th...
Lesson 1



Always use REE
What is REE ?
Ruby + COW GC
 And a bunch of other cool patches
Maintained by




(Photo acquired by the use of force)
                                                       (That was ju...
Who uses REE ?



   And many others
Who uses REE ?



   And many others

  In Production
Why should you use it
      for the
  development ?
Topmost Reason

Super Fast Tests
MRI - Ruby 1.8.6
    $ time rake

    real
 1m45.293s
    user
 0m54.341s
    sys 
 0m33.008s
REE - Ruby 1.8.6
    $ time rake

    real
 1m30.219s
    user
 0m40.290s
    sys 
 0m25.433s
That’s15 seconds faster
       Completely Free

          YMMV
Only Catch
 Twitter’s GC Settings
.profile

   RUBY_HEAP_MIN_SLOTS=500000
   RUBY_HEAP_SLOTS_INCREMENT=250000
   RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
   RUBY_GC_M...
Second Reason
ruby-prof & Rails Performance Tests
What are Rails Performance tests ?
      Integration Tests + ruby-prof
http://guides.rubyonrails.org/performance_testing.html
$ script/generate performance_test Home
       exists test/performance/
       create test/performance/home_test.rb



cla...
MRI




$ rake test:profile
HomeTest#test_homepage (29 ms warmup)
        wall_time: 25 ms
          memory: 0.00 KB
      ...
REE




$ rake test:profile
HomeTest#test_homepage (48 ms warmup)
        wall_time: 16 ms
          memory: 698.18 KB
    ...
Lesson 2
       Efficient Testing

• Faster Tests
• Factory
• More Integration Tests & Less Unit Tests
Faster Tests




        Tickle
http://github.com/lifo/tickle
$ script/plugin install git://github.com/lifo/tickle.git
Benchmarks
  ( they’re real )

$ time rake

real
 1m30.219s
user
 0m40.290s
sys 
 0m25.433s
Benchmarks
   ( they’re real )

$ time rake tickle

real
 0m55.691s
user
 0m37.532s
sys 
 0m22.563s
That’s another 35
            seconds
( On top of the 15 seconds initially saved by REE )
To get the Best of Tickle


• Experiment with the number of processes
• Create more test databases


          It’s all in...
Parallel Specs
http://github.com/jasonm/parallel_specs
Faster Tests




    fast_context
http://github.com/lifo/fast_context
Problem




That’s 5 DB Queries and 5 GET Requests
Solution




That’s 1 DB Query and 1 GET Request
   5x Faster w/ one word change
Catch ?

• Tests no longer atomic
• Developers should not need to care about
  atomicity of the tests
• It’s an optimizati...
Writing more effective tests in less time
Factory v/s Fixtures
★ Slow                   ★ Fast
★ Very easy to manage    ★ Hard to manage
★ Describes the data     ★ ...
Example : Fixtures



      What can go wrong ?
• Someone could change users(:lifo) to be no
    longer a ‘free’ account h...
Example : Factory



 What can go wrong ?


   Not much
Factory + Faker
Awesome Development Data
  (Clients and Designers Love it)
Writing Good Factories


• Should be able to loop
  10.times { Factory(:user) }


• No associations in the base Factory
  ...
Integration Tests > Unit Tests
Lesson 3
Improved Security
To know more




http://guides.rubyonrails.org/security.html
rails_xss
http://github.com/NZKoz/rails_xss
 By Michael Kozkiarski of http://therailsway.com/
rails_xss
 Without rails_xss                           With rails_xss



<%= “<script>alert(‘foo’)</script>” %>   <%= “<sc...
rails_xss

• Built in Rails 3
• Enabled by the rails_xss plugin in Rails
  2.3.next
• Requires Erubis
rails_xss
Introduces the concept of SafeBuffer




         ( * just the relevant bits )
rails_xss
>> buffer = ActionView::SafeBuffer.new
=> ""

>> buffer << "Hello ! "
=> "Hello ! "

>> buffer << "<script>"
=> ...
rails_xss

• Uses Erubis hooks to make <%= %> tags to
  always return a SafeBuffer
• Modifies all the relevant Rails helper...
rails_xss
       When you don’t want to escape


<%= "<a href='#{foo_path}'>foo</a>".html_safe! %>
MessageVerifier
http://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html
MessageVerifier

Secret                       Ruby Data




            Signed Message



(Derived from Cookie Session Stor...
MessageVerifier
>> verifier = ActiveSupport::MessageVerifier.new("my super secret")
=> #<ActiveSupport::MessageVerifier:0x2d55...
MessageVerifier

      Example use case
 “Remember Me” Functionality
MessageVerifier
    When you store the ‘remember me’ tokens in the db



• Extra column
• More maintenance
 Expiring tokens...
MessageVerifier
# User.rb
def remember_me_token
 User.remember_me_verifier.generate([self.id, self.salt])
end

# Controller ...
MessageVerifier

• Use a different secret for every use of
  MessageVerifier
  rake secret

• Make sure to use the ‘salt’ fo...
bcrypt
http://bcrypt-ruby.rubyforge.org
bcrypt
             Bcrypt                          MD5/SHA1



★ Designed for generating password ★ Designed for detectin...
bcrypt

• bcrypt-ruby gem by Coda Hale works great
• Reduces the need of ‘salt’ column by
  storing the salt in the encryp...
Lesson 4
Background processing
     with the DJ
      http://github.com/tobi/delayed_job
What is DJ ?
                                                                  DJ
                                        ...
How to use DJ ?




Minimal Example using Delayed::Job.enqueue
How to use DJ ?




  More practical example
How to use DJ ?




   Using send_later
How to use DJ ?




Using handle_asynchronously
Batch Processing w/ DJ
Batch Processing w/ DJ
              Tweetmuffler Requirement




That’s average 4-10 external calls per user. Every 2 minu...
Batch Processing w/ DJ
     Initial Implementation




          1 job/user
Batch Processing w/ DJ


      Problems with that ?
Batch Processing w/ DJ

     DID NOT SCALE

   • Too Slow
   • Way too much memory required
   • Too many workers required
Batch Processing w/ DJ

             Solution


     Fork based workers w/ REE
Batch Processing w/ DJ
Batch Processing w/ DJ

  Has scaled great so far

     • 10x faster
     • Uses 40% less memory
     • Just 1 worker need...
Batch Processing w/ DJ
General things to remember when forking w/ Ruby

     • Always reset the database
       connection...
Batch Processing w/ DJ
REE Specific things to remember when forking


   • Call GC.start before you fork
   • Call GC.copy_...
Lesson 5
  Scaling
http://modporter.com




Scaling file uploads
Mod Porter
            What’s the problem ?


• Rails processes are resource intensive
• Multipart parsing for large files ...
Mod Porter
       How does mod_porter work ?

• mod_porter is an apache module built on
  top of libapreq
• libapreq does ...
Mod Porter
                    Apache Config File

<VirtualHost *:8080>
  ServerName actionrails.com
  DocumentRoot /Users/...
will_paginate
   Does not scale
will_paginate
        The common pattern




SELECT * FROM `posts` LIMIT 10,10
will_paginate
            Scaling Problems

• Large OFFSET are harder to scale
• Problems clear when you have more rows
  ...
How to scale Pagination?
Scalable Pagination




       Github
Scalable Pagination




       Twitter
Scalable Pagination
   What’s common with Github and Twitter ?

• Don’t show all the page links
• Don’t show the total cou...
Scalable Pagination
                         Page 1
page1 = SELECT * FROM `posts` LIMIT 10 WHERE id > 0 ASC id
           ...
Scalable Pagination
                  Benefits ?

• Using no OFFSET is much faster
• Plays great with caching. No records e...
Source
http://www.scribd.com/doc/14683263/Efficient-Pagination-Using-MySQL
                         By the Yahoo folks
That’s all !
Thank you.
Blog http://m.onkey.org

        @lifo
Lessons Learnt in 2009
Upcoming SlideShare
Loading in...5
×

Lessons Learnt in 2009

9,857

Published on

Slides from my presentation at Railssummit 2009, Sao Paulo.

Published in: Technology
0 Comments
42 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,857
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
242
Comments
0
Likes
42
Embeds 0
No embeds

No notes for slide

Lessons Learnt in 2009

  1. 1. Lessons learnt in 2009 Pratik Naik Jobless
  2. 2. Lessons learnt in 2009 Pratik Naik Freelancer/ ActionRails Blogger m.onkey.org Rails Core Team member
  3. 3. And some hobby apps http://planetrubyonrails.com http://tweetmuffler.com
  4. 4. • My first ever presentation. • Usually don’t like any conferences • But it’s Brazil!!
  5. 5. So if I screw up... DONT Tweet FFS
  6. 6. Overview • Using Ruby Enterprise Edition • Testing • Faster tests • Factory v/s Fixtures • Security • Auto escaping for XSS prevention • More MessageVerifier • Asynchronous job processing with DJ • How to use • Fork based batch processing • Scaling • Scaling file uploads with mod_porter • Better Pagination
  7. 7. This talk is targeted at Ruby Web Developers Not everything may apply to things outside the web Also if this doesn’t interest you, now is the time to go the next room :-)
  8. 8. Lesson 1 Always use REE
  9. 9. What is REE ?
  10. 10. Ruby + COW GC And a bunch of other cool patches
  11. 11. Maintained by (Photo acquired by the use of force) (That was just googling) The Phusion Guys
  12. 12. Who uses REE ? And many others
  13. 13. Who uses REE ? And many others In Production
  14. 14. Why should you use it for the development ?
  15. 15. Topmost Reason Super Fast Tests
  16. 16. MRI - Ruby 1.8.6 $ time rake real 1m45.293s user 0m54.341s sys 0m33.008s
  17. 17. REE - Ruby 1.8.6 $ time rake real 1m30.219s user 0m40.290s sys 0m25.433s
  18. 18. That’s15 seconds faster Completely Free YMMV
  19. 19. Only Catch Twitter’s GC Settings
  20. 20. .profile RUBY_HEAP_MIN_SLOTS=500000 RUBY_HEAP_SLOTS_INCREMENT=250000 RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_MALLOC_LIMIT=50000000 http://blog.evanweaver.com/articles/2009/04/09/ruby-gc- tuning/
  21. 21. Second Reason ruby-prof & Rails Performance Tests
  22. 22. What are Rails Performance tests ? Integration Tests + ruby-prof
  23. 23. http://guides.rubyonrails.org/performance_testing.html
  24. 24. $ script/generate performance_test Home exists test/performance/ create test/performance/home_test.rb class HomeTest < ActionController::PerformanceTest def test_homepage get '/' end end
  25. 25. MRI $ rake test:profile HomeTest#test_homepage (29 ms warmup) wall_time: 25 ms memory: 0.00 KB objects: 0 Needs a custom installation with special Patches
  26. 26. REE $ rake test:profile HomeTest#test_homepage (48 ms warmup) wall_time: 16 ms memory: 698.18 KB objects: 21752 Just “works”
  27. 27. Lesson 2 Efficient Testing • Faster Tests • Factory • More Integration Tests & Less Unit Tests
  28. 28. Faster Tests Tickle http://github.com/lifo/tickle
  29. 29. $ script/plugin install git://github.com/lifo/tickle.git
  30. 30. Benchmarks ( they’re real ) $ time rake real 1m30.219s user 0m40.290s sys 0m25.433s
  31. 31. Benchmarks ( they’re real ) $ time rake tickle real 0m55.691s user 0m37.532s sys 0m22.563s
  32. 32. That’s another 35 seconds ( On top of the 15 seconds initially saved by REE )
  33. 33. To get the Best of Tickle • Experiment with the number of processes • Create more test databases It’s all in the README
  34. 34. Parallel Specs http://github.com/jasonm/parallel_specs
  35. 35. Faster Tests fast_context http://github.com/lifo/fast_context
  36. 36. Problem That’s 5 DB Queries and 5 GET Requests
  37. 37. Solution That’s 1 DB Query and 1 GET Request 5x Faster w/ one word change
  38. 38. Catch ? • Tests no longer atomic • Developers should not need to care about atomicity of the tests • It’s an optimization
  39. 39. Writing more effective tests in less time
  40. 40. Factory v/s Fixtures ★ Slow ★ Fast ★ Very easy to manage ★ Hard to manage ★ Describes the data ★ Doesn’t describe the being tested data ★ Hardly Breaks ★ Very Brittle ★ Runs all the callbacks ★ Does not run callbacks
  41. 41. Example : Fixtures What can go wrong ? • Someone could change users(:lifo) to be no longer a ‘free’ account holder • Someone could add more items to ‘lifo’ • Someone could remove lifo’s item! • ‘create_more_items’ could fails because ‘lifo’ failed validations
  42. 42. Example : Factory What can go wrong ? Not much
  43. 43. Factory + Faker Awesome Development Data (Clients and Designers Love it)
  44. 44. Writing Good Factories • Should be able to loop 10.times { Factory(:user) } • No associations in the base Factory Factory(:user) and Factory(:user_with_items) • Should pass the validations
  45. 45. Integration Tests > Unit Tests
  46. 46. Lesson 3 Improved Security
  47. 47. To know more http://guides.rubyonrails.org/security.html
  48. 48. rails_xss http://github.com/NZKoz/rails_xss By Michael Kozkiarski of http://therailsway.com/
  49. 49. rails_xss Without rails_xss With rails_xss <%= “<script>alert(‘foo’)</script>” %> <%= “<script>alert(‘foo’)</script>” %> => => <script>alert(‘foo’)</script> &lt;script&gt;alert('foo')&lt;/script&gt; Must use h() explicitly No need of h()
  50. 50. rails_xss • Built in Rails 3 • Enabled by the rails_xss plugin in Rails 2.3.next • Requires Erubis
  51. 51. rails_xss Introduces the concept of SafeBuffer ( * just the relevant bits )
  52. 52. rails_xss >> buffer = ActionView::SafeBuffer.new => "" >> buffer << "Hello ! " => "Hello ! " >> buffer << "<script>" => "Hello ! &lt;script&gt;"
  53. 53. rails_xss • Uses Erubis hooks to make <%= %> tags to always return a SafeBuffer • Modifies all the relevant Rails helpers and mark them as html_safe!
  54. 54. rails_xss When you don’t want to escape <%= "<a href='#{foo_path}'>foo</a>".html_safe! %>
  55. 55. MessageVerifier http://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html
  56. 56. MessageVerifier Secret Ruby Data Signed Message (Derived from Cookie Session Store)
  57. 57. MessageVerifier >> verifier = ActiveSupport::MessageVerifier.new("my super secret") => #<ActiveSupport::MessageVerifier:0x2d559ec @secret="my super secret", @digest="SHA1"> >> data = [1, 10.days.from_now.utc] => [1, Sat Oct 24 05:28:07 UTC 2009] >> token = verifier.generate(data) # Generate a token that is safe to distribute => "BAhbB2kGSXU6CVRpbWUNBWcbgO7HdXAGOh9AbWFyc2hhbF93aXRoX3V0Y19jb2Vy Y2lvblQ=--ff41cf5575006a2797cad49e6738361346292bfa" >> id, expiry_time = verifier.verify(token) # Get the data back => [1, Sat Oct 24 05:28:07 UTC 2009]
  58. 58. MessageVerifier Example use case “Remember Me” Functionality
  59. 59. MessageVerifier When you store the ‘remember me’ tokens in the db • Extra column • More maintenance Expiring tokens after every use or after password reset • Doesn’t play well with multiple browsers
  60. 60. MessageVerifier # User.rb def remember_me_token User.remember_me_verifier.generate([self.id, self.salt]) end # Controller - when user checks the ‘remember me’ def send_remember_cookie! cookies[:auth_token] = { :value => @current_user.remember_me_token, :expires => 20.years.from_now.utc } end
  61. 61. MessageVerifier • Use a different secret for every use of MessageVerifier rake secret • Make sure to use the ‘salt’ for generating the token, making sure the token expires on the password change
  62. 62. bcrypt http://bcrypt-ruby.rubyforge.org
  63. 63. bcrypt Bcrypt MD5/SHA1 ★ Designed for generating password ★ Designed for detecting data hash tampering ★ Meant to be “slow” ★ Meant to be super “fast”
  64. 64. bcrypt • bcrypt-ruby gem by Coda Hale works great • Reduces the need of ‘salt’ column by storing the salt in the encrypted password column • Allows you to increase the ‘cost factor’ as the computers get faster
  65. 65. Lesson 4 Background processing with the DJ http://github.com/tobi/delayed_job
  66. 66. What is DJ ? DJ Worker Webserver DJ Jobs Database Jobs Worker Webserver DJ Worker Database backed asynchronous priority queue
  67. 67. How to use DJ ? Minimal Example using Delayed::Job.enqueue
  68. 68. How to use DJ ? More practical example
  69. 69. How to use DJ ? Using send_later
  70. 70. How to use DJ ? Using handle_asynchronously
  71. 71. Batch Processing w/ DJ
  72. 72. Batch Processing w/ DJ Tweetmuffler Requirement That’s average 4-10 external calls per user. Every 2 minutes.
  73. 73. Batch Processing w/ DJ Initial Implementation 1 job/user
  74. 74. Batch Processing w/ DJ Problems with that ?
  75. 75. Batch Processing w/ DJ DID NOT SCALE • Too Slow • Way too much memory required • Too many workers required
  76. 76. Batch Processing w/ DJ Solution Fork based workers w/ REE
  77. 77. Batch Processing w/ DJ
  78. 78. Batch Processing w/ DJ Has scaled great so far • 10x faster • Uses 40% less memory • Just 1 worker needed
  79. 79. Batch Processing w/ DJ General things to remember when forking w/ Ruby • Always reset the database connections • Always reset any open file handlers • Make sure the child calls exit! from an ensure block • Make sure mysql allows sufficient number of connections
  80. 80. Batch Processing w/ DJ REE Specific things to remember when forking • Call GC.start before you fork • Call GC.copy_on_write_friendly = true as early as possible. Possibly from the top of the Rakefile and environment.rb
  81. 81. Lesson 5 Scaling
  82. 82. http://modporter.com Scaling file uploads
  83. 83. Mod Porter What’s the problem ? • Rails processes are resource intensive • Multipart parsing for large files can get slower • Keeping a Rails process occupied for multipart parsing of large files can have serious scaling issues
  84. 84. Mod Porter How does mod_porter work ? • mod_porter is an apache module built on top of libapreq • libapreq does the heavy job of multipart parsing in a cheap little apache process • mod_porter sends those multipart files as tmpfile urls to the Rails app • mod_porter Rails plugin makes the whole thing transparent to the application
  85. 85. Mod Porter Apache Config File <VirtualHost *:8080> ServerName actionrails.com DocumentRoot /Users/actionrails/application/current/public Porter On PorterSharedSecret secret </VirtualHost> Rails Configration class ApplicationController < ActionController::Base self.mod_porter_secret = "secret" end
  86. 86. will_paginate Does not scale
  87. 87. will_paginate The common pattern SELECT * FROM `posts` LIMIT 10,10
  88. 88. will_paginate Scaling Problems • Large OFFSET are harder to scale • Problems clear when you have more rows than the memory can hold • Very hard to cache • Extra COUNT queries
  89. 89. How to scale Pagination?
  90. 90. Scalable Pagination Github
  91. 91. Scalable Pagination Twitter
  92. 92. Scalable Pagination What’s common with Github and Twitter ? • Don’t show all the page links • Don’t show the total count • AJAX is much easier to scale when it comes to pagination • Pagination query does not use OFFSET, just LIMIT.
  93. 93. Scalable Pagination Page 1 page1 = SELECT * FROM `posts` LIMIT 10 WHERE id > 0 ASC id page2_min_id = page1.last.id Page 2 page2 = SELECT * FROM `posts` LIMIT 10 WHERE id > page2_min_id ASC id
  94. 94. Scalable Pagination Benefits ? • Using no OFFSET is much faster • Plays great with caching. No records ever get repeated • A little less user friendly as you cannot show all the page numbers
  95. 95. Source http://www.scribd.com/doc/14683263/Efficient-Pagination-Using-MySQL By the Yahoo folks
  96. 96. That’s all !
  97. 97. Thank you. Blog http://m.onkey.org @lifo
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×