Lessons learnt in 2009
Pratik Naik
Freelancer/
ActionRails
Blogger
m.onkey.org
Rails Core Team
member
And some hobby apps
http://planetrubyonrails.com
http://tweetmuffler.com
• My first ever presentation.
• Usually don’t like any conferences
• But it’s Brazil!!
So if I screw up...
DONT Tweet FFS
Overview
• Using Ruby Enterprise Edition
• Testing
• Faster tests
• Factory v/s Fixtures
• Security
• Auto escaping for XSS prevention
• More MessageVerifier
• Asynchronous job processing with DJ
• How to use
• Fork based batch processing
• Scaling
• Scaling file uploads with mod_porter
• Better Pagination
This talk is targeted at
Ruby Web Developers
Not everything may apply to things outside the web
Also if this doesn’t interest you, now is the time to go the
next room :-)
Lesson 1
Always use REE
What is REE ?
Ruby + COW GC
And a bunch of other cool patches
Maintained by
(Photo acquired by the use of force)
(That was just googling)
The Phusion Guys
Who uses REE ?
And many others
Who uses REE ?
And many others
In Production
Why should you use it
for the
development ?
Topmost Reason
Super Fast Tests
MRI - Ruby 1.8.6
$ time rake
real
1m45.293s
user
0m54.341s
sys
0m33.008s
REE - Ruby 1.8.6
$ time rake
real
1m30.219s
user
0m40.290s
sys
0m25.433s
$ script/generate performance_test Home
exists test/performance/
create test/performance/home_test.rb
class HomeTest < ActionController::PerformanceTest
def test_homepage
get '/'
end
end
MRI
$ rake test:profile
HomeTest#test_homepage (29 ms warmup)
wall_time: 25 ms
memory: 0.00 KB
objects: 0
Needs a custom installation with special Patches
REE
$ rake test:profile
HomeTest#test_homepage (48 ms warmup)
wall_time: 16 ms
memory: 698.18 KB
objects: 21752
Just “works”
Lesson 2
Efficient Testing
• Faster Tests
• Factory
• More Integration Tests & Less Unit Tests
Solution
That’s 1 DB Query and 1 GET Request
5x Faster w/ one word change
Catch ?
• Tests no longer atomic
• Developers should not need to care about
atomicity of the tests
• It’s an optimization
Writing more effective tests in less time
Factory v/s Fixtures
★ Slow ★ Fast
★ Very easy to manage ★ Hard to manage
★ Describes the data ★ Doesn’t describe the
being tested data
★ Hardly Breaks ★ Very Brittle
★ Runs all the callbacks ★ Does not run callbacks
Example : Fixtures
What can go wrong ?
• Someone could change users(:lifo) to be no
longer a ‘free’ account holder
• Someone could add more items to ‘lifo’
• Someone could remove lifo’s item!
• ‘create_more_items’ could fails because ‘lifo’
failed validations
Example : Factory
What can go wrong ?
Not much
Factory + Faker
Awesome Development Data
(Clients and Designers Love it)
Writing Good Factories
• Should be able to loop
10.times { Factory(:user) }
• No associations in the base Factory
Factory(:user) and Factory(:user_with_items)
• Should pass the validations
Integration Tests > Unit Tests
Lesson 3
Improved Security
To know more
http://guides.rubyonrails.org/security.html
rails_xss
http://github.com/NZKoz/rails_xss
By Michael Kozkiarski of http://therailsway.com/
rails_xss
Without rails_xss With rails_xss
<%= “<script>alert(‘foo’)</script>” %> <%= “<script>alert(‘foo’)</script>” %>
=> =>
<script>alert(‘foo’)</script> <script>alert('foo')</script>
Must use h() explicitly No need of h()
rails_xss
• Built in Rails 3
• Enabled by the rails_xss plugin in Rails
2.3.next
• Requires Erubis
rails_xss
Introduces the concept of SafeBuffer
( * just the relevant bits )
MessageVerifier
Secret Ruby Data
Signed Message
(Derived from Cookie Session Store)
MessageVerifier
>> verifier = ActiveSupport::MessageVerifier.new("my super secret")
=> #<ActiveSupport::MessageVerifier:0x2d559ec @secret="my super secret",
@digest="SHA1">
>> data = [1, 10.days.from_now.utc]
=> [1, Sat Oct 24 05:28:07 UTC 2009]
>> token = verifier.generate(data) # Generate a token that is safe to distribute
=>
"BAhbB2kGSXU6CVRpbWUNBWcbgO7HdXAGOh9AbWFyc2hhbF93aXRoX3V0Y19jb2Vy
Y2lvblQ=--ff41cf5575006a2797cad49e6738361346292bfa"
>> id, expiry_time = verifier.verify(token) # Get the data back
=> [1, Sat Oct 24 05:28:07 UTC 2009]
MessageVerifier
Example use case
“Remember Me” Functionality
MessageVerifier
When you store the ‘remember me’ tokens in the db
• Extra column
• More maintenance
Expiring tokens after every use or after password reset
• Doesn’t play well with multiple
browsers
MessageVerifier
# User.rb
def remember_me_token
User.remember_me_verifier.generate([self.id, self.salt])
end
# Controller - when user checks the ‘remember me’
def send_remember_cookie!
cookies[:auth_token] = {
:value => @current_user.remember_me_token,
:expires => 20.years.from_now.utc }
end
MessageVerifier
• Use a different secret for every use of
MessageVerifier
rake secret
• Make sure to use the ‘salt’ for generating
the token, making sure the token expires
on the password change
bcrypt
http://bcrypt-ruby.rubyforge.org
bcrypt
Bcrypt MD5/SHA1
★ Designed for generating password ★ Designed for detecting data
hash tampering
★ Meant to be “slow” ★ Meant to be super “fast”
bcrypt
• bcrypt-ruby gem by Coda Hale works great
• Reduces the need of ‘salt’ column by
storing the salt in the encrypted password
column
• Allows you to increase the ‘cost factor’ as
the computers get faster
Lesson 4
Background processing
with the DJ
http://github.com/tobi/delayed_job
What is DJ ?
DJ
Worker
Webserver
DJ
Jobs Database Jobs
Worker
Webserver
DJ
Worker
Database backed asynchronous priority queue
How to use DJ ?
Minimal Example using Delayed::Job.enqueue
How to use DJ ?
More practical example
How to use DJ ?
Using send_later
How to use DJ ?
Using handle_asynchronously
Batch Processing w/ DJ
Batch Processing w/ DJ
Tweetmuffler Requirement
That’s average 4-10 external calls per user. Every 2 minutes.
Batch Processing w/ DJ
Initial Implementation
1 job/user
Batch Processing w/ DJ
Problems with that ?
Batch Processing w/ DJ
DID NOT SCALE
• Too Slow
• Way too much memory required
• Too many workers required
Batch Processing w/ DJ
Solution
Fork based workers w/ REE
Batch Processing w/ DJ
Batch Processing w/ DJ
Has scaled great so far
• 10x faster
• Uses 40% less memory
• Just 1 worker needed
Batch Processing w/ DJ
General things to remember when forking w/ Ruby
• Always reset the database
connections
• Always reset any open file handlers
• Make sure the child calls exit! from
an ensure block
• Make sure mysql allows sufficient
number of connections
Batch Processing w/ DJ
REE Specific things to remember when forking
• Call GC.start before you fork
• Call GC.copy_on_write_friendly = true as
early as possible. Possibly from the top of
the Rakefile and environment.rb
Lesson 5
Scaling
http://modporter.com
Scaling file uploads
Mod Porter
What’s the problem ?
• Rails processes are resource intensive
• Multipart parsing for large files can get
slower
• Keeping a Rails process occupied for
multipart parsing of large files can have
serious scaling issues
Mod Porter
How does mod_porter work ?
• mod_porter is an apache module built on
top of libapreq
• libapreq does the heavy job of multipart
parsing in a cheap little apache process
• mod_porter sends those multipart files as
tmpfile urls to the Rails app
• mod_porter Rails plugin makes the whole
thing transparent to the application
Mod Porter
Apache Config File
<VirtualHost *:8080>
ServerName actionrails.com
DocumentRoot /Users/actionrails/application/current/public
Porter On
PorterSharedSecret secret
</VirtualHost>
Rails Configration
class ApplicationController < ActionController::Base
self.mod_porter_secret = "secret"
end
will_paginate
Does not scale
will_paginate
The common pattern
SELECT * FROM `posts` LIMIT 10,10
will_paginate
Scaling Problems
• Large OFFSET are harder to scale
• Problems clear when you have more rows
than the memory can hold
• Very hard to cache
• Extra COUNT queries
How to scale Pagination?
Scalable Pagination
Github
Scalable Pagination
Twitter
Scalable Pagination
What’s common with Github and Twitter ?
• Don’t show all the page links
• Don’t show the total count
• AJAX is much easier to scale when it
comes to pagination
• Pagination query does not use OFFSET,
just LIMIT.
Scalable Pagination
Page 1
page1 = SELECT * FROM `posts` LIMIT 10 WHERE id > 0 ASC id
page2_min_id = page1.last.id
Page 2
page2 = SELECT * FROM `posts` LIMIT 10 WHERE id >
page2_min_id ASC id
Scalable Pagination
Benefits ?
• Using no OFFSET is much faster
• Plays great with caching. No records ever
get repeated
• A little less user friendly as you cannot
show all the page numbers
Source
http://www.scribd.com/doc/14683263/Efficient-Pagination-Using-MySQL
By the Yahoo folks