Scaling Rails with Ruby-Prof
Ben Hughes, RubyConf Kenya 2017
Airbnb’s Problem
Things were slow
A Journey into Ruby-
Prof
And related concepts
Making it Accessible
It needs to be easy
Let’s talk about profilers
Me
____________________________
Tech Lead on Performance Engineering team
Joined Airbnb in 2012, have seen tremendous growth
Handled database scalability for a lot of that growth…
@schleyfox on twitter/github
… but pretty boring on both
Airbnb
Airbnb’s Problem
That Awesome Slide Title of Yours
Growth
Response Time
Total Time
… and our growth isn’t linear
Slowness is a problem because….
E I R I E N H T T P S : / / F L I C . K R / P / E G Y H M W
User Experience
____________________________
Waiting is frustrating
Google found +500ms reduced
traffic/revenue by 20%
Amazon found +100ms cost 1%
in sales
We see significant lift in bookings
in experiments that improve real
or perceived performance
CollegeDegrees360 https://flic.kr/p/cEJpCY
http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html
http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-
crush-it
Server Growth
____________________________
$$$$$
Alex https://flic.kr/p/9v7Kgx
But also reliability and maintenance
It gets worse though
ERROR 1040: Too many connections
https://medium.com/airbnb-engineering/unlocking-horizontal-
scalability-in-our-web-serving-tier-d907449cdbcf
Why So Slow?
H T T P : / / W W W . H I G H S N O B I E T Y. C O M / 2 0 1 6 / 0 1 / 2 8 / H E A T H - L E D G E R - D A R K - K N I G H T - J O K E R /
No longer the obvious
____________________________
Slow database queries
N+1 queries
Super obvious slow code
Better Tools
____________________________
Nelson Gauthier championed
profiling
Stuck in old way of thinking
With our familiar tools
Required investment to get value
out of new things
Shift in our thinking
It was the new API framework (in the parlor with the candlestick)
____________________________
We were switching over to a new internal framework for defining our API
Offered many great features around presentation, validation, security
Heavily aspect oriented and metaprogrammed
The execution was difficult to follow,
therefore intuition was insufficient
The profiler tore down the artifice
like tears to mascara
gate_attribute!
6.03%
read_authorization [:airbnb, :public], [
:icon,
:id,
:is_safety_feature,
:explore_url,
:name,
:tooltip,
:is_present,
:listing_id,
], lambda { can_view? }
id
This work was hugely influential
____________________________
Changed how we thought about building the API
Heavily shaped the roadmap
Projects that came out of this are still underway
We lucked out on timing
Also had a thing called SmartHash
Getting these profiles
was hard
____________________________
Everything was a one-off
Artisanal
Hot-patching live servers
Hiding from unicorn’s process
reaper
Portlandia, IFC (https://www.racked.com/2011/10/24/7746887/portlandia-
is-making-people-not-want-to-put-a-bird-on-it)
Airbnb’s Performance Problem
____________________________
We saw that our response time was getting worse
and that this was bad
Profiling ultimately revealed the causes of the degradation
which were in some specific parts of the new API
A Journey into Ruby-Prof
Ruby-Prof
____________________________
Is a tracing profiler for Ruby
Uses rb_add_event_hook to hook into the VM
Same function used internally by Kernel.set_trace_func and TracePoint
Biased towards sections of Ruby compute, C functions and IO are true to time
Let’s trace some code
def a
5.times do
[0].each do
sleep(0.1)
end
end
sleep(0.1)
end
def b
[0,2,3].map do
sleep(0.1)
end
sleep(0.1)
end
def c
[1,2,3].select do
[0].map do
[0].each do
sleep(0.1)
end
end
false
end
end
def d
sleep(0.1)
end
def bar
a
b
c
d
end
Object::bardef d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar
bar
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351613307823
def d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
C O D E
… skipping…
def d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
C O D E
… skipping…
def d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
C O D E
… skipping…
Object::bardef d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar
…
bar
d
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351614653317
Object::d
Object::bar;Object::d
Kernel::sleep
Object::d
Object::bar
def d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar
…
Object::bar;Object::d
Object::bar;Object::d;Kernel::sleep
bar
d
Kernel
#sleep
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351614653327
Kernel::sleep
Object::d
Object::bar
def d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar
…
Object::bar;Object::d
Object::bar;Object::d;Kernel::sleep
bar
d
Kernel
#sleep
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351614756966
1496351614653327
103639
Object::d
Object::bar
def d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar
…
Object::bar;Object::d 103659
Object::bar;Object::d;Kernel::sleep 103639
bar
d
Kernel
#sleep
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351614756976
1496351614653317
Object::bardef d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar 1449163
…
Object::bar;Object::d 103659
Object::bar;Object::d;Kernel::sleep 103639
bar
d
Kernel
#sleep
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351614756986
1496351613307823
Object::bardef d
sleep(0.1)
end
def bar
a
b
c
d
end
bar
Object::bar 1449163 90
…
Object::bar;Object::d 103659 20
Object::bar;Object::d;Kernel::sleep 103639
103639
bar
d
Kernel
#sleep
C O D E S T A C K
gettimeofday(&tv, NULL)
C A L L I N G C O N T E X T S
C A L L I N G C O N T E X T T R E E
M E A S U R E R
> 1496351614756986
1496351613307823
Calling Context Trees
____________________________
Aggregates on calling context
(call stack)
Identical methods called through
different paths don’t group
Visualized through flame graphs
bar
a b c d
Integer
#times
Array
#each
Kernel
#sleep
Kernel
#sleep
Array
#map
Kernel
#sleep
Kernel
#sleep
Array
#select
Array
#map
Array
#each
Kernel
#sleep
Kernel
#sleep
Flame Graph of the Tree
🔥 FLAME GRAPHS 🔥
http://www.brendangregg.com/flamegraphs.html
CCTs (cont.)
____________________________
Harder to see diffuse costs
E.g. from this it’s not obvious that
a, b, c, d are slow because of the
same method.
bar
a b c d
Integer
#times
Array
#each
Kernel
#sleep
Kernel
#sleep
Array
#map
Kernel
#sleep
Kernel
#sleep
Array
#select
Array
#map
Array
#each
Kernel
#sleep
Kernel
#sleep
Call Graphs
____________________________
Grouped by method
Highlight common costs/
opportunities
Can be powerfully explored in
QCacheGrind
bar
a b c d
Integer
#times
Array
#each
Kernel
#sleep
Array
#map
Array
#select
QCacheGrind
____________________________
Qt only version of KDE’s
KCacheGrind
… because 
With much right-clicking, you can
find the answers
Simple right?
Choke Points
Choke Points
Method Elimination
profile = RubyProf::Profile.new
profile.exclude_methods!(Object, [:trampoline1, :trampoline2])
Our addition/optimization
Exclude common methods
(could also be handled by expanding calling contexts)
After Method Elimination
After Method Elimination
Kernel#public_send
Recursion
def leaf
end
def leaf_b
end
def kid_a
leaf
end
def kid_b
3.times do
leaf_b
end
end
def root
kid_a
kid_b
kid_a
end
root
Recursion
Recursion
def leaf
end
def leaf_b
end
def kid_a
leaf
end
def kid_b
3.times do
leaf_b
end
end
def root
kid_a
kid_b
kid_a
end
2.times do
root
end
Odd
103.15%?!?
Expand Calling Contexts
Can still lead to madness
Other strategies and groupings
available. Tool support for interactive
expansion would be 👌
Multiple Measures
____________________________
Measure many things in the same profile
Provides different views on the same execution
Wall time shows something is slow, Allocations count can help explain why
And provide insight into things like GC
Process time is useful, but very slow on AWS :( < https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/>
Deserializing Thrift was v expensive
Wall Time:
Deserializing Thrift was v expensive
Allocations:
Deserializing Thrift was v expensive
Allocations:
Mystery Solved
fname, ftype, fid = iprot.read_field_begin
That’s an Array!
Mystery Solved
fname, ftype, fid = iprot.read_field_begin(ary)
With visibility from ruby-prof, even
micro-optimizations become
worthwhile!
This change was
… where result is sometimes a Date
Make it Accessible
Getting profiles was hard!
… and specialized
I, myself, never even captured any
Too expensive to run in production
Created a benchmark of common actions
____________________________
Run against production data
Triggered on every deploy
Update profiled actions as usage changes
Balance of repeatability and low maintenance
Stores profiles in S3
Creates an historical record
Downloading profiles is easy
See changes over time
Spun off into
____________________________
A silhouette-runner, where developers provide a rack env, and get profiles back
Development mode integration
(Though, of course, dev and prod can be quite different)
Adopted internally to get massive wins on important pages
(Like the page where you book)
We have similar setups for Java (and soon JavaScript!)
And you can have it, too
airbnb-ruby-prof
____________________________
https://rubygems.org/gems/airbnb-ruby-prof / https://github.com/airbnb/ruby-prof
If you’re on Ruby >= 2.1, it should just work
… If you’re stuck on 1.9, talk to me about some patchsets
Airbnb’s Problem
Things were slow
A Journey into Ruby-
Prof
And related concepts
Making it Accessible
It needs to be easy
We have talked about profilers
Questions?
Scaling Rails with Ruby-prof -- Ruby Conf Kenya 2017 by Ben Hughes

Scaling Rails with Ruby-prof -- Ruby Conf Kenya 2017 by Ben Hughes