How To Write Middleware In Ruby

How To Write
Middleware in Ruby
2016/12/02 RubyConf Taiwan Day 1
Satoshi Tagomori (@tagomoris)

Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, MessagePack-Ruby,
Norikra, ...
Treasure Data, Inc.

http://www.ﬂuentd.org/
open source data collector for uniﬁed logging layer.

LOG
script to
parse data
cron job for
loading
ﬁltering
script
syslog
script
Tweet-
fetching
script
aggregation
script
aggregation
script
script to
parse data
rsync
server
FILE
LOG
FILE
✓ Parse/Format data
✓ Buffering & Retries
✓ Load balancing
✓ Failover
Before
After

Middleware? : Fluentd
• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support
• Linux, Mac and Windows(!)
• Baremetal servers, Virtual machines, Containers
• Many use cases
• Various data, Various data formats, Unexpected errors
• Various traffic - small to huge

• Ruby, JRuby?, Rubinius?
• Many use cases
Middleware? Batches:
Minutes - Hours

• Many use cases
Middleware? Providing APIs
and/or Client Libraries

• Many use cases
Middleware?
Daily Development
& Deployment
Providing Client Tools

• Many use cases
Middleware?
Make Your Application
Stable

• Many use cases
Middleware?
Make Your Application
Fast and Scalable

Case studies from
development of Fluentd
• Platform: Linux, Mac and Windows
• Resource: Memory usage and malloc
• Resource and Stability: Handling JSON
• Stability: Threads and exceptions

Platforms:
Linux, Mac and Windows

Linux and Mac:
Thread/process scheduling
• Both are UNIX-like systems...
• Mac (development), Linux (production)
• Test code must run on both!
• CI services provide multi-environment support
• Fluentd uses Travis CI :D
• Travis CI provides "os" option: "linux" & "osx"
• Important tests to be written: Threading

class MyTest < ::Test::Unit::TestCase
test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
assert_equal ["line 1", "line 2"], data
end
end
test 'client sends 2 data' do
list = []
thr = Thread.new do # Mock server
TCPServer.open("127.0.0.1", 2048) do |server|
while sock = server.accept
list << sock.read.chomp
end
end
end
2.times do |i|
TCPSocket.open("127.0.0.1", 2048) do |client|
client.write "data #{i}"
end
end
assert_equal(["data 0", "data 1"], list)
end
end

Loaded suite example
Started
F
===========================================================================================
Failure: test: client sends 2 data(MyTest)
example.rb:22:in `block in <class:MyTest>'
19: end
20: end
21:
=> 22: assert_equal(["data 0", "data 1"], list)
23: end
24: end
<["data 0", "data 1"]> expected but was
<["data 0"]>
diff:
["data 0", "data 1"]
===========================================================================================
Finished in 0.007253 seconds.
-------------------------------------------------------------------------------------------
1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
0% passed
-------------------------------------------------------------------------------------------
137.87 tests/s, 137.87 assertions/s
Mac OS X (10.11.16)

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
list = []
listening = true
end
end
end
2.times do |i|
end
end
sleep 1
end
end

Started
.
--------------------------------------------------------------------------------------------
100% passed
--------------------------------------------------------------------------------------------
Mac OS X (10.11.16)

Started
E
=================================================================================================
Error: test: client sends 2 data(MyTest): Errno::ECONNREFUSED: Connection refused - connect(2)
for "127.0.0.1" port 2048
example.rb:16:in `initialize'
example.rb:16:in `open'
example.rb:16:in `block (2 levels) in <class:MyTest>'
example.rb:15:in `times'
=================================================================================================
-------------------------------------------------------------------------------------------------
0% passed
-------------------------------------------------------------------------------------------------
Linux (Ubuntu 16.04)

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
require 'socket'
list = []
listening = false
listening = true
end
end
end
sleep 0.1 until listening
2.times do |i|
end
end
require 'timeout'
Timeout.timeout(3){ sleep 0.1 until list.size >= 2 }
end
end

*NIX and Windows:
fork-exec and spawn
• Windows: another thread scheduling :(
• daemonize:
• double fork (or Process.daemon) on *nix
• spawn on Windows
• Execute one another process:
• fork & exec on *nix
• spawn on Windows
• CI on Windows: AppVeyor

Lesson 1:
Run Tests
on All Platforms Supported

Resource:
Memory usage and malloc

Memory Usage:
Object leak
• Temp values must leak in
long running process
• 1,000 objects / hour 
=> 8,760,000 objects / year
• Some solutions:
• In-process GC
• Storage with TTL
• (External storages: Redis, ...)
module MyDaemon
class Process
def hour_key
Time.now.to_i / 3600
end
def hourly_store
@map[hour_key] ||= {}
end
def put(key, value)
hourly_store[key] = value
end
def get(key)
hourly_store[key]
end
# add # of data per hour
def read_data(table_name, data)
key = "records_of_#{table_name}"
put(key, get(key) + data.size)
end
end

Lesson 2:
Make Sure to Collect Garbages

Resource and Stability:
Handling JSON

Formatting Data Into JSON
• Fluentd handles JSON in many use cases
• both of parsing and generating
• it consumes much CPU time...
• JSON, Yajl and Oj
• JSON: ruby standard library
• Yajl (yajl-ruby): ruby binding of YAJL (SAX-based)
• Oj (oj): Optimized JSON

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
require 'json'; require 'yajl'; require 'oj'
Oj.default_options = {bigdecimal_load: :float, mode: :compat, use_to_json: true}
module MyDaemon
class Json
def initialize(mode)
klass = case mode
when :json then JSON
when :yajl then Yajl
when :oj then Oj
end
@proc = klass.method(:dump)
end
def dump(data); @proc.call(data); end
end
end
require 'benchmark'
N = 500_000
obj = {"message" => "a"*100, "100" => 100, "pi" => 3.14159, "true" => true}
Benchmark.bm{|x|
x.report("json") {
formatter = MyDaemon::Json.new(:json)
N.times{ formatter.dump(obj) }
}
x.report("yajl") {
formatter = MyDaemon::Json.new(:yajl)
}
x.report("oj") {
formatter = MyDaemon::Json.new(:oj)
}
}

$ ruby example2.rb
user system total real
json 3.870000 0.050000 3.920000 ( 4.005429)
yajl 2.940000 0.030000 2.970000 ( 2.998924)
oj 1.130000 0.020000 1.150000 ( 1.152596)
# for 500_000 objects
Mac OS X (10.11.16)
Ruby 2.3.1
yajl-ruby 1.3.0
oj 2.18.0

Speed is not only thing:
APIs for unstable I/O
• JSON and Oj have only ".load"
• it raises parse error for:
• incomplete JSON string
• additional bytes after JSON string
• Yajl has stream parser: very useful for servers
• method to feed input data
• callback for parsed objects

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
require 'oj'
Oj.load('{"message":"this is ') # Oj::ParseError
Oj.load('{"message":"this is a pen."}') # => Hash
Oj.load('{"message":"this is a pen."}{"messa"') # Oj::ParseError

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
require 'yajl'
parsed_objs = []
parser = Yajl::Parser.new
parser.on_parse_complete = ->(obj){ parsed_objs << obj }
parse << '{"message":"aaaaaaaaaaaaaaa'
parse << 'aaaaaaaaa"}{"message"' # on_parse_complete is called
parse << ':"bbbbbbbbb"'
parse << '}' # on_parse_complete is called again

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
require 'socket'
require 'oj'
TCPServer.open(port) do |server|
begin
buf = ""
while input = sock.readpartial(1024)
buf << input
# can we feed this value to Oj.load ?
begin
obj = Oj.load(buf) # never succeeds if buf has 2 objects
call_method(obj)
buf = ""
rescue Oj::ParseError
# try with next input ...
end
end
rescue EOFError
sock.close rescue nil
end
end
end

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
require 'socket'
require 'yajl'
TCPServer.open(port) do |server|
begin
parser = Yajl::Parser.new
parser.on_parse_complete = ->(obj){ call_method(obj) }
while input = sock.readpartial(1024)
parser << input
end
rescue EOFError
sock.close rescue nil
end
end
end

Lesson 3:
Choose
Fast/Well-Designed(/Stable)
Libraries

Stability:
Threads and Exceptions

Thread in Ruby
• GVL(GIL): Giant VM Lock (Global Interpreter Lock)
• Just one thread in many threads can run at a time
• Ruby VM can use only 1 CPU core
• Thread in I/O is *not* running
• I/O threads can run in parallel
threads in I/O running threads
• We can write network servers in Ruby!

test 'yay 1' do
data = []
thr = Thread.new do
data << "line 1"
end
data << "line 2"
end
end
class MyTestCase < ::Test::Unit::TestCase
test 'sent data should be received' do
received = []
sent = []
listening = false
th1 = Thread.new do
listening = true
while sock = server.accepto
received << sock.read
end
end
end
["foo", "bar"].each do |str|
begin
end
sent << str
rescue => e
# ignore
end
end
assert_equal sent, received
end
end

Loaded suite example7
Started
.
-------------------------------------------------------------------------------------------
100% passed
-------------------------------------------------------------------------------------------

received = []
sent = []
listening = false
th1 = Thread.new do
listening = true
end
end
end
begin
end
sent << str
rescue => e
# ignore
end
end
assert_equal sent, received
end
end

received = []
sent = []
listening = false
th1 = Thread.new do
listening = true
end
end
end
begin
end
sent << str
rescue => e
# ignore
end
end
assert_equal sent, received # [] == []
end
end

Thread in Ruby:
Methods for errors
• Threads will die silently if any errors are raised
• abort_on_exception
• raise error in threads on main thread if true
• required to make sure not to create false success
(silent crash)
• report_on_exception
• warn errors in threads if true (2.4 feature)

received = []
sent = []
listening = false
th1 = Thread.new do
Thread.current.abort_on_exception = true
listening = true
end
end
end
begin
end
sent << str
rescue => e
# ignore
end
end
assert_equal sent, received # [] == []
end
end

Loaded suite example7
Started
E
===========================================================================================
Error: test: sent data should be received(MyTestCase):
NoMethodError: undefined method `accepto' for #<TCPServer:(closed)>
Did you mean? accept
example7.rb:14:in `block (3 levels) in <class:MyTestCase>'
example7.rb:12:in `open'
example7.rb:12:in `block (2 levels) in <class:MyTestCase>'
===========================================================================================
-------------------------------------------------------------------------------------------
0% passed
-------------------------------------------------------------------------------------------
sleeping = false
Thread.abort_on_exception = true
Thread.new{ sleep 0.1 until sleeping ; raise "yay" }
begin
sleeping = true
sleep 5
rescue => e
p(here: "rescue in main thread", error: e)
end
p "foo!"

Thread in Ruby:
Process crash from errors in threads
• Middleware SHOULD NOT crash as far as possible :)
• An error from a TCP connection MUST NOT crash the
whole process
• Many points to raise errors...
• Socket I/O, Executing commands
• Parsing HTTP requests, Parsing JSON (or other formats)
• Process
• should crash in tests, but
• should not in production

Thread in Ruby:
What needed in your code about threads
• Set Thread#abort_on_exception = true
• for almost all threads...
• "rescue" all errors in threads
• to log these errors, and not to crash whole process
• "raise" rescued errors again only in testing
• to make tests failed for bugs

Lesson 4:
Handle Exceptions
in Right Way

Wrap-up:
Writing Middleware is ...

Writing Middleware:
• Taking care about:
• various platforms and environment
• Resource usage and stability
• Requiring to know about:
• Ruby's features
• Ruby VM's behavior
• Library implementation
• In different viewpoint from writing applications!

Write your code,
like middleware :D
Make it efﬁcient & stable!
Thank you!
@tagomoris

Started
F
===========================================================================================
Failure: test: client sends 2 data(MyTest)
19: end
20: end
21:
=> 22: assert_equal(["data 0", "data 1"], list)
23: end
24: end
<["data 0", "data 1"]> expected but was
<["data 0", "data 1"]>
diff:
["data 0", "data 1"]
===========================================================================================
-------------------------------------------------------------------------------------------
0% passed
-------------------------------------------------------------------------------------------
Mac OS X (10.11.16)

Memory Usage:
Memory fragmentation
• High memory usage, low # of objects
• memory fragmentation?
• glibc malloc: weak for ﬁne-grained memory allocation
and multi threading
• Switching to jemalloc by LD_PRELOAD
• FreeBSD standard malloc (available on Linux)
• ﬂuentd's rpm/deb package uses jemalloc in default

abort_on_exception in detail
• It doesn't abort the whole process, actually
• it just re-raise errors in main thread
sleeping = false
Thread.abort_on_exception = true
Thread.new{ sleep 0.1 until sleeping ; raise "yay" }
begin
sleeping = true
sleep 5
rescue => e
p(here: "rescue in main thread", error: e)
end
p "foo!"
$ ruby example.rb
{:here=>"rescue in main thread", :error=>#<RuntimeError: yay>}
"foo!"

How To Write Middleware In Ruby

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to How To Write Middleware In Ruby

Similar to How To Write Middleware In Ruby (20)

More from SATOSHI TAGOMORI

More from SATOSHI TAGOMORI (19)

Recently uploaded

Recently uploaded (20)

How To Write Middleware In Ruby