How To Write Middleware In Ruby

2,690 views

Published on

Talk at #rubyconftw day 1

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,690
On SlideShare
0
From Embeds
0
Number of Embeds
2,061
Actions
Shares
0
Downloads
4
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

How To Write Middleware In Ruby

  1. 1. How To Write Middleware in Ruby 2016/12/02 RubyConf Taiwan Day 1 Satoshi Tagomori (@tagomoris)
  2. 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  3. 3. http://www.fluentd.org/ open source data collector for unified logging layer.
  4. 4. LOG script to parse data cron job for loading filtering script syslog script Tweet- fetching script aggregation script aggregation script script to parse data rsync server FILE LOG FILE ✓ Parse/Format data ✓ Buffering & Retries ✓ Load balancing ✓ Failover Before After
  5. 5. Middleware? : Fluentd • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
  6. 6. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Batches: Minutes - Hours
  7. 7. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Providing APIs and/or Client Libraries
  8. 8. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Daily Development & Deployment Providing Client Tools
  9. 9. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Make Your Application Stable
  10. 10. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Make Your Application Fast and Scalable
  11. 11. Case studies from development of Fluentd • Platform: Linux, Mac and Windows • Resource: Memory usage and malloc • Resource and Stability: Handling JSON • Stability: Threads and exceptions
  12. 12. Platforms: Linux, Mac and Windows
  13. 13. Linux and Mac: Thread/process scheduling • Both are UNIX-like systems... • Mac (development), Linux (production) • Test code must run on both! • CI services provide multi-environment support • Fluentd uses Travis CI :D • Travis CI provides "os" option: "linux" & "osx" • Important tests to be written: Threading
  14. 14. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end assert_equal(["data 0", "data 1"], list) end end
  15. 15. Loaded suite example Started F =========================================================================================== Failure: test: client sends 2 data(MyTest) example.rb:22:in `block in <class:MyTest>' 19: end 20: end 21: => 22: assert_equal(["data 0", "data 1"], list) 23: end 24: end <["data 0", "data 1"]> expected but was <["data 0"]> diff: ["data 0", "data 1"] =========================================================================================== Finished in 0.007253 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 137.87 tests/s, 137.87 assertions/s Mac OS X (10.11.16)
  16. 16. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end assert_equal(["data 0", "data 1"], list) end end
  17. 17. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  18. 18. Loaded suite example Started . Finished in 1.002745 seconds. -------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed -------------------------------------------------------------------------------------------- 1.00 tests/s, 1.00 assertions/s Mac OS X (10.11.16)
  19. 19. Loaded suite example Started E ================================================================================================= Error: test: client sends 2 data(MyTest): Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 2048 example.rb:16:in `initialize' example.rb:16:in `open' example.rb:16:in `block (2 levels) in <class:MyTest>' example.rb:15:in `times' example.rb:15:in `block in <class:MyTest>' ================================================================================================= Finished in 0.005918197 seconds. ------------------------------------------------------------------------------------------------- 1 tests, 0 assertions, 0 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------------- 168.97 tests/s, 0.00 assertions/s Linux (Ubuntu 16.04)
  20. 20. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  21. 21. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  22. 22. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  23. 23. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'socket' class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] listening = false thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end sleep 0.1 until listening 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end require 'timeout' Timeout.timeout(3){ sleep 0.1 until list.size >= 2 } assert_equal(["data 0", "data 1"], list) end end
  24. 24. *NIX and Windows: fork-exec and spawn • Windows: another thread scheduling :( • daemonize: • double fork (or Process.daemon) on *nix • spawn on Windows • Execute one another process: • fork & exec on *nix • spawn on Windows • CI on Windows: AppVeyor
  25. 25. Lesson 1: Run Tests on All Platforms Supported
  26. 26. Resource: Memory usage and malloc
  27. 27. Memory Usage: Object leak • Temp values must leak in long running process • 1,000 objects / hour
 => 8,760,000 objects / year • Some solutions: • In-process GC • Storage with TTL • (External storages: Redis, ...) module MyDaemon class Process def hour_key Time.now.to_i / 3600 end def hourly_store @map[hour_key] ||= {} end def put(key, value) hourly_store[key] = value end def get(key) hourly_store[key] end # add # of data per hour def read_data(table_name, data) key = "records_of_#{table_name}" put(key, get(key) + data.size) end end
  28. 28. Lesson 2: Make Sure to Collect Garbages
  29. 29. Resource and Stability: Handling JSON
  30. 30. Formatting Data Into JSON • Fluentd handles JSON in many use cases • both of parsing and generating • it consumes much CPU time... • JSON, Yajl and Oj • JSON: ruby standard library • Yajl (yajl-ruby): ruby binding of YAJL (SAX-based) • Oj (oj): Optimized JSON
  31. 31. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'json'; require 'yajl'; require 'oj' Oj.default_options = {bigdecimal_load: :float, mode: :compat, use_to_json: true} module MyDaemon class Json def initialize(mode) klass = case mode when :json then JSON when :yajl then Yajl when :oj then Oj end @proc = klass.method(:dump) end def dump(data); @proc.call(data); end end end require 'benchmark' N = 500_000 obj = {"message" => "a"*100, "100" => 100, "pi" => 3.14159, "true" => true} Benchmark.bm{|x| x.report("json") { formatter = MyDaemon::Json.new(:json) N.times{ formatter.dump(obj) } } x.report("yajl") { formatter = MyDaemon::Json.new(:yajl) N.times{ formatter.dump(obj) } } x.report("oj") { formatter = MyDaemon::Json.new(:oj) N.times{ formatter.dump(obj) } } }
  32. 32. $ ruby example2.rb user system total real json 3.870000 0.050000 3.920000 ( 4.005429) yajl 2.940000 0.030000 2.970000 ( 2.998924) oj 1.130000 0.020000 1.150000 ( 1.152596) # for 500_000 objects Mac OS X (10.11.16) Ruby 2.3.1 yajl-ruby 1.3.0 oj 2.18.0
  33. 33. Speed is not only thing: APIs for unstable I/O • JSON and Oj have only ".load" • it raises parse error for: • incomplete JSON string • additional bytes after JSON string • Yajl has stream parser: very useful for servers • method to feed input data • callback for parsed objects
  34. 34. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'oj' Oj.load('{"message":"this is ') # Oj::ParseError Oj.load('{"message":"this is a pen."}') # => Hash Oj.load('{"message":"this is a pen."}{"messa"') # Oj::ParseError
  35. 35. Speed is not only thing: APIs for unstable I/O • JSON and Oj have only ".load" • it raises parse error for: • incomplete JSON string • additional bytes after JSON string • Yajl has stream parser: very useful for servers • method to feed input data • callback for parsed objects
  36. 36. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'yajl' parsed_objs = [] parser = Yajl::Parser.new parser.on_parse_complete = ->(obj){ parsed_objs << obj } parse << '{"message":"aaaaaaaaaaaaaaa' parse << 'aaaaaaaaa"}{"message"' # on_parse_complete is called parse << ':"bbbbbbbbb"' parse << '}' # on_parse_complete is called again
  37. 37. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'socket' require 'oj' TCPServer.open(port) do |server| while sock = server.accept begin buf = "" while input = sock.readpartial(1024) buf << input # can we feed this value to Oj.load ? begin obj = Oj.load(buf) # never succeeds if buf has 2 objects call_method(obj) buf = "" rescue Oj::ParseError # try with next input ... end end rescue EOFError sock.close rescue nil end end end
  38. 38. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'socket' require 'yajl' TCPServer.open(port) do |server| while sock = server.accept begin parser = Yajl::Parser.new parser.on_parse_complete = ->(obj){ call_method(obj) } while input = sock.readpartial(1024) parser << input end rescue EOFError sock.close rescue nil end end end
  39. 39. Lesson 3: Choose Fast/Well-Designed(/Stable) Libraries
  40. 40. Stability: Threads and Exceptions
  41. 41. Thread in Ruby • GVL(GIL): Giant VM Lock (Global Interpreter Lock) • Just one thread in many threads can run at a time • Ruby VM can use only 1 CPU core • Thread in I/O is *not* running • I/O threads can run in parallel threads in I/O running threads • We can write network servers in Ruby!
  42. 42. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  43. 43. Loaded suite example7 Started . Finished in 0.104729 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ------------------------------------------------------------------------------------------- 9.55 tests/s, 9.55 assertions/s
  44. 44. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  45. 45. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  46. 46. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  47. 47. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  48. 48. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  49. 49. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received # [] == [] end end
  50. 50. Thread in Ruby: Methods for errors • Threads will die silently if any errors are raised • abort_on_exception • raise error in threads on main thread if true • required to make sure not to create false success (silent crash) • report_on_exception • warn errors in threads if true (2.4 feature)
  51. 51. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do Thread.current.abort_on_exception = true TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received # [] == [] end end
  52. 52. Loaded suite example7 Started E =========================================================================================== Error: test: sent data should be received(MyTestCase): NoMethodError: undefined method `accepto' for #<TCPServer:(closed)> Did you mean? accept example7.rb:14:in `block (3 levels) in <class:MyTestCase>' example7.rb:12:in `open' example7.rb:12:in `block (2 levels) in <class:MyTestCase>' =========================================================================================== Finished in 0.0046 seconds. ------------------------------------------------------------------------------------------- 1 tests, 0 assertions, 0 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 217.39 tests/s, 0.00 assertions/s sleeping = false Thread.abort_on_exception = true Thread.new{ sleep 0.1 until sleeping ; raise "yay" } begin sleeping = true sleep 5 rescue => e p(here: "rescue in main thread", error: e) end p "foo!"
  53. 53. Thread in Ruby: Process crash from errors in threads • Middleware SHOULD NOT crash as far as possible :) • An error from a TCP connection MUST NOT crash the whole process • Many points to raise errors... • Socket I/O, Executing commands • Parsing HTTP requests, Parsing JSON (or other formats) • Process • should crash in tests, but • should not in production
  54. 54. Thread in Ruby: What needed in your code about threads • Set Thread#abort_on_exception = true • for almost all threads... • "rescue" all errors in threads • to log these errors, and not to crash whole process • "raise" rescued errors again only in testing • to make tests failed for bugs
  55. 55. Lesson 4: Handle Exceptions in Right Way
  56. 56. Wrap-up: Writing Middleware is ...
  57. 57. Writing Middleware: • Taking care about: • various platforms and environment • Resource usage and stability • Requiring to know about: • Ruby's features • Ruby VM's behavior • Library implementation • In different viewpoint from writing applications!
  58. 58. Write your code, like middleware :D Make it efficient & stable! Thank you! @tagomoris
  59. 59. Loaded suite example Started F =========================================================================================== Failure: test: client sends 2 data(MyTest) example.rb:22:in `block in <class:MyTest>' 19: end 20: end 21: => 22: assert_equal(["data 0", "data 1"], list) 23: end 24: end <["data 0", "data 1"]> expected but was <["data 0", "data 1"]> diff: ["data 0", "data 1"] =========================================================================================== Finished in 0.009425 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 106.10 tests/s, 106.10 assertions/s Mac OS X (10.11.16)
  60. 60. Memory Usage: Memory fragmentation • High memory usage, low # of objects • memory fragmentation? • glibc malloc: weak for fine-grained memory allocation and multi threading • Switching to jemalloc by LD_PRELOAD • FreeBSD standard malloc (available on Linux) • fluentd's rpm/deb package uses jemalloc in default
  61. 61. abort_on_exception in detail • It doesn't abort the whole process, actually • it just re-raise errors in main thread sleeping = false Thread.abort_on_exception = true Thread.new{ sleep 0.1 until sleeping ; raise "yay" } begin sleeping = true sleep 5 rescue => e p(here: "rescue in main thread", error: e) end p "foo!" $ ruby example.rb {:here=>"rescue in main thread", :error=>#<RuntimeError: yay>} "foo!"

×