Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How To Write Middleware In Ruby

5,515 views

Published on

Talk at #rubyconftw day 1

Published in: Software
  • Be the first to comment

How To Write Middleware In Ruby

  1. 1. How To Write Middleware in Ruby 2016/12/02 RubyConf Taiwan Day 1 Satoshi Tagomori (@tagomoris)
  2. 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  3. 3. http://www.fluentd.org/ open source data collector for unified logging layer.
  4. 4. LOG script to parse data cron job for loading filtering script syslog script Tweet- fetching script aggregation script aggregation script script to parse data rsync server FILE LOG FILE ✓ Parse/Format data ✓ Buffering & Retries ✓ Load balancing ✓ Failover Before After
  5. 5. Middleware? : Fluentd • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
  6. 6. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Batches: Minutes - Hours
  7. 7. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Providing APIs and/or Client Libraries
  8. 8. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Daily Development & Deployment Providing Client Tools
  9. 9. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Make Your Application Stable
  10. 10. • Long running daemon process • Compatibility for API, behavior and configuration files • Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers • Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge Middleware? Make Your Application Fast and Scalable
  11. 11. Case studies from development of Fluentd • Platform: Linux, Mac and Windows • Resource: Memory usage and malloc • Resource and Stability: Handling JSON • Stability: Threads and exceptions
  12. 12. Platforms: Linux, Mac and Windows
  13. 13. Linux and Mac: Thread/process scheduling • Both are UNIX-like systems... • Mac (development), Linux (production) • Test code must run on both! • CI services provide multi-environment support • Fluentd uses Travis CI :D • Travis CI provides "os" option: "linux" & "osx" • Important tests to be written: Threading
  14. 14. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end assert_equal(["data 0", "data 1"], list) end end
  15. 15. Loaded suite example Started F =========================================================================================== Failure: test: client sends 2 data(MyTest) example.rb:22:in `block in <class:MyTest>' 19: end 20: end 21: => 22: assert_equal(["data 0", "data 1"], list) 23: end 24: end <["data 0", "data 1"]> expected but was <["data 0"]> diff: ["data 0", "data 1"] =========================================================================================== Finished in 0.007253 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 137.87 tests/s, 137.87 assertions/s Mac OS X (10.11.16)
  16. 16. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end assert_equal(["data 0", "data 1"], list) end end
  17. 17. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  18. 18. Loaded suite example Started . Finished in 1.002745 seconds. -------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed -------------------------------------------------------------------------------------------- 1.00 tests/s, 1.00 assertions/s Mac OS X (10.11.16)
  19. 19. Loaded suite example Started E ================================================================================================= Error: test: client sends 2 data(MyTest): Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 2048 example.rb:16:in `initialize' example.rb:16:in `open' example.rb:16:in `block (2 levels) in <class:MyTest>' example.rb:15:in `times' example.rb:15:in `block in <class:MyTest>' ================================================================================================= Finished in 0.005918197 seconds. ------------------------------------------------------------------------------------------------- 1 tests, 0 assertions, 0 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------------- 168.97 tests/s, 0.00 assertions/s Linux (Ubuntu 16.04)
  20. 20. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  21. 21. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  22. 22. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end sleep 1 assert_equal(["data 0", "data 1"], list) end end
  23. 23. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'socket' class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] listening = false thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end sleep 0.1 until listening 2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end require 'timeout' Timeout.timeout(3){ sleep 0.1 until list.size >= 2 } assert_equal(["data 0", "data 1"], list) end end
  24. 24. *NIX and Windows: fork-exec and spawn • Windows: another thread scheduling :( • daemonize: • double fork (or Process.daemon) on *nix • spawn on Windows • Execute one another process: • fork & exec on *nix • spawn on Windows • CI on Windows: AppVeyor
  25. 25. Lesson 1: Run Tests on All Platforms Supported
  26. 26. Resource: Memory usage and malloc
  27. 27. Memory Usage: Object leak • Temp values must leak in long running process • 1,000 objects / hour
 => 8,760,000 objects / year • Some solutions: • In-process GC • Storage with TTL • (External storages: Redis, ...) module MyDaemon class Process def hour_key Time.now.to_i / 3600 end def hourly_store @map[hour_key] ||= {} end def put(key, value) hourly_store[key] = value end def get(key) hourly_store[key] end # add # of data per hour def read_data(table_name, data) key = "records_of_#{table_name}" put(key, get(key) + data.size) end end
  28. 28. Lesson 2: Make Sure to Collect Garbages
  29. 29. Resource and Stability: Handling JSON
  30. 30. Formatting Data Into JSON • Fluentd handles JSON in many use cases • both of parsing and generating • it consumes much CPU time... • JSON, Yajl and Oj • JSON: ruby standard library • Yajl (yajl-ruby): ruby binding of YAJL (SAX-based) • Oj (oj): Optimized JSON
  31. 31. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'json'; require 'yajl'; require 'oj' Oj.default_options = {bigdecimal_load: :float, mode: :compat, use_to_json: true} module MyDaemon class Json def initialize(mode) klass = case mode when :json then JSON when :yajl then Yajl when :oj then Oj end @proc = klass.method(:dump) end def dump(data); @proc.call(data); end end end require 'benchmark' N = 500_000 obj = {"message" => "a"*100, "100" => 100, "pi" => 3.14159, "true" => true} Benchmark.bm{|x| x.report("json") { formatter = MyDaemon::Json.new(:json) N.times{ formatter.dump(obj) } } x.report("yajl") { formatter = MyDaemon::Json.new(:yajl) N.times{ formatter.dump(obj) } } x.report("oj") { formatter = MyDaemon::Json.new(:oj) N.times{ formatter.dump(obj) } } }
  32. 32. $ ruby example2.rb user system total real json 3.870000 0.050000 3.920000 ( 4.005429) yajl 2.940000 0.030000 2.970000 ( 2.998924) oj 1.130000 0.020000 1.150000 ( 1.152596) # for 500_000 objects Mac OS X (10.11.16) Ruby 2.3.1 yajl-ruby 1.3.0 oj 2.18.0
  33. 33. Speed is not only thing: APIs for unstable I/O • JSON and Oj have only ".load" • it raises parse error for: • incomplete JSON string • additional bytes after JSON string • Yajl has stream parser: very useful for servers • method to feed input data • callback for parsed objects
  34. 34. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'oj' Oj.load('{"message":"this is ') # Oj::ParseError Oj.load('{"message":"this is a pen."}') # => Hash Oj.load('{"message":"this is a pen."}{"messa"') # Oj::ParseError
  35. 35. Speed is not only thing: APIs for unstable I/O • JSON and Oj have only ".load" • it raises parse error for: • incomplete JSON string • additional bytes after JSON string • Yajl has stream parser: very useful for servers • method to feed input data • callback for parsed objects
  36. 36. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'yajl' parsed_objs = [] parser = Yajl::Parser.new parser.on_parse_complete = ->(obj){ parsed_objs << obj } parse << '{"message":"aaaaaaaaaaaaaaa' parse << 'aaaaaaaaa"}{"message"' # on_parse_complete is called parse << ':"bbbbbbbbb"' parse << '}' # on_parse_complete is called again
  37. 37. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'socket' require 'oj' TCPServer.open(port) do |server| while sock = server.accept begin buf = "" while input = sock.readpartial(1024) buf << input # can we feed this value to Oj.load ? begin obj = Oj.load(buf) # never succeeds if buf has 2 objects call_method(obj) buf = "" rescue Oj::ParseError # try with next input ... end end rescue EOFError sock.close rescue nil end end end
  38. 38. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end require 'socket' require 'yajl' TCPServer.open(port) do |server| while sock = server.accept begin parser = Yajl::Parser.new parser.on_parse_complete = ->(obj){ call_method(obj) } while input = sock.readpartial(1024) parser << input end rescue EOFError sock.close rescue nil end end end
  39. 39. Lesson 3: Choose Fast/Well-Designed(/Stable) Libraries
  40. 40. Stability: Threads and Exceptions
  41. 41. Thread in Ruby • GVL(GIL): Giant VM Lock (Global Interpreter Lock) • Just one thread in many threads can run at a time • Ruby VM can use only 1 CPU core • Thread in I/O is *not* running • I/O threads can run in parallel threads in I/O running threads • We can write network servers in Ruby!
  42. 42. class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = [] thr = Thread.new do data << "line 1" end data << "line 2" assert_equal ["line 1", "line 2"], data end end class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  43. 43. Loaded suite example7 Started . Finished in 0.104729 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ------------------------------------------------------------------------------------------- 9.55 tests/s, 9.55 assertions/s
  44. 44. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  45. 45. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  46. 46. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  47. 47. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  48. 48. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received end end
  49. 49. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received # [] == [] end end
  50. 50. Thread in Ruby: Methods for errors • Threads will die silently if any errors are raised • abort_on_exception • raise error in threads on main thread if true • required to make sure not to create false success (silent crash) • report_on_exception • warn errors in threads if true (2.4 feature)
  51. 51. class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false th1 = Thread.new do Thread.current.abort_on_exception = true TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end sleep 0.1 until listening ["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end assert_equal sent, received # [] == [] end end
  52. 52. Loaded suite example7 Started E =========================================================================================== Error: test: sent data should be received(MyTestCase): NoMethodError: undefined method `accepto' for #<TCPServer:(closed)> Did you mean? accept example7.rb:14:in `block (3 levels) in <class:MyTestCase>' example7.rb:12:in `open' example7.rb:12:in `block (2 levels) in <class:MyTestCase>' =========================================================================================== Finished in 0.0046 seconds. ------------------------------------------------------------------------------------------- 1 tests, 0 assertions, 0 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 217.39 tests/s, 0.00 assertions/s sleeping = false Thread.abort_on_exception = true Thread.new{ sleep 0.1 until sleeping ; raise "yay" } begin sleeping = true sleep 5 rescue => e p(here: "rescue in main thread", error: e) end p "foo!"
  53. 53. Thread in Ruby: Process crash from errors in threads • Middleware SHOULD NOT crash as far as possible :) • An error from a TCP connection MUST NOT crash the whole process • Many points to raise errors... • Socket I/O, Executing commands • Parsing HTTP requests, Parsing JSON (or other formats) • Process • should crash in tests, but • should not in production
  54. 54. Thread in Ruby: What needed in your code about threads • Set Thread#abort_on_exception = true • for almost all threads... • "rescue" all errors in threads • to log these errors, and not to crash whole process • "raise" rescued errors again only in testing • to make tests failed for bugs
  55. 55. Lesson 4: Handle Exceptions in Right Way
  56. 56. Wrap-up: Writing Middleware is ...
  57. 57. Writing Middleware: • Taking care about: • various platforms and environment • Resource usage and stability • Requiring to know about: • Ruby's features • Ruby VM's behavior • Library implementation • In different viewpoint from writing applications!
  58. 58. Write your code, like middleware :D Make it efficient & stable! Thank you! @tagomoris
  59. 59. Loaded suite example Started F =========================================================================================== Failure: test: client sends 2 data(MyTest) example.rb:22:in `block in <class:MyTest>' 19: end 20: end 21: => 22: assert_equal(["data 0", "data 1"], list) 23: end 24: end <["data 0", "data 1"]> expected but was <["data 0", "data 1"]> diff: ["data 0", "data 1"] =========================================================================================== Finished in 0.009425 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 106.10 tests/s, 106.10 assertions/s Mac OS X (10.11.16)
  60. 60. Memory Usage: Memory fragmentation • High memory usage, low # of objects • memory fragmentation? • glibc malloc: weak for fine-grained memory allocation and multi threading • Switching to jemalloc by LD_PRELOAD • FreeBSD standard malloc (available on Linux) • fluentd's rpm/deb package uses jemalloc in default
  61. 61. abort_on_exception in detail • It doesn't abort the whole process, actually • it just re-raise errors in main thread sleeping = false Thread.abort_on_exception = true Thread.new{ sleep 0.1 until sleeping ; raise "yay" } begin sleeping = true sleep 5 rescue => e p(here: "rescue in main thread", error: e) end p "foo!" $ ruby example.rb {:here=>"rescue in main thread", :error=>#<RuntimeError: yay>} "foo!"

×