Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Plugin-based software design with Ruby and RubyGems

6,522 views

Published on

Talk at RubyKaigi 2015.
Plugin architecture is known as a technique that brings extensibility to a program. Ruby has good language features for plugins. RubyGems.org is an excellent platform for plugin distribution. However, creating plugin architecture is not as easy as writing code without it: plugin loader, packaging, loosely-coupled API, and performance. Loading two versions of a gem is a unsolved challenge that is solved in Java on the other hand.
I have designed some open-source software such as Fluentd and Embulk. They provide most of functions by plugins. I will talk about their plugin-based architecture.

Published in: Software
  • Be the first to comment

Plugin-based software design with Ruby and RubyGems

  1. 1. Plugin-based software design with Ruby and RubyGems Sadayuki Furuhashi
 Founder & Software Architect RubyKaigi 2015
  2. 2. A little about me… Sadayuki Furuhashi github: @frsyuki Fluentd - Unifid log collection infrastracture Embulk - Plugin-based parallel ETL Founder & Software Architect
  3. 3. It's like JSON. but fast and small. A little about me…
  4. 4. What’s Plugin Architecture?
  5. 5. Benefits of Plugin Architecture > Plugins bring many features > Plugins keep core software simple > Plugins are easy to test > Plugins builds active developer community
  6. 6. Benefits of Plugin Architecture > Plugins bring many features > Plugins keep core software simple > Plugins are easy to test > Plugins builds active developer community > “…if it’s designed well”.
  7. 7. 
 plugin architecture? How to design
  8. 8. 
 plugin architecture? How did I design How to design
  9. 9. Today’s topic > Plugin Architecture Design Patterns > Plugin Architecture of Fluentd > Plugin Architecture of Embulk > Pitfalls & Challenges
  10. 10. Plugin Architecture
 Design Patterns
  11. 11. Plugin Architecture Design Patterns a) Traditional Extensible Software Architecture b) Plugin-based Software Architecture
  12. 12. Traditional Extensible Software Architecture Host Application Plugin Plugin Register plugins to extension points To add more extensibility, add more extension points.
  13. 13. Plugin-based software architecture Core Plugin Plugin Plugin Plugin Plugin Plugin Plugin Application
  14. 14. Plugin-based software architecture • Application as a network of plugins. > Plugins: provide features. > Core: framework to implement plugins. • More flexibility != More complexity. • Application must be designed as modularized. > It’s hard to design :( > Optimizing performance is difficult :( • Loosely-coupled API often makes performance worse.
  15. 15. Design Pattern 1: Dependency Injection Core class interface class interface interface class class A component is an interface or
 a class. Each component publishes API:
  16. 16. Design Pattern 1: Dependency Injection Core class Plugin Plugin Plugin Plugin class Plugin When application runs: A DI container
 replaces objects with plugins when application runs
  17. 17. Replace classes with mocks for unit tests Design Pattern 1: Dependency Injection Core dummy dummy dummy dummy dummy Plugin dummy Testing the application
  18. 18. Dependency Injection (Java) public interface Store { void store(String data); } public class Module { @Inject Module(Store store) { store.store(); } } public class DummyStore implements Store { void store(String data) { } } public class MainModule implements Module { public void configure( Binder binder) { binder.bind(Store.class) .to(DummyStore.class); } } interface → implementation
 mapping From source code, implementation is black box. It’s replaced at runtime.
  19. 19. Dependency Injection (Ruby) Ruby? (What’s a good way to use DI in Ruby?) (Please tell me if you know)
  20. 20. Dependency Injection (Ruby) class Module def initialize(store: DummyStore.new) store.store(”data”) end end class DummyStore def store(data) end end injector = Injector.new. bind(store: DBStore) object = injector.get(Module) class DBStore def initialize(db: DBM.new) @db = db end def store(data) @db.insert(data) end end injector = Injector.new. bind(store: DBStore). bind(db: SqliteDBImpl) object = injector.get(Module) I want to do this: Keyword arguments {:keyword => class} mapping
 at runtime
  21. 21. Design Pattern 2: Dynamic Plugin Loader Core Plugin Plugin Calls Plugin loader to load plugins Plugin Loader
  22. 22. Design Pattern 2: Dynamic Plugin Loader Core Plugin Plugin Plugins also call Plugin Loader. Plugins create an ecosystem. Plugin Loader Plugin Plugin
  23. 23. Design Pattern 3: Combination Core class Plugin class Plugin Plugin class class Plugin Loader Plugin Plugin Plugin Plugin Plugin Dependency Injection + Plugin Loader
  24. 24. Plugin Architecture Design Patterns a) Traditional Extensible Software Architecture b) Plugin-based Software Architecture > Dependency Injection (DI) > Dynamic Plugin Loader > Combination of those There’re trade-offs > Choose the best solution for each project
  25. 25. Plugin Architecture
 of Fluentd
  26. 26. What’s Fluentd? > Data collector for unified logging layer > Streaming data transfer based on JSON > Written in C & Ruby > Plugin Marketplace on RubyGems > http://www.fluentd.org/plugins > Working in production > http://www.fluentd.org/testimonials
  27. 27. Deployment of Fluentd
  28. 28. Deployment of Fluentd
  29. 29. The problems around log collection…
  30. 30. Solution: N × M → N + M plugins
  31. 31. # logs from a file <source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access </source> # logs from client libraries <source> type forward port 24224 </source> # store logs to ES and HDFS <match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type s3 bucket s3-event-archive </store> </match> <match metrics.*> type nagios host watch-server </match>
  32. 32. Example: Simple forwarding
  33. 33. Example: HA & High performance - HA (fail over) - Load-balancing - Choice of at-most-once or at-least-once
  34. 34. Example: Realtime search + Batch Analytics combo All data Hot data
  35. 35. Fluentd Core Event
 Router Input Plugin Output Plugin Filter Plugin Buffer Plugin Output Plugin Input Plugin Plugin Architecture of Fluentd Plugin Loader
  36. 36. Fluentd Core Event
 Router Input Plugin Output Plugin Filter Plugin Buffer Plugin Output Plugin Input Plugin Plugin Marketplace using RubyGems.org $ gem install fluent-plugin-s3 Plugin Loader /gems/ RubyGems.org
  37. 37. Fluentd’s Plugin Architecture • Fluentd is a plugin-based event collector. > Fluentd core: takes care of message routing between plugins. > Plugins: do all other things! • 300+ plugins released on RubyGems.org • Fluentd loads plugins using Gem API.
  38. 38. Plugin Architecture
 of Embulk
  39. 39. Embulk: Open-source Bulk Data Loader written in Java & JRuby
  40. 40. Amazon S3 MySQL FTP CSV Files Access Logs Salesforce.com Elasticsearch Cassandra Hive Redis Reliable framework :-) Parallel execution, transaction, auto guess, …and many by plugins.
  41. 41. Demo
  42. 42. Use case 1: Sync MySQL to Elasticsearch embulk-input-mysql embulk-filter-kuromoji embulk-output-elasticsearch MySQL kuromoji Elasticsearch
  43. 43. Use case 2: Load from S3 to Analytics embulk-parser-csv embulk-decoder-gzip embulk-input-s3 csv.gz on S3 Treasure Data BigQuery Redshift + + embulk-output-td embulk-output-bigquery embulk-output-redshift embulk-executor-mapreduce
  44. 44. Use case 3: Embulk as a Service at Treasure Data
  45. 45. Use case 3: Embulk as a Service at Treasure Data REST API to load/export data to/from Treasure Data
  46. 46. Input Output Embulk’s Plugin Architecture Embulk Core Executor Plugin Filter Filter Guess
  47. 47. Output Embulk’s Plugin Architecture Embulk Core Executor Plugin Filter Filter GuessFileInput Parser Decoder
  48. 48. Guess Embulk’s Plugin Architecture Embulk Core FileInput Executor Plugin Parser Decoder FileOutput Formatter Encoder Filter Filter
  49. 49. Embulk’s Plugin Architecture Embulk Core PluginManager Executor Plugin InjectedPluginSource ParserPlugin JRubyPluginLoader FormatterPlugin JRuby Plugin Loader Plugin FilterPlugin OutputPluginInputPlugin JRuby RuntimeJava Runtime
  50. 50. Plugin Marketplace using RubyGems.org Embulk Core PluginManager Executor Plugin InjectedPluginSource ParserPlugin FormatterPluginFilterPlugin OutputPluginInputPlugin JRuby RuntimeJava Runtime $ embulk gem install embulk-input-oracle /gems/ RubyGems.org JRubyPluginLoader JRuby Plugin Loader Plugin
  51. 51. Plugin Package Structure embulk-input-s3.gem +- build.gradle | +- src/main/java/org/embulk/input/s3 | - S3FileInputPlugin.java | AwsCredentials.java | +- classpath/ | - embulk-input-s3-0.2.6.jar | aws-java-sdk-s3-1.10.33.jar | httpclient-4.3.6.jar | +- lib/embulk/input/ - s3.rb Java source files Compiled jar file All dependent
 jar files Ruby script to
 load the jar files
  52. 52. Embulk Plugin Load Sequence Bundler.setup_environment Embulk::Runner = Embulk::Runner.new( .embulk.EmbulkEmbed::Bootstrap.new.initialize) Embulk::Runner.run(ARGV) Java JRuby Java org.embulk.cli.Main.main(String[] args) { org.jruby.Main.main( "embulk.jar!/embulk/command/embulk_bundle.rb", args); } org.embulk.exec.BulkLoader.run(…) org.embulk.plugin.PluginManager.newPlugin(…)
  53. 53. { jruby = org.jruby.embed.ScriptingContainer() rubyObj = jruby.runScriptlet("Embulk::Plugin") jruby.callMethod(rubyObj, "new_java_input", "s3") } Embulk Plugin Load Sequence def new_java_input(type) rubyPluginClass = lookup(:input, type) return rubyPluginClass.new_java end Java JRuby org.embulk.plugin.PluginManager.newPlugin(…)
  54. 54. Embulk Plugin Load Sequence def new_java jars = Dir["classpath/**/*.jar"] factory = org.embulk.embulk.plugin.PluginClassLoaderFactory.new classloader = factory.create(jars) return classloader.loadClass("org.embulk.input.s3.S3InputPlugin") end Java JRuby PluginClassLoaderFactory.create(URL[] jarPaths) { return new PluginClassLoader(jarPaths); }
  55. 55. Embulk • Embulk is a plugin-based parallel bulk data loader. • Guess plugins suggest you what plugins are necessary, and how to configure the plugins. • Executor plugins run plugins in parallel. • Embulk core takes care of message passing between plugins. • Embulk loads plugins using JRuby and Gem API.
  56. 56. ./embulk.jar $ ./embulk.jar guess example.yml executable jar!
  57. 57. Header of embulk.jar : <<BAT @echo off setlocal set this=%~f0 set java_args= rem ... java %java_args% -jar %this% %args% exit /b %ERRORLEVEL% BAT # ... exec java $java_args -jar "$0" "$@" exit 127 PK...
  58. 58. embulk.jar is a shell script : <<BAT @echo off setlocal set this=%~f0 set java_args= rem ... java %java_args% -jar %this% %args% exit /b %ERRORLEVEL% BAT # ... exec java $java_args -jar "$0" "$@" exit 127 PK... argument of “:” command (heredoc). “:” is a command that does nothing. #!/bin/sh is optional. Empty first line means a shell script. java -jar $0 shell script exits here (following data is ignored)
  59. 59. embulk.jar is a bat file : <<BAT @echo off setlocal set this=%~f0 set java_args= rem ... java %java_args% -jar %this% %args% exit /b %ERRORLEVEL% BAT # ... exec java $java_args -jar "$0" "$@" exit 127 PK... .bat exits here (following lines are ignored) “:” means a comment-line
  60. 60. embulk.jar is a jar file : <<BAT @echo off setlocal set this=%~f0 set java_args= rem ... java %java_args% -jar %this% %args% exit /b %ERRORLEVEL% BAT # ... exec java $java_args -jar "$0" "$@" exit 127 PK... jar (zip) format ignores headers (file entries are in footer)
  61. 61. Pitfalls & Challenges
  62. 62. Pitfalls & Challenges • Plugin version conflicts • Performance impact due to loosely-coupled API
  63. 63. Plugin Version Conflicts Embulk Core Java Runtime aws-sdk.jar v1.9 embulk-input-s3.jar Version conflicts! aws-sdk.jar v1.10 embulk-output-redshift.jar
  64. 64. Multiple Classloaders in JVM Embulk Core Java Runtime aws-sdk.jar v1.9 embulk-input-s3.jar Isolated environments aws-sdk.jar v1.10 embulk-output-redshift.jar Class Loader 1 Class Loader 2
  65. 65. Version conflicts in a JRuby Runtime Embulk Core Java Runtime httpclient 2.5.0 embulk-input-sfdc.gem Version conflicts! httpclient v2.6.0 embulk-input-marketo.gem JRuby Runtime
  66. 66. Java Runtime Multiple JRuby Runtime? Fluentd Core activerecord ~> 3.4 fluentd-plugin-sql.gem Isolated environments? activerecord ~> 4.2 fluent-plugin-presto.gem ? Sub VM 1? Sub VM 2?
  67. 67. Version conflicts in Fluentd Fluentd Core CRuby Runtime activerecord ~> 3.4 fluentd-plugin-sql.gem Version conflicts! activerecord ~> 4.2 fluent-plugin-presto.gem ?
  68. 68. Challenges • Version conflict is not completely solved. • Java can use multiple ClassLoader • I haven’t figured out hot to do the same thing in Ruby • I don’t have clear ideas to solve performance impact • Write more code to learn?
  69. 69. Wrapping Up
  70. 70. “How did I build Plugin Architecture?” • I built Fluentd using dynamic plugin loader. • “Plugin calls Plugins” • Most of features are provided by the ecosystem of plugins. • I built Embulk using combination of: • Dependency Injection, • JRuby to implement a Dynamic Plugin Loader, • Java VM and nested ClassLoaders to load multiple versions of plugins. • But some problems are not solved yet: • Version conflicts in a Ruby VM. • Design patterns of plugins AND high performance.
  71. 71. What’s Next? • You build plugin-based software architecture! • And you’ll talk to me how you did :-) • I’m working on another project: a distributed workflow engine • Java VM + Python Thank You! Sadayuki Furuhashi
 Founder & Software Architect

×