Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Designing scalable application: from umbrella project to distributed system - Anton Mishchuk

78 views

Published on

Elixir Club 8
Peremoga Space, Kyiv
21.10.2017

Published in: Technology
  • Be the first to comment

Designing scalable application: from umbrella project to distributed system - Anton Mishchuk

  1. 1. Designing scalable application: from umbrella project to distributed system Elixir Club 8, Kyiv, October 21, 2017
  2. 2. HELLO! I am Anton Mishchuk - Ruby/Elixir developer at Matic Insurance Services we are hiring! - A big fan of Elixir programming language - Author and maintainer of: ESpec and Flowex libraries github: antonmi 2
  3. 3. BEAM Applications BEAM VM Application GS S GS GS GS Application GS S GS GS GS Application GS S GS GS GS 3
  4. 4. Intro ○ Monolith or micro-services ○ BEAM approach ○ Umbrella project ○ Inter-service communication 4
  5. 5. Contents ○ “ML Tools” demo project ○ Three levels of code organization ○ Interfaces modules ○ Scaling to distributed system ○ Limiting concurrency 5
  6. 6. ML Tools ○ Stands for “Machine Learning Tools” ○ One can find it here: □ https://github.com/antonmi/ml_tools ○ Umbrella project with 4 applications: □ Datasets □ Models □ Utils □ Main 6
  7. 7. Umbrella project Monolith and microservices at the same time: ○ All applications are in one place ○ Easy to develop and test ○ Service-oriented architecture ○ Ready to scale 7
  8. 8. Code organization ○ Service level ○ Context level ○ Implementation level 8
  9. 9. Service level Main Datasets Models Utils 9
  10. 10. Context and implementation (datasets) datasets/ lib/ datasets/ fetchers/ fetchers.ex aws.ex kaggle.ex collections/ ○ Datasets.Fetchers ○ Datasets.Fetchers.Aws ○ Datasets.Fetchers.Kaggle 10
  11. 11. Context and implementation (models) models/ lib/ models/ lm/ lm.ex engine.ex rf/ rf.ex engine.ex ○ Models.Lm ○ Models.Lm.Engine ○ Models.Rf ○ Models.Rf.Engine 11
  12. 12. Context and implementation (utils) utils/ lib/ utils/ pre_processing/ pre_processing.ex filter.ex normalizer.ex visualization/ ... ○ Utils.PreProcessing ○ Utils.PreProcessing.Filter ○ Utils.PreProcessing.Normalizer 12
  13. 13. Main application defp deps do [ {:datasets, in_umbrella: true}, {:models, in_umbrella: true}, {:utils, in_umbrella: true} ] end 13
  14. 14. Main application defmodule Main.Zillow do def rf_fit do Datasets.Fetchers.zillow_data |> Utils.PrePorcessing.normalize_data |> Models.Rf.fit_model end end 14
  15. 15. What’s wrong with this code? defmodule Main.Zillow do def rf_fit do Datasets.Fetchers.zillow_data |> Utils.PreProcessing.normalize_data |> Models.Rf.fit_model end end 15
  16. 16. Encapsulation problem ○ Modules, functions and that’s it! ○ Main application has access to all public functions in all the applications ○ Micro-services -> tightly coupled monolith 16
  17. 17. Interfaces Modules ○ Specific modules in the application that implements public interface ○ Grouped by context ○ Other application must use only functions from this modules 17
  18. 18. Just delegate defmodule Datasets.Interfaces.Fetchers do alias Datasets.Fetchers defdelegate zillow_data, to: Fetchers defdelegate landsat_data, to: Fetchers end 18
  19. 19. Use Interfaces only! def rf_fit do Datasets.Interfaces.Fetchers.zillow_data |> Utils.Interfaces.PreProcessing.normalize_data |> Models.Interfaces.Rf.fit_model end 19
  20. 20. Interfaces Modules ○ It’s just a convention ○ But very important convention ○ It creates a good basis for future scaling 20
  21. 21. Let’s scale! ○ We decide to run each of the applications on a different node ○ Modules from one application will be not accessible in another one ○ But we still want keep all the things in one place 21
  22. 22. Interface applications Main Datasets Models Utils DatasetsInterface ModelsInterface UtilsInterface 22
  23. 23. Interface application models_interface/ config/ lib/ models_interface/ models_interface.ex lm.ex rf.ex mix.ex 23
  24. 24. Inter-Application Communication ○ RPC (for ‘models’ app) ○ Distributed tasks (for ‘datasets’ app) ○ HTTP (for ‘utils’ app) 24
  25. 25. RPC ○ Remote Procedure Call ○ :rpc.call(node, module, fun, args) ○ BEAM handles all the serialization and deserialization stuff 25
  26. 26. ModelsInterface.Rf defmodule ModelsInterface.Rf do def fit_model(data) do ModelsInterface.remote_call( Models.Interfaces.Rf, :fit_model, [data] ) end end 26
  27. 27. ModelsInterface defmodule ModelsInterface do def remote_call(module, fun, args, env Mix.env) do do_remote_call({module, fun, args}, env) end defp do_remote_call({module, fun, args}, :test) do apply(module, fun, args) end defp do_remote_call({module, fun, args}, _) do :rpc.call(remote_node(), module, fun, args) end end 27
  28. 28. Quick refactoring ○ Replace Models.Interfaces with ModelsInterface and that’s it ○ You continue develop with the same speed ○ Nothing to change in tests 28
  29. 29. What about tests defp do_remote_call({module, fun, args}, :test) do apply(module, fun, args) end defp do_remote_call({module, fun, args}, _) do :rpc.call(remote_node(), module, fun, args) end 29
  30. 30. What about tests defp deps do [ ... {:models, in_umbrella: true, only: [:test]}, {:models_interface, in_umbrella: true}, ... ] end 30
  31. 31. RPC cons ○ Synchronous calls from “local” process ○ Needs additional logic for asynchronous execution 31
  32. 32. Distributed tasks ○ Build on top of Elixir Task ○ Task are spawned on remote node ○ Supervisor controls the tasks 32
  33. 33. Start Supervisor defmodule Datasets.Application do ... def start(_type, _args) do children = [ supervisor(Task.Supervisor, [[name: Datasets.Task.Supervisor]], [restart: :temporary, shutdown: 10000]) ] ... end end 33
  34. 34. DatasetsInterface defmodule DatasetsInterface do ... defp do_spawn_task({module, fun, args}, _) do Task.Supervisor.async(remote_supervisor(), module, fun, args) |> Task.await end defp remote_supervisor, do: #config data end 34
  35. 35. Erlang Distribution Protocol Pros: ○ Battle-tested over decades ○ Free serialization / deserialization Cons: ○ Not very secure ○ No very fast ○ BEAM specific 35
  36. 36. HTTP UtilsInterface defmodule UtilsInterface do ... defp do_remote_call({module, fun, args}, _) do {:ok, resp} = HTTPoison.post(remote_url(), serialize({module, fun, args})) deserialize(resp.body) end ... end 36
  37. 37. Plug in Utils app defmodule Utils.Interfaces.Plug do use Plug.Router ... post "/remote" do {:ok, body, conn} = Plug.Conn.read_body(conn) {module, fun, args} = deserialize(body) result = apply(module, fun, args) send_resp(conn, 200, serialize(result)) end end 37
  38. 38. Limiting concurrency When do you need this? ○ Third-party API rate limits ○ Heavy calculations per request 38
  39. 39. def start(_type, _args) do pool_opts = [ name: {:local, Models.Interface}, worker_module: Models.Interfaces.Worker, size: 5, max_overflow: 5] children = [ :poolboy.child_spec(Models.Interface, pool_opts, []), ] end poolboy in Models.Application 39
  40. 40. defmodule Models.Interfaces.Worker do use GenServer ... def handle_call({module, fun, args}, _from, state) do result = apply(module, fun, args) {:reply, result, state} end end Models.Interfaces.Worker 40
  41. 41. defmodule Models.Interfaces.Rf do def fit_model(data) do with_poolboy({Models.Rf, :fit_model, [data]}) end def with_poolboy(args) do worker = :poolboy.checkout(Models.Interface) result = GenServer.call(worker, args, :infinity) :poolboy.checkin(Models.Interface, worker) result end end Models.Interfaces.Rf 41
  42. 42. Conclusion ○ Start with micro-services from the very beginning ○ Think carefully about app interface ○ Use EDP for communication when scale ○ Keep “communication” code in separate app 42
  43. 43. THANKS! Ask me questions! 43

×