Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GenStage and Flow - Jose Valim

1,111 views

Published on

Elixir Club 5

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

GenStage and Flow - Jose Valim

  1. 1. @elixirlang / elixir-lang.org
  2. 2. GenStage&Flow github.com/elixir-lang/gen_stage
  3. 3. Prelude: Fromeager,
 tolazy,
 toconcurrent,
 todistributed
  4. 4. Exampleproblem: wordcounting
  5. 5. “roses are redn violets are bluen" %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
  6. 6. Eager
  7. 7. File.read!("source") "roses are redn violets are bluen"
  8. 8. File.read!("source") |> String.split("n") ["roses are red", "violets are blue"]
  9. 9. File.read!("source") |> String.split("n") |> Enum.flat_map(&String.split/1) ["roses", “are", “red", "violets", "are", "blue"]
  10. 10. File.read!("source") |> String.split("n") |> Enum.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
  11. 11. Eager • Simple • Efficient for small collections • Inefficient for large collections with multiple passes
  12. 12. File.read!(“really large file") |> String.split("n") |> Enum.flat_map(&String.split/1)
  13. 13. Lazy
  14. 14. File.stream!("source", :line) #Stream<...>
  15. 15. File.stream!("source", :line) |> Stream.flat_map(&String.split/1) #Stream<...>
  16. 16. File.stream!("source", :line) |> Stream.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
  17. 17. Lazy • Folds computations, goes item by item • Less memory usage at the cost of computation • Allows us to work with large or infinite collections
  18. 18. Concurrent
  19. 19. #Stream<...> File.stream!("source", :line)
  20. 20. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) #Flow<...>
  21. 21. %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1} File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{})
  22. 22. Flow • We give up ordering and process locality for concurrency • Tools for working with bounded and unbounded data
  23. 23. Flow • It is not magic! There is an overhead when data flows through processes • Requires volume and/or cpu/io-bound work to see benefits
  24. 24. Flowstats • ~1200 lines of code (LOC) • ~1300 lines of documentation (LOD)
  25. 25. Topics • 1200LOC: How is flow implemented? • 1300LOD: How to reason about flows?
  26. 26. GenStage
  27. 27. GenStage • It is a new behaviour • Exchanges data between stages transparently with back-pressure • Breaks into producers, consumers and producer_consumers
  28. 28. GenStage producer consumer producer producer consumer producer consumer consumer
  29. 29. GenStage:Demand-driven BA Producer Consumer 1. consumer subscribes to producer 2. consumer sends demand 3. producer sends events Asks 10 Sends max 10 Subscribes
  30. 30. B CA Asks 10Asks 10 Sends max 10 Sends max 10 GenStage:Demand-driven
  31. 31. • It is a message contract • It pushes back-pressure to the boundary • GenStage is one impl of this contract GenStage:Demand-driven
  32. 32. GenStage example
  33. 33. BA Printer (Consumer) Counter (Producer)
  34. 34. defmodule Producer do use GenStage def init(counter) do {:producer, counter} end def handle_demand(demand, counter) when demand > 0 do events = Enum.to_list(counter..counter+demand-1) {:noreply, events, counter + demand} end end
  35. 35. state demand handle_demand 0 10 {:noreply, [0, 1, …, 9], 10} 10 5 {:noreply, [10, 11, 12, 13, 14], 15} 15 5 {:noreply, [15, 16, 17, 18, 19], 20}
  36. 36. defmodule Consumer do use GenStage def init(:ok) do {:consumer, :the_state_does_not_matter} end def handle_events(events, _from, state) do Process.sleep(1000) IO.inspect(events) {:noreply, [], state} end end
  37. 37. {:ok, counter} = GenStage.start_link(Producer, 0) {:ok, printer} = GenStage.start_link(Consumer, :ok) GenStage.sync_subscribe(printer, to: counter) (wait 1 second) [0, 1, 2, ..., 499] (500 events) (wait 1 second) [500, 501, 502, ..., 999] (500 events)
  38. 38. Subscribeoptions • max_demand: the maximum amount of events to ask (default 1000) • min_demand: when reached, ask for more events (default 500)
  39. 39. max_demand:10,min_demand:0 1. the consumer asks for 10 items 2. the consumer receives 10 items 3. the consumer processes 10 items 4. the consumer asks for 10 more 5. the consumer waits
  40. 40. max_demand:10,min_demand:5 1. the consumer asks for 10 items 2. the consumer receives 10 items 3. the consumer processes 5 of 10 items 4. the consumer asks for 5 more 5. the consumer processes the remaining 5
  41. 41. GenStage dispatchers
  42. 42. Dispatchers • Per producer • Effectively receive the demand and send data • Allow a producer to dispatch to multiple consumers at once
  43. 43. prod 1 2,5 3 4 1,2,3,4,5 DemandDispatcher
  44. 44. prod 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 BroadcastDispatcher
  45. 45. prod 1,5 2,6 3 4 1,2,3,4,5,6 PartitionDispatcher rem(event, 4)
  46. 46. http://bit.ly/genstage
  47. 47. Topics • 1200LOC: How is flow implemented? • Its core is a 80LOC stage • 1300LOD: How to reason about flows?
  48. 48. Flow
  49. 49. “roses are redn violets are bluen" %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
  50. 50. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) #Flow<...>
  51. 51. %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1} File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{})
  52. 52. File.stream!("source", :line) |> Flow.from_enumerable() Producer
  53. 53. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) Producer Stage 1 Stage 2 Stage 3 Stage 4 DemandDispatcher
  54. 54. "blue" Producer Stage 1 Stage 2 Stage 3 Stage 4 "roses are red" "roses""are""red" "violets are blue" "violets""are"
  55. 55. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) Producer Stage 1 Stage 2 Stage 3 Stage 4
  56. 56. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) Producer Stage 1 Stage 2 Stage 3 Stage 4
  57. 57. Producer Stage 1 Stage 2 Stage 3 Stage 4 "roses are red" %{“are” => 1, “red” => 1, “roses" => 1} "violets are blue" %{“are” => 1, “blue” => 1, “violets" => 1}
  58. 58. Producer Stage 1 Stage 2 Stage 3 Stage 4 %{“are” => 1, “red” => 1, “roses" => 1} %{“are” => 1, “blue” => 1, “violets" => 1}
  59. 59. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) Producer Stage 1 Stage 2 Stage 3 Stage 4 Stage A Stage B Stage C Stage D
  60. 60. Stage CStage A Stage B Stage D Stage 1 "roses""are" Stage 4 %{“are” => 1} "red" Producer"roses are red" %{“roses” => 1} "violets are blue" %{“are” => 1, “red” => 1} Stage 2 Stage 3 %{“are” => 2, “red” => 1} "are"
  61. 61. Mapper 4 Reducer C Producer Reducer A Reducer B Reducer D Mapper 1 Mapper 2 Mapper 3 DemandDispatcher PartitionDispatcher
  62. 62. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{}) • reduce/3 collects all data into maps • when it is done, the maps are streamed • into/2 collects the state into a map
  63. 63. Windowsandtriggers • If reduce/3 runs until all data is processed… what happens on unbounded data? • See Flow.Window documentation.
  64. 64. Postlude: Fromeager,
 tolazy,
 toconcurrent,
 todistributed
  65. 65. Enum(eager) File.read!("source") |> String.split("n") |> Enum.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end)
  66. 66. Stream(lazy) File.stream!("source", :line) |> Stream.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end)
  67. 67. Flow(concurrent) File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{})
  68. 68. Flowfeatures • Provides map and reduce operations, partitions, flow merge, flow join • Configurable batch size (max & min demand) • Data windowing with triggers and watermarks
  69. 69. Distributed? • Flow API has feature parity with frameworks like Apache Spark • However, there is no distribution nor execution guarantees
  70. 70. CHEN, Y., ALSPAUGH, S., AND KATZ, R. Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads “small inputs are common in practice: 40–80% of Cloudera customers’ MapReduce jobs and 70% of jobs in a Facebook trace have ≤ 1GB of input”
  71. 71. GOG, I., SCHWARZKOPF, M., CROOKS, N., GROSVENOR, M. P., CLEMENT, A., AND HAND, S. Musketeer: all for one, one for all in data processing systems. “For between 40-80% of the jobs submitted to MapReduce systems, you’d be better off just running them on a single machine”
  72. 72. Distributed? • Single machine matters - try it! • The gap between concurrent and distributed in Elixir is small • Durability concerns will be tackled next
  73. 73. Thank-yous
  74. 74. Libraryauthors • Give GenStage a try • Build your own producers • TIP: use @behaviour GenStage if you want to make it an optional dep
  75. 75. Inspirations • Akka Streams - back pressure contract • Apache Spark - map reduce API • Apache Beam - windowing model • Microsoft Naiad - stage notifications
  76. 76. TheTeam • The Elixir Core Team • Specially James Fish (fishcakez) • And Eric and Chris for therapy sessions
  77. 77. consulting and software engineering Builtanddesignedat
  78. 78. consulting and software engineering Elixircoaching Elixirdesignreview Customdevelopment
  79. 79. @elixirlang / elixir-lang.org

×