This document summarizes how to build concurrent and parallel workflows using Elixir Flow. It shows how to transition from an eager sequential implementation to a lazy stream-based one, then introduce concurrency using GenStage. Finally, it demonstrates partitioning work into parallel tasks across multiple cores to achieve a 5x speedup. Key aspects covered include producers, consumers, and instrumentation of parallel jobs. Optimizing the partitioning strategy further improved performance by an additional 15x.
4. Producer
defmodule GenstageExample.Producer do
use GenStage
def start_link(initial 0) do
GenStage.start_link(__MODULE__, initial,
name: __MODULE__)
end
def init(counter), do: {:producer, counter}
def handle_demand(demand, state) do
events = Enum.to_list(state..(state +
demand - 1))
{:noreply, events, state + demand}
end
end
5. Producer-Consumer
defmodule GenstageExample.ProducerConsumer do
use GenStage
require Integer
def start_link do
GenStage.start_link(__MODULE__, :state_doesnt_matter, name: __MODULE__)
end
def init(state) do
{:producer_consumer, state, subscribe_to: [GenstageExample.Producer]}
end
def handle_events(events, _from, state) do
numbers =
events
|> Enum.filter(&Integer.is_even/1)
{:noreply, numbers, state}
end
end
6. Consumer
defmodule GenstageExample.Consumer do
use GenStage
def start_link do
GenStage.start_link(__MODULE__, :state_doesnt_matter)
end
def init(state) do
{:consumer, state, subscribe_to:
[GenstageExample.ProducerConsumer]}
end
def handle_events(events, _from, state) do
for event <- events do
IO.inspect({self(), event, state})
end
# As a consumer we never emit events
{:noreply, [], state}
end
end
7. What is Flow?
Flow allows developers to express
computations on collections, similar
to the Enum and Stream modules,
although computations will be
executed in parallel using multiple
GenStages.
Hi everyone!
I’m going to be talking about how you can easily build concurrent, parallel workflows using Flow and I’ll be going into some of the details of instrumenting and tuning a Flow.. flow?
I’m from Precision Nutrition – we do online nutrition and fitness coaching.
We’re mostly an Ember+Rails shop, but we’re starting to use Elixir more and more.
We’ve had our url shortener (get.pn) running on elixir in production for over a year and we’re in the process of extracting all financial/billing logic out of our app into a separate elixir/phoenix payment processing system.
Today I’ll be going over a basic intro on Flow but I’ll also get into a specific case study of where we’ve used Flow at PN.
Before we get into Flow, it helps to understand GenStage because Flow is built on top of it.
Here’s Jose Valim – the creator of Elixir – presenting on GenStage. This is going to be a tad dry/painful so bear with me because Flow will make this super awesome.
The gist is that you have several “stages”, with work moving from stage to stage, but rather than each stage pushing completing work to the next stage, it works in reverse.
Each stage requests batches of work from the previous stage. GenStage calls this “Demand”.
Each stage can be a producer, a consumer or a producer consumer.
Flow is a useful subset of the Enum and Stream API.
This is great because it let’s you use comfortable, high level abstractions to write parallel, concurrent code.
Moving forward, let’s use a real world example of where Flow is handy.
At it’s core, our software is a learning management system – filled with content.
We also have a simple link checker written in elixir that we use to periodically check every link in our content.
This talk is about how I built the fastest link checker, ever.. And you can too!
The fact that Enum, Stream and Flow share a nearly common API means that you can more or less develop as you would normally (with a few caveats) and then later “Flow your code”.
So – why does it take 23 minutes?
We can see that it does all of each step before proceeding to the next.
Then it is done all cpu and gets stuck on slowly checking each link, one by one.
Another way to see this is to instrument the code. There’s a great article by Tymon Toblski on this along with a simple “Progress” module that you can use to dump out timing for each step.
Using this, I can just choose a few points in the code and drop in a progress call.
A few caveats – avoid dropping rows.
If we graph the output we can see a huge spike and that explains what we’re seeing when we like at system resource utilization.
The next step is to go from eager to lazy.
The Stream API is very similar so for the most part this just involves swapping out an import or changing Enum references to String as well as messing with the input and output steps.
Interestingly enough, this actually takes longer because we’re still all in a single process.. All we’ve done is reduce memory use because we aren’t holding everything in memory at once.
And we can see that instead of a spike, we have these nice curves.
Before we run it, I want to tell you about FlowViz! I published this early in the week and basically took Tyrone’s work – made it more performant by using delayed writes and wired it up with Gnuplot to give realtime performance plotting.