SlideShare a Scribd company logo
1 of 34
Download to read offline
PSUG #52
Dataflow and simplified reactive
programming with Akka-streams



Stéphane Manciot
26/02/2015
Plan
○ Dataflow main features
○ Reactive Streams
○ Akka-streams - Pipeline Dataflow
○ Akka-streams - working with graphs
Dataflow - main features
Dataflow Nodes and data
○ A node is a processing element that takes inputs,
does some operation and returns the results on its
outputs
○ Data (referred to as tokens) flows between nodes
through arcs
○ The only way for a node to send and receive data is
through ports
○ A port is the connection point between an arc and a
node
Dataflow Arcs
○ Always connects one output port to one input port
○ Has a capacity : the maximum amount of tokens
that an arc can hold at any one time
○ Node activation requirements :
○ at least one token waiting on the input arc
○ space available on the output arc
○ Reading data from the input arc frees space on the
latter
○ Arcs may contain zero or more tokens as long as it
is less than the arc’s capacity
Dataflow graph
○ Explicitly states the connections between nodes
○ Dataflow graph execution :
○ Node activation requirements are fulfilled
○ A token is consumed from the input arc
○ The node executes
○ If needed, a new token is pushed to the output
arc
Dataflow features - push or pull data
○ Push :
○ nodes send token to other nodes whenever they
are available
○ the data producer is in control and initiates
transmissions
○ Pull :
○ the demand flows upstream
○ allows a node to lazily produce an output only
when needed
○ the data consumer is in control
Dataflow features - mutable or immutable
data
○ splits require to copy mutable data
○ Immutable data is preferred any time parallel
computations exist
Dataflow features - multiple inputs and/or
outputs
○ multiple inputs
○ node “firing rule”
○ ALL of its inputs have tokens (zip)
○ ANY of its inputs have tokens waiting (merge)
○ activation preconditions
○ a firing rule for the node must match the
available input tokens
○ space for new token(s) are available on the
outputs
○ multiple outputs
Dataflow features - compound nodes
○ the ability to create new nodes by using other nodes
○ a smaller data flow program
○ the interior nodes
○ should not know of anything outside of the
compound node
○ must be able to connect to the ports of the parent
compound node
Dataflow features - arc joins and/or splits
○ Fan-in : merge, zip, concat
○ Fan-out : broadcast, balance, unzip
Reactive Streams
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ Blocking calls : queue with a bounded size
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ The Push way
○ unbounded buffer : Out Of Memory error
○ bounded buffer with NACK
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ The reactive way : non-blocking, non-dropping &
bounded
○ data items flow downstream
○ demand flows upstream
○ data items flow only when there is demand
Reactive Streams - dynamic push / pull model
○ “push” behaviour when Subscriber is faster
○ “pull” behaviour when Publisher is faster
○ switches automatically between those behaviours in
runtime
○ batching demand allows batching data
Reactive Streams - back-pressure
Reactive Streams - back-pressure
Reactive Streams - SPI
trait Publisher[T] {

def subscribe(sub: Subscriber[T]): Unit

}



trait Subscription{

def requestMore(n: Int): Unit

def cancel(): Unit

}



trait Subscriber[T] {

def onSubscribe(s: Subscription): Unit

def onNext(elem: T): Unit

def onError(thr: Throwable): Unit

def onComplete(): Unit

}

Reactive Streams - asynchronous non-blocking protocol
Akka Streams - Pipeline Dataflow
Akka Streams - Core Concepts
○ a DSL implementation of Reactive Streams relying
on Akka actors
○ Stream which involves moving and/or transforming
data
○ Element : the processing unit of streams
○ Processing Stage : building blocks that build up a
Flow or FlowGraph (map(), filter(), transform(),
junctions …)
Akka Streams - Pipeline Dataflow
○ Source : a processing stage with exactly one
output, emitting data elements when downstream
processing stages are ready to receive them
○ Sink : a processing stage with exactly one input,
requesting and accepting data elements
○ Flow : a processing stage with exactly one input
and output, which connects its up- and downstream
by moving / transforming the data elements flowing
through it
○ Runnable Flow : a Flow that has both ends
“attached” to a Source and Sink respectively
○ Materialized Flow : a Runnable Flow that ran
Akka Streams - Pipeline Dataflow
/**

* Construct the ActorSystem we will use in our application

*/

implicit lazy val system = ActorSystem("DataFlow")



implicit val _ = ActorFlowMaterializer()



val source = Source(1 to 10)



val sink = Sink.fold[Int, Int](0)(_ + _)



val transform = Flow[Int].map(_ + 1).filter(_ % 2 == 0)



// connect the Source to the Sink, obtaining a runnable flow

val runnable: RunnableFlow = source.via(transform).to(sink)



// materialize the flow

val materialized: MaterializedMap = runnable.run()



implicit val dispatcher = system.dispatcher



// retrieve the result of the folding process over the stream

val result: Future[Int] = materialized.get(sink)

result.foreach(println)
Akka Streams - Pipeline Dataflow
○ Processing stages are immutable (connecting them
returns a new processing stage)
val source = Source(1 to 10)

source.map(_ => 0) // has no effect on source, since it’s immutable

source.runWith(Sink.fold(0)(_ + _)) // 55
○ A stream can be materialized multiple times
// connect the Source to the Sink, obtaining a RunnableFlow

val sink = Sink.fold[Int, Int](0)(_ + _)

val runnable: RunnableFlow = Source(1 to 10).to(sink)

// get the materialized value of the FoldSink

val sum1: Future[Int] = runnable.run().get(sink)

val sum2: Future[Int] = runnable.run().get(sink)

// sum1 and sum2 are different Futures!

for(a <- sum1; b <- sum2) yield println(a + b) //110
○ Processing stages preserve input order of elements
Akka Streams - working with graphs
Akka Streams - bulk export to es
val bulkSize: Int = 100



val source: Source[String] = Source(types.toList)



val sum: Sink[Int] = FoldSink[Int, Int](0)(_ + _)



val g:FlowGraph = FlowGraph{ implicit builder =>

import FlowGraphImplicits._



val walk: Flow[String, Companies2Export] = Flow[String]

.map(fileTree(_).toList)

.mapConcat[Companies2Export](identity)



val parse: Flow[Companies2Export, List[Company]] = Flow[Companies2Export]

.transform(() => new LoggingStage("ExportService"))

.map[List[Company]](f => parseFile(f.source, index, f.`type`).toList)



val flatten: Flow[List[Company], Company] = Flow[List[Company]].mapConcat[Company](identity)



val group: Flow[Company, Seq[Company]] = Flow[Company].grouped(bulkSize)



val toJson: Flow[Seq[Company], String] = Flow[Seq[Company]]

.map(_.map(_.toBulkIndex(index.name)).mkString(crlf) + crlf)



val bulkIndex: Flow[String, EsBulkResponse] = Flow[String]

.mapAsyncUnordered[EsBulkResponse](esBulk(index.name, _))



val count: Flow[EsBulkResponse, Int] = Flow[EsBulkResponse].map[Int]((b)=>{

logger.debug(s"index ${b.items.size} companies within ${b.took} ms")

b.items.size

})



// connect the graph

source ~> walk ~> parse ~> flatten ~> group ~> toJson ~> bulkIndex ~> count ~> sum

}



val result: Future[Int] = g.runWith(sum)



result.foreach(c => logger.info(s"*** total companies indexed: $c"))



result.onComplete(_aliases(index.name, before))
Akka Streams - bulk export to es
Akka Streams - frequent item set
Akka Streams - frequent item setval g:FlowGraph = FlowGraph{ implicit builder =>

import FlowGraphImplicits._



val source = Source.single(sorted)



import com.sksamuel.elastic4s.ElasticDsl._



val toJson = Flow[List[Seq[Long]]].map(_.map(seq => {

val now = Calendar.getInstance().getTime

val uuid = seq.mkString("-")

update(uuid)

.in(s"${esInputStore(store)}/CartCombination")

.upsert(

"uuid" -> uuid,

"combinations" -> seq.map(_.toString),

"counter" -> 1,

"dateCreated" -> now,

"lastUpdated" -> now)

.script("ctx._source.counter += count;ctx._source.lastUpdated = now")

.params("count" -> 1, "now" -> now)

}))



val flatten = Flow[List[BulkCompatibleDefinition]].mapConcat[BulkCompatibleDefinition](identity)



val count = Flow[BulkResponse].map[Int]((resp)=>{

val nb = resp.getItems.length

logger.debug(s"index $nb combinations within ${resp.getTookInMillis} ms")

nb

})



import EsClient._



source ~> combinationsFlow ~> toJson ~> flatten ~> bulkBalancedFlow() ~> count ~> sum

}



val start = new Date().getTime

val result: Future[Int] = g.runWith(sum)



result.foreach(c => logger.debug(s"*** $c combinations indexed within ${new Date().getTime - start} ms"))
Akka Streams - frequent item setval combinationsFlow = Flow(){implicit b =>

import FlowGraphImplicits._



val undefinedSource = UndefinedSource[Seq[Long]]

val undefinedSink = UndefinedSink[List[Seq[Long]]]



undefinedSource ~> Flow[Seq[Long]].map[List[Seq[Long]]](s => {

val combinations : ListBuffer[Seq[Long]] = ListBuffer.empty

1 to s.length foreach {i => combinations ++= s.combinations(i).toList}

combinations.toList

}) ~> undefinedSink



(undefinedSource, undefinedSink)

}
def bulkBalancedFlow(bulkSize: Int = Settings.ElasticSearch.bulkSize, balanceSize: Int = 2) = 

Flow(){implicit b =>

import FlowGraphImplicits._



val in = UndefinedSource[BulkCompatibleDefinition]

val group = Flow[BulkCompatibleDefinition].grouped(bulkSize)

val bulkUpsert = Flow[Seq[BulkCompatibleDefinition]].map(bulk)

val out = UndefinedSink[BulkResponse]



if(balanceSize > 1){



val balance = Balance[Seq[BulkCompatibleDefinition]]

val merge = Merge[BulkResponse]



in ~> group ~> balance

1 to balanceSize foreach { _ =>

balance ~> bulkUpsert ~> merge

}

merge ~> out



}

else{

in ~> group ~> bulkUpsert ~> out

}



(in, out)

}
Akka Streams - frequent item set
Akka Streams - frequent item set
def fis(store: String, productId: String, frequency: Double = 0.2): Future[(Seq[String], Seq[String])] = {

implicit val _ = ActorFlowMaterializer()



import com.mogobiz.run.learning._



val source = Source.single((productId, frequency))



val extract = Flow[(Option[CartCombination], Option[CartCombination])].map((x) => {

(

x._1.map(_.combinations.toSeq).getOrElse(Seq.empty),

x._2.map(_.combinations.toSeq).getOrElse(Seq.empty))

})



val exclude = Flow[(Seq[String], Seq[String])]

.map(x => (x._1.filter(_ != productId), x._2.filter(_ != productId)))



val head = Sink.head[(Seq[String], Seq[String])]



val runnable:RunnableFlow = source

.transform(() => new LoggingStage[(String, Double)]("Learning"))

.via(frequentItemSets(store))

.via(extract)

.via(exclude)

.to(head)



runnable.run().get(head)

}
def frequentItemSets(store: String) = Flow() { implicit builder =>

import FlowGraphImplicits._



val undefinedSource = UndefinedSource[(String, Double)]

val broadcast = Broadcast[Option[(String, Long)]]

val zip = Zip[Option[CartCombination], Option[CartCombination]]

val undefinedSink = UndefinedSink[(Option[CartCombination], Option[CartCombination])]



undefinedSource ~> calculateSupportThreshold(store) ~> broadcast

broadcast ~> loadMostFrequentItemSet(store) ~> zip.left

broadcast ~> loadLargerFrequentItemSet(store) ~> zip.right

zip.out ~> undefinedSink



(undefinedSource, undefinedSink)

}
Akka Streams - frequent item set
def calculateSupportThreshold(store:String) = Flow[(String, Double)].map { tuple =>

load[CartCombination](esInputStore(store), tuple._1).map { x =>

Some(tuple._1, Math.ceil(x.counter * tuple._2).toLong)

}.getOrElse(None)

}



def loadMostFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>

searchAll[CartCombination](cartCombinations(store, x) sort {

by field "counter" order SortOrder.DESC

}).headOption).getOrElse(None))



def loadLargerFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>

searchAll[CartCombination](cartCombinations(store, x) sort {

by script "doc['combinations'].values.size()" order SortOrder.DESC

}).headOption).getOrElse(None))




More Related Content

What's hot

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingSantosh Sahoo
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark StreamingKnoldus Inc.
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internalsSigmoid
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineTianlun Zhang
 

What's hot (20)

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internals
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engine
 

Viewers also liked

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey ClientMichal Gajdos
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsNLJUG
 
Reactive streams
Reactive streamsReactive streams
Reactive streamscodepitbull
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQmkiedys
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Izzet Mustafaiev
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonClaudiu Barbura
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingAhmed Soliman
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Björn Antonsson
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)Claudiu Barbura
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsDejan Glozic
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM DeploymentJoe Kutner
 

Viewers also liked (13)

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
Reactive streams
Reactive streamsReactive streams
Reactive streams
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQ
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function Programming
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factors
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment
 

Similar to PSUG #52 Dataflow and simplified reactive programming with Akka-streams

Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTPRoland Kuhn
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a clusterGal Marder
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRSMilan Das
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objectsMikhail Girkin
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...DataWorks Summit
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent
 
Streaming data to s3 using akka streams
Streaming data to s3 using akka streamsStreaming data to s3 using akka streams
Streaming data to s3 using akka streamsMikhail Girkin
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxInfluxData
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne Chauchot
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsKevin Webber
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsDean Wampler
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Satalia
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsKonrad Malawski
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flinkRenato Guimaraes
 

Similar to PSUG #52 Dataflow and simplified reactive programming with Akka-streams (20)

Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRS
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objects
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Streaming data to s3 using akka streams
Streaming data to s3 using akka streamsStreaming data to s3 using akka streams
Streaming data to s3 using akka streams
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for Flux
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runner
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 

More from Stephane Manciot

Des principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreDes principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreStephane Manciot
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Stephane Manciot
 
DevOps avec Ansible et Docker
DevOps avec Ansible et DockerDevOps avec Ansible et Docker
DevOps avec Ansible et DockerStephane Manciot
 
De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013Stephane Manciot
 

More from Stephane Manciot (6)

Des principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreDes principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvre
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
 
DevOps avec Ansible et Docker
DevOps avec Ansible et DockerDevOps avec Ansible et Docker
DevOps avec Ansible et Docker
 
Docker / Ansible
Docker / AnsibleDocker / Ansible
Docker / Ansible
 
Ansible - Introduction
Ansible - IntroductionAnsible - Introduction
Ansible - Introduction
 
De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

PSUG #52 Dataflow and simplified reactive programming with Akka-streams

  • 1. PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
 Stéphane Manciot 26/02/2015
  • 2. Plan ○ Dataflow main features ○ Reactive Streams ○ Akka-streams - Pipeline Dataflow ○ Akka-streams - working with graphs
  • 3. Dataflow - main features
  • 4. Dataflow Nodes and data ○ A node is a processing element that takes inputs, does some operation and returns the results on its outputs ○ Data (referred to as tokens) flows between nodes through arcs ○ The only way for a node to send and receive data is through ports ○ A port is the connection point between an arc and a node
  • 5. Dataflow Arcs ○ Always connects one output port to one input port ○ Has a capacity : the maximum amount of tokens that an arc can hold at any one time ○ Node activation requirements : ○ at least one token waiting on the input arc ○ space available on the output arc ○ Reading data from the input arc frees space on the latter ○ Arcs may contain zero or more tokens as long as it is less than the arc’s capacity
  • 6. Dataflow graph ○ Explicitly states the connections between nodes ○ Dataflow graph execution : ○ Node activation requirements are fulfilled ○ A token is consumed from the input arc ○ The node executes ○ If needed, a new token is pushed to the output arc
  • 7. Dataflow features - push or pull data ○ Push : ○ nodes send token to other nodes whenever they are available ○ the data producer is in control and initiates transmissions ○ Pull : ○ the demand flows upstream ○ allows a node to lazily produce an output only when needed ○ the data consumer is in control
  • 8. Dataflow features - mutable or immutable data ○ splits require to copy mutable data ○ Immutable data is preferred any time parallel computations exist
  • 9. Dataflow features - multiple inputs and/or outputs ○ multiple inputs ○ node “firing rule” ○ ALL of its inputs have tokens (zip) ○ ANY of its inputs have tokens waiting (merge) ○ activation preconditions ○ a firing rule for the node must match the available input tokens ○ space for new token(s) are available on the outputs ○ multiple outputs
  • 10. Dataflow features - compound nodes ○ the ability to create new nodes by using other nodes ○ a smaller data flow program ○ the interior nodes ○ should not know of anything outside of the compound node ○ must be able to connect to the ports of the parent compound node
  • 11. Dataflow features - arc joins and/or splits ○ Fan-in : merge, zip, concat ○ Fan-out : broadcast, balance, unzip
  • 13. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ Blocking calls : queue with a bounded size
  • 14. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ The Push way ○ unbounded buffer : Out Of Memory error ○ bounded buffer with NACK
  • 15. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ The reactive way : non-blocking, non-dropping & bounded ○ data items flow downstream ○ demand flows upstream ○ data items flow only when there is demand
  • 16. Reactive Streams - dynamic push / pull model ○ “push” behaviour when Subscriber is faster ○ “pull” behaviour when Publisher is faster ○ switches automatically between those behaviours in runtime ○ batching demand allows batching data
  • 17. Reactive Streams - back-pressure
  • 18. Reactive Streams - back-pressure
  • 19. Reactive Streams - SPI trait Publisher[T] {
 def subscribe(sub: Subscriber[T]): Unit
 }
 
 trait Subscription{
 def requestMore(n: Int): Unit
 def cancel(): Unit
 }
 
 trait Subscriber[T] {
 def onSubscribe(s: Subscription): Unit
 def onNext(elem: T): Unit
 def onError(thr: Throwable): Unit
 def onComplete(): Unit
 }

  • 20. Reactive Streams - asynchronous non-blocking protocol
  • 21. Akka Streams - Pipeline Dataflow
  • 22. Akka Streams - Core Concepts ○ a DSL implementation of Reactive Streams relying on Akka actors ○ Stream which involves moving and/or transforming data ○ Element : the processing unit of streams ○ Processing Stage : building blocks that build up a Flow or FlowGraph (map(), filter(), transform(), junctions …)
  • 23. Akka Streams - Pipeline Dataflow ○ Source : a processing stage with exactly one output, emitting data elements when downstream processing stages are ready to receive them ○ Sink : a processing stage with exactly one input, requesting and accepting data elements ○ Flow : a processing stage with exactly one input and output, which connects its up- and downstream by moving / transforming the data elements flowing through it ○ Runnable Flow : a Flow that has both ends “attached” to a Source and Sink respectively ○ Materialized Flow : a Runnable Flow that ran
  • 24. Akka Streams - Pipeline Dataflow /**
 * Construct the ActorSystem we will use in our application
 */
 implicit lazy val system = ActorSystem("DataFlow")
 
 implicit val _ = ActorFlowMaterializer()
 
 val source = Source(1 to 10)
 
 val sink = Sink.fold[Int, Int](0)(_ + _)
 
 val transform = Flow[Int].map(_ + 1).filter(_ % 2 == 0)
 
 // connect the Source to the Sink, obtaining a runnable flow
 val runnable: RunnableFlow = source.via(transform).to(sink)
 
 // materialize the flow
 val materialized: MaterializedMap = runnable.run()
 
 implicit val dispatcher = system.dispatcher
 
 // retrieve the result of the folding process over the stream
 val result: Future[Int] = materialized.get(sink)
 result.foreach(println)
  • 25. Akka Streams - Pipeline Dataflow ○ Processing stages are immutable (connecting them returns a new processing stage) val source = Source(1 to 10)
 source.map(_ => 0) // has no effect on source, since it’s immutable
 source.runWith(Sink.fold(0)(_ + _)) // 55 ○ A stream can be materialized multiple times // connect the Source to the Sink, obtaining a RunnableFlow
 val sink = Sink.fold[Int, Int](0)(_ + _)
 val runnable: RunnableFlow = Source(1 to 10).to(sink)
 // get the materialized value of the FoldSink
 val sum1: Future[Int] = runnable.run().get(sink)
 val sum2: Future[Int] = runnable.run().get(sink)
 // sum1 and sum2 are different Futures!
 for(a <- sum1; b <- sum2) yield println(a + b) //110 ○ Processing stages preserve input order of elements
  • 26. Akka Streams - working with graphs
  • 27. Akka Streams - bulk export to es
  • 28. val bulkSize: Int = 100
 
 val source: Source[String] = Source(types.toList)
 
 val sum: Sink[Int] = FoldSink[Int, Int](0)(_ + _)
 
 val g:FlowGraph = FlowGraph{ implicit builder =>
 import FlowGraphImplicits._
 
 val walk: Flow[String, Companies2Export] = Flow[String]
 .map(fileTree(_).toList)
 .mapConcat[Companies2Export](identity)
 
 val parse: Flow[Companies2Export, List[Company]] = Flow[Companies2Export]
 .transform(() => new LoggingStage("ExportService"))
 .map[List[Company]](f => parseFile(f.source, index, f.`type`).toList)
 
 val flatten: Flow[List[Company], Company] = Flow[List[Company]].mapConcat[Company](identity)
 
 val group: Flow[Company, Seq[Company]] = Flow[Company].grouped(bulkSize)
 
 val toJson: Flow[Seq[Company], String] = Flow[Seq[Company]]
 .map(_.map(_.toBulkIndex(index.name)).mkString(crlf) + crlf)
 
 val bulkIndex: Flow[String, EsBulkResponse] = Flow[String]
 .mapAsyncUnordered[EsBulkResponse](esBulk(index.name, _))
 
 val count: Flow[EsBulkResponse, Int] = Flow[EsBulkResponse].map[Int]((b)=>{
 logger.debug(s"index ${b.items.size} companies within ${b.took} ms")
 b.items.size
 })
 
 // connect the graph
 source ~> walk ~> parse ~> flatten ~> group ~> toJson ~> bulkIndex ~> count ~> sum
 }
 
 val result: Future[Int] = g.runWith(sum)
 
 result.foreach(c => logger.info(s"*** total companies indexed: $c"))
 
 result.onComplete(_aliases(index.name, before)) Akka Streams - bulk export to es
  • 29. Akka Streams - frequent item set
  • 30. Akka Streams - frequent item setval g:FlowGraph = FlowGraph{ implicit builder =>
 import FlowGraphImplicits._
 
 val source = Source.single(sorted)
 
 import com.sksamuel.elastic4s.ElasticDsl._
 
 val toJson = Flow[List[Seq[Long]]].map(_.map(seq => {
 val now = Calendar.getInstance().getTime
 val uuid = seq.mkString("-")
 update(uuid)
 .in(s"${esInputStore(store)}/CartCombination")
 .upsert(
 "uuid" -> uuid,
 "combinations" -> seq.map(_.toString),
 "counter" -> 1,
 "dateCreated" -> now,
 "lastUpdated" -> now)
 .script("ctx._source.counter += count;ctx._source.lastUpdated = now")
 .params("count" -> 1, "now" -> now)
 }))
 
 val flatten = Flow[List[BulkCompatibleDefinition]].mapConcat[BulkCompatibleDefinition](identity)
 
 val count = Flow[BulkResponse].map[Int]((resp)=>{
 val nb = resp.getItems.length
 logger.debug(s"index $nb combinations within ${resp.getTookInMillis} ms")
 nb
 })
 
 import EsClient._
 
 source ~> combinationsFlow ~> toJson ~> flatten ~> bulkBalancedFlow() ~> count ~> sum
 }
 
 val start = new Date().getTime
 val result: Future[Int] = g.runWith(sum)
 
 result.foreach(c => logger.debug(s"*** $c combinations indexed within ${new Date().getTime - start} ms"))
  • 31. Akka Streams - frequent item setval combinationsFlow = Flow(){implicit b =>
 import FlowGraphImplicits._
 
 val undefinedSource = UndefinedSource[Seq[Long]]
 val undefinedSink = UndefinedSink[List[Seq[Long]]]
 
 undefinedSource ~> Flow[Seq[Long]].map[List[Seq[Long]]](s => {
 val combinations : ListBuffer[Seq[Long]] = ListBuffer.empty
 1 to s.length foreach {i => combinations ++= s.combinations(i).toList}
 combinations.toList
 }) ~> undefinedSink
 
 (undefinedSource, undefinedSink)
 } def bulkBalancedFlow(bulkSize: Int = Settings.ElasticSearch.bulkSize, balanceSize: Int = 2) = 
 Flow(){implicit b =>
 import FlowGraphImplicits._
 
 val in = UndefinedSource[BulkCompatibleDefinition]
 val group = Flow[BulkCompatibleDefinition].grouped(bulkSize)
 val bulkUpsert = Flow[Seq[BulkCompatibleDefinition]].map(bulk)
 val out = UndefinedSink[BulkResponse]
 
 if(balanceSize > 1){
 
 val balance = Balance[Seq[BulkCompatibleDefinition]]
 val merge = Merge[BulkResponse]
 
 in ~> group ~> balance
 1 to balanceSize foreach { _ =>
 balance ~> bulkUpsert ~> merge
 }
 merge ~> out
 
 }
 else{
 in ~> group ~> bulkUpsert ~> out
 }
 
 (in, out)
 }
  • 32. Akka Streams - frequent item set
  • 33. Akka Streams - frequent item set def fis(store: String, productId: String, frequency: Double = 0.2): Future[(Seq[String], Seq[String])] = {
 implicit val _ = ActorFlowMaterializer()
 
 import com.mogobiz.run.learning._
 
 val source = Source.single((productId, frequency))
 
 val extract = Flow[(Option[CartCombination], Option[CartCombination])].map((x) => {
 (
 x._1.map(_.combinations.toSeq).getOrElse(Seq.empty),
 x._2.map(_.combinations.toSeq).getOrElse(Seq.empty))
 })
 
 val exclude = Flow[(Seq[String], Seq[String])]
 .map(x => (x._1.filter(_ != productId), x._2.filter(_ != productId)))
 
 val head = Sink.head[(Seq[String], Seq[String])]
 
 val runnable:RunnableFlow = source
 .transform(() => new LoggingStage[(String, Double)]("Learning"))
 .via(frequentItemSets(store))
 .via(extract)
 .via(exclude)
 .to(head)
 
 runnable.run().get(head)
 } def frequentItemSets(store: String) = Flow() { implicit builder =>
 import FlowGraphImplicits._
 
 val undefinedSource = UndefinedSource[(String, Double)]
 val broadcast = Broadcast[Option[(String, Long)]]
 val zip = Zip[Option[CartCombination], Option[CartCombination]]
 val undefinedSink = UndefinedSink[(Option[CartCombination], Option[CartCombination])]
 
 undefinedSource ~> calculateSupportThreshold(store) ~> broadcast
 broadcast ~> loadMostFrequentItemSet(store) ~> zip.left
 broadcast ~> loadLargerFrequentItemSet(store) ~> zip.right
 zip.out ~> undefinedSink
 
 (undefinedSource, undefinedSink)
 }
  • 34. Akka Streams - frequent item set def calculateSupportThreshold(store:String) = Flow[(String, Double)].map { tuple =>
 load[CartCombination](esInputStore(store), tuple._1).map { x =>
 Some(tuple._1, Math.ceil(x.counter * tuple._2).toLong)
 }.getOrElse(None)
 }
 
 def loadMostFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>
 searchAll[CartCombination](cartCombinations(store, x) sort {
 by field "counter" order SortOrder.DESC
 }).headOption).getOrElse(None))
 
 def loadLargerFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>
 searchAll[CartCombination](cartCombinations(store, x) sort {
 by script "doc['combinations'].values.size()" order SortOrder.DESC
 }).headOption).getOrElse(None))