SlideShare a Scribd company logo
PSUG #52
Dataflow and simplified reactive
programming with Akka-streams



Stéphane Manciot
26/02/2015
Plan
○ Dataflow main features
○ Reactive Streams
○ Akka-streams - Pipeline Dataflow
○ Akka-streams - working with graphs
Dataflow - main features
Dataflow Nodes and data
○ A node is a processing element that takes inputs,
does some operation and returns the results on its
outputs
○ Data (referred to as tokens) flows between nodes
through arcs
○ The only way for a node to send and receive data is
through ports
○ A port is the connection point between an arc and a
node
Dataflow Arcs
○ Always connects one output port to one input port
○ Has a capacity : the maximum amount of tokens
that an arc can hold at any one time
○ Node activation requirements :
○ at least one token waiting on the input arc
○ space available on the output arc
○ Reading data from the input arc frees space on the
latter
○ Arcs may contain zero or more tokens as long as it
is less than the arc’s capacity
Dataflow graph
○ Explicitly states the connections between nodes
○ Dataflow graph execution :
○ Node activation requirements are fulfilled
○ A token is consumed from the input arc
○ The node executes
○ If needed, a new token is pushed to the output
arc
Dataflow features - push or pull data
○ Push :
○ nodes send token to other nodes whenever they
are available
○ the data producer is in control and initiates
transmissions
○ Pull :
○ the demand flows upstream
○ allows a node to lazily produce an output only
when needed
○ the data consumer is in control
Dataflow features - mutable or immutable
data
○ splits require to copy mutable data
○ Immutable data is preferred any time parallel
computations exist
Dataflow features - multiple inputs and/or
outputs
○ multiple inputs
○ node “firing rule”
○ ALL of its inputs have tokens (zip)
○ ANY of its inputs have tokens waiting (merge)
○ activation preconditions
○ a firing rule for the node must match the
available input tokens
○ space for new token(s) are available on the
outputs
○ multiple outputs
Dataflow features - compound nodes
○ the ability to create new nodes by using other nodes
○ a smaller data flow program
○ the interior nodes
○ should not know of anything outside of the
compound node
○ must be able to connect to the ports of the parent
compound node
Dataflow features - arc joins and/or splits
○ Fan-in : merge, zip, concat
○ Fan-out : broadcast, balance, unzip
Reactive Streams
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ Blocking calls : queue with a bounded size
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ The Push way
○ unbounded buffer : Out Of Memory error
○ bounded buffer with NACK
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ The reactive way : non-blocking, non-dropping &
bounded
○ data items flow downstream
○ demand flows upstream
○ data items flow only when there is demand
Reactive Streams - dynamic push / pull model
○ “push” behaviour when Subscriber is faster
○ “pull” behaviour when Publisher is faster
○ switches automatically between those behaviours in
runtime
○ batching demand allows batching data
Reactive Streams - back-pressure
Reactive Streams - back-pressure
Reactive Streams - SPI
trait Publisher[T] {

def subscribe(sub: Subscriber[T]): Unit

}



trait Subscription{

def requestMore(n: Int): Unit

def cancel(): Unit

}



trait Subscriber[T] {

def onSubscribe(s: Subscription): Unit

def onNext(elem: T): Unit

def onError(thr: Throwable): Unit

def onComplete(): Unit

}

Reactive Streams - asynchronous non-blocking protocol
Akka Streams - Pipeline Dataflow
Akka Streams - Core Concepts
○ a DSL implementation of Reactive Streams relying
on Akka actors
○ Stream which involves moving and/or transforming
data
○ Element : the processing unit of streams
○ Processing Stage : building blocks that build up a
Flow or FlowGraph (map(), filter(), transform(),
junctions …)
Akka Streams - Pipeline Dataflow
○ Source : a processing stage with exactly one
output, emitting data elements when downstream
processing stages are ready to receive them
○ Sink : a processing stage with exactly one input,
requesting and accepting data elements
○ Flow : a processing stage with exactly one input
and output, which connects its up- and downstream
by moving / transforming the data elements flowing
through it
○ Runnable Flow : a Flow that has both ends
“attached” to a Source and Sink respectively
○ Materialized Flow : a Runnable Flow that ran
Akka Streams - Pipeline Dataflow
/**

* Construct the ActorSystem we will use in our application

*/

implicit lazy val system = ActorSystem("DataFlow")



implicit val _ = ActorFlowMaterializer()



val source = Source(1 to 10)



val sink = Sink.fold[Int, Int](0)(_ + _)



val transform = Flow[Int].map(_ + 1).filter(_ % 2 == 0)



// connect the Source to the Sink, obtaining a runnable flow

val runnable: RunnableFlow = source.via(transform).to(sink)



// materialize the flow

val materialized: MaterializedMap = runnable.run()



implicit val dispatcher = system.dispatcher



// retrieve the result of the folding process over the stream

val result: Future[Int] = materialized.get(sink)

result.foreach(println)
Akka Streams - Pipeline Dataflow
○ Processing stages are immutable (connecting them
returns a new processing stage)
val source = Source(1 to 10)

source.map(_ => 0) // has no effect on source, since it’s immutable

source.runWith(Sink.fold(0)(_ + _)) // 55
○ A stream can be materialized multiple times
// connect the Source to the Sink, obtaining a RunnableFlow

val sink = Sink.fold[Int, Int](0)(_ + _)

val runnable: RunnableFlow = Source(1 to 10).to(sink)

// get the materialized value of the FoldSink

val sum1: Future[Int] = runnable.run().get(sink)

val sum2: Future[Int] = runnable.run().get(sink)

// sum1 and sum2 are different Futures!

for(a <- sum1; b <- sum2) yield println(a + b) //110
○ Processing stages preserve input order of elements
Akka Streams - working with graphs
Akka Streams - bulk export to es
val bulkSize: Int = 100



val source: Source[String] = Source(types.toList)



val sum: Sink[Int] = FoldSink[Int, Int](0)(_ + _)



val g:FlowGraph = FlowGraph{ implicit builder =>

import FlowGraphImplicits._



val walk: Flow[String, Companies2Export] = Flow[String]

.map(fileTree(_).toList)

.mapConcat[Companies2Export](identity)



val parse: Flow[Companies2Export, List[Company]] = Flow[Companies2Export]

.transform(() => new LoggingStage("ExportService"))

.map[List[Company]](f => parseFile(f.source, index, f.`type`).toList)



val flatten: Flow[List[Company], Company] = Flow[List[Company]].mapConcat[Company](identity)



val group: Flow[Company, Seq[Company]] = Flow[Company].grouped(bulkSize)



val toJson: Flow[Seq[Company], String] = Flow[Seq[Company]]

.map(_.map(_.toBulkIndex(index.name)).mkString(crlf) + crlf)



val bulkIndex: Flow[String, EsBulkResponse] = Flow[String]

.mapAsyncUnordered[EsBulkResponse](esBulk(index.name, _))



val count: Flow[EsBulkResponse, Int] = Flow[EsBulkResponse].map[Int]((b)=>{

logger.debug(s"index ${b.items.size} companies within ${b.took} ms")

b.items.size

})



// connect the graph

source ~> walk ~> parse ~> flatten ~> group ~> toJson ~> bulkIndex ~> count ~> sum

}



val result: Future[Int] = g.runWith(sum)



result.foreach(c => logger.info(s"*** total companies indexed: $c"))



result.onComplete(_aliases(index.name, before))
Akka Streams - bulk export to es
Akka Streams - frequent item set
Akka Streams - frequent item setval g:FlowGraph = FlowGraph{ implicit builder =>

import FlowGraphImplicits._



val source = Source.single(sorted)



import com.sksamuel.elastic4s.ElasticDsl._



val toJson = Flow[List[Seq[Long]]].map(_.map(seq => {

val now = Calendar.getInstance().getTime

val uuid = seq.mkString("-")

update(uuid)

.in(s"${esInputStore(store)}/CartCombination")

.upsert(

"uuid" -> uuid,

"combinations" -> seq.map(_.toString),

"counter" -> 1,

"dateCreated" -> now,

"lastUpdated" -> now)

.script("ctx._source.counter += count;ctx._source.lastUpdated = now")

.params("count" -> 1, "now" -> now)

}))



val flatten = Flow[List[BulkCompatibleDefinition]].mapConcat[BulkCompatibleDefinition](identity)



val count = Flow[BulkResponse].map[Int]((resp)=>{

val nb = resp.getItems.length

logger.debug(s"index $nb combinations within ${resp.getTookInMillis} ms")

nb

})



import EsClient._



source ~> combinationsFlow ~> toJson ~> flatten ~> bulkBalancedFlow() ~> count ~> sum

}



val start = new Date().getTime

val result: Future[Int] = g.runWith(sum)



result.foreach(c => logger.debug(s"*** $c combinations indexed within ${new Date().getTime - start} ms"))
Akka Streams - frequent item setval combinationsFlow = Flow(){implicit b =>

import FlowGraphImplicits._



val undefinedSource = UndefinedSource[Seq[Long]]

val undefinedSink = UndefinedSink[List[Seq[Long]]]



undefinedSource ~> Flow[Seq[Long]].map[List[Seq[Long]]](s => {

val combinations : ListBuffer[Seq[Long]] = ListBuffer.empty

1 to s.length foreach {i => combinations ++= s.combinations(i).toList}

combinations.toList

}) ~> undefinedSink



(undefinedSource, undefinedSink)

}
def bulkBalancedFlow(bulkSize: Int = Settings.ElasticSearch.bulkSize, balanceSize: Int = 2) = 

Flow(){implicit b =>

import FlowGraphImplicits._



val in = UndefinedSource[BulkCompatibleDefinition]

val group = Flow[BulkCompatibleDefinition].grouped(bulkSize)

val bulkUpsert = Flow[Seq[BulkCompatibleDefinition]].map(bulk)

val out = UndefinedSink[BulkResponse]



if(balanceSize > 1){



val balance = Balance[Seq[BulkCompatibleDefinition]]

val merge = Merge[BulkResponse]



in ~> group ~> balance

1 to balanceSize foreach { _ =>

balance ~> bulkUpsert ~> merge

}

merge ~> out



}

else{

in ~> group ~> bulkUpsert ~> out

}



(in, out)

}
Akka Streams - frequent item set
Akka Streams - frequent item set
def fis(store: String, productId: String, frequency: Double = 0.2): Future[(Seq[String], Seq[String])] = {

implicit val _ = ActorFlowMaterializer()



import com.mogobiz.run.learning._



val source = Source.single((productId, frequency))



val extract = Flow[(Option[CartCombination], Option[CartCombination])].map((x) => {

(

x._1.map(_.combinations.toSeq).getOrElse(Seq.empty),

x._2.map(_.combinations.toSeq).getOrElse(Seq.empty))

})



val exclude = Flow[(Seq[String], Seq[String])]

.map(x => (x._1.filter(_ != productId), x._2.filter(_ != productId)))



val head = Sink.head[(Seq[String], Seq[String])]



val runnable:RunnableFlow = source

.transform(() => new LoggingStage[(String, Double)]("Learning"))

.via(frequentItemSets(store))

.via(extract)

.via(exclude)

.to(head)



runnable.run().get(head)

}
def frequentItemSets(store: String) = Flow() { implicit builder =>

import FlowGraphImplicits._



val undefinedSource = UndefinedSource[(String, Double)]

val broadcast = Broadcast[Option[(String, Long)]]

val zip = Zip[Option[CartCombination], Option[CartCombination]]

val undefinedSink = UndefinedSink[(Option[CartCombination], Option[CartCombination])]



undefinedSource ~> calculateSupportThreshold(store) ~> broadcast

broadcast ~> loadMostFrequentItemSet(store) ~> zip.left

broadcast ~> loadLargerFrequentItemSet(store) ~> zip.right

zip.out ~> undefinedSink



(undefinedSource, undefinedSink)

}
Akka Streams - frequent item set
def calculateSupportThreshold(store:String) = Flow[(String, Double)].map { tuple =>

load[CartCombination](esInputStore(store), tuple._1).map { x =>

Some(tuple._1, Math.ceil(x.counter * tuple._2).toLong)

}.getOrElse(None)

}



def loadMostFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>

searchAll[CartCombination](cartCombinations(store, x) sort {

by field "counter" order SortOrder.DESC

}).headOption).getOrElse(None))



def loadLargerFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>

searchAll[CartCombination](cartCombinations(store, x) sort {

by script "doc['combinations'].values.size()" order SortOrder.DESC

}).headOption).getOrElse(None))




More Related Content

What's hot

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Santosh Sahoo
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
Knoldus Inc.
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
Gerard Maas
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
InfluxData
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
Databricks
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Apache Apex
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internals
Sigmoid
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
Databricks
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engine
Tianlun Zhang
 

What's hot (20)

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internals
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engine
 

Viewers also liked

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
Michal Gajdos
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsNLJUG
 
Reactive streams
Reactive streamsReactive streams
Reactive streams
codepitbull
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQmkiedys
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
rhatr
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?
Izzet Mustafaiev
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
Roland Kuhn
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function Programming
Ahmed Soliman
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014
Björn Antonsson
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Claudiu Barbura
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factors
Dejan Glozic
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment
Joe Kutner
 

Viewers also liked (13)

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
Reactive streams
Reactive streamsReactive streams
Reactive streams
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQ
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function Programming
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factors
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment
 

Similar to PSUG #52 Dataflow and simplified reactive programming with Akka-streams

Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
Michael Kendra
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
Roland Kuhn
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
Gal Marder
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRS
Milan Das
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objects
Mikhail Girkin
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
DataWorks Summit
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
confluent
 
Streaming data to s3 using akka streams
Streaming data to s3 using akka streamsStreaming data to s3 using akka streams
Streaming data to s3 using akka streams
Mikhail Girkin
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for Flux
InfluxData
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runner
Etienne Chauchot
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
Apache Apex
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
Dean Wampler
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Data Con LA
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
Satalia
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsKonrad Malawski
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 

Similar to PSUG #52 Dataflow and simplified reactive programming with Akka-streams (20)

Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRS
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objects
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Streaming data to s3 using akka streams
Streaming data to s3 using akka streamsStreaming data to s3 using akka streams
Streaming data to s3 using akka streams
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for Flux
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runner
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 

More from Stephane Manciot

Des principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreDes principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvre
Stephane Manciot
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Stephane Manciot
 
DevOps avec Ansible et Docker
DevOps avec Ansible et DockerDevOps avec Ansible et Docker
DevOps avec Ansible et Docker
Stephane Manciot
 
Docker / Ansible
Docker / AnsibleDocker / Ansible
Docker / Ansible
Stephane Manciot
 
Ansible - Introduction
Ansible - IntroductionAnsible - Introduction
Ansible - Introduction
Stephane Manciot
 
De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013Stephane Manciot
 

More from Stephane Manciot (6)

Des principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreDes principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvre
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
 
DevOps avec Ansible et Docker
DevOps avec Ansible et DockerDevOps avec Ansible et Docker
DevOps avec Ansible et Docker
 
Docker / Ansible
Docker / AnsibleDocker / Ansible
Docker / Ansible
 
Ansible - Introduction
Ansible - IntroductionAnsible - Introduction
Ansible - Introduction
 
De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

PSUG #52 Dataflow and simplified reactive programming with Akka-streams

  • 1. PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
 Stéphane Manciot 26/02/2015
  • 2. Plan ○ Dataflow main features ○ Reactive Streams ○ Akka-streams - Pipeline Dataflow ○ Akka-streams - working with graphs
  • 3. Dataflow - main features
  • 4. Dataflow Nodes and data ○ A node is a processing element that takes inputs, does some operation and returns the results on its outputs ○ Data (referred to as tokens) flows between nodes through arcs ○ The only way for a node to send and receive data is through ports ○ A port is the connection point between an arc and a node
  • 5. Dataflow Arcs ○ Always connects one output port to one input port ○ Has a capacity : the maximum amount of tokens that an arc can hold at any one time ○ Node activation requirements : ○ at least one token waiting on the input arc ○ space available on the output arc ○ Reading data from the input arc frees space on the latter ○ Arcs may contain zero or more tokens as long as it is less than the arc’s capacity
  • 6. Dataflow graph ○ Explicitly states the connections between nodes ○ Dataflow graph execution : ○ Node activation requirements are fulfilled ○ A token is consumed from the input arc ○ The node executes ○ If needed, a new token is pushed to the output arc
  • 7. Dataflow features - push or pull data ○ Push : ○ nodes send token to other nodes whenever they are available ○ the data producer is in control and initiates transmissions ○ Pull : ○ the demand flows upstream ○ allows a node to lazily produce an output only when needed ○ the data consumer is in control
  • 8. Dataflow features - mutable or immutable data ○ splits require to copy mutable data ○ Immutable data is preferred any time parallel computations exist
  • 9. Dataflow features - multiple inputs and/or outputs ○ multiple inputs ○ node “firing rule” ○ ALL of its inputs have tokens (zip) ○ ANY of its inputs have tokens waiting (merge) ○ activation preconditions ○ a firing rule for the node must match the available input tokens ○ space for new token(s) are available on the outputs ○ multiple outputs
  • 10. Dataflow features - compound nodes ○ the ability to create new nodes by using other nodes ○ a smaller data flow program ○ the interior nodes ○ should not know of anything outside of the compound node ○ must be able to connect to the ports of the parent compound node
  • 11. Dataflow features - arc joins and/or splits ○ Fan-in : merge, zip, concat ○ Fan-out : broadcast, balance, unzip
  • 13. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ Blocking calls : queue with a bounded size
  • 14. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ The Push way ○ unbounded buffer : Out Of Memory error ○ bounded buffer with NACK
  • 15. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ The reactive way : non-blocking, non-dropping & bounded ○ data items flow downstream ○ demand flows upstream ○ data items flow only when there is demand
  • 16. Reactive Streams - dynamic push / pull model ○ “push” behaviour when Subscriber is faster ○ “pull” behaviour when Publisher is faster ○ switches automatically between those behaviours in runtime ○ batching demand allows batching data
  • 17. Reactive Streams - back-pressure
  • 18. Reactive Streams - back-pressure
  • 19. Reactive Streams - SPI trait Publisher[T] {
 def subscribe(sub: Subscriber[T]): Unit
 }
 
 trait Subscription{
 def requestMore(n: Int): Unit
 def cancel(): Unit
 }
 
 trait Subscriber[T] {
 def onSubscribe(s: Subscription): Unit
 def onNext(elem: T): Unit
 def onError(thr: Throwable): Unit
 def onComplete(): Unit
 }

  • 20. Reactive Streams - asynchronous non-blocking protocol
  • 21. Akka Streams - Pipeline Dataflow
  • 22. Akka Streams - Core Concepts ○ a DSL implementation of Reactive Streams relying on Akka actors ○ Stream which involves moving and/or transforming data ○ Element : the processing unit of streams ○ Processing Stage : building blocks that build up a Flow or FlowGraph (map(), filter(), transform(), junctions …)
  • 23. Akka Streams - Pipeline Dataflow ○ Source : a processing stage with exactly one output, emitting data elements when downstream processing stages are ready to receive them ○ Sink : a processing stage with exactly one input, requesting and accepting data elements ○ Flow : a processing stage with exactly one input and output, which connects its up- and downstream by moving / transforming the data elements flowing through it ○ Runnable Flow : a Flow that has both ends “attached” to a Source and Sink respectively ○ Materialized Flow : a Runnable Flow that ran
  • 24. Akka Streams - Pipeline Dataflow /**
 * Construct the ActorSystem we will use in our application
 */
 implicit lazy val system = ActorSystem("DataFlow")
 
 implicit val _ = ActorFlowMaterializer()
 
 val source = Source(1 to 10)
 
 val sink = Sink.fold[Int, Int](0)(_ + _)
 
 val transform = Flow[Int].map(_ + 1).filter(_ % 2 == 0)
 
 // connect the Source to the Sink, obtaining a runnable flow
 val runnable: RunnableFlow = source.via(transform).to(sink)
 
 // materialize the flow
 val materialized: MaterializedMap = runnable.run()
 
 implicit val dispatcher = system.dispatcher
 
 // retrieve the result of the folding process over the stream
 val result: Future[Int] = materialized.get(sink)
 result.foreach(println)
  • 25. Akka Streams - Pipeline Dataflow ○ Processing stages are immutable (connecting them returns a new processing stage) val source = Source(1 to 10)
 source.map(_ => 0) // has no effect on source, since it’s immutable
 source.runWith(Sink.fold(0)(_ + _)) // 55 ○ A stream can be materialized multiple times // connect the Source to the Sink, obtaining a RunnableFlow
 val sink = Sink.fold[Int, Int](0)(_ + _)
 val runnable: RunnableFlow = Source(1 to 10).to(sink)
 // get the materialized value of the FoldSink
 val sum1: Future[Int] = runnable.run().get(sink)
 val sum2: Future[Int] = runnable.run().get(sink)
 // sum1 and sum2 are different Futures!
 for(a <- sum1; b <- sum2) yield println(a + b) //110 ○ Processing stages preserve input order of elements
  • 26. Akka Streams - working with graphs
  • 27. Akka Streams - bulk export to es
  • 28. val bulkSize: Int = 100
 
 val source: Source[String] = Source(types.toList)
 
 val sum: Sink[Int] = FoldSink[Int, Int](0)(_ + _)
 
 val g:FlowGraph = FlowGraph{ implicit builder =>
 import FlowGraphImplicits._
 
 val walk: Flow[String, Companies2Export] = Flow[String]
 .map(fileTree(_).toList)
 .mapConcat[Companies2Export](identity)
 
 val parse: Flow[Companies2Export, List[Company]] = Flow[Companies2Export]
 .transform(() => new LoggingStage("ExportService"))
 .map[List[Company]](f => parseFile(f.source, index, f.`type`).toList)
 
 val flatten: Flow[List[Company], Company] = Flow[List[Company]].mapConcat[Company](identity)
 
 val group: Flow[Company, Seq[Company]] = Flow[Company].grouped(bulkSize)
 
 val toJson: Flow[Seq[Company], String] = Flow[Seq[Company]]
 .map(_.map(_.toBulkIndex(index.name)).mkString(crlf) + crlf)
 
 val bulkIndex: Flow[String, EsBulkResponse] = Flow[String]
 .mapAsyncUnordered[EsBulkResponse](esBulk(index.name, _))
 
 val count: Flow[EsBulkResponse, Int] = Flow[EsBulkResponse].map[Int]((b)=>{
 logger.debug(s"index ${b.items.size} companies within ${b.took} ms")
 b.items.size
 })
 
 // connect the graph
 source ~> walk ~> parse ~> flatten ~> group ~> toJson ~> bulkIndex ~> count ~> sum
 }
 
 val result: Future[Int] = g.runWith(sum)
 
 result.foreach(c => logger.info(s"*** total companies indexed: $c"))
 
 result.onComplete(_aliases(index.name, before)) Akka Streams - bulk export to es
  • 29. Akka Streams - frequent item set
  • 30. Akka Streams - frequent item setval g:FlowGraph = FlowGraph{ implicit builder =>
 import FlowGraphImplicits._
 
 val source = Source.single(sorted)
 
 import com.sksamuel.elastic4s.ElasticDsl._
 
 val toJson = Flow[List[Seq[Long]]].map(_.map(seq => {
 val now = Calendar.getInstance().getTime
 val uuid = seq.mkString("-")
 update(uuid)
 .in(s"${esInputStore(store)}/CartCombination")
 .upsert(
 "uuid" -> uuid,
 "combinations" -> seq.map(_.toString),
 "counter" -> 1,
 "dateCreated" -> now,
 "lastUpdated" -> now)
 .script("ctx._source.counter += count;ctx._source.lastUpdated = now")
 .params("count" -> 1, "now" -> now)
 }))
 
 val flatten = Flow[List[BulkCompatibleDefinition]].mapConcat[BulkCompatibleDefinition](identity)
 
 val count = Flow[BulkResponse].map[Int]((resp)=>{
 val nb = resp.getItems.length
 logger.debug(s"index $nb combinations within ${resp.getTookInMillis} ms")
 nb
 })
 
 import EsClient._
 
 source ~> combinationsFlow ~> toJson ~> flatten ~> bulkBalancedFlow() ~> count ~> sum
 }
 
 val start = new Date().getTime
 val result: Future[Int] = g.runWith(sum)
 
 result.foreach(c => logger.debug(s"*** $c combinations indexed within ${new Date().getTime - start} ms"))
  • 31. Akka Streams - frequent item setval combinationsFlow = Flow(){implicit b =>
 import FlowGraphImplicits._
 
 val undefinedSource = UndefinedSource[Seq[Long]]
 val undefinedSink = UndefinedSink[List[Seq[Long]]]
 
 undefinedSource ~> Flow[Seq[Long]].map[List[Seq[Long]]](s => {
 val combinations : ListBuffer[Seq[Long]] = ListBuffer.empty
 1 to s.length foreach {i => combinations ++= s.combinations(i).toList}
 combinations.toList
 }) ~> undefinedSink
 
 (undefinedSource, undefinedSink)
 } def bulkBalancedFlow(bulkSize: Int = Settings.ElasticSearch.bulkSize, balanceSize: Int = 2) = 
 Flow(){implicit b =>
 import FlowGraphImplicits._
 
 val in = UndefinedSource[BulkCompatibleDefinition]
 val group = Flow[BulkCompatibleDefinition].grouped(bulkSize)
 val bulkUpsert = Flow[Seq[BulkCompatibleDefinition]].map(bulk)
 val out = UndefinedSink[BulkResponse]
 
 if(balanceSize > 1){
 
 val balance = Balance[Seq[BulkCompatibleDefinition]]
 val merge = Merge[BulkResponse]
 
 in ~> group ~> balance
 1 to balanceSize foreach { _ =>
 balance ~> bulkUpsert ~> merge
 }
 merge ~> out
 
 }
 else{
 in ~> group ~> bulkUpsert ~> out
 }
 
 (in, out)
 }
  • 32. Akka Streams - frequent item set
  • 33. Akka Streams - frequent item set def fis(store: String, productId: String, frequency: Double = 0.2): Future[(Seq[String], Seq[String])] = {
 implicit val _ = ActorFlowMaterializer()
 
 import com.mogobiz.run.learning._
 
 val source = Source.single((productId, frequency))
 
 val extract = Flow[(Option[CartCombination], Option[CartCombination])].map((x) => {
 (
 x._1.map(_.combinations.toSeq).getOrElse(Seq.empty),
 x._2.map(_.combinations.toSeq).getOrElse(Seq.empty))
 })
 
 val exclude = Flow[(Seq[String], Seq[String])]
 .map(x => (x._1.filter(_ != productId), x._2.filter(_ != productId)))
 
 val head = Sink.head[(Seq[String], Seq[String])]
 
 val runnable:RunnableFlow = source
 .transform(() => new LoggingStage[(String, Double)]("Learning"))
 .via(frequentItemSets(store))
 .via(extract)
 .via(exclude)
 .to(head)
 
 runnable.run().get(head)
 } def frequentItemSets(store: String) = Flow() { implicit builder =>
 import FlowGraphImplicits._
 
 val undefinedSource = UndefinedSource[(String, Double)]
 val broadcast = Broadcast[Option[(String, Long)]]
 val zip = Zip[Option[CartCombination], Option[CartCombination]]
 val undefinedSink = UndefinedSink[(Option[CartCombination], Option[CartCombination])]
 
 undefinedSource ~> calculateSupportThreshold(store) ~> broadcast
 broadcast ~> loadMostFrequentItemSet(store) ~> zip.left
 broadcast ~> loadLargerFrequentItemSet(store) ~> zip.right
 zip.out ~> undefinedSink
 
 (undefinedSource, undefinedSink)
 }
  • 34. Akka Streams - frequent item set def calculateSupportThreshold(store:String) = Flow[(String, Double)].map { tuple =>
 load[CartCombination](esInputStore(store), tuple._1).map { x =>
 Some(tuple._1, Math.ceil(x.counter * tuple._2).toLong)
 }.getOrElse(None)
 }
 
 def loadMostFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>
 searchAll[CartCombination](cartCombinations(store, x) sort {
 by field "counter" order SortOrder.DESC
 }).headOption).getOrElse(None))
 
 def loadLargerFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>
 searchAll[CartCombination](cartCombinations(store, x) sort {
 by script "doc['combinations'].values.size()" order SortOrder.DESC
 }).headOption).getOrElse(None))