Large volume data analysis on the Typesafe Reactive Platform

Martin Zapletal @zapletal_martin
Cake Solutions @cakesolutions

● Increasing importance of data analytics
● Current state
○ Destructive updates
○ Analytics tools with poor scalability and integration
○ Manual processes
○ Slow iterations
○ Not suitable for large amounts of data

● Shared memory, disk, shared nothing, threads, mutexes, transactional memory,
message passing, CSP, actors, futures, coroutines, evented, dataflow, ...
We can think of two reasons for using distributed machine learning: because you
have to (so much data), or because you want to (hoping it will be faster). Only the first
reason is good.
Zygmunt Z
Elapsed times for 20 PageRank iterations
[1, 2]

● Microsoft's data centers average failure rate is 5.2 devices per day and 40.8 links per day,
with a median time to repair of approximately five minutes (and a maximum of one week).
● Google new cluster over one year. Five times rack issues 40-80 machines seeing 50 percent
packet loss. Eight network maintenance events (four of which might cause ~30-minute
random connectivity losses). Three router failures (resulting in the need to pull traffic
immediately for an hour).
● CENIC 500 isolating network partitions with median 2.7 and 32 minutes; 95th percentile of
19.9 minutes and 3.7 days, respectively for software and hardware problems [3]

● MongoDB separated primary from its 2 secondaries. 2 hours later the old primary rejoined and rolled back
everything on the new primary
● A network partition isolated the Redis primary from all secondaries. Every API call caused the billing
system to recharge customer credit cards automatically, resulting in 1.1 percent of customers being
overbilled over a period of 40 minutes.
● The partition caused inconsistency in the MySQL database. Because foreign key relationships were not
consistent, Github showed private repositories to the wrong users' dashboards and incorrectly routed
some newly created repositories.
● For several seconds, Elasticsearch is happy to believe two nodes in the same cluster are both primaries, will
accept writes on both of those nodes, and later discard the writes to one side.
● RabbitMQ lost ~35% of acknowledged writes under those conditions.
● Redis threw away 56% of the writes it told us succeeded.
● In Riak, last-write-wins resulted in dropping 30-70% of writes, even with the strongest consistency
settings
● MongoDB “strictly consistent” reads see stale versions of documents, but they can also return garbage
data from writes that never should have occurred. [4]

● Complementary
● Distributed data processing framework Apache Spark won Daytona
Gray Sort 100TB Benchmark
● Distributed databases

● Whole lifecycle of data
● Data processing - Futures, Akka, Akka Cluster, Reactive Streams,
Spark, …
● Data stores
● Integration
● Distributed computing primitives
● Cluster managers and task schedulers
● Deployment, configuration management and DevOps
● Data analytics and machine learning

CQRS
Kappa architecture
Batch-Pipeline
Kafka
Allyourdata
NoSQL
SQL
Spark
Client
Client
Client Views
Stream
processor
Client
QueryCommand
DBDB
Denormalise
/Precompute
Flume
Scoop
Hive
Impala
Serving DB
Oozie
HDFS
Lambda Architecture
Batch Layer Serving
Layer
Stream layer (fast)
Query
Query
Allyourdata
[5]

● Basic building block of neural networks
a = f(Σ(y * w) + b)

● Multi Layer Perceptron (Feed Forward Neural Network)
● Network training
○ Many “optimal” solutions
○ Optimization and training techniques - LBFGS, Backpropagation,
batch and online gradient descent, Downpour SGD, Sandblaster
LBFGS
○ New methods for large networks - deep learning
● We will only need forward propagation

-10.895
1.195
1
0
0.999595
-24.584
-1.159
7.360
-40.119
1.991
35.369
-24.687
-53.197
-8.627
-57.122
2.616
61.488
-52.985
-22.904
-67.173
22.172
-53.706
27.098
-0.375
Output 2.613296075440797E-4 for input Vector(0, 0)
Output 0.9989222606269823 for input Vector(0, 1)
Output 0.9995952194411893 for input Vector(1, 0)
Output 4.0074182099155245E-7 for input Vector(1, 1)

trait HasInput {
var input: Node = _
def addInput(i: Node): Unit = input = i
}
trait HasOutput {
var output: Node = _
def addOutput(o: Node): Unit = output = o
}
class Edge extends HasInput with HasOutput {
var weight: Double = 0.3
def run(in: Input) = output.run(WeightedInput(in.feature, weight))
}

class Perceptron extends Neuron {
override var activationFunction: Double => Double = Neuron.sigmoid
override var bias: Double = 0.2
var inputs: Seq[Edge] = Seq()
var outputs: Seq[Edge] = Seq()
var weightsT: Seq[Double] = Vector()
var featuresT: Seq[Double] = Vector()
private def allInputsAvailable(w: Seq[Double], f: Seq[Double], in: Seq[Edge]) =
w.length == in.length && f.length == in.length
override def run(in: WeightedInput): Unit = {
featuresT = featuresT :+ in.feature
weightsT = weightsT :+ in.weight
if(allInputsAvailable(weightsT, featuresT, inputs)) {
val activation = activationFunction(weightsT.zip(featuresT).map(x => x._1 * x._2).sum + bias)
featuresT = Vector()
weightsT = Vector()
outputs.foreach(_.run(Input(activation)))
}
}
}

val hiddenLayer1 = new Perceptron()
val edgei1h1 = new Edge()
edgei1h1.addInput(inputLayer1)
edgei1h1.addOutput(hiddenLayer1)
hiddenLayer1.addInputs(Seq(edgei1h1, edgei2h1, edgei3h1))
hiddenLayer1.addOutputs(Seq(edgeh1o1))
Source.fromFile("src/main/resources/data2.csv")
.getLines()
.foreach{ l =>
val splits = l.split(",")
inputLayer1.run(
WeightedInput(splits(0).toDouble, 1))
inputLayer2.run(
inputLayer3.run(
}
00.00010.0002
0.00010.0002 0
0.00010.0002 0

Output 0 with result 0.6294598811729977 in 14:49:14.971
...

Source.fromFile("src/main/resources/data2.csv")
.getLines()
.toList
.par
.foreach { l =>
...
}

object Perceptron {
def activation(w: Vector[Double], f: Vector[Double], bias: Double, activationFunction: Double => Double) =
activationFunction(w.zip(f).map(x => x._1 * x._2).sum + bias)
}
object Network {
def feedForward(features: Vector[Double], network: Seq[Vector[Vector[Double] => Double]]): Vector[Double] =
network.foldLeft(features)((b, a) => a.map(_(b)))
}
val network = Seq[Vector[Vector[Double] => Double]](
Vector(
Perceptron.activation(Vector(0.3, 0.3, 0.3), _, 0.2, Neuron.sigmoid),
Perceptron.activation(Vector(0.3, 0.3, 0.3), _, 0.2, Neuron.sigmoid)),
Vector(Perceptron.activation(Vector(0.3, 0.3, 0.3), _, 0.2, Neuron.sigmoid))
)
Network.feedForward(Vector(splits(0).toDouble, splits(1).toDouble, splits(2).toDouble), network)

● Actor framework for truly concurrent and distributed systems
● Thread safe mutable state
● Send messages, create new actors, change behaviour
● Multiple options how to express Neural network

def props() = Props(behaviour)
def behaviour = addInput(addOutput(feedForward(_, _, 0.2, sigmoid, Vector(), Vector()), _))
private def allInputsAvailable(w: Vector[Double], f: Vector[Double], in: Seq[ActorRef[Nothing]]) =
w.length == in.length && f.length == in.length
def feedForward(
inputs: Seq[ActorRef[Nothing]],
outputs: Seq[ActorRef[Input]],
bias: Double,
activationFunction: Double => Double,
weightsT: Vector[Double],
featuresT: Vector[Double]): Behavior[NodeMessage] = Partial[NodeMessage] {
case WeightedInput(f, w) =>
val featuresTplusOne = featuresT :+ f
val weightsTplusOne = weightsT :+ w
if (allInputsAvailable(featuresTplusOne, weightsTplusOne, inputs)) {
val activation = activationFunction(weightsTplusOne.zip(featuresTplusOne).map(x => x._1 * x._2).sum +
bias)
outputs.foreach(_ ! Input(activation))
feedForward(inputs, outputs, bias, activationFunction, Vector(), Vector())
} else {
feedForward(inputs, outputs, bias, activationFunction, weightsTplusOne, featuresTplusOne)
}
}

Activation 0.5498414227985574 using features Vector(0.0, 0.0, 1.0E-4)
Activation 0.549856273704096 using features Vector(0.0, 1.0E-4, 2.0E-4)
Activation 0.5498711245207856 using features Vector(1.0E-4, 2.0E-4, 2.0E-4)
Activation 0.6294619594716266 using features Vector(0.5498414227985574, 0.549856273704096)
Activation 0.5498859752486001 using features Vector(3.0E-4, 4.0E-4, 0.0)
Activation 0.5499453772705898 using features Vector(7.0E-4, 0.0, 8.0E-4)
Activation 0.5500196277952787 using features Vector(7.0E-4, 8.0E-4, 0.001)
Activation 0.5500716018368813 using features Vector(9.0E-4, 0.0011, 0.0012)
Activation 0.5501458485714356 using features Vector(0.0013, 0.0014, 0.0015)
Activation 0.5501532731220817 using features Vector(0.0016, 0.001, 0.0017)

● Sequential program always one total order of operations
● No order guarantees in distributed system
● Akka messages sent directly from the first to the second will not be
received out-of-order for a pair of actors (non transitive)

● At-most-once. Messages may be lost.
● At-least-once. Messages may be duplicated but not lost.
● Exactly-once.
Ack [8]

1.
4.
7.
2.
3.
5.
6.
8.
9.
10.
11.

?
?
?
? + 1
? + 1
? + 1
? + 1
? + 1
? + 1
? + 2
? + 2

Output 76 with result 0.6298492571946717 in 2015-05-21 17:26:56.504
[INFO] [05/21/2015 17:26:56.504] [akka-akka.actor.default-dispatcher-13] [akka://akka/user/hiddenLayer1] Message
[Node$WeightedInput] from Actor[akka://akka/deadLetters] to Actor[akka://akka/user/hiddenLayer1#162015581] was not
delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [05/21/2015 17:26:56.504] [akka-akka.actor.default-dispatcher-13] [akka://akka/user/hiddenLayer1] Message
[Node$WeightedInput] from Actor[akka://akka/deadLetters] to Actor[akka://akka/user/hiddenLayer1#162015581] was not
delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

● Model parallelism
● Actor creation manual or Cluster Sharding
val idExtractor: ShardRegion.IdExtractor = {
case i: AddInputs =>
}
val shardResolver: ShardRegion.ShardResolver = {
case i: AddInputs =>
}
Machine1
Machine2
Machine3
Machine4
Machine1
Machine2
Machine3
Machine4
[9]

class Perceptron() extends Actor with Neuron {
...
override def receive = run orElse addInput orElse addOutput
val shardRegion = ClusterSharding(context.system).shardRegion(Edge.shardName)
def run: Receive = {
case WeightedInput(_, f, w) =>
featuresT = featuresT :+ f
weightsT = weightsT :+ w
if(allInputsAvailable(weightsT, featuresT, inputs)) {
val activation = activationFunction(weightsT.zip(featuresT).map(x => x._1 * x._2).sum + bias)
featuresT = Vector()
weightsT = Vector()
outputs.foreach(shardRegion ! Input(_, activation))
}
}
}

[17:33:12.353] [ClusterSystem-akka.actor.default-dispatcher-21] [Cluster(akka://ClusterSystem)] Cluster Node
[akka.tcp://ClusterSystem@127.0.0.1:2551] - Leader is removing unreachable node [akka.tcp:
//ClusterSystem@127.0.0.1:54495]
[17:33:12.388] [ClusterSystem-akka.actor.default-dispatcher-22] [akka.tcp://ClusterSystem@127.0.0.1:
2551/user/sharding/PerceptronCoordinator] Member removed [akka.tcp://ClusterSystem@127.0.0.1:54495]
[17:33:12.388] [ClusterSystem-akka.actor.default-dispatcher-35] [akka.tcp://ClusterSystem@127.0.0.1:
2551/user/sharding/EdgeCoordinator] Member removed [akka.tcp://ClusterSystem@127.0.0.1:54495]
[17:33:12.415] [ClusterSystem-akka.actor.default-dispatcher-18] [akka://ClusterSystem/user/sharding/Edge/e-2-
1-3-1] null java.lang.NullPointerException

class Edge extends PersistentActor with HasInput with HasOutput {
override def persistenceId: String = self.path.name
override def receiveCommand: Receive = run orElse addInput orElse addOutput
override def receiveRecover: Receive = recover orElse addInputRecover orElse addOutputRecover
val shardRegion = ClusterSharding(context.system).shardRegion(Perceptron.shardName)
case Input(r, f) => shardRegion ! WeightedInput(output, f, weight)
case UpdateWeightCommand(r, w) =>
persist(UpdatedWeightEvent(r, w)) { event =>
weight = event.weight
}
}
def recover: Receive = {
case UpdatedWeightEvent(_, w) =>
weight = w
}
}

● Data parallelism
Data
Data

ElasticSearch gives up on partition tolerance, it means, if enough nodes
fail, cluster state turns red and ES does not proceed to operate on that
index.
ES is not giving up on availability. Every request will be responded,
either true (with result) or false (error).
● Synchronous and asynchronout replication
● Avaiability and consistency during partition
[4]

Clock: (r0 -> 1),
Value: x
r0 r1 r2

r0 r1 r2
(r0 -> 1), x (r0 -> 1), x (r0 -> 1), x
(r2 -> 1), y

r0 r1 r2
(r0 -> 1, r2 -> 1), y(r0 -> 1, r2 -> 1), y(r0 -> 1, r2 -> 1), y

(r0 -> 1), x (r2 -> 1), y
Conflict
r0 r1 r2

class Edge(
override val aggregateId: Option[String],
override val replicaId: String,
override val eventLog: ActorRef) extends EventsourcedActor with HasInput with HasOutput {
override def onCommand: Receive = run orElse addInput orElse addOutput
private var versionedState: ConcurrentVersions[Double, Double] = ConcurrentVersions(0.3, (s, a) => a)
...
override def onEvent: Receive = {
case UpdatedWeightEvent(w) =>
versionedState = versionedState.update(w, lastVectorTimestamp, lastEmitterReplicaId)
if (versionedState.conflict) {
val conflictingVersions = versionedState.all
val avg = conflictingVersions.map(_.value).sum / conflictingVersions.size
val newTimestamp = conflictingVersions.map(_.updateTimestamp).foldLeft(VectorTime())(_.merge(_))
versionedState.update(avg, newTimestamp, replicaId)
versionedState = versionedState.resolve(newTimestamp)
weight = versionedState.all.head.value
} else {
weight = versionedState.all.head.value
}
}
}

● Replica r0 - update weight to 0, 1, 2
● Replica r1 - 3, 4, 5
● Replica r2 - 6, 7, 8
Conflicting versions on replica 0
value 4.0 vector clock VectorTime(r1 -> 1)
Conflicting versions on replica 0 resolved value 5.5 vector clock VectorTime(r1 -> 1,r2 -> 1)
value 5.5 vector clock VectorTime(r1 -> 1,r2 -> 1)
Conflicting versions on replica 0 resolved value 2.75 vector clock VectorTime(r1 -> 1,r2 -> 1,r0 -> 1)
value 2.75 vector clock VectorTime(r1 -> 1,r2 -> 1,r0 -> 1)
Conflicting versions on replica 0 resolved Vector(value 2.875 vector clock VectorTime(r1 -> 2,r2 -> 1,r0 -> 1)
value 5.0 vector clock VectorTime(1-e1 -> 5,r2 -> 1,r0 -> 1)
value 6.0 vector clock VectorTime(r2 -> 3,1-e1 -> 1)
Conflicting versions on replica 0 resolved Vector(value 5.5 vector clock VectorTime(1-e1 -> 5,r2 -> 3,r0 -> 1)

class Edge extends Actor with HasInput with HasOutput {
val replicator = DataReplication(context.system).replicator
implicit val cluster = Cluster(context.system)
replicator ! Subscribe(self.path.name, self)
override def receive: Receive = run orElse addInput orElse addOutput
...
case UpdateWeight(w) =>
replicator ! Update(self.path.name, GCounter(), WriteLocal)(_ + w)
case Changed(key, GCounter(mergedWeight)) if key == self.path.name =>
weight = mergedWeight
}
}

● Replica r0 - update weight to 0, 1, 2
● Replica r1 - 3, 4, 5
● Replica r2 - 6, 7, 8
Weight on replica r2 changed to 21

● Publisher and subscriber
● Source[Circle].map(_.toSquare).filter(_.color == blue)
● Lazy topology definition
Publisher Subscriber
toSquare
color == blue
backpressure

input ~> network(topology, weights) ~> zipWithIndex ~> formatPrintSink
def buildLayer(
layer: Int,
input: Outlet[DenseMatrix[Double]],
topology: Array[Int],
weights: DenseMatrix[Double]): Outlet[DenseMatrix[Double]] = {
val currentLayer = builder.add(hiddenLayer(layer))
input ~> currentLayer.in0
hiddenLayerWeights(topology, layer, weights) ~> currentLayer.in1
if (layer < topology.length - 1) buildLayer(layer + 1, currentLayer.out, topology, weights)
else currentLayer.out
}

def hiddenLayer(layer: Int) = {
def feedForward(features: DenseMatrix[Double], weightMatrices: DenseMatrix[Double]) = {
val bias = 0.2
val activation: DenseMatrix[Double] = weightMatrices * features
activation(::, *) :+= bias
sigmoid.inPlace(activation)
activation
}
FlowGraph.partial() { implicit builder: FlowGraph.Builder[Unit] =>
import akka.stream.scaladsl.FlowGraph.Implicits._
val zipInputAndWeights = builder.add(Zip[DenseMatrix[Double], DenseMatrix[Double]]())
val feedForwardFlow = builder.add(Flow[(DenseMatrix[Double], DenseMatrix[Double])]
.map(x => feedForward(x._1, x._2)))
zipInputAndWeights.out ~> feedForwardFlow
new FanInShape2(zipInputAndWeights.in0, zipInputAndWeights.in1, feedForwardFlow.outlet)
}
}

Network
weightsvectorn
zip
feedForward
activation
*
zipWithIndex
index
Layer
ZipWithIndex
feature vector n+1
feature vector n

● In memory dataflow distributed data processing framework, streaming and
batch
● Distributes computation using a higher level API
● Moves computation to data
● Fault tolerant
● Caching
● Transformations
○ Lazy, form the DAG
○ map, filter, flatMap, mapPartitions, mapPartitionsWithIndex, sample,
union, intersection, distinct, groupByKey, reduceByKey, sortByKey, join,
cogroup, repatition, cartesian, glom, ...
● Actions
○ Execute DAG, retrieve result
○ reduce, collect, count, first, take, foreach, saveAs…, min, max, ...

textFile mapmap
reduceByKey
collect
sc.textFile("counts")
.map(line => line.split("t"))
.map(word => (word(0), word(1).toInt))
.reduceByKey(_ + _)
.collect()
[11]

● Accumulators
○ Processes can only add
○ Associative, commutative operation
○ Only driver program can read the value
○ Exactly once semantics only guaranteed for actions
object DoubleAccumulatorParam extends AccumulatorParam[Double] {
def zero(initialValue: Double): Double = 0
def addInPlace(d1: Double, d2: Double): Double = d1 + d2
}

def forwardRun(
topology: Array[Int],
data: DenseMatrix[Double],
weightMatrices: Array[DenseMatrix[Double]]): DenseMatrix[Double] = {
val bias = 0.2
val outArray = new Array[DenseMatrix[Double]](topology.size)
val blas = BLAS.getInstance()
outArray(0) = data
for(i <- 1 until topology.size) {
val weights = hiddenLayerWeights(topology, i, weightMatrices)
val outputCurrent = new DenseMatrix[Double](weights.rows, data.cols)
val outputPrevious = outArray(i - 1)
blas.dgemm("N", "N", outputCurrent.rows, outputCurrent.cols,
weights.cols, 1.0, weights.data, weights.offset, weights.majorStride,
outputPrevious.data, outputPrevious.offset, outputPrevious.majorStride,
1.0, outputCurrent.data, outputCurrent.offset, outputCurrent.rows)
outArray(i) = outputCurrent
outArray(i)(::, *) :+= bias
sigmoid.inPlace(outArray(i))
}
outArray(topology.size - 1)
}

val sc = new SparkContext("local", "Neural Network")
val result = sc.textFile("src/main/resources/data.csv", 3)
.map { l =>
val splits = l.split(",")
val features = splits.map(_.toDouble)
new DenseMatrix(3, 1, Array(features(0), features(1), features(2)))
}
.map(in => forwardRun(topology, in, weights))

Data
feedForward
feedForward
feedForward
collect()

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val resultDF = result.toDF("result")
resultDF
.filter(resultDF("result") > "String")
.select(resultDF("result") + "String")
// StructType(StructField(result,DoubleType,true))
resultDF.registerTempTable("results")
val filtered3 = sqlContext.sql(
"SELECT result + "String" " +
"FROM (" +
"SELECT result " +
"FROM results) r " +
"WHERE r.result >= "String"")

● Multiple phases
● Catalyst
[12]

object PushPredicateThroughProject extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case filter @ Filter(condition, project @ Project(fields, grandChild)) =>
val sourceAliases = fields.collect { case a @ Alias(c, _) =>
(a.toAttribute: Attribute) -> c
}.toMap
project.copy(child = filter.copy(
replaceAlias(condition, sourceAliases),
grandChild))
}
}
case Divide(e1, e2) =>
val eval1 = expressionEvaluator(e1)
val eval2 = expressionEvaluator(e2)
eval1.code ++ eval2.code ++
q"""
var $nullTerm = false
var $primitiveTerm: ${termForType(e1.dataType)} = 0
if (${eval1.nullTerm} || ${eval2.nullTerm} ) {
$nullTerm = true
} else if (${eval2.primitiveTerm} == 0)
$nullTerm = true
else {
$primitiveTerm = ${eval1.primitiveTerm} / ${eval2.primitiveTerm}
}
""".children

=== Result of Batch Resolution ===
=== Result of Batch Remove SubQueries ===
=== Result of Batch ConstantFolding ===
=== Result of Batch Filter Pushdown ===
== Parsed Logical Plan ==
'Project [('result + String) AS c0#2]
'Filter ('r.result >= String)
'Subquery r
'Project ['result]
'UnresolvedRelation [results], None
== Analyzed Logical Plan ==
Project [(CAST(result#1, DoubleType) + CAST(String, DoubleType)) AS c0#2]
Filter (CAST(result#1, DoubleType) >= CAST(String, DoubleType))
Subquery r
Project [result#1]
Subquery results
Project [_1#0 AS result#1]
LogicalRDD [_1#0], MapPartitionsRDD[5] at map at SQLContext.scala:394
== Optimized Logical Plan ==
LocalRelation [c0#2], []
== Physical Plan ==
LocalTableScan [c0#2], []

case class Person(age: Int, height: Double)
val people = sc.parallelize((0 to 100).map(x => Person(x, x)))
people
.map(p => Person(p.age, p.height * 2.54))
.filter(_.age < 35)
people
.filter(_.age < 35)
people
.filter(_.height < 170)
people
.filter(_.height < 170)

Choose the best combination of tools for given use case.
Understand the internals of selected tools.
The environment often fully asynchronous and distributed.
1)
2)
3)

● Jobs at www.cakesolutions.net/careers
● Code at https://github.com/zapletal-martin/reactive-deep-learning
● Twitter @zapletal_martin

[1] http://www.csie.ntu.edu.tw/~cjlin/talks/twdatasci_cjlin.pdf
[2] http://blog.acolyer.org/2015/06/05/scalability-but-at-what-cost/
[3] https://queue.acm.org/detail.cfm?id=2655736
[4] https://aphyr.com/
[5] http://www.benstopford.com/2015/04/28/elements-of-scale-composing-and-scaling-data-platforms/
[6] http://malteschwarzkopf.de/research/assets/google-stack.pdf
[7] http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
[8] http://en.wikipedia.org/wiki/Two_Generals%27_Problem
[9] http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf
[10] http://www.smartjava.org/content/visualizing-back-pressure-and-reactive-streams-akka-streams-statsd-grafana-and-influxdb
[11] http://www.slideshare.net/LisaHua/spark-overview-37479609
[12] https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/

Large volume data analysis on the Typesafe Reactive Platform

More Related Content

What's hot

Viewers also liked

Similar to Large volume data analysis on the Typesafe Reactive Platform

Recently uploaded

Large volume data analysis on the Typesafe Reactive Platform