SlideShare a Scribd company logo
Streaming data to S3
using akka-streams
Mikhail Girkin
Software Engineer
GILT
HBC Digital
@mike_girkin
The problem
● Several big (hundreds Mb) database result sets
● Served as a JSON files
● The service constantly OOM-ing, even on 32Gb instance
Akka-streams
● Library from akka toolbox
● Build on top of actor framework
● Handles streams and their specifics, without exposing
actors itself
A bit on akka-streams - Source
● The input of the data in the stream
● Has the output channel to feed data into the stream
SQLSource
A bit on akka streams - Sink
● The final point of the data in the stream
● Has the input channel to receive the data from the stream
S3 object
Another bit on akka-streams - Flow
● The transformation procedure of the stream
● Takes data from the input, apply some computations to it,
and pass the resulting data to the output
Serialization
Basic stream operations
● via
Source via Flow =>
Source
Flow via Flow =>
Flow
● to
Flow to Sink =>
Sink
Source to Sink =>
Sink
Declaration is not execution!
Stream description is just a declaration, so:
val s = Source[Int](Range(1, 100).toList)
.via(
Flow[Int].map(x => x + 10)
).to(
Sink.foreach(println)
)
will not execute until you call
s.run()
The skeleton
Get data -> serialize -> send to S3
def run(): Future[Long] = {
val cn = getConnection()
val stream = (cn: Connection) =>
dataSource.streamList(cn) // Source[Item] - get data from the DB
.via(serializeFlow) // Flow[Item, Byte] - serialize
.toMat(s3UploaderSink)(Keep.right) // Sink[Byte] - upload to S3
val countFuture = stream(cn).run()
countFuture.onComplete { r =>
cn.close()
}
countFuture
}
Serialize in the stream
● We deal with the single collection
● Type of the items is the same
val serializeFlow = Flow[Item]
.map(x => serializeItem(x)) // serializeItem: Item => String
.intersperse("[", ",", "]") // sort of mkString for the streams
.mapConcat[Byte] {
x => x.getBytes().toIndexedSeq
}
S3 multipart upload API
● Allows to upload files in separate chunks
● Allows to upload chunks in parallel
● Doesn’t have TTL for the chunks uploaded (by default)
Simplified methods:
1. initialize(bucket, filename) => uploadId
2. uploadChunk(uploadId, partNumber, content) => hashSum
3. complete()
Lets create an S3 Sink!
● SinkA = Flow to SinkB
S3 upload flow
Sink.head
(first value received)
S3 upload sink
S3 upload sink
Flow[Byte]
.grouped(chunkSize) //split the stream in chunks
.zip(Source.fromIterator(() => Iterator.from(1))) //Give the chunks numbers
.fold[MultipartUploader] ( //Fold over uploader state
initUploader() //initial value - uploader
) {
case (uploader, (data, chunkNumber)) => //reduce - returns uploader (!)
uploader.uploadChunk(chunkNumber, data.toArray)
}.map {
uploader => uploader.complete() //close the uploader on completion
}
.to(Sink.head)
SQL Source
Anorm provides akka-stream SQL source
libraryDependencies ++= Seq(
"com.typesafe.play" %% "anorm-akka" % "version",
"com.typesafe.akka" %% "akka-stream" % "version")
AkkaStream.source(SQL"SELECT * FROM Test",
SqlParser.scalar[String], ColumnAliaser.empty): Source[String]
Brings minimal transitive dependencies (!)
Road to production
● Retries in case of S3 errors/failures
● Handle the possible problem during stream execution (ie.
failure talking to DB)
200 OK

More Related Content

What's hot

Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoa
kleneau
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
Javier Santos Paniego
 
My first experience with lambda expressions in java
My first experience with lambda expressions in javaMy first experience with lambda expressions in java
My first experience with lambda expressions in javaScheidt & Bachmann
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
Akka streams - Umeå java usergroup
Akka streams - Umeå java usergroupAkka streams - Umeå java usergroup
Akka streams - Umeå java usergroup
Johan Andrén
 
Streaming all the things with akka streams
Streaming all the things with akka streams   Streaming all the things with akka streams
Streaming all the things with akka streams
Johan Andrén
 
GPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstractionGPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstractionVaclav Pech
 
Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
Esa Firman
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
Clément OUDOT
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich Android
Egor Andreevich
 
A dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenarioA dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenario
Gioia Ballin
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
Synchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC projectSynchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC projectClément OUDOT
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and Iterations
Sameer Wadkar
 
Concurrency on the JVM
Concurrency on the JVMConcurrency on the JVM
Concurrency on the JVMVaclav Pech
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
Viliam Elischer
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJS
Brainhub
 

What's hot (20)

Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoa
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
My first experience with lambda expressions in java
My first experience with lambda expressions in javaMy first experience with lambda expressions in java
My first experience with lambda expressions in java
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Akka streams - Umeå java usergroup
Akka streams - Umeå java usergroupAkka streams - Umeå java usergroup
Akka streams - Umeå java usergroup
 
Streaming all the things with akka streams
Streaming all the things with akka streams   Streaming all the things with akka streams
Streaming all the things with akka streams
 
Gpars workshop
Gpars workshopGpars workshop
Gpars workshop
 
GPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstractionGPars howto - when to use which concurrency abstraction
GPars howto - when to use which concurrency abstraction
 
Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich Android
 
A dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenarioA dive into akka streams: from the basics to a real-world scenario
A dive into akka streams: from the basics to a real-world scenario
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Synchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC projectSynchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC project
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and Iterations
 
Concurrency on the JVM
Concurrency on the JVMConcurrency on the JVM
Concurrency on the JVM
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJS
 

Viewers also liked

Monads - Dublin Scala meetup
Monads - Dublin Scala meetupMonads - Dublin Scala meetup
Monads - Dublin Scala meetup
Mikhail Girkin
 
CQRS + ES with Scala and Akka
CQRS + ES with Scala and AkkaCQRS + ES with Scala and Akka
CQRS + ES with Scala and Akka
Bharadwaj N
 
Akka: Введение
Akka: ВведениеAkka: Введение
Akka: Введение
Iosif Itkin
 
Akka Fundamentals
Akka FundamentalsAkka Fundamentals
Akka Fundamentals
Michael Kendra
 
Akkaships: "Primeros pasos con Akka: Olvídate de los threads"
Akkaships: "Primeros pasos con Akka: Olvídate de los threads"Akkaships: "Primeros pasos con Akka: Olvídate de los threads"
Akkaships: "Primeros pasos con Akka: Olvídate de los threads"
Miguel Angel Fernandez Diaz
 
Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016
Konrad Malawski
 
JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]
JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]
JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]
Johan Janssen
 
Akka stream
Akka streamAkka stream
Akka stream
Masaki Toyoshima
 
How Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemHow Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM Ecosystem
Konrad Malawski
 
Akka streams
Akka streamsAkka streams
Akka streams
mircodotta
 
End to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to SocketEnd to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to Socket
Konrad Malawski
 
Reactive Streams, j.u.concurrent & Beyond!
Reactive Streams, j.u.concurrent & Beyond!Reactive Streams, j.u.concurrent & Beyond!
Reactive Streams, j.u.concurrent & Beyond!
Konrad Malawski
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Legacy Typesafe (now Lightbend)
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsKonrad Malawski
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
Roland Kuhn
 
Akka Cluster and Auto-scaling
Akka Cluster and Auto-scalingAkka Cluster and Auto-scaling
Akka Cluster and Auto-scaling
Ikuo Matsumura
 
Vert.x vs akka
Vert.x vs akkaVert.x vs akka
Vert.x vs akka
Chang-Hwan Han
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Lightbend
 
Akka Finite State Machine
Akka Finite State MachineAkka Finite State Machine
Akka Finite State Machine
Knoldus Inc.
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 

Viewers also liked (20)

Monads - Dublin Scala meetup
Monads - Dublin Scala meetupMonads - Dublin Scala meetup
Monads - Dublin Scala meetup
 
CQRS + ES with Scala and Akka
CQRS + ES with Scala and AkkaCQRS + ES with Scala and Akka
CQRS + ES with Scala and Akka
 
Akka: Введение
Akka: ВведениеAkka: Введение
Akka: Введение
 
Akka Fundamentals
Akka FundamentalsAkka Fundamentals
Akka Fundamentals
 
Akkaships: "Primeros pasos con Akka: Olvídate de los threads"
Akkaships: "Primeros pasos con Akka: Olvídate de los threads"Akkaships: "Primeros pasos con Akka: Olvídate de los threads"
Akkaships: "Primeros pasos con Akka: Olvídate de los threads"
 
Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016
 
JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]
JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]
JavaOne: A tour of (advanced) akka features in 60 minutes [con1706]
 
Akka stream
Akka streamAkka stream
Akka stream
 
How Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemHow Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM Ecosystem
 
Akka streams
Akka streamsAkka streams
Akka streams
 
End to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to SocketEnd to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to Socket
 
Reactive Streams, j.u.concurrent & Beyond!
Reactive Streams, j.u.concurrent & Beyond!Reactive Streams, j.u.concurrent & Beyond!
Reactive Streams, j.u.concurrent & Beyond!
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Akka Cluster and Auto-scaling
Akka Cluster and Auto-scalingAkka Cluster and Auto-scaling
Akka Cluster and Auto-scaling
 
Vert.x vs akka
Vert.x vs akkaVert.x vs akka
Vert.x vs akka
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Akka Finite State Machine
Akka Finite State MachineAkka Finite State Machine
Akka Finite State Machine
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 

Similar to Streaming data to s3 using akka streams

Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
Gal Marder
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
Michael Kendra
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
Ortus Solutions, Corp
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
Ortus Solutions, Corp
 
Streaming Data with scalaz-stream
Streaming Data with scalaz-streamStreaming Data with scalaz-stream
Streaming Data with scalaz-stream
GaryCoady
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
Kenny Gorman
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
Ortus Solutions, Corp
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
Stephane Manciot
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_sparkGeetanjali G
 
Elk with Openstack
Elk with OpenstackElk with Openstack
Elk with Openstack
Arun prasath
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Reactivesummit
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Akara Sucharitakul
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 

Similar to Streaming data to s3 using akka streams (20)

Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
 
Streaming Data with scalaz-stream
Streaming Data with scalaz-streamStreaming Data with scalaz-stream
Streaming Data with scalaz-stream
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_spark
 
Elk with Openstack
Elk with OpenstackElk with Openstack
Elk with Openstack
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 

Recently uploaded

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 

Recently uploaded (20)

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 

Streaming data to s3 using akka streams

  • 1. Streaming data to S3 using akka-streams Mikhail Girkin Software Engineer GILT HBC Digital @mike_girkin
  • 2. The problem ● Several big (hundreds Mb) database result sets ● Served as a JSON files ● The service constantly OOM-ing, even on 32Gb instance
  • 3. Akka-streams ● Library from akka toolbox ● Build on top of actor framework ● Handles streams and their specifics, without exposing actors itself
  • 4. A bit on akka-streams - Source ● The input of the data in the stream ● Has the output channel to feed data into the stream SQLSource
  • 5. A bit on akka streams - Sink ● The final point of the data in the stream ● Has the input channel to receive the data from the stream S3 object
  • 6. Another bit on akka-streams - Flow ● The transformation procedure of the stream ● Takes data from the input, apply some computations to it, and pass the resulting data to the output Serialization
  • 7. Basic stream operations ● via Source via Flow => Source Flow via Flow => Flow ● to Flow to Sink => Sink Source to Sink => Sink
  • 8. Declaration is not execution! Stream description is just a declaration, so: val s = Source[Int](Range(1, 100).toList) .via( Flow[Int].map(x => x + 10) ).to( Sink.foreach(println) ) will not execute until you call s.run()
  • 9. The skeleton Get data -> serialize -> send to S3 def run(): Future[Long] = { val cn = getConnection() val stream = (cn: Connection) => dataSource.streamList(cn) // Source[Item] - get data from the DB .via(serializeFlow) // Flow[Item, Byte] - serialize .toMat(s3UploaderSink)(Keep.right) // Sink[Byte] - upload to S3 val countFuture = stream(cn).run() countFuture.onComplete { r => cn.close() } countFuture }
  • 10. Serialize in the stream ● We deal with the single collection ● Type of the items is the same val serializeFlow = Flow[Item] .map(x => serializeItem(x)) // serializeItem: Item => String .intersperse("[", ",", "]") // sort of mkString for the streams .mapConcat[Byte] { x => x.getBytes().toIndexedSeq }
  • 11. S3 multipart upload API ● Allows to upload files in separate chunks ● Allows to upload chunks in parallel ● Doesn’t have TTL for the chunks uploaded (by default) Simplified methods: 1. initialize(bucket, filename) => uploadId 2. uploadChunk(uploadId, partNumber, content) => hashSum 3. complete()
  • 12. Lets create an S3 Sink! ● SinkA = Flow to SinkB S3 upload flow Sink.head (first value received) S3 upload sink
  • 13. S3 upload sink Flow[Byte] .grouped(chunkSize) //split the stream in chunks .zip(Source.fromIterator(() => Iterator.from(1))) //Give the chunks numbers .fold[MultipartUploader] ( //Fold over uploader state initUploader() //initial value - uploader ) { case (uploader, (data, chunkNumber)) => //reduce - returns uploader (!) uploader.uploadChunk(chunkNumber, data.toArray) }.map { uploader => uploader.complete() //close the uploader on completion } .to(Sink.head)
  • 14. SQL Source Anorm provides akka-stream SQL source libraryDependencies ++= Seq( "com.typesafe.play" %% "anorm-akka" % "version", "com.typesafe.akka" %% "akka-stream" % "version") AkkaStream.source(SQL"SELECT * FROM Test", SqlParser.scalar[String], ColumnAliaser.empty): Source[String] Brings minimal transitive dependencies (!)
  • 15. Road to production ● Retries in case of S3 errors/failures ● Handle the possible problem during stream execution (ie. failure talking to DB)