SlideShare a Scribd company logo
1 of 30
Download to read offline
Streaming Data with
scalaz-stream
Gary Coady
gcoady@gilt.com
• Why do we want streaming APIs?
• Introduction to scalaz-stream
• Use case: Server-Sent Events implementation
Contents
Why do we want
streaming APIs?
Information with
Indeterminate/unbounded size
• Lines from a text file
• Bytes from a binary file
• Chunks of data from a TCP connection
• TCP connections
• Data from Kinesis or SQS or SNS or Kafka or…
• Data from an API with paged implementation
“Dangerous” Choices
• scala.collection.Iterable

Provides an iterator to step through items in
sequence
• scala.collection.immutable.Stream

Lazily evaluated, possibly infinite list of values
Do The Right Thing
• Safe setup and cleanup
• Constant memory usage
• Constant stack usage
• Refactor with confidence
• Composable
• Back-pressure
• Creates co-data
• Safe resource management
• Referential transparency
• Controlled asynchronous effects
What is scalaz-stream
User
code
Process
.await
“Waiting” for
callback
User
code
Callback
sealed	
  trait	
  Process[+F[_],	
  +O]
Effect
Output
case	
  class	
  Halt(cause:	
  Cause)	
  extends	
  Process[Nothing,	
  Nothing]
case	
  class	
  Emit[+O](seq:	
  Seq[O])	
  extends	
  Process[Nothing,	
  O]
case	
  class	
  Await[+F[_],	
  A,	
  +O](

	
  	
  req:	
  F[A],

	
  	
  rcv:	
  (EarlyCause	
  /	
  A)	
  =>	
  Process[F,	
  O]	
  
)	
  extends	
  Process[F,	
  O]
Composition Options
Process1[I,	
  O]	
  
	
  -­‐	
  Stateful	
  transducer,	
  converts	
  I	
  =>	
  O	
  (with	
  state)	
  
	
  -­‐	
  Combine	
  with	
  “pipe”	
  
Channel[F[_],	
  I,	
  O]	
  
	
  -­‐	
  Takes	
  I	
  values,	
  runs	
  function	
  I	
  =>	
  F[O]	
  
	
  -­‐	
  Combine	
  with	
  “through”	
  or	
  “observe”.	
  
Sink[F[_],	
  I]	
  
	
  -­‐	
  Takes	
  I	
  values,	
  runs	
  function	
  I	
  =>	
  F[Unit]	
  
	
  -­‐	
  Add	
  with	
  “to”.
Implementing
Server-sent Events (SSE)
This specification defines an API for
opening an HTTP connection for
receiving push notifications from a
server in the form of DOM events.
case	
  class	
  SSEEvent(eventName:	
  Option[String],	
  data:	
  String)
data:	
  This	
  is	
  the	
  first	
  message.	
  
data:	
  This	
  is	
  the	
  second	
  message,	
  it	
  
data:	
  has	
  two	
  lines.	
  
data:	
  This	
  is	
  the	
  third	
  message.	
  
event:	
  add	
  
data:	
  73857293	
  
event:	
  remove	
  
data:	
  2153	
  
event:	
  add	
  
data:	
  113411
Example streams
We want this type:



Process[Task,	
  SSEEvent]
“A potentially infinite stream of SSE event messages”
async.boundedQueue[A]
• Items added to queue are removed in same order
• Connect different asynchronous domains
• Methods:



def	
  enqueueOne(a:	
  A):	
  Task[Unit]



def	
  dequeue:	
  Process[Task,	
  A]
HTTP Client
Implementation
• Use Apache AsyncHTTPClient
• Hook into onBodyPartReceived callback
• Use async.boundedQueue to convert chunks into
stream
def	
  httpRequest(client:	
  AsyncHttpClient,	
  url:	
  String):	
  
	
  	
  	
  Process[Task,	
  ByteVector]	
  =	
  {	
  
	
  	
  val	
  contentQueue	
  =	
  async.boundedQueue[ByteVector](10)	
  
	
  	
  val	
  req	
  =	
  client.prepareGet(url)	
  
	
  	
  req.execute(new	
  AsyncCompletionHandler[Unit]	
  {

	
  	
  	
  	
  override	
  def	
  onBodyPartReceived(content:	
  HttpResponseBodyPart)	
  =	
  {

	
  	
  	
  	
  	
  	
  contentQueue.enqueueOne(	
  
	
  	
  	
  	
  	
  	
  	
  	
  ByteVector(content.getBodyByteBuffer)	
  
	
  	
  	
  	
  	
  	
  ).run	
  
	
  	
  	
  	
  	
  	
  super.onBodyPartReceived(content)

	
  	
  	
  	
  }

	
  	
  })

	
  	
  contentQueue.dequeue	
  
}
How to terminate
stream?
req.execute(new	
  AsyncCompletionHandler[Unit]	
  {	
  
	
  	
  ...	
  
	
  	
  override	
  def	
  onCompleted(r:	
  Response):	
  Unit	
  =	
  {

	
  	
  	
  	
  logger.debug("Request	
  completed")

	
  	
  	
  	
  contentQueue.close.run

	
  	
  }	
  
	
  	
  ...	
  
}
How to terminate
stream with errors?
req.execute(new	
  AsyncCompletionHandler[Unit]	
  {	
  
	
  	
  ...	
  
	
  	
  override	
  def	
  onThrowable(t:	
  Throwable):	
  Unit	
  =	
  {

	
  	
  	
  	
  logger.debug("Request	
  failed	
  with	
  error",	
  t)

	
  	
  	
  	
  contentQueue.fail(t).run

	
  	
  }	
  
	
  	
  ...	
  
}
Process[Task, ByteVector]
Process[Task, SSEEvent]
Process[Task, Underpants]
Step 1
Step 2
Step 3
• Split at line endings
• Convert ByteVector into UTF-8 Strings
• Partition by SSE “tag” (“data”, “id”, “event”, …)
• Emit accumulated SSE data when blank line found
• Split at line endings



ByteVector	
  =>	
  Seq[ByteVector]
• Convert ByteVector into UTF-8 Strings



ByteVector	
  =>	
  String
• Partition by SSE “tag” (“data”, “id”, “event”, …)



String	
  =>	
  SSEMessage
• Emit accumulated SSE data when blank line found



SSEMessage	
  =>	
  SSEEvent
Handling Network Errors
• If a network error occurs:
• Sleep a while
• Set up the connection again and keep going
• Append the same Process definition again!
def	
  sseStream:	
  Process[Task,	
  SSEEvent]	
  =	
  {	
  
	
  	
  httpRequest(client,	
  url)	
  
	
  	
  	
  	
  .pipe(splitLines)	
  
	
  	
  	
  	
  .pipe(emitMessages)	
  
	
  	
  	
  	
  .pipe(emitEvents)	
  
	
  	
  	
  	
  .partialAttempt	
  {	
  
	
  	
  	
  	
  	
  	
  case	
  e:	
  ConnectException	
  =>	
  retryRequest	
  
	
  	
  	
  	
  	
  	
  case	
  e:	
  TimeoutException	
  =>	
  retryRequest	
  
	
  	
  	
  	
  }	
  
	
  	
  	
  	
  .map(_.merge)	
  
}	
  
def	
  retryRequest:	
  Process[Task,	
  SSEEvent]	
  =	
  {	
  
	
  	
  time.sleep(retryTime)	
  ++	
  sseStream	
  
}
Usage
sseStream(client,	
  url)	
  pipe	
  jsonToString	
  to	
  io.stdOutLines
Questions?

More Related Content

What's hot

Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Yaroslav Tkachenko
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesRomi Kuntsman
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.Vladimir Pavkin
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flinkRenato Guimaraes
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...Flink Forward
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streamsdmantula
 
RMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization ConnectorRMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization ConnectorClément OUDOT
 
Introduction of Blockchain @ Airtel Payment Bank
Introduction of Blockchain @ Airtel Payment BankIntroduction of Blockchain @ Airtel Payment Bank
Introduction of Blockchain @ Airtel Payment BankRajesh Kumar
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayJacob Park
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLsbahloul
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 

What's hot (20)

Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Javantura v3 - Logs – the missing gold mine – Franjo Žilić
Javantura v3 - Logs – the missing gold mine – Franjo ŽilićJavantura v3 - Logs – the missing gold mine – Franjo Žilić
Javantura v3 - Logs – the missing gold mine – Franjo Žilić
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
ADO.NETObjects
ADO.NETObjectsADO.NETObjects
ADO.NETObjects
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streams
 
RMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization ConnectorRMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization Connector
 
Introduction of Blockchain @ Airtel Payment Bank
Introduction of Blockchain @ Airtel Payment BankIntroduction of Blockchain @ Airtel Payment Bank
Introduction of Blockchain @ Airtel Payment Bank
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLL
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Akka streams
Akka streamsAkka streams
Akka streams
 

Viewers also liked

Unsucking Error Handling with Futures
Unsucking Error Handling with FuturesUnsucking Error Handling with Futures
Unsucking Error Handling with FuturesGaryCoady
 
Contents page analysis
Contents page analysisContents page analysis
Contents page analysisandreidanca
 
Http4s, Doobie and Circe: The Functional Web Stack
Http4s, Doobie and Circe: The Functional Web StackHttp4s, Doobie and Circe: The Functional Web Stack
Http4s, Doobie and Circe: The Functional Web StackGaryCoady
 
AMR Medicion de Agua Potable "Medidores Ultrasonicos"
AMR Medicion de Agua Potable "Medidores Ultrasonicos"AMR Medicion de Agua Potable "Medidores Ultrasonicos"
AMR Medicion de Agua Potable "Medidores Ultrasonicos"Wilmer Troconis
 
Unit 3 - Egyptian art and architecture
Unit 3 - Egyptian art and architectureUnit 3 - Egyptian art and architecture
Unit 3 - Egyptian art and architectureJaimeAlonsoEdu
 
Custom deployments with sbt-native-packager
Custom deployments with sbt-native-packagerCustom deployments with sbt-native-packager
Custom deployments with sbt-native-packagerGaryCoady
 
Puntos mes de Septiembre
Puntos mes de SeptiembrePuntos mes de Septiembre
Puntos mes de SeptiembreRPCard
 
Amber CV Feb 2017
Amber CV Feb 2017Amber CV Feb 2017
Amber CV Feb 2017Amber Leis
 
Thrust Bearing
Thrust BearingThrust Bearing
Thrust Bearingalexcostea
 
Unit 1- Carolingian art
Unit 1- Carolingian artUnit 1- Carolingian art
Unit 1- Carolingian artJaimeAlonsoEdu
 
Unit 3 - Romanesque art
Unit 3 - Romanesque artUnit 3 - Romanesque art
Unit 3 - Romanesque artJaimeAlonsoEdu
 

Viewers also liked (14)

Unsucking Error Handling with Futures
Unsucking Error Handling with FuturesUnsucking Error Handling with Futures
Unsucking Error Handling with Futures
 
Contents page analysis
Contents page analysisContents page analysis
Contents page analysis
 
Wilmer2015.01.30
Wilmer2015.01.30Wilmer2015.01.30
Wilmer2015.01.30
 
Http4s, Doobie and Circe: The Functional Web Stack
Http4s, Doobie and Circe: The Functional Web StackHttp4s, Doobie and Circe: The Functional Web Stack
Http4s, Doobie and Circe: The Functional Web Stack
 
AMR Medicion de Agua Potable "Medidores Ultrasonicos"
AMR Medicion de Agua Potable "Medidores Ultrasonicos"AMR Medicion de Agua Potable "Medidores Ultrasonicos"
AMR Medicion de Agua Potable "Medidores Ultrasonicos"
 
Unit 3 - Egyptian art and architecture
Unit 3 - Egyptian art and architectureUnit 3 - Egyptian art and architecture
Unit 3 - Egyptian art and architecture
 
Custom deployments with sbt-native-packager
Custom deployments with sbt-native-packagerCustom deployments with sbt-native-packager
Custom deployments with sbt-native-packager
 
Puntos mes de Septiembre
Puntos mes de SeptiembrePuntos mes de Septiembre
Puntos mes de Septiembre
 
Amber CV Feb 2017
Amber CV Feb 2017Amber CV Feb 2017
Amber CV Feb 2017
 
Wilmer CV
Wilmer CVWilmer CV
Wilmer CV
 
Thrust Bearing
Thrust BearingThrust Bearing
Thrust Bearing
 
Unit 1- Carolingian art
Unit 1- Carolingian artUnit 1- Carolingian art
Unit 1- Carolingian art
 
Unit 3 - Romanesque art
Unit 3 - Romanesque artUnit 3 - Romanesque art
Unit 3 - Romanesque art
 
Unit 2 - Islamic art
Unit 2 - Islamic artUnit 2 - Islamic art
Unit 2 - Islamic art
 

Similar to Streaming Data with scalaz-stream

Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture  Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture Yaroslav Tkachenko
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel ProcessingRTigger
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdfStephanie Locke
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleMariaDB plc
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupRobert Metzger
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayGrega Kespret
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objectsMikhail Girkin
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Marmagna desai
Marmagna desaiMarmagna desai
Marmagna desaijmsthakur
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservicesmarius_bogoevici
 
Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017delagoya
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
 
Prezo tooracleteam (2)
Prezo tooracleteam (2)Prezo tooracleteam (2)
Prezo tooracleteam (2)Sharma Podila
 

Similar to Streaming Data with scalaz-stream (20)

Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture  Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel Processing
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
 
Windows 8 Apps and the Outside World
Windows 8 Apps and the Outside WorldWindows 8 Apps and the Outside World
Windows 8 Apps and the Outside World
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objects
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Marmagna desai
Marmagna desaiMarmagna desai
Marmagna desai
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservices
 
Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
AWS IoT Deep Dive
AWS IoT Deep DiveAWS IoT Deep Dive
AWS IoT Deep Dive
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Prezo tooracleteam (2)
Prezo tooracleteam (2)Prezo tooracleteam (2)
Prezo tooracleteam (2)
 

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 

Streaming Data with scalaz-stream

  • 2. • Why do we want streaming APIs? • Introduction to scalaz-stream • Use case: Server-Sent Events implementation Contents
  • 3. Why do we want streaming APIs?
  • 4. Information with Indeterminate/unbounded size • Lines from a text file • Bytes from a binary file • Chunks of data from a TCP connection • TCP connections • Data from Kinesis or SQS or SNS or Kafka or… • Data from an API with paged implementation
  • 5. “Dangerous” Choices • scala.collection.Iterable
 Provides an iterator to step through items in sequence • scala.collection.immutable.Stream
 Lazily evaluated, possibly infinite list of values
  • 6. Do The Right Thing • Safe setup and cleanup • Constant memory usage • Constant stack usage • Refactor with confidence • Composable • Back-pressure
  • 7. • Creates co-data • Safe resource management • Referential transparency • Controlled asynchronous effects What is scalaz-stream
  • 9. sealed  trait  Process[+F[_],  +O] Effect Output
  • 10. case  class  Halt(cause:  Cause)  extends  Process[Nothing,  Nothing]
  • 11. case  class  Emit[+O](seq:  Seq[O])  extends  Process[Nothing,  O]
  • 12. case  class  Await[+F[_],  A,  +O](
    req:  F[A],
    rcv:  (EarlyCause  /  A)  =>  Process[F,  O]   )  extends  Process[F,  O]
  • 13. Composition Options Process1[I,  O]    -­‐  Stateful  transducer,  converts  I  =>  O  (with  state)    -­‐  Combine  with  “pipe”   Channel[F[_],  I,  O]    -­‐  Takes  I  values,  runs  function  I  =>  F[O]    -­‐  Combine  with  “through”  or  “observe”.   Sink[F[_],  I]    -­‐  Takes  I  values,  runs  function  I  =>  F[Unit]    -­‐  Add  with  “to”.
  • 14. Implementing Server-sent Events (SSE) This specification defines an API for opening an HTTP connection for receiving push notifications from a server in the form of DOM events.
  • 15. case  class  SSEEvent(eventName:  Option[String],  data:  String) data:  This  is  the  first  message.   data:  This  is  the  second  message,  it   data:  has  two  lines.   data:  This  is  the  third  message.   event:  add   data:  73857293   event:  remove   data:  2153   event:  add   data:  113411 Example streams
  • 16. We want this type:
 
 Process[Task,  SSEEvent] “A potentially infinite stream of SSE event messages”
  • 17. async.boundedQueue[A] • Items added to queue are removed in same order • Connect different asynchronous domains • Methods:
 
 def  enqueueOne(a:  A):  Task[Unit]
 
 def  dequeue:  Process[Task,  A]
  • 18. HTTP Client Implementation • Use Apache AsyncHTTPClient • Hook into onBodyPartReceived callback • Use async.boundedQueue to convert chunks into stream
  • 19. def  httpRequest(client:  AsyncHttpClient,  url:  String):        Process[Task,  ByteVector]  =  {      val  contentQueue  =  async.boundedQueue[ByteVector](10)      val  req  =  client.prepareGet(url)      req.execute(new  AsyncCompletionHandler[Unit]  {
        override  def  onBodyPartReceived(content:  HttpResponseBodyPart)  =  {
            contentQueue.enqueueOne(                  ByteVector(content.getBodyByteBuffer)              ).run              super.onBodyPartReceived(content)
        }
    })
    contentQueue.dequeue   }
  • 21. req.execute(new  AsyncCompletionHandler[Unit]  {      ...      override  def  onCompleted(r:  Response):  Unit  =  {
        logger.debug("Request  completed")
        contentQueue.close.run
    }      ...   }
  • 22. How to terminate stream with errors?
  • 23. req.execute(new  AsyncCompletionHandler[Unit]  {      ...      override  def  onThrowable(t:  Throwable):  Unit  =  {
        logger.debug("Request  failed  with  error",  t)
        contentQueue.fail(t).run
    }      ...   }
  • 25. • Split at line endings • Convert ByteVector into UTF-8 Strings • Partition by SSE “tag” (“data”, “id”, “event”, …) • Emit accumulated SSE data when blank line found
  • 26. • Split at line endings
 
 ByteVector  =>  Seq[ByteVector] • Convert ByteVector into UTF-8 Strings
 
 ByteVector  =>  String • Partition by SSE “tag” (“data”, “id”, “event”, …)
 
 String  =>  SSEMessage • Emit accumulated SSE data when blank line found
 
 SSEMessage  =>  SSEEvent
  • 27. Handling Network Errors • If a network error occurs: • Sleep a while • Set up the connection again and keep going • Append the same Process definition again!
  • 28. def  sseStream:  Process[Task,  SSEEvent]  =  {      httpRequest(client,  url)          .pipe(splitLines)          .pipe(emitMessages)          .pipe(emitEvents)          .partialAttempt  {              case  e:  ConnectException  =>  retryRequest              case  e:  TimeoutException  =>  retryRequest          }          .map(_.merge)   }   def  retryRequest:  Process[Task,  SSEEvent]  =  {      time.sleep(retryTime)  ++  sseStream   }
  • 29. Usage sseStream(client,  url)  pipe  jsonToString  to  io.stdOutLines