Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stream Processing In Go

1,650 views

Published on

In this presentation for Gophercon India 2016, we discuss about a generalized idea to easily compose any stream of data by implementing only a one method interface for data collections.

Published in: Software
  • Be the first to comment

Stream Processing In Go

  1. 1. 1 Stream Processing In Go Khosrow Afroozeh Sunil Sayyaparaju
  2. 2. 2 Streams are the Norm ● Need for Business Analytics generates endless streams of data ● Horizontal Scaling adds to the number of streams ● Stream variety is on the rise ● Streams need to be composed and co-processed
  3. 3. 3 Stream ● Arrays ● Slices ● Channels ● Buffers ● Files ● Database Queries ● ...
  4. 4. 4 Stream Elements No Generics In Go, so stream elements are boxed objects: interface{} ● There is no type-safety for generic stream processing. ● Not a big deal really, Schemaless datasources return interfaces anyway. ● It can be easily managed by runtime type- checking in the first step of the pipeline.
  5. 5. 5 Classic Collections
  6. 6. 6 Traditional Compositions 1 stream 1 <Record> stream 2 <Cloud> stream1.Join(stream2).Filter(...) API Interface Problem
  7. 7. 7 Traditional Compositions 2 stream 1 <Record> stream 2 <Cloud> Join(stream1, stream2) Lots of Gophers Needed for Pipelining, Signature Problem Still Unsolved Filter(stream3, ...) stream3
  8. 8. 8 Problem ● Don’t want to code1 unless absolutely necessary ● Don’t want to repeat ourselves ● More code leads to more maintenance and testing 1 not on company hours at least! YMMV.
  9. 9. 9 Abstraction Goals ● Data processing should be decoupled from data structures. ● Compositions should happen on data, not data structures. Note: <T> denotes type. This is not valid Go code. Note: f and m are functions, e.g: f(value interface{}) bool m(value interface{}) interface{}
  10. 10. 10 Abstraction Goals Cont’d ● Data should not be transported during transformation, unless necessary.
  11. 11. 11 Transducers1 1 Idea inspired by Clojure. Fair enough, they got inspired by channels ;)
  12. 12. 12 Transducers Impl.
  13. 13. 13 Reducer ● Responsible for chaining of the pipeline: stream → t1 → t2 → … → tn → reducer → result
  14. 14. 14 Transducers Impl. Example
  15. 15. 15 Transduction ● Flush is used when some function in the chain would like to eject the operation. ● When all the data in the stream has been processed or a flush has been requested, method Complete() is called to capture the states in the stateful reducers. Chain of functions call each other: f, m => m(f(val))
  16. 16. 16 Example
  17. 17. 17 Observations ● Cons – No compile-time type safety – Tricky to parallelize ● Pros – Fewer Go-routines for long pipelines – Fewer synchronizations For channels – Potentially uses less memory – Decoupled processing logic from data structures – Better compositions – More readable
  18. 18. 18 Thank You Khosrow Afroozeh: ● @parshua ● khosrow@aerospike.com Sunil Sayyaparajou ● sunil@aerospike.com

×