Reactive Streams: Handling Data-Flow the Reactive Way

28,937 views

Published on

Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.

Published in: Software, Technology

Reactive Streams: Handling Data-Flow the Reactive Way

  1. 1. Dr. Roland Kuhn Akka Tech Lead @rolandkuhn Reactive Streams Handling Data-Flows the Reactive Way
  2. 2. Introduction: Streams
  3. 3. What is a Stream? • ephemeral flow of data • focused on describing transformation • possibly unbounded in size 3
  4. 4. Common uses of Streams • bulk data transfer • real-time data sources • batch processing of large data sets • monitoring and analytics 4
  5. 5. What is special about Reactive Streams?
  6. 6. Reactive  Applications The Four Reactive Traits 6 http://reactivemanifesto.org/
  7. 7. Needed: Asynchrony • Resilience demands it: • encapsulation • isolation • Scalability demands it: • distribution across nodes • distribution across cores 7
  8. 8. Many Kinds of Async Boundaries 8
  9. 9. Many Kinds of Async Boundaries • between different applications 8
  10. 10. Many Kinds of Async Boundaries • between different applications • between network nodes 8
  11. 11. Many Kinds of Async Boundaries • between different applications • between network nodes • between CPUs 8
  12. 12. Many Kinds of Async Boundaries • between different applications • between network nodes • between CPUs • between threads 8
  13. 13. Many Kinds of Async Boundaries • between different applications • between network nodes • between CPUs • between threads • between actors 8
  14. 14. The Problem: ! Getting Data across an Async Boundary
  15. 15. Possible Solutions 10
  16. 16. Possible Solutions • the Traditional way: blocking calls 10
  17. 17. Possible Solutions 10
  18. 18. Possible Solutions ! • the Push way: buffering and/or dropping 11
  19. 19. Possible Solutions 11
  20. 20. Possible Solutions ! ! • the Reactive way:
 non-blocking & non-dropping & bounded 12
  21. 21. How do we achieve that?
  22. 22. Supply and Demand • data items flow downstream • demand flows upstream • data items flow only when there is demand • recipient is in control of incoming data rate • data in flight is bounded by signaled demand 14 Publisher Subscriber data demand
  23. 23. Dynamic Push–Pull • “push” behavior when consumer is faster • “pull” behavior when producer is faster • switches automatically between these • batching demand allows batching data 15 Publisher Subscriber data demand
  24. 24. Explicit Demand: Tailored Flow Control 16 demand data splitting the data means merging the demand
  25. 25. Explicit Demand: Tailored Flow Control 17 merging the data means splitting the demand
  26. 26. Reactive Streams • asynchronous non-blocking data flow • asynchronous non-blocking demand flow • minimal coordination and contention • message passing allows for distribution • across applications • across nodes • across CPUs • across threads • across actors 18
  27. 27. Are Streams Collections?
  28. 28. What is a Collection? 20
  29. 29. What is a Collection? • Oxford Dictionary: • “a group of things or people” 20
  30. 30. What is a Collection? • Oxford Dictionary: • “a group of things or people” • wikipedia: • “a grouping of some variable number of data items” 20
  31. 31. What is a Collection? • Oxford Dictionary: • “a group of things or people” • wikipedia: • “a grouping of some variable number of data items” • backbone.js: • “collections are simply an ordered set of models” 20
  32. 32. What is a Collection? • Oxford Dictionary: • “a group of things or people” • wikipedia: • “a grouping of some variable number of data items” • backbone.js: • “collections are simply an ordered set of models” • java.util.Collection: • definite size, provides an iterator, query membership 20
  33. 33. User Expectations • an Iterator is expected to visit all elements
 (especially with immutable collections) • x.head + x.tail == x • the contents does not depend on who is processing the collection • the contents does not depend on when the processing happens
 (especially with immutable collections) 21
  34. 34. Streams have Unexpected Properties • the observed sequence depends on • … when the observer subscribed to the stream • … whether the observer can process fast enough • … whether the streams flows fast enough 22
  35. 35. Streams are not Collections! • java.util.stream:
 Stream is not derived from Collection
 “Streams differ from Coll’s in several ways” • no storage • functional in nature • laziness seeking • possibly unbounded • consumable 23
  36. 36. Streams are not Collections! • a collection can be streamed • a stream observer can create a collection • … but saying that a Stream is just a lazy Collection evokes the wrong associations 24
  37. 37. So, Reactive Streams: why not just java.util.stream.Stream?
  38. 38. Java 8 Stream 26 import java.util.stream.*; ! // get some stream final Stream<Integer> s = Stream.of(1, 2, 3); // describe transformation final Stream<String> s2 = s.map(i -> "a" + i); // make a pull collection s2.iterator(); // or alternatively push it somewhere s2.forEach(i -> System.out.println(i)); // (need to pick one, Stream is consumable)
  39. 39. Java 8 Stream • provides a DSL for describing transformation • introduces staged computation
 (but does not allow reuse) • prescribes an eager model of execution • offers either push or pull, chosen statically 27
  40. 40. What about RxJava?
  41. 41. RxJava 29 import rx.Observable; import rx.Observable.*; ! // get some stream source final Observable<Integer> obs = range(1, 3); // describe transformation final Observable<String> obs2 = obs.map(i -> "b" + i); // and use it twice obs2.subscribe(i -> System.out.println(i)); obs2.filter(i -> i.equals("b2")) .subscribe(i -> System.out.println(i));
  42. 42. RxJava • implements pure “push” model • includes extensive DSL for transformations • only allows blocking for back pressure • currently uses unbounded buffering for crossing an async boundary • work on distributed Observables sparked participation in Reactive Streams 30
  43. 43. The Reactive Streams Project
  44. 44. Participants • Engineers from • Netflix • Oracle • Pivotal • Red Hat • Twitter • Typesafe • Individuals like Doug Lea and Todd Montgomery 32
  45. 45. The Motivation • all participants had the same basic problem • all are building tools for their community • a common solution benefits everybody • interoperability to make best use of efforts • e.g. use Reactor data store driver with Akka transformation pipeline and Rx monitoring to drive a vert.x REST API (purely made up, at this point) 33 see also Jon Brisbin’s post on “Tribalism as a Force for Good”
  46. 46. Recipe for Success • minimal interfaces • rigorous specification of semantics • full TCK for verification of implementation • complete freedom for many idiomatic APIs 34
  47. 47. The Meat 35 trait Publisher[T] { def subscribe(sub: Subscriber[T]): Unit } trait Subscription { def requestMore(n: Int): Unit def cancel(): Unit } trait Subscriber[T] { def onSubscribe(s: Subscription): Unit def onNext(elem: T): Unit def onError(thr: Throwable): Unit def onComplete(): Unit }
  48. 48. The Sauce • all calls on Subscriber must dispatch async • all calls on Subscription must not block • Publisher is just there to create Subscriptions 36
  49. 49. How does it Connect? 37 SubscriberPublisher subscribe onSubscribeSubscription requestMore onNext
  50. 50. Akka Streams
  51. 51. Akka Streams • powered by Akka Actors • execution • distribution • resilience • type-safe streaming through Actors with bounded buffering 39
  52. 52. Basic Akka Example 40 implicit val system = ActorSystem("Sys") val mat = FlowMaterializer(...) ! Flow(text.split("s").toVector). map(word => word.toUpperCase). foreach(tranformed => println(tranformed)). onComplete(mat) { case Success(_) => system.shutdown() case Failure(e) => println("Failure: " + e.getMessage) system.shutdown() }
  53. 53. More Advanced Akka Example 41 val s = Source.fromFile("log.txt", "utf-8") Flow(s.getLines()). groupBy { case LogLevelPattern(level) => level case other => "OTHER" }. foreach { case (level, producer) => val out = new PrintWriter(level+".txt") Flow(producer). foreach(line => out.println(line)). onComplete(mat)(_ => Try(out.close())) }. onComplete(mat)(_ => Try(s.close()))
  54. 54. Java 8 Example 42 final ActorSystem system = ActorSystem.create("Sys"); final MaterializerSettings settings = MaterializerSettings.create(); final FlowMaterializer materializer = FlowMaterializer.create(settings, system); ! final String[] lookup = { "a", "b", "c", "d", "e", "f" }; final Iterable<Integer> input = Arrays.asList(0, 1, 2, 3, 4, 5); ! Flow.create(input).drop(2).take(3). // leave 2, 3, 4 map(elem -> lookup[elem]). // translate to "c","d","e" filter(elem -> !elem.equals("c")). // filter out the "c" grouped(2). // make into a list mapConcat(list -> list). // flatten the list fold("", (acc, elem) -> acc + elem). // accumulate into "de" foreach(elem -> System.out.println(elem)). // print it consume(materializer);
  55. 55. Akka TCP Echo Server 43 val future = IO(StreamTcp) ? Bind(addr, ...) future.onComplete { case Success(server: TcpServerBinding) => println("listen at "+server.localAddress) ! Flow(server.connectionStream).foreach{ c => println("client from "+c.remoteAddress) c.inputStream.produceTo(c.outputStream) }.consume(mat) ! case Failure(ex) => println("cannot bind: "+ex) system.shutdown() }
  56. 56. Akka HTTP Sketch (NOT REAL CODE) 44 val future = IO(Http) ? RequestChannelSetup(...) future.onSuccess { case ch: HttpRequestChannel => val p = ch.processor[Int] val body = Flow(new File(...)).toProducer(mat) Flow(HttpRequest(method = POST, uri = "http://ex.amp.le/42", entity = body) -> 42) .produceTo(mat, p) Flow(p).map { case (resp, token) => val out = FileSink(new File(...)) Flow(resp.entity.data).produceTo(mat, out) }.consume(mat) }
  57. 57. Closing Remarks
  58. 58. Current State • Early Preview is available:
 "org.reactivestreams" % "reactive-streams-spi" % "0.2"
 "com.typesafe.akka" %% "akka-stream-experimental" % "0.3" • check out the Activator template
 "Akka Streams with Scala!"
 (https://github.com/typesafehub/activator-akka-stream-scala) 46
  59. 59. Next Steps • we work towards inclusion in future JDK • try it out and give feedback! • http://reactive-streams.org/ • https://github.com/reactive-streams 47
  60. 60. ©Typesafe 2014 – All Rights Reserved

×