Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Christian Kreuzfeld – Static vs Dynamic Stream Processing

6,771 views

Published on

Flink Forward 2015

Published in: Technology
  • Be the first to comment

Christian Kreuzfeld – Static vs Dynamic Stream Processing

  1. 1. STATIC VS DYNAMIC STREAM PROCESSING Christian Kreutzfeldt @mnxfst STATIC VS DYNAMIC STREAM PROCESSING Christian Kreutzfeldt @mnxfst
  2. 2. 1. Introduction 2. Stream Processing - First Encounter 3. Increasing number of Use Cases 4. Arising Implementation Issues 5. Requirements for Stream Processing Framework 6. Way to SPQR (+ short demo) 7. Way to Apache Flink (extension points + short demo) 8. Future (hope to come) 9. Q&A
  3. 3. Christian Kreutzfeldt (@mnxfst) Senior Software Developer & Architect at Otto Group Business Intelligence Department Tech Lead “Real-Time Stream Processing” Computer Science at University of Luebeck
  4. 4. w/ catalogue business, e-commerce and over- the-counter retail Multichannel Retail covering the entire portfolio of retail services across the value-added chain Services World’s Second-Largest Online Retailer in End-Consumer Business Europe’s Largest Online Retailer in End-Consumer Fashion & Lifestyle Business providing retail-related financial services across the value- added chain Financial Services
  5. 5. definition of business intelligence strategy BI Strategy talent recruitment & training, networking & consulting Consulting evaluation & impl. of data driven business models Business Development maintaining & providing data pools Data Pool software-as- a-service solutions SaaS Products Otto Group Business Intelligence Department driven by data, inspired by our customers
  6. 6. Otto Group Business Intelligence Department dedicated to open source stream processing framework SPQR scheduling framework for painfree agile development of your datahub Schedoscope framework for developing real-world machine learning solutions Palladium follow us on github.com/ottogroup
  7. 7. Stream Processing first steps w/ unified tracking U n i f i e d T r a c k i n g
  8. 8. Stream Processing prevent quality problems U n i f i e d T r a c k i n g Tagging Template Tagging Template Tagging Template Tagging Template
  9. 9. Stream Processing prevent quality problems U n i f i e d T r a c k i n g Tagging Template Tagging Template Tagging Template Tagging Template Event Stream Event Validator akka-based real stream processing
  10. 10. customer sessions search sessions user-agent identification dynamic profile selection dynamic stream queries Stream Processing developing project ideas
  11. 11. Umberto Salvagnin https://www.flickr.com/photos/kaibara/4688161016 (cc by 2.0) Stream Processing software development issues resource intensive use- case implementation required ops support for topology deployment and monitoring rather static implementations than highly flexible ones highly time consuming Static Topologies (Queries) Dynamic Data Highly Flexible Context
  12. 12. Stream Processing requirements to ease the pain unified runtime environment operations support support for multiple sources and sinks real stream processing easy-to-extend steep learning curve
  13. 13. Stream Processing working w/ data the business way no-code topology definition (the SQL way) self dependent, immediate deployments consistent monitoring (behavior / result retrieval) adjustment through re- deployments Dynamic Topologies (Queries) Dynamic Data Highly Flexible Context
  14. 14. Stream Processing framework decision unified runtime environment operations support support for multiple sources and sinks real stream processing easy-to-extend steep learning curve SPQR (spooker) no-code topology definition self dependent deployments consistent monitoring immediate deployments short feedback circuit
  15. 15. SPQR concepts independent library deployments into node repositories for later use library deployment configuration based pipeline descriptions zero-code topologies support for ad hoc queries, immediate adjustments and short feedback circuits ad hoc queries https://github.com/ottogroup/spqr
  16. 16. SPQR architecture
  17. 17. D E M O
  18. 18. Dynamic Stream Processing importance for (business) acceptance no-code topology definition self dependent deployments consistent monitoring immediate deployments short feedback circuit steep learning curve, focus on functionality instead of implementation, better representation no or less ops support, shorter time-to-execution, independency from tech teams, easier to use short feedback circuit, easier to adjust support people to try out new ideas, get more people to work with data streams choose representation defined by topology author as foundation for monitoring to have common understanding (topology author, ops team)
  19. 19. Dynamic Stream Processing from spqr to apache flink - it’s all there Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0) akka
  20. 20. Dynamic Stream Processing variety of ways to interact with apache flink Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0) variety to message types (request/response) available to interact with job manager / cluster: ● RequestNumberRegisteredTaskManager ● RequestTotalNumberOfSlots ● SubmitJob ● CancelJob ● RequestPartitionState ● RequestJobStatus ● RequestRunningJobs ● RequestRunningJobsStatus ● RequestJob ● RequestRegisteredTaskManagers ● RequestStackTrace ● RequestJobManagerStatus ● AccumulatorMessage (RequestAccumulatorResultsStringified,...) ● ...
  21. 21. Apache Flink short feedback circuit & consistent monitoring (impl) Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0) akka FlinkMetricsCollector RunningJobsManagerspawns queries JobManager JobMetricsCollector spawns for each job queries JobManager
  22. 22. Apache Flink short feedback circuit & consistent monitoring (impl) Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0) akka public void preStart() throws Exception { context().system().scheduler().schedule( FiniteDuration.Zero(), FiniteDuration.apply(5, TimeUnit.SECONDS), this.remoteJobManagerRef, new RequestAccumulatorResults(this.jobId), context().dispatcher(), getSelf() ); } AccumulatorResultsFound public void preStart() throws Exception { context().system().scheduler().schedule( FiniteDuration.Zero(), FiniteDuration.apply(5, TimeUnit.SECONDS), this.remoteJobManagerRef, JobManagerMessages.getRequestRunningJobsStatus(), context().dispatcher(), getSelf() ); } receive RunningJobsStatus extract job identifier start job metrics collector RunningJobsManager JobMetricsCollector
  23. 23. Apache Flink metrics retrieval through accumulators D E M O
  24. 24. https://nifi.apache.org/ Apache Flink how to move on deploy metrics under construction
  25. 25. Apache Flink topology definition & deployments (integration points) akka Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0) no-code topology definition self dependent deployments immediate deployments expects code requires far too much framework modifications the place to be
  26. 26. https://nifi.apache.org/ metricsdeploy Apache Flink relevance Static Data Static Queries Static Data Dynamic Queries Dynamic Data Static Queries Dynamic Data Dynamic Queries SQL
  27. 27. https://nifi.apache.org/ metricsdeploy Apache Flink apache zeppelin points the right direction Static Data Static Queries Static Data Dynamic Queries Dynamic Data Static Queries Dynamic Data Dynamic Queries SQL
  28. 28. http://www.ottogroup.com/en/karriere/ W e are hiring!

×