Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming analytics manager

1,155 views

Published on

Develop and deploy Streaming Analytics applications visually with bindings for streaming engine and multiple source/sinks, rich set of streaming operators and operational lifecycle management. Streaming Analytics Manager makes it easy to develop, monitor streaming applications and also provides analytics of data thats being processed by streaming application.

Published in: Technology
  • Be the first to comment

Streaming analytics manager

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager - SAM Sriharsha Chintalapani Arun Mahadevan
  2. 2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Streaming at Hortonworks  Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)  First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)  Added several improvements & features into Apache Storm. Yahoo! Running 2400 nodes of Storm  Added Security and critical features/improvements to Apache Kafka  Lot of learnings from shipping Storm & Kafka from past 3 years  Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm & Kafka for past 3 years.
  3. 3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Schema Registry  Streaming Applications usually fronts with a queue such as Kafka, Kinesis, EventHub etc..  Data in Messaging Queues are Byte payloads and there is no schema associated with it.  Streaming applications developers usually looks at the data flowing and defines their processing of that data  Any change to this data, schema wise, means developers have to update their code to process the new format  Support both programmatic schema creation and managed schemas
  4. 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Schema Registry Kafka Kinesis EventHub ConsumerBytes Payload Bytes Payload Storm Spark Streaming Others… Producer
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Allows you to do event correlation, context enrichment, complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build complex streaming applications easily with minimum code
  6. 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windowing, joins, and state management is hard  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application  Securing a streaming application cluster with the right configurations is a pain
  7. 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  8. 8. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  9. 9. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight?  Provides a tool for business analysts to do descriptive analytics of the streaming data and insights using a sophisticated UI provided by Superset  Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  10. 10. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  11. 11. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  12. 12. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture
  13. 13. Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM Architecture Web server (Jetty) DB SAM UI Storage Manager Topology actions service Topology DAG Builder Topology Lifecycle Manager Storm Runners (translate SAM DAG to Streaming Engine topology) Flink Spark Flux Deploy DAG Ambari (cluster manager) Streaming computation Engines (Storm) Service Pools REST API Environ Service Schema Registry SR Client
  14. 14. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Topology lifecycle Initial DAG Constructed Extra artifacts set up Deployed Suspended Deployment Failed Deploy Kill Suspend Kill Resume Re-deploy
  15. 15. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Topology DAG Source Processor 1 Processor 2 Sink 1 Stream 2 Edge Stream 1 Stream 1 Stream 1 Sink 2 Fields: [ “a”: Int, “b”:String … ]
  16. 16. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Runner implements - Topology Actions
  17. 17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Runner implements - TopologyDAGVisitor
  18. 18. Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm runner example
  19. 19. Page27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SDK
  20. 20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor - allows users to write their own business logic
  21. 21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Multi-lang support (upcoming)
  22. 22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UADFs - compute aggregates within a window Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  23. 23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UDFs - does simple transformations Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  24. 24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notifier - sends notifications such as Email, SMS or more complex ones that can invoke external APIs Built in notifiers  Email  More in future…
  25. 25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The current release – 0.5  Manual service pool registration not requiring Ambari  Test mode to easily test out the streaming app  Kerberos and delegation token based Authentication  Authorization support with RBAC + permissions  New sources, processors and sinks Upcoming…  Extending token based authentication for other components  Support for state management in SAM  Support for other streaming engines – Flink, Spark streaming
  26. 26. Page34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.5 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!

×