1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
 What is it?
• A platform to design, develop, deploy and manage streaming analytics applications using a drag
drop visualize paradigm in minutes
• Allows you to do event correlation, context enrichment, complex pattern matching, analytical
aggregations and alerts/notifications when insights are discovered.
• Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g:
Storm, Spark Streaming, Flink)
• Extensibility is a first class citizen (add sinks, processors, sources as needed)
 Guiding Principle
– Build complex streaming applications easily with minimum code
Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Complexities in building streaming applications
 New streaming engines and APIs
 Implementing windowing, joins, and state management is hard
 Interaction with external services such as HBase, Hive, HDFS etc
 Deploying with all the necessary configuration files
 Operations around the streaming application including monitoring and metrics
 Debugging streaming application
 Securing a streaming application cluster with the right configurations is a pain
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key challenges that SAM is trying to solve
 Building streaming applications requires specialized skillsets that most enterprise
organizations don’t have today
 Streaming applications require considerable amount of programming, testing and tuning
before deploying to production which takes a significant amount of time
 Key streaming primitives such as joining/splitting streams, aggregations over a window of
time and pattern matching are difficult to implement
 People don’t prefer to code to build complex streaming applications
 No true open source project today solves all of the above challenges
 People don’t care about the streaming engine that powers streaming applications so much as
long challenges above are addressed and doesn’t force them into vendor lock in.
Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Value Proposition
 A platform using a graphical programming paradigm allowing users to focus on business
logic and easily build and deploy complex streaming applications
 Makes it easier for users to import other service configurations and use them in streaming
applications
 Provides abstractions on the streaming engine used. The abstraction provides the ability to
plugin in open source streaming engines (Storm, Spark Streaming, Flink, etc.)
 Decouple schema from the streaming application via integration with Schema Registry
 Provide operational metrics to monitor streaming application via pluggable metrics storage.
E.g. Ambari, OpenTSDB
 Streaming Insights, visualize the data that’s being processed by streaming application
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Key Capabilities
 Building streaming apps using the following primitives
– Connecting to Streams
– Transformations
– Filtering and Routing
– Joining Streams
– Forking Streams
– Aggregations over Windows
– Rules Engine
– Notifications / Alerts
– Streaming Analytics
 Deploying and monitoring streaming apps
– Deploying the streaming app on supported streaming engines
– Monitoring the streaming app with metrics
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Components and User
Personas
Distributed Streaming
Computation Engine
(Different Streaming Engines that powers higher level services to build stream application. )
App Developer
Business Analyst
Operations
Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Service Pools and Environments
Stream App 1 Stream App 2
• Service Pool
• A pool of services that can be
used to create different
environments
• Environment
• Consists of a set of services
you choose from 1 or more
service pools.
• Stream App
• The environment is then
associated with a Stream
Application which then uses the
services in that environment for
various configuration
Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics powered by Druid and Superset
 What is Stream Insight?
 Provides a tool for business analysts to do descriptive analytics of the streaming data and
insights using a sophisticated UI provided by Superset
 Tooling to create time-series and real-time analytics dashboards, charts and graphs and
create rich customizable visualization of data
Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM Architecture
Web server
(Jetty)
DB
SAM UI
Storage
Manager
Topology
actions
service
Topology DAG Builder
Topology Lifecycle
Manager
Storm
Runners
(translate SAM DAG
to Streaming Engine
topology)
Flink Spark
Flux
Deploy
DAG
Ambari
(cluster manager)
Streaming computation Engines
(Storm)
Service
Pools
REST
API
Environ
Service
Schema
Registry
SR
Client
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology lifecycle
Initial
DAG
Constructed
Extra artifacts
set up
Deployed
Suspended
Deployment
Failed
Deploy
Kill
Suspend
Kill
Resume
Re-deploy
Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology DAG
Source
Processor 1
Processor 2
Sink 1
Stream 2
Edge
Stream 1
Stream 1
Stream 1
Sink 2
Fields: [
“a”: Int,
“b”:String
…
]
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - Topology Actions
Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - TopologyDAGVisitor
Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm runner example
Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SDK
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Custom Processor - allows users to write their own business logic
Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Multi-lang support (upcoming)
Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 UADFs - compute aggregates within a window
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 UDFs - does simple transformations
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Notifier - sends notifications such as Email, SMS or more complex ones that can
invoke external APIs
Built in notifiers
 Email
 More in future…
Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The current release – 0.5
 Manual service pool registration not requiring Ambari
 Test mode to easily test out the streaming app
 Kerberos and delegation token based Authentication
 Authorization support with RBAC + permissions
 New sources, processors and sinks
Upcoming…
 Extending token based authentication for other components
 Support for state management in SAM
 Support for other streaming engines – Flink, Spark streaming
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/streamline
 Apache incubation soon
 SAM 0.5 is out!
 https://groups.google.com/forum/#!forum/streamline-users
 Contributions are welcome!

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min without writing any code

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Allows you to do event correlation, context enrichment, complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build complex streaming applications easily with minimum code
  • 3.
    Page3 © HortonworksInc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windowing, joins, and state management is hard  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application  Securing a streaming application cluster with the right configurations is a pain
  • 4.
    Page4 © HortonworksInc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  • 5.
    Page5 © HortonworksInc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  A platform using a graphical programming paradigm allowing users to focus on business logic and easily build and deploy complex streaming applications  Makes it easier for users to import other service configurations and use them in streaming applications  Provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark Streaming, Flink, etc.)  Decouple schema from the streaming application via integration with Schema Registry  Provide operational metrics to monitor streaming application via pluggable metrics storage. E.g. Ambari, OpenTSDB  Streaming Insights, visualize the data that’s being processed by streaming application
  • 6.
    Page6 © HortonworksInc. 2011 – 2016. All Rights Reserved SAM’s Key Capabilities  Building streaming apps using the following primitives – Connecting to Streams – Transformations – Filtering and Routing – Joining Streams – Forking Streams – Aggregations over Windows – Rules Engine – Notifications / Alerts – Streaming Analytics  Deploying and monitoring streaming apps – Deploying the streaming app on supported streaming engines – Monitoring the streaming app with metrics
  • 7.
    Page7 © HortonworksInc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  • 8.
    Page8 © HortonworksInc. 2011 – 2016. All Rights Reserved SAM’s Service Pools and Environments Stream App 1 Stream App 2 • Service Pool • A pool of services that can be used to create different environments • Environment • Consists of a set of services you choose from 1 or more service pools. • Stream App • The environment is then associated with a Stream Application which then uses the services in that environment for various configuration
  • 9.
    Page9 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 10.
    Page10 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 11.
    Page12 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 12.
    Page13 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 13.
    Page14 © HortonworksInc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight?  Provides a tool for business analysts to do descriptive analytics of the streaming data and insights using a sophisticated UI provided by Superset  Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  • 14.
    Page15 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 15.
    Page16 © HortonworksInc. 2011 – 2016. All Rights Reserved Architecture
  • 16.
    Page17 © HortonworksInc. 2011 – 2016. All Rights Reserved SAM Architecture Web server (Jetty) DB SAM UI Storage Manager Topology actions service Topology DAG Builder Topology Lifecycle Manager Storm Runners (translate SAM DAG to Streaming Engine topology) Flink Spark Flux Deploy DAG Ambari (cluster manager) Streaming computation Engines (Storm) Service Pools REST API Environ Service Schema Registry SR Client
  • 17.
    Page18 © HortonworksInc. 2011 – 2016. All Rights Reserved Topology lifecycle Initial DAG Constructed Extra artifacts set up Deployed Suspended Deployment Failed Deploy Kill Suspend Kill Resume Re-deploy
  • 18.
    Page19 © HortonworksInc. 2011 – 2016. All Rights Reserved Topology DAG Source Processor 1 Processor 2 Sink 1 Stream 2 Edge Stream 1 Stream 1 Stream 1 Sink 2 Fields: [ “a”: Int, “b”:String … ]
  • 19.
    Page20 © HortonworksInc. 2011 – 2016. All Rights Reserved Runner implements - Topology Actions
  • 20.
    Page21 © HortonworksInc. 2011 – 2016. All Rights Reserved Runner implements - TopologyDAGVisitor
  • 21.
    Page22 © HortonworksInc. 2011 – 2016. All Rights Reserved Storm runner example
  • 22.
    Page23 © HortonworksInc. 2011 – 2016. All Rights Reserved SDK
  • 23.
    Page24 © HortonworksInc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor - allows users to write their own business logic
  • 24.
    Page25 © HortonworksInc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Multi-lang support (upcoming)
  • 25.
    Page26 © HortonworksInc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UADFs - compute aggregates within a window Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 26.
    Page27 © HortonworksInc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UDFs - does simple transformations Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 27.
    Page28 © HortonworksInc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notifier - sends notifications such as Email, SMS or more complex ones that can invoke external APIs Built in notifiers  Email  More in future…
  • 28.
    Page29 © HortonworksInc. 2011 – 2016. All Rights Reserved The current release – 0.5  Manual service pool registration not requiring Ambari  Test mode to easily test out the streaming app  Kerberos and delegation token based Authentication  Authorization support with RBAC + permissions  New sources, processors and sinks Upcoming…  Extending token based authentication for other components  Support for state management in SAM  Support for other streaming engines – Flink, Spark streaming
  • 29.
    30 © HortonworksInc. 2011 – 2016. All Rights Reserved Demo
  • 30.
    Page31 © HortonworksInc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.5 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!