More Related Content Similar to SAM—streaming analytics made easy (20) More from DataWorks Summit (20) SAM—streaming analytics made easy1. 1 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM - Streaming analytics made easy
Arun Iyer, Hortonworks
aiyer@hortonworks.com
DataWorks Summit, San Jose 2018
2. 2 © Hortonworks Inc. 2011–2018. All rights reserved.
Agenda
• Overview of Streaming Analytics Manager (SAM)
• SAM Architecture and SDK
• Look at some of the new features
• Demo
• Q&A
4. 4 © Hortonworks Inc. 2011–2018. All rights reserved.
• A tool that helps users build and deploy complex streaming
analytics apps without writing a lot of code using GUI
• Open Source ASF Licensed
• https://github.com/hortonworks/streamline
What is it ?
Streaming Analytics Manager (SAM)
Key design principles
• Build stream analytics apps w/o specialized skillsets.
• Support multiple underlining streaming engine (Storm, Spark
Streaming, Flink)
• Extensibility – Provide SDK to plug in custom
sources/sinks/processors/UDFs
• Schema is a first class citizen
5. 5 © Hortonworks Inc. 2011–2018. All rights reserved.
Schema management
• A well defined schema is required for Streaming app developers to define their business
logic (like filtering, aggregations, transformations etc.) on the incoming data.
• Typically the schema and (de)serialization logic is hard coded into the streaming app
• Any changes to the schema breaks the system
• There is very little re-use of the schema across different components
6. 6 © Hortonworks Inc. 2011–2018. All rights reserved.
Schema Registry
• A shared repository of schemas that allows
applications to flexibly interact with each
other
• Avoid attaching schema to every piece of data
• Define relationship between schema versions
and compatibility policies
• Consumers and producers can evolve at
different rates
• Open Source ASF licensed
• https://github.com/hortonworks/registry
What is it ? SAM Integration
7. 7 © Hortonworks Inc. 2011–2018. All rights reserved.
App Developer
Business Analyst
Operations
Streaming Analytics Manager (SAM) – Components and user
personas
8. 8 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM is All about Doing Real-Time Analytics on the Stream
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
9. 9 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Prescriptive Analytics
• Question: What should we do right
now?
• Context: It is rainy, the driver is
been on the road for 12 hours and
he has 30 high speeding alerts over
a 3 minute window in the last 2
hours.
• Answer: Dispatch a radio call to the
Driver to slow down
10. 10 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Predictive Analytics
• Question: No violation events but what might happen that I need to be worried about?
• My data science team has a model that can predict that based on
• Weather
• Roads
• Driver HR info like driver certification status, wagePlan
• Driver timesheet info like hours, and miles logged over the last week
11. 11 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Predictive Analytics
Use SAM’s enrich/custom processors to enrich the event
with the features required for the model2
Enrich with Features
Use SAM’s projection/custom processors to
transform/normalize the streaming event and the
features required for the model
3
Transform/Normalize
Use SAM’s PMML processor to score the model for each
stream event with its required features4
Score Model
Use SAM’s rule and notification processors to alert,
notify and take action using the results of the model5
Alert / Notify / Action
Export the Spark Mllib model and import into the HDF’s
Model Registry
1 Model
Registry
12. 12 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Descriptive Analytics for Business Analysts
• A tool to create time-
series and real-time
analytics dashboards,
charts and graphs
• 30+ visualization
charts out of the box
with customization
capability
• Druid is the Analytics
Engine that powers
the Stream Insight
Module.
14. 14 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Architecture
Web server
(Jetty)
DB
SAM UI
Storage
Manager
Topology
actions
service
Topology DAG Builder
Topology Lifecycle
Manager
Storm
Runners
(translate SAM DAG to
Streaming Engine
topology)
Flink Spark
Deploy
DAG
Ambari
(cluster manager)
Streaming computation Engines
(Storm)
Service Pools
REST
API
Environ
Service
Schema
Registry
SR
Client
15. 15 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• Custom Processor - allows users to write their own business logic
16. 16 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• Multi-lang support
17. 17 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• UDAFs - compute aggregates within a window Built in functions
STDDEV
STDDEVP
VARIANCE
VARIANCEP
MEAN
MIN
MAX
SUM
COUNT
UPPER
LOWER
INITCAP
SUBSTRING
CHAR_LENGTH
CONCAT
18. 18 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• UDFs - does simple transformations Built in functions
STDDEV
STDDEVP
VARIANCE
VARIANCEP
MEAN
MIN
MAX
SUM
COUNT
UPPER
LOWER
INITCAP
SUBSTRING
CHAR_LENGTH
CONCAT
19. 19 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• Notifier - sends notifications such as Email, SMS or more complex ones that can invoke
external APIs
Built in notifiers
Email
More in future…
20. 20 © Hortonworks Inc. 2011–2018. All rights reserved.
New features in v 0.6.0
(and HDF 3.1)
21. 21 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics Manager – v0.6.0
• Test Mode
• Operations Module
• Event sampling
• Log search
• Monitoring improvements
• Other improvements
• Custom processor enhancements
• New UDFs
• Kafka 1.0 support
• Oracle 11/12 Support for SAM metadata storage
22. 22 © Hortonworks Inc. 2011–2018. All rights reserved.
• Create a Named Test Case
• Mock out the sources of the app and configure test data for each test source
• Validate the test data using the the configured Schema in the Schema Registry for each
source
• Execute the Test Case and visually see how the data looks like at each
component/processor
• Download the the output of the test
SAM Test mode
Allows Developers to test SAM app locally without deploying to cluster
23. 23 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Test mode
Test case
Results
24. 24 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Test mode – Using the REST APIs to integrate with CI pipeline
25. 25 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module
• Monitoring the Application, troubleshooting and identifying performance issues
• Troubleshooting an application through Log Search
• Troubleshooting an application through Event Sampling
Visual approach for common tasks performed by operations and developers
after deploying an app to aid the troubleshooting
27. 27 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module – Log search
28. 28 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module – Event sampling
30. 30 © Hortonworks Inc. 2011–2018. All rights reserved.
To conclude
• Develop streaming analytics apps without writing a lot of complex code
• Be agnostic of the underlying streaming engine
• Simplify the environment and complex configuration management
• Test, tune and bring apps to production faster
• Monitor, debug and troubleshoot streaming analytics applications quickly
How SAM adds value?
31. 31 © Hortonworks Inc. 2011–2018. All rights reserved.
Try it out!
• Its open source under Apache License
• https://github.com/hortonworks/streamline
• Latest release
• 0.6.0 (log search, event sampling, test mode and more)
• HDF 3.1 / 3.2
• https://groups.google.com/forum/#!forum/streamline-users
• Contributions are welcome!