SlideShare a Scribd company logo
1 of 32
1 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM - Streaming analytics made easy
Arun Iyer, Hortonworks
aiyer@hortonworks.com
DataWorks Summit, San Jose 2018
2 © Hortonworks Inc. 2011–2018. All rights reserved.
Agenda
• Overview of Streaming Analytics Manager (SAM)
• SAM Architecture and SDK
• Look at some of the new features
• Demo
• Q&A
3 © Hortonworks Inc. 2011–2018. All rights reserved.
Overview
4 © Hortonworks Inc. 2011–2018. All rights reserved.
• A tool that helps users build and deploy complex streaming
analytics apps without writing a lot of code using GUI
• Open Source ASF Licensed
• https://github.com/hortonworks/streamline
What is it ?
Streaming Analytics Manager (SAM)
Key design principles
• Build stream analytics apps w/o specialized skillsets.
• Support multiple underlining streaming engine (Storm, Spark
Streaming, Flink)
• Extensibility – Provide SDK to plug in custom
sources/sinks/processors/UDFs
• Schema is a first class citizen
5 © Hortonworks Inc. 2011–2018. All rights reserved.
Schema management
• A well defined schema is required for Streaming app developers to define their business
logic (like filtering, aggregations, transformations etc.) on the incoming data.
• Typically the schema and (de)serialization logic is hard coded into the streaming app
• Any changes to the schema breaks the system
• There is very little re-use of the schema across different components
6 © Hortonworks Inc. 2011–2018. All rights reserved.
Schema Registry
• A shared repository of schemas that allows
applications to flexibly interact with each
other
• Avoid attaching schema to every piece of data
• Define relationship between schema versions
and compatibility policies
• Consumers and producers can evolve at
different rates
• Open Source ASF licensed
• https://github.com/hortonworks/registry
What is it ? SAM Integration
7 © Hortonworks Inc. 2011–2018. All rights reserved.
App Developer
Business Analyst
Operations
Streaming Analytics Manager (SAM) – Components and user
personas
8 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM is All about Doing Real-Time Analytics on the Stream
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
9 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Prescriptive Analytics
• Question: What should we do right
now?
• Context: It is rainy, the driver is
been on the road for 12 hours and
he has 30 high speeding alerts over
a 3 minute window in the last 2
hours.
• Answer: Dispatch a radio call to the
Driver to slow down
10 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Predictive Analytics
• Question: No violation events but what might happen that I need to be worried about?
• My data science team has a model that can predict that based on
• Weather
• Roads
• Driver HR info like driver certification status, wagePlan
• Driver timesheet info like hours, and miles logged over the last week
11 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Predictive Analytics
Use SAM’s enrich/custom processors to enrich the event
with the features required for the model2
Enrich with Features
Use SAM’s projection/custom processors to
transform/normalize the streaming event and the
features required for the model
3
Transform/Normalize
Use SAM’s PMML processor to score the model for each
stream event with its required features4
Score Model
Use SAM’s rule and notification processors to alert,
notify and take action using the results of the model5
Alert / Notify / Action
Export the Spark Mllib model and import into the HDF’s
Model Registry
1 Model
Registry
12 © Hortonworks Inc. 2011–2018. All rights reserved.
Real-Time Descriptive Analytics for Business Analysts
• A tool to create time-
series and real-time
analytics dashboards,
charts and graphs
• 30+ visualization
charts out of the box
with customization
capability
• Druid is the Analytics
Engine that powers
the Stream Insight
Module.
13 © Hortonworks Inc. 2011–2018. All rights reserved.
Architecture
14 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Architecture
Web server
(Jetty)
DB
SAM UI
Storage
Manager
Topology
actions
service
Topology DAG Builder
Topology Lifecycle
Manager
Storm
Runners
(translate SAM DAG to
Streaming Engine
topology)
Flink Spark
Deploy
DAG
Ambari
(cluster manager)
Streaming computation Engines
(Storm)
Service Pools
REST
API
Environ
Service
Schema
Registry
SR
Client
15 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• Custom Processor - allows users to write their own business logic
16 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• Multi-lang support
17 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• UDAFs - compute aggregates within a window Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
18 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• UDFs - does simple transformations Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
19 © Hortonworks Inc. 2011–2018. All rights reserved.
Extensibility with SAM SDK
• Notifier - sends notifications such as Email, SMS or more complex ones that can invoke
external APIs
Built in notifiers
 Email
 More in future…
20 © Hortonworks Inc. 2011–2018. All rights reserved.
New features in v 0.6.0
(and HDF 3.1)
21 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics Manager – v0.6.0
• Test Mode
• Operations Module
• Event sampling
• Log search
• Monitoring improvements
• Other improvements
• Custom processor enhancements
• New UDFs
• Kafka 1.0 support
• Oracle 11/12 Support for SAM metadata storage
22 © Hortonworks Inc. 2011–2018. All rights reserved.
• Create a Named Test Case
• Mock out the sources of the app and configure test data for each test source
• Validate the test data using the the configured Schema in the Schema Registry for each
source
• Execute the Test Case and visually see how the data looks like at each
component/processor
• Download the the output of the test
SAM Test mode
Allows Developers to test SAM app locally without deploying to cluster
23 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Test mode
Test case
Results
24 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Test mode – Using the REST APIs to integrate with CI pipeline
25 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module
• Monitoring the Application, troubleshooting and identifying performance issues
• Troubleshooting an application through Log Search
• Troubleshooting an application through Event Sampling
Visual approach for common tasks performed by operations and developers
after deploying an app to aid the troubleshooting
26 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module
27 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module – Log search
28 © Hortonworks Inc. 2011–2018. All rights reserved.
SAM Operations module – Event sampling
29 © Hortonworks Inc. 2011–2018. All rights reserved.
Demo
30 © Hortonworks Inc. 2011–2018. All rights reserved.
To conclude
• Develop streaming analytics apps without writing a lot of complex code
• Be agnostic of the underlying streaming engine
• Simplify the environment and complex configuration management
• Test, tune and bring apps to production faster
• Monitor, debug and troubleshoot streaming analytics applications quickly
How SAM adds value?
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Try it out!
• Its open source under Apache License
• https://github.com/hortonworks/streamline
• Latest release
• 0.6.0 (log search, event sampling, test mode and more)
• HDF 3.1 / 3.2
• https://groups.google.com/forum/#!forum/streamline-users
• Contributions are welcome!
32 © Hortonworks Inc. 2011–2018. All rights reserved.
Thank you

More Related Content

What's hot

Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
An elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkAn elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache Flink
DataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
DataWorks Summit
 

What's hot (20)

Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
An elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkAn elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache Flink
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analytics
 
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 

Similar to SAM—streaming analytics made easy

Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
The Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraThe Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest Mentora
SOASTA
 
The Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraThe Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest Mentora
SOASTA
 

Similar to SAM—streaming analytics made easy (20)

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testing
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
 
Add Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAdd Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring Toolkit
 
Performance testing - Accenture
Performance testing - AccenturePerformance testing - Accenture
Performance testing - Accenture
 
Introducing RTView Enterprise Monitor 1.5
Introducing RTView Enterprise Monitor 1.5 Introducing RTView Enterprise Monitor 1.5
Introducing RTView Enterprise Monitor 1.5
 
The Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraThe Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest Mentora
 
The Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraThe Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest Mentora
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

SAM—streaming analytics made easy

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. SAM - Streaming analytics made easy Arun Iyer, Hortonworks aiyer@hortonworks.com DataWorks Summit, San Jose 2018
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Agenda • Overview of Streaming Analytics Manager (SAM) • SAM Architecture and SDK • Look at some of the new features • Demo • Q&A
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Overview
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. • A tool that helps users build and deploy complex streaming analytics apps without writing a lot of code using GUI • Open Source ASF Licensed • https://github.com/hortonworks/streamline What is it ? Streaming Analytics Manager (SAM) Key design principles • Build stream analytics apps w/o specialized skillsets. • Support multiple underlining streaming engine (Storm, Spark Streaming, Flink) • Extensibility – Provide SDK to plug in custom sources/sinks/processors/UDFs • Schema is a first class citizen
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Schema management • A well defined schema is required for Streaming app developers to define their business logic (like filtering, aggregations, transformations etc.) on the incoming data. • Typically the schema and (de)serialization logic is hard coded into the streaming app • Any changes to the schema breaks the system • There is very little re-use of the schema across different components
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Schema Registry • A shared repository of schemas that allows applications to flexibly interact with each other • Avoid attaching schema to every piece of data • Define relationship between schema versions and compatibility policies • Consumers and producers can evolve at different rates • Open Source ASF licensed • https://github.com/hortonworks/registry What is it ? SAM Integration
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. App Developer Business Analyst Operations Streaming Analytics Manager (SAM) – Components and user personas
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. Real-Time Prescriptive Analytics • Question: What should we do right now? • Context: It is rainy, the driver is been on the road for 12 hours and he has 30 high speeding alerts over a 3 minute window in the last 2 hours. • Answer: Dispatch a radio call to the Driver to slow down
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. Real-Time Predictive Analytics • Question: No violation events but what might happen that I need to be worried about? • My data science team has a model that can predict that based on • Weather • Roads • Driver HR info like driver certification status, wagePlan • Driver timesheet info like hours, and miles logged over the last week
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Real-Time Predictive Analytics Use SAM’s enrich/custom processors to enrich the event with the features required for the model2 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 3 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features4 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model5 Alert / Notify / Action Export the Spark Mllib model and import into the HDF’s Model Registry 1 Model Registry
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Real-Time Descriptive Analytics for Business Analysts • A tool to create time- series and real-time analytics dashboards, charts and graphs • 30+ visualization charts out of the box with customization capability • Druid is the Analytics Engine that powers the Stream Insight Module.
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. Architecture
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Architecture Web server (Jetty) DB SAM UI Storage Manager Topology actions service Topology DAG Builder Topology Lifecycle Manager Storm Runners (translate SAM DAG to Streaming Engine topology) Flink Spark Deploy DAG Ambari (cluster manager) Streaming computation Engines (Storm) Service Pools REST API Environ Service Schema Registry SR Client
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Extensibility with SAM SDK • Custom Processor - allows users to write their own business logic
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. Extensibility with SAM SDK • Multi-lang support
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Extensibility with SAM SDK • UDAFs - compute aggregates within a window Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. Extensibility with SAM SDK • UDFs - does simple transformations Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. Extensibility with SAM SDK • Notifier - sends notifications such as Email, SMS or more complex ones that can invoke external APIs Built in notifiers  Email  More in future…
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. New features in v 0.6.0 (and HDF 3.1)
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager – v0.6.0 • Test Mode • Operations Module • Event sampling • Log search • Monitoring improvements • Other improvements • Custom processor enhancements • New UDFs • Kafka 1.0 support • Oracle 11/12 Support for SAM metadata storage
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. • Create a Named Test Case • Mock out the sources of the app and configure test data for each test source • Validate the test data using the the configured Schema in the Schema Registry for each source • Execute the Test Case and visually see how the data looks like at each component/processor • Download the the output of the test SAM Test mode Allows Developers to test SAM app locally without deploying to cluster
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Test mode Test case Results
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Test mode – Using the REST APIs to integrate with CI pipeline
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Operations module • Monitoring the Application, troubleshooting and identifying performance issues • Troubleshooting an application through Log Search • Troubleshooting an application through Event Sampling Visual approach for common tasks performed by operations and developers after deploying an app to aid the troubleshooting
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Operations module
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Operations module – Log search
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. SAM Operations module – Event sampling
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Demo
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. To conclude • Develop streaming analytics apps without writing a lot of complex code • Be agnostic of the underlying streaming engine • Simplify the environment and complex configuration management • Test, tune and bring apps to production faster • Monitor, debug and troubleshoot streaming analytics applications quickly How SAM adds value?
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Try it out! • Its open source under Apache License • https://github.com/hortonworks/streamline • Latest release • 0.6.0 (log search, event sampling, test mode and more) • HDF 3.1 / 3.2 • https://groups.google.com/forum/#!forum/streamline-users • Contributions are welcome!
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Thank you