SlideShare a Scribd company logo
1 of 26
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager - SAM
Priyank Shah
Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History of Streaming at Hortonworks
 Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)
 First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)
 Added several improvements & features into Apache Storm. Yahoo! Running 2400 nodes of
Storm
 Added Security and critical features/improvements to Apache Kafka
 Lot of learnings from shipping Storm & Kafka from past 3 years
 Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from
shipping Storm & Kafka for past 3 years.
Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Schema Registry
 Streaming Applications usually fronts with a queue such as Kafka, Kinesis, EventHub etc..
 Data in Messaging Queues are Byte payloads and there is no schema associated with it.
 Streaming applications developers usually looks at the data flowing and defines their
processing of that data
 Any change to this data, schema wise, means developers have to update their code to
process the new format
 Support both programmatic schema creation and managed schemas
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Schema Registry
Kafka
Kinesis
EventHub
ConsumerBytes
Payload
Bytes
Payload
Storm
Spark Streaming
Others…
Producer
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
 What is it?
• A platform to design, develop, deploy and manage streaming analytics applications using a drag
drop visualize paradigm in minutes
• Allows you to do event correlation, context enrichment, complex pattern matching, analytical
aggregations and alerts/notifications when insights are discovered.
• Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g:
Storm, Spark Streaming, Flink)
• Extensibility is a first class citizen (add sinks, processors, sources as needed)
 Guiding Principle
– Build complex streaming applications easily with minimum code
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Complexities in building streaming applications
 New streaming engines and APIs
 Implementing windowing, joins, and state management is hard
 Interaction with external services such as HBase, Hive, HDFS etc
 Deploying with all the necessary configuration files
 Operations around the streaming application including monitoring and metrics
 Debugging streaming application
 Securing a streaming application cluster with the right configurations is a pain
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key challenges that SAM is trying to solve
 Building streaming applications requires specialized skillsets that most enterprise
organizations don’t have today
 Streaming applications require considerable amount of programming, testing and tuning
before deploying to production which takes a significant amount of time
 Key streaming primitives such as joining/splitting streams, aggregations over a window of
time and pattern matching are difficult to implement
 People don’t prefer to code to build complex streaming applications
 No true open source project today solves all of the above challenges
 People don’t care about the streaming engine that powers streaming applications so much as
long challenges above are addressed and doesn’t force them into vendor lock in.
Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Components and User
Personas
Distributed Streaming
Computation Engine
(Different Streaming Engines that powers higher level services to build stream application. )
App Developer
Business Analyst
Operations
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics powered by Druid and Superset
 What is Stream Insight?
 Provides a tool for business analysts to do descriptive analytics of the streaming data and
insights using a sophisticated UI provided by Superset
 Tooling to create time-series and real-time analytics dashboards, charts and graphs and
create rich customizable visualization of data
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM Architecture
Web server
(Jetty)
DB
SAM UI
Storage
Manager
Topology
actions
service
Topology DAG Builder
Topology Lifecycle
Manager
Storm
Runners
(translate SAM DAG
to Streaming Engine
topology)
Flink Spark
Flux
Deploy
DAG
Ambari
(cluster manager)
Streaming computation Engines
(Storm)
Service
Pools
REST
API
Environ
Service
Schema
Registry
SR
Client
Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology lifecycle
Initial
DAG
Constructed
Extra artifacts
set up
Deployed
Suspended
Deployment
Failed
Deploy
Kill
Suspend
Kill
Resume
Re-deploy
Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology DAG
Source
Processor 1
Processor 2
Sink 1
Stream 2
Edge
Stream 1
Stream 1
Stream 1
Sink 2
Fields: [
“a”: Int,
“b”:String
…
]
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - Topology Actions
Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - TopologyDAGVisitor
Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm runner example
Page27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SDK
Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Custom Processor - allows users to write their own business logic
Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Multi-lang support (upcoming)
Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 UADFs - compute aggregates within a window
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 UDFs - does simple transformations
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Notifier - sends notifications such as Email, SMS or more complex ones that can
invoke external APIs
Built in notifiers
 Email
 More in future…
Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The current release – 0.5
 Manual service pool registration not requiring Ambari
 Test mode to easily test out the streaming app
 Kerberos and delegation token based Authentication
 Authorization support with RBAC + permissions
 New sources, processors and sinks
Upcoming…
 Extending token based authentication for other components
 Support for state management in SAM
 Support for other streaming engines – Flink, Spark streaming
Page34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/streamline
 Apache incubation soon
 SAM 0.5 is out!
 https://groups.google.com/forum/#!forum/streamline-users
 Contributions are welcome!

More Related Content

What's hot

Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit/Hadoop Summit
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...DataWorks Summit
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Multitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and OozieMultitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and OozieDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetDataWorks Summit
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseDataWorks Summit/Hadoop Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 

What's hot (20)

Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Multitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and OozieMultitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and Oozie
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
 
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 

Similar to SAM - Streaming Analytics Made Easy

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics ManagerSriharsha Chintalapani
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...DataWorks Summit
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerAbdelkrim Hadjidj
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appgvetticaden
 
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0lisanl
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP Technology
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Full Stack Web Development: Vision, Challenges and Future Scope
Full Stack Web Development: Vision, Challenges and Future ScopeFull Stack Web Development: Vision, Challenges and Future Scope
Full Stack Web Development: Vision, Challenges and Future ScopeIRJET Journal
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209minseok kim
 

Similar to SAM - Streaming Analytics Made Easy (20)

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application Development
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Full Stack Web Development: Vision, Challenges and Future Scope
Full Stack Web Development: Vision, Challenges and Future ScopeFull Stack Web Development: Vision, Challenges and Future Scope
Full Stack Web Development: Vision, Challenges and Future Scope
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

SAM - Streaming Analytics Made Easy

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager - SAM Priyank Shah
  • 2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Streaming at Hortonworks  Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)  First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)  Added several improvements & features into Apache Storm. Yahoo! Running 2400 nodes of Storm  Added Security and critical features/improvements to Apache Kafka  Lot of learnings from shipping Storm & Kafka from past 3 years  Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm & Kafka for past 3 years.
  • 3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Schema Registry  Streaming Applications usually fronts with a queue such as Kafka, Kinesis, EventHub etc..  Data in Messaging Queues are Byte payloads and there is no schema associated with it.  Streaming applications developers usually looks at the data flowing and defines their processing of that data  Any change to this data, schema wise, means developers have to update their code to process the new format  Support both programmatic schema creation and managed schemas
  • 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Schema Registry Kafka Kinesis EventHub ConsumerBytes Payload Bytes Payload Storm Spark Streaming Others… Producer
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Allows you to do event correlation, context enrichment, complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build complex streaming applications easily with minimum code
  • 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windowing, joins, and state management is hard  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application  Securing a streaming application cluster with the right configurations is a pain
  • 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  • 8. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  • 9. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight?  Provides a tool for business analysts to do descriptive analytics of the streaming data and insights using a sophisticated UI provided by Superset  Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  • 10. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 11. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 12. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture
  • 13. Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM Architecture Web server (Jetty) DB SAM UI Storage Manager Topology actions service Topology DAG Builder Topology Lifecycle Manager Storm Runners (translate SAM DAG to Streaming Engine topology) Flink Spark Flux Deploy DAG Ambari (cluster manager) Streaming computation Engines (Storm) Service Pools REST API Environ Service Schema Registry SR Client
  • 14. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Topology lifecycle Initial DAG Constructed Extra artifacts set up Deployed Suspended Deployment Failed Deploy Kill Suspend Kill Resume Re-deploy
  • 15. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Topology DAG Source Processor 1 Processor 2 Sink 1 Stream 2 Edge Stream 1 Stream 1 Stream 1 Sink 2 Fields: [ “a”: Int, “b”:String … ]
  • 16. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Runner implements - Topology Actions
  • 17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Runner implements - TopologyDAGVisitor
  • 18. Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm runner example
  • 19. Page27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SDK
  • 20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor - allows users to write their own business logic
  • 21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Multi-lang support (upcoming)
  • 22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UADFs - compute aggregates within a window Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UDFs - does simple transformations Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notifier - sends notifications such as Email, SMS or more complex ones that can invoke external APIs Built in notifiers  Email  More in future…
  • 25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The current release – 0.5  Manual service pool registration not requiring Ambari  Test mode to easily test out the streaming app  Kerberos and delegation token based Authentication  Authorization support with RBAC + permissions  New sources, processors and sinks Upcoming…  Extending token based authentication for other components  Support for state management in SAM  Support for other streaming engines – Flink, Spark streaming
  • 26. Page34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.5 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!