SlideShare a Scribd company logo
1 of 39
1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Future of Data Boston
2 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Agenda
 Networking, Food and drink
 Announcements
 Main Presentation
– Unlocking Insights in Streaming Data with Open Source
 Question and Answer
 Networking and Wrap up
3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Announcements
 Thanks to our sponsors
– Hortonworks
– Pivotal Labs
 What topics would you like to hear about?
4 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
About Carolyn Duby
 Big Data Solutions Architect
 High performance data intensive systems
 Data science
 ScB ScM Computer Science, Brown University
 LinkedIn: https://www.linkedin.com/in/carolynduby/
 Twitter: @carolynduby Github: carolynduby
 Hortonworks
– Innovation through data
– Enterprise ready, 100% open source, modern data platforms
– Engineering, Technical Support, Professional Services, Training
5 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Streaming Analytics
 Streaming data is valuable
– Make decisions in real time
– Gain new understanding of business
– Detect/resolve/predict/warn of conditions
– Recommend at the right moment
6 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Not very easy
 Streams flows in at variable rates
– High and low points
– Data streams arrive with different latency
– Often can’t control input rate
 Lots of different choices of libraries
– Storm, Spark Streaming, Samza, Flink…. Oh my!
 Complex time series analytics
– Windowing
– Joining streams
7 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Components
 Schema registry
– AVRO schemas for streaming data
 Model registry
– Register machine learning models
 Streaming Analytics Manager
– Build and monitor streaming applications
 Superset
– Visualize streaming time series data
 Druid
– Store and aggregate time series data
 Streaming Substrate – Kafka, Storm, HDFS
8 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reference Architecture: Real-time
Streaming Analytics
10 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Trucking company w/ large fleet of international trucks
A truck generates millions of events for a given route;
an event could be:
 'Normal' events: starting / stopping of the vehicle
 ‘Violation’ events: speeding, excessive acceleration and
breaking, unsafe tail distance
 ‘Speed’ Events: The speed of a driver that comes in every
minute.
Company uses an application that monitors truck
locations and violations from the truck/driver in real-
time
Route?
Truck?
Driver?
Analysts query a broad
history to understand if
today’s violations are
part of a larger problem
with specific routes,
trucks, or drivers
11 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Real-time Analytics Architecture with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Services:
Schema Registry
13 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Problem Statement:
 No centralized store to manage schemas for event data. Schema has to be hardcoded, passed with
data or inferred. Producers and Consumers cannot evolve at different rates
Solution:
 Introducing new component in HDF platform called: Hortonworks Schema Registry
 A shared repository of schemas that allows applications to flexibly interact with each other - in order to
save or retrieve schemas for the data they need to access
Why does this matter?
 Meets governance and operations requirements with a centralized registry to manage event schemas.
 Provides a reusable schema and avoids attaching a schema to every payload, allows consumers and
producers to evolve at different rates
#1: Schema Registry: The Problem & Solution Defined
14 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
How Schema Registry Work with the Rest of the Platform
 NiFi Processors for Schema Registry
– Schema aware flow management (record readers/writers)
– Live schema reference / automated schema conversion
 Streaming Analytics Manager processors for Schema Registry
– For example: Lookup a schema of a Kafka Topic
– Context/schema aware user experience eases time-to-market of building stream apps
 Atlas integration with Schema Registry (In a future release )
– Just like Atlas pulls schema info from Hive MetaStore, Atlas can now capture schema, format and
semantic metadata from events in HDF via the registry.
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Demo
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
17 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Why SAM?
Make it a delightful experience to build streaming analytics applications.
Provide the same experience for streaming analytics that developers
have today with Apache NiFi/MiniFi.
18 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Stream Processing – Introducing Streaming Analytics Manager (SAM)
Streaming Analytics Manager
 Design, develop, deploy and manage streaming analytics app with drag-and-drop ease
– Build streaming analytics applications that do event correlation, context enrichment , complex
pattern matching, analytical aggregations and creation of alerts/notifications when insights are
discovered.
– Supports multiple streaming substrates/engine (e.g: Storm, Spark Streaming, etc.)
– Extensibility is a first class citizen (add custom sinks, processors, spouts, etc..)
19 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SAM’s 3 Modules for 3 Different Personas in the Enterprise
20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Stream Ops Module for IT Operations
 Service Pool Abstraction
 Create and manage different environments in which
individual streaming applications will be built
 Environments consists of services such as HDFS, Kafka,
Storm from different service pools
 Save time and reduce operational overhead with same
drag and drop paradigm as the stream build module
 SAM takes away the complexity of deploying secure
streaming analytics on kerberized cluster
21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Stream Builder Module for App Developers
 Builder components, shown on the canvas
palette, are the building blocks used by the app
developer to build streaming apps.
 Drag and drop to build a working streaming
application without writing a single line of code.
 4 Types of Components: Sources, Processors,
Sinks and Custom
22 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SAM is All about Doing Real-Time Analytics on the Stream
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
23 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Real-Time Prescriptive Analytics
 Question: What should we do right
now?
 Context: It is rainy, the driver is
been on the road for 12 hours and
he has 30 high speeding alerts over
a 3 minute window in the last 2
hours.
 Answer: Dispatch a radio call to the
Driver to slow down
24 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Real-Time Predictive Analytics
 Question: No violation events but what might happen that I need to be worried about?
 My data science team has a model that can predict that based on
– Weather
– Roads
– Driver HR info like driver certification status, wagePlan
– Driver timesheet info like hours, and miles logged over the last week
25 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Real-Time Predictive Analytics
Use SAM’s enrich/custom processors to enrich the event
with the features required for the model2
Enrich with Features
Use SAM’s projection/custom processors to
transform/normalize the streaming event and the
features required for the model
3
Transform/Normalize
Use SAM’s PMML processor to score the model for each
stream event with its required features4
Score Model
Use SAM’s rule and notification processors to alert,
notify and take action using the results of the model5
Alert / Notify / Action
Export the Spark Mllib model and import into the HDF’s
Model Registry
1 Model
Registry
26 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Real-Time Prescriptive Analytics for Business Analysts
 A tool to create time-
series and real-time
analytics dashboards,
charts and graphs
 30+ visualization
charts out of the box
with customization
capability
 Druid is the Analytics
Engine that powers
the Stream Insight
Module.
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid
28 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
What Is Druid?
Druid is a distributed, real-time, column-oriented datastore
designed to quickly ingest and index large amounts of data
and make it available for real-time query.
Features:
• Streaming Data Ingestion
• Sub-Second Queries
• Merge Historical and Real-Time Data
• Approximate Computation
TECHNICAL PREVIEW: 2.6.2
GA: 2.6.3
29 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Cool stuff you can do with Druid (and pretty much nothing else)
 Real Time Ingest and Query At Scale
– Scale: 100m+ events per second with highly
concurrent queries.
– Stream data from Kafka to Druid and query it as it
arrives.
 Use cases:
– Real-time bidding / market making.
– Realtime analytics on clickstream data.
– IoT monitoring applications.
– Real-time dashboards and KPI tracking.
 Learn how PayPal hypercharged self-service
analytics:
– https://www.slideshare.net/anilmadan902/paypal-
business-intelligence-and-real-time-analytics
30 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Cool stuff you can do with Druid (and pretty much nothing else)
 Datasketches: Fast, multi-dimensional approximate set intersections at high scale.
– Question: Is a CPM of $7.00 a good deal for my iOS app, popular among middle-aged in the US?
– Decision Support: How many iOS users, in the US, age range 30-45 visited in the last week?
 Use cases:
– Targeted advertising / offer management.
– Personalized recommendations.
– Anything aimed at a segment or an individual.
 Learn how Nielsen Marketing Cloud takes Micro Targeting to the next level with Druid:
– https://www.slideshare.net/ItaiYaffe/using-druid-for-interactive-count-distinct-queries-at-scale
Age Range
Country
Mobile
Platforms
Small Intersection:
Not worth it!
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Demo
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility:
SAM Software Development Kit
33 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Extensibility with SAM SDK
 Custom Processor - allows users to write their own business logic
34 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Extensibility with SAM SDK
 Multi-lang support (upcoming)
35 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Extensibility with SAM SDK
 UDAFs - compute aggregates within a window Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
36 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Extensibility with SAM SDK
 UDFs - does simple transformations Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
37 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Extensibility with SAM SDK
 Notifier - sends notifications such as Email, SMS or more complex ones that can invoke
external APIs
Built in notifiers
 Email
 More in future…
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
39 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Learn More about Streaming Analytics Manager (SAM)
 Download the tutorial – https://hortonworks.com/tutorial
– Real-Time Event Processing in NIFI, SAM, Schema Registry and Superset
 Blogs – https://hortonworks.com/blog
– Hortonworks Thoughts on Building A Successful Streaming Analytics Platform
 Hortonworks Community – https://community.hortonworks.com
 Github - https://github.com/hortonworks/streamline

More Related Content

What's hot

Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash CourseDataWorks Summit
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthDataWorks Summit
 
10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the GlobeDataWorks Summit
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the dataDataWorks Summit
 
Data Science at Speed. At Scale.
Data Science at Speed. At Scale.Data Science at Speed. At Scale.
Data Science at Speed. At Scale.DataWorks Summit
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic EcosystemsHortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning EverywhereDataWorks Summit
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksKelly Kohlleffel
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricDataWorks Summit
 
Big Data Challenges in the Energy Sector
Big Data Challenges in the Energy SectorBig Data Challenges in the Energy Sector
Big Data Challenges in the Energy SectorDataWorks Summit
 
Big Traffic, Big Trouble: Big Data Security Analytics
Big Traffic, Big Trouble: Big Data Security AnalyticsBig Traffic, Big Trouble: Big Data Security Analytics
Big Traffic, Big Trouble: Big Data Security AnalyticsDataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 

What's hot (20)

Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
 
10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
Deep learning 101
Deep learning 101Deep learning 101
Deep learning 101
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science at Speed. At Scale.
Data Science at Speed. At Scale.Data Science at Speed. At Scale.
Data Science at Speed. At Scale.
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
Big Data Challenges in the Energy Sector
Big Data Challenges in the Energy SectorBig Data Challenges in the Energy Sector
Big Data Challenges in the Energy Sector
 
Big Traffic, Big Trouble: Big Data Security Analytics
Big Traffic, Big Trouble: Big Data Security AnalyticsBig Traffic, Big Trouble: Big Data Security Analytics
Big Traffic, Big Trouble: Big Data Security Analytics
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 

Similar to Unlocking insights in streaming data

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics ManagerSriharsha Chintalapani
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...DataWorks Summit
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsDataWorks Summit/Hadoop Summit
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerAbdelkrim Hadjidj
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appgvetticaden
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motionRaúl Marín
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 

Similar to Unlocking insights in streaming data (20)

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Unlocking insights in streaming data

  • 1. 1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Future of Data Boston
  • 2. 2 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Agenda  Networking, Food and drink  Announcements  Main Presentation – Unlocking Insights in Streaming Data with Open Source  Question and Answer  Networking and Wrap up
  • 3. 3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Announcements  Thanks to our sponsors – Hortonworks – Pivotal Labs  What topics would you like to hear about?
  • 4. 4 © Hortonworks Inc. 2011 – 2017 All Rights Reserved About Carolyn Duby  Big Data Solutions Architect  High performance data intensive systems  Data science  ScB ScM Computer Science, Brown University  LinkedIn: https://www.linkedin.com/in/carolynduby/  Twitter: @carolynduby Github: carolynduby  Hortonworks – Innovation through data – Enterprise ready, 100% open source, modern data platforms – Engineering, Technical Support, Professional Services, Training
  • 5. 5 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Streaming Analytics  Streaming data is valuable – Make decisions in real time – Gain new understanding of business – Detect/resolve/predict/warn of conditions – Recommend at the right moment
  • 6. 6 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Not very easy  Streams flows in at variable rates – High and low points – Data streams arrive with different latency – Often can’t control input rate  Lots of different choices of libraries – Storm, Spark Streaming, Samza, Flink…. Oh my!  Complex time series analytics – Windowing – Joining streams
  • 7. 7 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Components  Schema registry – AVRO schemas for streaming data  Model registry – Register machine learning models  Streaming Analytics Manager – Build and monitor streaming applications  Superset – Visualize streaming time series data  Druid – Store and aggregate time series data  Streaming Substrate – Kafka, Storm, HDFS
  • 8. 8 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference Architecture: Real-time Streaming Analytics
  • 10. 10 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Trucking company w/ large fleet of international trucks A truck generates millions of events for a given route; an event could be:  'Normal' events: starting / stopping of the vehicle  ‘Violation’ events: speeding, excessive acceleration and breaking, unsafe tail distance  ‘Speed’ Events: The speed of a driver that comes in every minute. Company uses an application that monitors truck locations and violations from the truck/driver in real- time Route? Truck? Driver? Analysts query a broad history to understand if today’s violations are part of a larger problem with specific routes, trucks, or drivers
  • 11. 11 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Real-time Analytics Architecture with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Services: Schema Registry
  • 13. 13 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Problem Statement:  No centralized store to manage schemas for event data. Schema has to be hardcoded, passed with data or inferred. Producers and Consumers cannot evolve at different rates Solution:  Introducing new component in HDF platform called: Hortonworks Schema Registry  A shared repository of schemas that allows applications to flexibly interact with each other - in order to save or retrieve schemas for the data they need to access Why does this matter?  Meets governance and operations requirements with a centralized registry to manage event schemas.  Provides a reusable schema and avoids attaching a schema to every payload, allows consumers and producers to evolve at different rates #1: Schema Registry: The Problem & Solution Defined
  • 14. 14 © Hortonworks Inc. 2011 – 2017 All Rights Reserved How Schema Registry Work with the Rest of the Platform  NiFi Processors for Schema Registry – Schema aware flow management (record readers/writers) – Live schema reference / automated schema conversion  Streaming Analytics Manager processors for Schema Registry – For example: Lookup a schema of a Kafka Topic – Context/schema aware user experience eases time-to-market of building stream apps  Atlas integration with Schema Registry (In a future release ) – Just like Atlas pulls schema info from Hive MetaStore, Atlas can now capture schema, format and semantic metadata from events in HDF via the registry.
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Demo
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing
  • 17. 17 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Why SAM? Make it a delightful experience to build streaming analytics applications. Provide the same experience for streaming analytics that developers have today with Apache NiFi/MiniFi.
  • 18. 18 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Processing – Introducing Streaming Analytics Manager (SAM) Streaming Analytics Manager  Design, develop, deploy and manage streaming analytics app with drag-and-drop ease – Build streaming analytics applications that do event correlation, context enrichment , complex pattern matching, analytical aggregations and creation of alerts/notifications when insights are discovered. – Supports multiple streaming substrates/engine (e.g: Storm, Spark Streaming, etc.) – Extensibility is a first class citizen (add custom sinks, processors, spouts, etc..)
  • 19. 19 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM’s 3 Modules for 3 Different Personas in the Enterprise
  • 20. 20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Ops Module for IT Operations  Service Pool Abstraction  Create and manage different environments in which individual streaming applications will be built  Environments consists of services such as HDFS, Kafka, Storm from different service pools  Save time and reduce operational overhead with same drag and drop paradigm as the stream build module  SAM takes away the complexity of deploying secure streaming analytics on kerberized cluster
  • 21. 21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Builder Module for App Developers  Builder components, shown on the canvas palette, are the building blocks used by the app developer to build streaming apps.  Drag and drop to build a working streaming application without writing a single line of code.  4 Types of Components: Sources, Processors, Sinks and Custom
  • 22. 22 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 23. 23 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Real-Time Prescriptive Analytics  Question: What should we do right now?  Context: It is rainy, the driver is been on the road for 12 hours and he has 30 high speeding alerts over a 3 minute window in the last 2 hours.  Answer: Dispatch a radio call to the Driver to slow down
  • 24. 24 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Real-Time Predictive Analytics  Question: No violation events but what might happen that I need to be worried about?  My data science team has a model that can predict that based on – Weather – Roads – Driver HR info like driver certification status, wagePlan – Driver timesheet info like hours, and miles logged over the last week
  • 25. 25 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Real-Time Predictive Analytics Use SAM’s enrich/custom processors to enrich the event with the features required for the model2 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 3 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features4 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model5 Alert / Notify / Action Export the Spark Mllib model and import into the HDF’s Model Registry 1 Model Registry
  • 26. 26 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Real-Time Prescriptive Analytics for Business Analysts  A tool to create time- series and real-time analytics dashboards, charts and graphs  30+ visualization charts out of the box with customization capability  Druid is the Analytics Engine that powers the Stream Insight Module.
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Druid
  • 28. 28 © Hortonworks Inc. 2011 – 2017 All Rights Reserved What Is Druid? Druid is a distributed, real-time, column-oriented datastore designed to quickly ingest and index large amounts of data and make it available for real-time query. Features: • Streaming Data Ingestion • Sub-Second Queries • Merge Historical and Real-Time Data • Approximate Computation TECHNICAL PREVIEW: 2.6.2 GA: 2.6.3
  • 29. 29 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Cool stuff you can do with Druid (and pretty much nothing else)  Real Time Ingest and Query At Scale – Scale: 100m+ events per second with highly concurrent queries. – Stream data from Kafka to Druid and query it as it arrives.  Use cases: – Real-time bidding / market making. – Realtime analytics on clickstream data. – IoT monitoring applications. – Real-time dashboards and KPI tracking.  Learn how PayPal hypercharged self-service analytics: – https://www.slideshare.net/anilmadan902/paypal- business-intelligence-and-real-time-analytics
  • 30. 30 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Cool stuff you can do with Druid (and pretty much nothing else)  Datasketches: Fast, multi-dimensional approximate set intersections at high scale. – Question: Is a CPM of $7.00 a good deal for my iOS app, popular among middle-aged in the US? – Decision Support: How many iOS users, in the US, age range 30-45 visited in the last week?  Use cases: – Targeted advertising / offer management. – Personalized recommendations. – Anything aimed at a segment or an individual.  Learn how Nielsen Marketing Cloud takes Micro Targeting to the next level with Druid: – https://www.slideshare.net/ItaiYaffe/using-druid-for-interactive-count-distinct-queries-at-scale Age Range Country Mobile Platforms Small Intersection: Not worth it!
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Demo
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility: SAM Software Development Kit
  • 33. 33 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Extensibility with SAM SDK  Custom Processor - allows users to write their own business logic
  • 34. 34 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Extensibility with SAM SDK  Multi-lang support (upcoming)
  • 35. 35 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Extensibility with SAM SDK  UDAFs - compute aggregates within a window Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 36. 36 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Extensibility with SAM SDK  UDFs - does simple transformations Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 37. 37 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Extensibility with SAM SDK  Notifier - sends notifications such as Email, SMS or more complex ones that can invoke external APIs Built in notifiers  Email  More in future…
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions?
  • 39. 39 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Learn More about Streaming Analytics Manager (SAM)  Download the tutorial – https://hortonworks.com/tutorial – Real-Time Event Processing in NIFI, SAM, Schema Registry and Superset  Blogs – https://hortonworks.com/blog – Hortonworks Thoughts on Building A Successful Streaming Analytics Platform  Hortonworks Community – https://community.hortonworks.com  Github - https://github.com/hortonworks/streamline

Editor's Notes

  1. TALK TRACK Hello, my name is [NAME] and I want to thank you for taking time to speak with me today. Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications. Today, I’ll tell you how we do that and how you can transform your business by managing your data with Hortonworks Connected Data platforms. [NEXT SLIDE]