SlideShare a Scribd company logo
1 of 28
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 3.1 Streaming Webinar
George Vetticaden
VP of Emerging Products
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reference Architecture: Real-time
Streaming Analytics
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Trucking company w/ large fleet of international trucks
A truck generates millions of events for a given route;
an event could be:
 'Normal' events: starting / stopping of the vehicle
 ‘Violation’ events: speeding, excessive acceleration and
breaking, unsafe tail distance
 ‘Speed’ Events: The speed of a driver that comes in every
minute.
Company uses an application that monitors truck
locations and violations from the truck/driver in real-
time
Route?
Truck?
Driver?
Analysts query a broad
history to understand if
today’s violations are
part of a larger problem
with specific routes,
trucks, or drivers
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Real-time Analytics Architecture with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Implement the Flow App with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Implementing the Flow Requirements with Apache NiFi and Kafka
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Kafka 1.0
Key Highlights
 Kafka 1.0 was the most asked for
feature/component in HDF Kafka 1.0
 Key Features introduced in Kaka
0.11/1.0 included
– Kafka Message Header support
– Transactional Support
– Performance improvements
 These features are critical for
customers who are building
streaming apps
 Customer didn’t’ just want support
for Kafka 1.0 but want full HDF
integration with Kafka 1.0 including
Nifi, Ambari, Ranger, and Atlas
Integration
Kafka 1.0 - Ambari
 Ambari support for Kafka 1.0
– Install, configure, manage & monitor Kafka 1.0
multi-node secure clusters.
Kafka 1.0 - Ranger
 Ranger ACL support for Kafka 1.0
– Support for resource and tag based access
control for Kafka 1.0
Kafka 1.0 - NiFi & SAM
 New Nifi Kafka 1.0 Processors
– Kafka header & transaction support
 SAM Source/Sink for Kafka 1.0
Kafka 1.0 - Atlas
 Lineage of Kafka Topics
– Who are all the consumers and producers
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF Ref App Data Sources: TruckGeoEvent and TruckSpeedEvent Streams
eventType
truckId
driverId
driverName longitude
eventTime routeId
eventSource
2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1|
route
latitude
correlationId
speed
truckId
driverId
driverName
eventTime routeIdeventSource
2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 |
route
 Each Truck emits different event stream
– Truck Geo Event
– Truck Speed Event
Truck Geo Event:
Truck Speed Event:
eventTimeLong
eventTimeLong
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Flow Management Requirements
Flow Requirement # Requirement Description
Req. #1 Edge deployed data collection service needs to capture data from the two
sensors and stream to an IOT gateway powered by Apache Kafka.
Req. #2 The ingestion service will deliver events in CSV format from each sensor to a
Kafka topic (call it raw-all_truck_events_csv) in a secure cluster
Req. #3 Metadata headers need to be sent with each event like the schema key that
identifies the schema for the event in a centralized schema registry store.
Req. #4 Consumers of this raw sensor data need to inspect the meta headers to
lookup the schema and do routing, filtering, and enrichment.
Req. #5 Producers need to publish the enriched streams to their own respective Kafka
topics for consumption for downstream analytics (let's call the Kafka topics for
the two streams: truck_events_avro, truck_speed_events_avro)
Req. #6 Agility is key. Developer should be able to create consumers and producers
quickly in a code-less approach, preferably UI driven.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Flow Management App
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Implement the Streaming App with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Streaming Analytics Requirements
Streaming Analytics
Requirement #
Requirement Description
Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the
enriched geo and speed streams to.
Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation
window.
Req. #3 Apply rules on the stream to filter on events of interest.
Req. #4 Enrich the stream with features required for a machine learning (ML) model.
The enrichment entails performing lookups for driver HR info, hours/miles
driven in the past week, weather info.
Req. #5 Normalize the events in the stream to feed into the PMML model.
Req. #6 Execute a predictive logistical regression model on the stream built with
Spark ML to predict if a driver is in danger going to commit a violation.
Req. #7 Alert and feed into real-time dashboard if model predicted a violation.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Implementing the Streaming Analytics Requirements with SAM
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Value Proposition
 Build and deploy complex stream analytics applications without writing any code
 Only open source tool in the market with graphical programming paradigm
 Speed time-to-market for complex stream apps
 Build stream analytics apps without specialized skillsets.
 Decouple data schema from the streaming application itself
 Support multiple underlining streaming engine
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who Uses SAM?
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM is All about Doing Real-Time Analytics on the Stream
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Real-Time Descriptive Analytics
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Predictive Analytics with SAM
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Predictive Analytics
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real-Time Predictive Analytics
 Question: No violation events but what might happen that I need to be worried about?
 My data science team has a model that can predict that based on
– Weather
– Roads
– Driver HR info like driver certification status, wagePlan
– Driver timesheet info like hours, and miles logged over the last week
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Building the Predictive Model on HDP
Identify suitable ML algorithms to train a model – we will
use classification algorithms as we have labeled events
data
2
Transform enriched events data to a format that is
friendly to Spark MLlib – many ML libs expect
training data in a certain format
3
Train a logistic classification Spark model on YARN, with
above events as training input, and iterate to fine tune
generated model
4
Explore small subset of events to identify predictive
features and make a hypothesis. E.g. hypothesis: “foggy
weather causes driver violations”
1
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logistical Regression Model
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scoring the Predictive Model on HDF
Use SAM’s enrich/custom processors to enrich the event
with the features required for the model6
Enrich with Features
Use SAM’s projection/custom processors to
transform/normalize the streaming event and the
features required for the model
7
Transform/Normalize
Use SAM’s PMML processor to score the model for each
stream event with its required features8
Score Model
Use SAM’s rule and notification processors to alert,
notify and take action using the results of the model9
Alert / Notify / Action
Export the Spark Mllib model and import into the HDF’s
Model Registry
5 Model
Registry
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Predictive Analytics with SAM
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Integration
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Where did the data come from?
Where did it go? Who Produced/Consumed?
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problem Statement:
 Common ref architectures use all the HDF components including Nifi for Flow and Storm/SAM for
streaming analytics with Schema registry as the glue.
 Hence, as data flows through these different components in HDF, governance requirements such as
lineage/provenance, chain of custody, security, audit are key requirements for every large enterprise.
Solution:
 With HDF 3.1, Flow and Streaming components will be integrated with Atlas:
– Nifi is integrated with Atlas so that Atlas contains meta information of Nifi Data Flows including source data and target
systems of the flow
– SAM/Storm is integrated with Atlas that contains meta information about SAM App topologies including source data
and target systems
Why Should You Care?
 Atlas Integration with HDF components allows enterprise to meet governance requirements allowing
users to track data as it travels across the data-in-motion platform (HDF) and into the data-at-rest
platform (HDP).
Apache Atlas Integration for Flow and Streaming Components
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Atlas

More Related Content

What's hot

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Mats Johansson
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthDataWorks Summit
 
EDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteEDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteHortonworks
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onDataWorks Summit
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseHortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteHortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...DataWorks Summit
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0DataWorks Summit
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scaleCarolyn Duby
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoDataWorks Summit
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataMats Johansson
 

What's hot (20)

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
 
EDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteEDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old Favorite
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
Deep Learning 101
Deep Learning 101Deep Learning 101
Deep Learning 101
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - Tokyo
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 

Similar to HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...DataWorks Summit
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appgvetticaden
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics ManagerSriharsha Chintalapani
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksKelly Kohlleffel
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014Hortonworks
 
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...Toralf Richter
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testingVinay Kumar
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsDataWorks Summit/Hadoop Summit
 
Linux Akraino Blueprint
Linux Akraino BlueprintLinux Akraino Blueprint
Linux Akraino BlueprintLiz Warner
 
What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)AppDynamics
 
Resume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expResume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expSandip Mohod
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 

Similar to HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features (20)

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testing
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
 
Linux Akraino Blueprint
Linux Akraino BlueprintLinux Akraino Blueprint
Linux Akraino Blueprint
 
What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)
 
Resume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expResume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_exp
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 

More from Hortonworks

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive DataHortonworks
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of DataHortonworks
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateHortonworks
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization JourneyHortonworks
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 

More from Hortonworks (20)

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 3.1 Streaming Webinar George Vetticaden VP of Emerging Products
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference Architecture: Real-time Streaming Analytics
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Trucking company w/ large fleet of international trucks A truck generates millions of events for a given route; an event could be:  'Normal' events: starting / stopping of the vehicle  ‘Violation’ events: speeding, excessive acceleration and breaking, unsafe tail distance  ‘Speed’ Events: The speed of a driver that comes in every minute. Company uses an application that monitors truck locations and violations from the truck/driver in real- time Route? Truck? Driver? Analysts query a broad history to understand if today’s violations are part of a larger problem with specific routes, trucks, or drivers
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Real-time Analytics Architecture with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Implement the Flow App with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Implementing the Flow Requirements with Apache NiFi and Kafka
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Kafka 1.0 Key Highlights  Kafka 1.0 was the most asked for feature/component in HDF Kafka 1.0  Key Features introduced in Kaka 0.11/1.0 included – Kafka Message Header support – Transactional Support – Performance improvements  These features are critical for customers who are building streaming apps  Customer didn’t’ just want support for Kafka 1.0 but want full HDF integration with Kafka 1.0 including Nifi, Ambari, Ranger, and Atlas Integration Kafka 1.0 - Ambari  Ambari support for Kafka 1.0 – Install, configure, manage & monitor Kafka 1.0 multi-node secure clusters. Kafka 1.0 - Ranger  Ranger ACL support for Kafka 1.0 – Support for resource and tag based access control for Kafka 1.0 Kafka 1.0 - NiFi & SAM  New Nifi Kafka 1.0 Processors – Kafka header & transaction support  SAM Source/Sink for Kafka 1.0 Kafka 1.0 - Atlas  Lineage of Kafka Topics – Who are all the consumers and producers
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF Ref App Data Sources: TruckGeoEvent and TruckSpeedEvent Streams eventType truckId driverId driverName longitude eventTime routeId eventSource 2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1| route latitude correlationId speed truckId driverId driverName eventTime routeIdeventSource 2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 | route  Each Truck emits different event stream – Truck Geo Event – Truck Speed Event Truck Geo Event: Truck Speed Event: eventTimeLong eventTimeLong
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Flow Management Requirements Flow Requirement # Requirement Description Req. #1 Edge deployed data collection service needs to capture data from the two sensors and stream to an IOT gateway powered by Apache Kafka. Req. #2 The ingestion service will deliver events in CSV format from each sensor to a Kafka topic (call it raw-all_truck_events_csv) in a secure cluster Req. #3 Metadata headers need to be sent with each event like the schema key that identifies the schema for the event in a centralized schema registry store. Req. #4 Consumers of this raw sensor data need to inspect the meta headers to lookup the schema and do routing, filtering, and enrichment. Req. #5 Producers need to publish the enriched streams to their own respective Kafka topics for consumption for downstream analytics (let's call the Kafka topics for the two streams: truck_events_avro, truck_speed_events_avro) Req. #6 Agility is key. Developer should be able to create consumers and producers quickly in a code-less approach, preferably UI driven.
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Flow Management App
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Implement the Streaming App with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Streaming Analytics Requirements Streaming Analytics Requirement # Requirement Description Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the enriched geo and speed streams to. Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation window. Req. #3 Apply rules on the stream to filter on events of interest. Req. #4 Enrich the stream with features required for a machine learning (ML) model. The enrichment entails performing lookups for driver HR info, hours/miles driven in the past week, weather info. Req. #5 Normalize the events in the stream to feed into the PMML model. Req. #6 Execute a predictive logistical regression model on the stream built with Spark ML to predict if a driver is in danger going to commit a violation. Req. #7 Alert and feed into real-time dashboard if model predicted a violation.
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Implementing the Streaming Analytics Requirements with SAM
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  Build and deploy complex stream analytics applications without writing any code  Only open source tool in the market with graphical programming paradigm  Speed time-to-market for complex stream apps  Build stream analytics apps without specialized skillsets.  Decouple data schema from the streaming application itself  Support multiple underlining streaming engine
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who Uses SAM?
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Real-Time Descriptive Analytics
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Predictive Analytics with SAM Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Predictive Analytics
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-Time Predictive Analytics  Question: No violation events but what might happen that I need to be worried about?  My data science team has a model that can predict that based on – Weather – Roads – Driver HR info like driver certification status, wagePlan – Driver timesheet info like hours, and miles logged over the last week
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Building the Predictive Model on HDP Identify suitable ML algorithms to train a model – we will use classification algorithms as we have labeled events data 2 Transform enriched events data to a format that is friendly to Spark MLlib – many ML libs expect training data in a certain format 3 Train a logistic classification Spark model on YARN, with above events as training input, and iterate to fine tune generated model 4 Explore small subset of events to identify predictive features and make a hypothesis. E.g. hypothesis: “foggy weather causes driver violations” 1
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logistical Regression Model
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scoring the Predictive Model on HDF Use SAM’s enrich/custom processors to enrich the event with the features required for the model6 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 7 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features8 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model9 Alert / Notify / Action Export the Spark Mllib model and import into the HDF’s Model Registry 5 Model Registry
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Predictive Analytics with SAM
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Integration
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Where did the data come from? Where did it go? Who Produced/Consumed? Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problem Statement:  Common ref architectures use all the HDF components including Nifi for Flow and Storm/SAM for streaming analytics with Schema registry as the glue.  Hence, as data flows through these different components in HDF, governance requirements such as lineage/provenance, chain of custody, security, audit are key requirements for every large enterprise. Solution:  With HDF 3.1, Flow and Streaming components will be integrated with Atlas: – Nifi is integrated with Atlas so that Atlas contains meta information of Nifi Data Flows including source data and target systems of the flow – SAM/Storm is integrated with Atlas that contains meta information about SAM App topologies including source data and target systems Why Should You Care?  Atlas Integration with HDF components allows enterprise to meet governance requirements allowing users to track data as it travels across the data-in-motion platform (HDF) and into the data-at-rest platform (HDP). Apache Atlas Integration for Flow and Streaming Components
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Atlas

Editor's Notes

  1. A tool using a graphically programming paradigm with drag and drop interface allowing customers to build and deploy complex stream applications without writing any code Bring complex stream apps to market considerably faster Allows any organization to to build stream analytics apps without specialized skillsets. The tool provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark, Flink, etc..) The only open source tool in the market that provides graphically programming paradigm to build streaming apps. Customers less concerned about what open source stream engine is used. Decouple Schema from the Stream App via integration with Schema Registry Combination of Flow Management’s Data Flow Management with Stream Processing’s Stream Builder provides a truly differentiated solution in the market to build end to end stream analytical apps