SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 3.1 Streaming Webinar
George Vetticaden
VP of Emerging Products
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reference Architecture: Real-time
Streaming Analytics
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Trucking company w/ large fleet of international trucks
A truck generates millions of events for a given route;
an event could be:
 'Normal' events: starting / stopping of the vehicle
 ‘Violation’ events: speeding, excessive acceleration and
breaking, unsafe tail distance
 ‘Speed’ Events: The speed of a driver that comes in every
minute.
Company uses an application that monitors truck
locations and violations from the truck/driver in real-
time
Route?
Truck?
Driver?
Analysts query a broad
history to understand if
today’s violations are
part of a larger problem
with specific routes,
trucks, or drivers
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Real-time Analytics Architecture with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Implement the Flow App with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Implementing the Flow Requirements with Apache NiFi and Kafka
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Kafka 1.0
Key Highlights
 Kafka 1.0 was the most asked for
feature/component in HDF Kafka 1.0
 Key Features introduced in Kaka
0.11/1.0 included
– Kafka Message Header support
– Transactional Support
– Performance improvements
 These features are critical for
customers who are building
streaming apps
 Customer didn’t’ just want support
for Kafka 1.0 but want full HDF
integration with Kafka 1.0 including
Nifi, Ambari, Ranger, and Atlas
Integration
Kafka 1.0 - Ambari
 Ambari support for Kafka 1.0
– Install, configure, manage & monitor Kafka 1.0
multi-node secure clusters.
Kafka 1.0 - Ranger
 Ranger ACL support for Kafka 1.0
– Support for resource and tag based access
control for Kafka 1.0
Kafka 1.0 - NiFi & SAM
 New Nifi Kafka 1.0 Processors
– Kafka header & transaction support
 SAM Source/Sink for Kafka 1.0
Kafka 1.0 - Atlas
 Lineage of Kafka Topics
– Who are all the consumers and producers
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF Ref App Data Sources: TruckGeoEvent and TruckSpeedEvent Streams
eventType
truckId
driverId
driverName longitude
eventTime routeId
eventSource
2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1|
route
latitude
correlationId
speed
truckId
driverId
driverName
eventTime routeIdeventSource
2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 |
route
 Each Truck emits different event stream
– Truck Geo Event
– Truck Speed Event
Truck Geo Event:
Truck Speed Event:
eventTimeLong
eventTimeLong
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Flow Management Requirements
Flow Requirement # Requirement Description
Req. #1 Edge deployed data collection service needs to capture data from the two
sensors and stream to an IOT gateway powered by Apache Kafka.
Req. #2 The ingestion service will deliver events in CSV format from each sensor to a
Kafka topic (call it raw-all_truck_events_csv) in a secure cluster
Req. #3 Metadata headers need to be sent with each event like the schema key that
identifies the schema for the event in a centralized schema registry store.
Req. #4 Consumers of this raw sensor data need to inspect the meta headers to
lookup the schema and do routing, filtering, and enrichment.
Req. #5 Producers need to publish the enriched streams to their own respective Kafka
topics for consumption for downstream analytics (let's call the Kafka topics for
the two streams: truck_events_avro, truck_speed_events_avro)
Req. #6 Agility is key. Developer should be able to create consumers and producers
quickly in a code-less approach, preferably UI driven.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Flow Management App
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Implement the Streaming App with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Streaming Analytics Requirements
Streaming Analytics
Requirement #
Requirement Description
Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the
enriched geo and speed streams to.
Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation
window.
Req. #3 Apply rules on the stream to filter on events of interest.
Req. #4 Enrich the stream with features required for a machine learning (ML) model.
The enrichment entails performing lookups for driver HR info, hours/miles
driven in the past week, weather info.
Req. #5 Normalize the events in the stream to feed into the PMML model.
Req. #6 Execute a predictive logistical regression model on the stream built with
Spark ML to predict if a driver is in danger going to commit a violation.
Req. #7 Alert and feed into real-time dashboard if model predicted a violation.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Implementing the Streaming Analytics Requirements with SAM
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Value Proposition
 Build and deploy complex stream analytics applications without writing any code
 Only open source tool in the market with graphical programming paradigm
 Speed time-to-market for complex stream apps
 Build stream analytics apps without specialized skillsets.
 Decouple data schema from the streaming application itself
 Support multiple underlining streaming engine
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who Uses SAM?
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM is All about Doing Real-Time Analytics on the Stream
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Real-Time Descriptive Analytics
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Predictive Analytics with SAM
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Predictive Analytics
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real-Time Predictive Analytics
 Question: No violation events but what might happen that I need to be worried about?
 My data science team has a model that can predict that based on
– Weather
– Roads
– Driver HR info like driver certification status, wagePlan
– Driver timesheet info like hours, and miles logged over the last week
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Building the Predictive Model on HDP
Identify suitable ML algorithms to train a model – we will
use classification algorithms as we have labeled events
data
2
Transform enriched events data to a format that is
friendly to Spark MLlib – many ML libs expect
training data in a certain format
3
Train a logistic classification Spark model on YARN, with
above events as training input, and iterate to fine tune
generated model
4
Explore small subset of events to identify predictive
features and make a hypothesis. E.g. hypothesis: “foggy
weather causes driver violations”
1
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logistical Regression Model
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scoring the Predictive Model on HDF
Use SAM’s enrich/custom processors to enrich the event
with the features required for the model6
Enrich with Features
Use SAM’s projection/custom processors to
transform/normalize the streaming event and the
features required for the model
7
Transform/Normalize
Use SAM’s PMML processor to score the model for each
stream event with its required features8
Score Model
Use SAM’s rule and notification processors to alert,
notify and take action using the results of the model9
Alert / Notify / Action
Export the Spark Mllib model and import into the HDF’s
Model Registry
5 Model
Registry
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Predictive Analytics with SAM
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Integration
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Where did the data come from?
Where did it go? Who Produced/Consumed?
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problem Statement:
 Common ref architectures use all the HDF components including Nifi for Flow and Storm/SAM for
streaming analytics with Schema registry as the glue.
 Hence, as data flows through these different components in HDF, governance requirements such as
lineage/provenance, chain of custody, security, audit are key requirements for every large enterprise.
Solution:
 With HDF 3.1, Flow and Streaming components will be integrated with Atlas:
– Nifi is integrated with Atlas so that Atlas contains meta information of Nifi Data Flows including source data and target
systems of the flow
– SAM/Storm is integrated with Atlas that contains meta information about SAM App topologies including source data
and target systems
Why Should You Care?
 Atlas Integration with HDF components allows enterprise to meet governance requirements allowing
users to track data as it travels across the data-in-motion platform (HDF) and into the data-at-rest
platform (HDP).
Apache Atlas Integration for Flow and Streaming Components
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Atlas

More Related Content

What's hot

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
Mats Johansson
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
DataWorks Summit
 
EDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteEDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old Favorite
Hortonworks
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
DataWorks Summit
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Hortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
Hortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
DataWorks Summit
 
Deep Learning 101
Deep Learning 101Deep Learning 101
Deep Learning 101
DataWorks Summit
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
Carolyn Duby
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
Carolyn Duby
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - Tokyo
DataWorks Summit
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
Hortonworks
 

What's hot (20)

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
 
EDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteEDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old Favorite
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
Deep Learning 101
Deep Learning 101Deep Learning 101
Deep Learning 101
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - Tokyo
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 

Similar to HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
DataWorks Summit
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
gvetticaden
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
Sriharsha Chintalapani
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Sriharsha Chintalapani
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
DataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Kelly Kohlleffel
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
Hortonworks
 
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
Toralf Richter
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testing
Vinay Kumar
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
DataWorks Summit/Hadoop Summit
 
Linux Akraino Blueprint
Linux Akraino BlueprintLinux Akraino Blueprint
Linux Akraino Blueprint
Liz Warner
 
What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)
AppDynamics
 
Resume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expResume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_exp
Sandip Mohod
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 

Similar to HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features (20)

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testing
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
 
Linux Akraino Blueprint
Linux Akraino BlueprintLinux Akraino Blueprint
Linux Akraino Blueprint
 
What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)
 
Resume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expResume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_exp
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 

More from Hortonworks

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
Hortonworks
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
Hortonworks
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Hortonworks
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
Hortonworks
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Hortonworks
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
 

More from Hortonworks (20)

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 

Recently uploaded

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 

Recently uploaded (20)

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 3.1 Streaming Webinar George Vetticaden VP of Emerging Products
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference Architecture: Real-time Streaming Analytics
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Trucking company w/ large fleet of international trucks A truck generates millions of events for a given route; an event could be:  'Normal' events: starting / stopping of the vehicle  ‘Violation’ events: speeding, excessive acceleration and breaking, unsafe tail distance  ‘Speed’ Events: The speed of a driver that comes in every minute. Company uses an application that monitors truck locations and violations from the truck/driver in real- time Route? Truck? Driver? Analysts query a broad history to understand if today’s violations are part of a larger problem with specific routes, trucks, or drivers
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Real-time Analytics Architecture with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Implement the Flow App with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Implementing the Flow Requirements with Apache NiFi and Kafka
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Kafka 1.0 Key Highlights  Kafka 1.0 was the most asked for feature/component in HDF Kafka 1.0  Key Features introduced in Kaka 0.11/1.0 included – Kafka Message Header support – Transactional Support – Performance improvements  These features are critical for customers who are building streaming apps  Customer didn’t’ just want support for Kafka 1.0 but want full HDF integration with Kafka 1.0 including Nifi, Ambari, Ranger, and Atlas Integration Kafka 1.0 - Ambari  Ambari support for Kafka 1.0 – Install, configure, manage & monitor Kafka 1.0 multi-node secure clusters. Kafka 1.0 - Ranger  Ranger ACL support for Kafka 1.0 – Support for resource and tag based access control for Kafka 1.0 Kafka 1.0 - NiFi & SAM  New Nifi Kafka 1.0 Processors – Kafka header & transaction support  SAM Source/Sink for Kafka 1.0 Kafka 1.0 - Atlas  Lineage of Kafka Topics – Who are all the consumers and producers
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF Ref App Data Sources: TruckGeoEvent and TruckSpeedEvent Streams eventType truckId driverId driverName longitude eventTime routeId eventSource 2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1| route latitude correlationId speed truckId driverId driverName eventTime routeIdeventSource 2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 | route  Each Truck emits different event stream – Truck Geo Event – Truck Speed Event Truck Geo Event: Truck Speed Event: eventTimeLong eventTimeLong
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Flow Management Requirements Flow Requirement # Requirement Description Req. #1 Edge deployed data collection service needs to capture data from the two sensors and stream to an IOT gateway powered by Apache Kafka. Req. #2 The ingestion service will deliver events in CSV format from each sensor to a Kafka topic (call it raw-all_truck_events_csv) in a secure cluster Req. #3 Metadata headers need to be sent with each event like the schema key that identifies the schema for the event in a centralized schema registry store. Req. #4 Consumers of this raw sensor data need to inspect the meta headers to lookup the schema and do routing, filtering, and enrichment. Req. #5 Producers need to publish the enriched streams to their own respective Kafka topics for consumption for downstream analytics (let's call the Kafka topics for the two streams: truck_events_avro, truck_speed_events_avro) Req. #6 Agility is key. Developer should be able to create consumers and producers quickly in a code-less approach, preferably UI driven.
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Flow Management App
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Implement the Streaming App with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Streaming Analytics Requirements Streaming Analytics Requirement # Requirement Description Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the enriched geo and speed streams to. Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation window. Req. #3 Apply rules on the stream to filter on events of interest. Req. #4 Enrich the stream with features required for a machine learning (ML) model. The enrichment entails performing lookups for driver HR info, hours/miles driven in the past week, weather info. Req. #5 Normalize the events in the stream to feed into the PMML model. Req. #6 Execute a predictive logistical regression model on the stream built with Spark ML to predict if a driver is in danger going to commit a violation. Req. #7 Alert and feed into real-time dashboard if model predicted a violation.
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Implementing the Streaming Analytics Requirements with SAM
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  Build and deploy complex stream analytics applications without writing any code  Only open source tool in the market with graphical programming paradigm  Speed time-to-market for complex stream apps  Build stream analytics apps without specialized skillsets.  Decouple data schema from the streaming application itself  Support multiple underlining streaming engine
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who Uses SAM?
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Real-Time Descriptive Analytics
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Predictive Analytics with SAM Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Predictive Analytics
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-Time Predictive Analytics  Question: No violation events but what might happen that I need to be worried about?  My data science team has a model that can predict that based on – Weather – Roads – Driver HR info like driver certification status, wagePlan – Driver timesheet info like hours, and miles logged over the last week
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Building the Predictive Model on HDP Identify suitable ML algorithms to train a model – we will use classification algorithms as we have labeled events data 2 Transform enriched events data to a format that is friendly to Spark MLlib – many ML libs expect training data in a certain format 3 Train a logistic classification Spark model on YARN, with above events as training input, and iterate to fine tune generated model 4 Explore small subset of events to identify predictive features and make a hypothesis. E.g. hypothesis: “foggy weather causes driver violations” 1
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logistical Regression Model
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scoring the Predictive Model on HDF Use SAM’s enrich/custom processors to enrich the event with the features required for the model6 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 7 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features8 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model9 Alert / Notify / Action Export the Spark Mllib model and import into the HDF’s Model Registry 5 Model Registry
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Predictive Analytics with SAM
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Integration
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Where did the data come from? Where did it go? Who Produced/Consumed? Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problem Statement:  Common ref architectures use all the HDF components including Nifi for Flow and Storm/SAM for streaming analytics with Schema registry as the glue.  Hence, as data flows through these different components in HDF, governance requirements such as lineage/provenance, chain of custody, security, audit are key requirements for every large enterprise. Solution:  With HDF 3.1, Flow and Streaming components will be integrated with Atlas: – Nifi is integrated with Atlas so that Atlas contains meta information of Nifi Data Flows including source data and target systems of the flow – SAM/Storm is integrated with Atlas that contains meta information about SAM App topologies including source data and target systems Why Should You Care?  Atlas Integration with HDF components allows enterprise to meet governance requirements allowing users to track data as it travels across the data-in-motion platform (HDF) and into the data-at-rest platform (HDP). Apache Atlas Integration for Flow and Streaming Components
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Atlas

Editor's Notes

  1. A tool using a graphically programming paradigm with drag and drop interface allowing customers to build and deploy complex stream applications without writing any code Bring complex stream apps to market considerably faster Allows any organization to to build stream analytics apps without specialized skillsets. The tool provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark, Flink, etc..) The only open source tool in the market that provides graphically programming paradigm to build streaming apps. Customers less concerned about what open source stream engine is used. Decouple Schema from the Stream App via integration with Schema Registry Combination of Flow Management’s Data Flow Management with Stream Processing’s Stream Builder provides a truly differentiated solution in the market to build end to end stream analytical apps