SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 3.1 Streaming Webinar
George Vetticaden
VP of Emerging Products
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reference Architecture: Real-time
Streaming Analytics
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Trucking company w/ large fleet of international trucks
A truck generates millions of events for a given route;
an event could be:
 'Normal' events: starting / stopping of the vehicle
 ‘Violation’ events: speeding, excessive acceleration and
breaking, unsafe tail distance
 ‘Speed’ Events: The speed of a driver that comes in every
minute.
Company uses an application that monitors truck
locations and violations from the truck/driver in real-
time
Route?
Truck?
Driver?
Analysts query a broad
history to understand if
today’s violations are
part of a larger problem
with specific routes,
trucks, or drivers
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Real-time Analytics Architecture with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Implement the Flow App with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Implementing the Flow Requirements with Apache NiFi and Kafka
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Kafka 1.0
Key Highlights
 Kafka 1.0 was the most asked for
feature/component in HDF Kafka 1.0
 Key Features introduced in Kaka
0.11/1.0 included
– Kafka Message Header support
– Transactional Support
– Performance improvements
 These features are critical for
customers who are building
streaming apps
 Customer didn’t’ just want support
for Kafka 1.0 but want full HDF
integration with Kafka 1.0 including
Nifi, Ambari, Ranger, and Atlas
Integration
Kafka 1.0 - Ambari
 Ambari support for Kafka 1.0
– Install, configure, manage & monitor Kafka 1.0
multi-node secure clusters.
Kafka 1.0 - Ranger
 Ranger ACL support for Kafka 1.0
– Support for resource and tag based access
control for Kafka 1.0
Kafka 1.0 - NiFi & SAM
 New Nifi Kafka 1.0 Processors
– Kafka header & transaction support
 SAM Source/Sink for Kafka 1.0
Kafka 1.0 - Atlas
 Lineage of Kafka Topics
– Who are all the consumers and producers
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF Ref App Data Sources: TruckGeoEvent and TruckSpeedEvent Streams
eventType
truckId
driverId
driverName longitude
eventTime routeId
eventSource
2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1|
route
latitude
correlationId
speed
truckId
driverId
driverName
eventTime routeIdeventSource
2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 |
route
 Each Truck emits different event stream
– Truck Geo Event
– Truck Speed Event
Truck Geo Event:
Truck Speed Event:
eventTimeLong
eventTimeLong
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Flow Management Requirements
Flow Requirement # Requirement Description
Req. #1 Edge deployed data collection service needs to capture data from the two
sensors and stream to an IOT gateway powered by Apache Kafka.
Req. #2 The ingestion service will deliver events in CSV format from each sensor to a
Kafka topic (call it raw-all_truck_events_csv) in a secure cluster
Req. #3 Metadata headers need to be sent with each event like the schema key that
identifies the schema for the event in a centralized schema registry store.
Req. #4 Consumers of this raw sensor data need to inspect the meta headers to
lookup the schema and do routing, filtering, and enrichment.
Req. #5 Producers need to publish the enriched streams to their own respective Kafka
topics for consumption for downstream analytics (let's call the Kafka topics for
the two streams: truck_events_avro, truck_speed_events_avro)
Req. #6 Agility is key. Developer should be able to create consumers and producers
quickly in a code-less approach, preferably UI driven.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Flow Management App
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Implement the Streaming App with HDF
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Streaming Analytics Requirements
Streaming Analytics
Requirement #
Requirement Description
Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the
enriched geo and speed streams to.
Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation
window.
Req. #3 Apply rules on the stream to filter on events of interest.
Req. #4 Enrich the stream with features required for a machine learning (ML) model.
The enrichment entails performing lookups for driver HR info, hours/miles
driven in the past week, weather info.
Req. #5 Normalize the events in the stream to feed into the PMML model.
Req. #6 Execute a predictive logistical regression model on the stream built with
Spark ML to predict if a driver is in danger going to commit a violation.
Req. #7 Alert and feed into real-time dashboard if model predicted a violation.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Implementing the Streaming Analytics Requirements with SAM
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Value Proposition
 Build and deploy complex stream analytics applications without writing any code
 Only open source tool in the market with graphical programming paradigm
 Speed time-to-market for complex stream apps
 Build stream analytics apps without specialized skillsets.
 Decouple data schema from the streaming application itself
 Support multiple underlining streaming engine
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who Uses SAM?
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM is All about Doing Real-Time Analytics on the Stream
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Real-Time Descriptive Analytics
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Predictive Analytics with SAM
Real-Time
Prescriptive
Analytics
Real-Time Analytics
Real-Time
Predictive
Analytics
Real-Time
Descriptive
Analytics
What should we do
right now?
What could happen
now/soon?
What is happening
right now?
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Predictive Analytics
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real-Time Predictive Analytics
 Question: No violation events but what might happen that I need to be worried about?
 My data science team has a model that can predict that based on
– Weather
– Roads
– Driver HR info like driver certification status, wagePlan
– Driver timesheet info like hours, and miles logged over the last week
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Building the Predictive Model on HDP
Identify suitable ML algorithms to train a model – we will
use classification algorithms as we have labeled events
data
2
Transform enriched events data to a format that is
friendly to Spark MLlib – many ML libs expect
training data in a certain format
3
Train a logistic classification Spark model on YARN, with
above events as training input, and iterate to fine tune
generated model
4
Explore small subset of events to identify predictive
features and make a hypothesis. E.g. hypothesis: “foggy
weather causes driver violations”
1
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logistical Regression Model
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scoring the Predictive Model on HDF
Use SAM’s enrich/custom processors to enrich the event
with the features required for the model6
Enrich with Features
Use SAM’s projection/custom processors to
transform/normalize the streaming event and the
features required for the model
7
Transform/Normalize
Use SAM’s PMML processor to score the model for each
stream event with its required features8
Score Model
Use SAM’s rule and notification processors to alert,
notify and take action using the results of the model9
Alert / Notify / Action
Export the Spark Mllib model and import into the HDF’s
Model Registry
5 Model
Registry
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Predictive Analytics with SAM
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Integration
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
Where did the data come from?
Where did it go? Who Produced/Consumed?
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Cloud Instance in
Different Geo
Locations
China Cloud Instance
(IBM)
Germany Cloud Instance
(Azure)
US Cloud Instance
(Amazon)
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
Centralized Schema
Repository
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problem Statement:
 Common ref architectures use all the HDF components including Nifi for Flow and Storm/SAM for
streaming analytics with Schema registry as the glue.
 Hence, as data flows through these different components in HDF, governance requirements such as
lineage/provenance, chain of custody, security, audit are key requirements for every large enterprise.
Solution:
 With HDF 3.1, Flow and Streaming components will be integrated with Atlas:
– Nifi is integrated with Atlas so that Atlas contains meta information of Nifi Data Flows including source data and target
systems of the flow
– SAM/Storm is integrated with Atlas that contains meta information about SAM App topologies including source data
and target systems
Why Should You Care?
 Atlas Integration with HDF components allows enterprise to meet governance requirements allowing
users to track data as it travels across the data-in-motion platform (HDF) and into the data-at-rest
platform (HDP).
Apache Atlas Integration for Flow and Streaming Components
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Atlas

More Related Content

What's hot

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
Mats Johansson
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
DataWorks Summit
 
EDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteEDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old Favorite
Hortonworks
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
DataWorks Summit
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Hortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
Hortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
DataWorks Summit
 
Deep Learning 101
Deep Learning 101Deep Learning 101
Deep Learning 101
DataWorks Summit
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
Carolyn Duby
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
Carolyn Duby
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - Tokyo
DataWorks Summit
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
Hortonworks
 

What's hot (20)

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
 
EDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old FavoriteEDW Optimization: A Modern Twist on an Old Favorite
EDW Optimization: A Modern Twist on an Old Favorite
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
Deep Learning 101
Deep Learning 101Deep Learning 101
Deep Learning 101
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - Tokyo
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 

Similar to HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
DataWorks Summit
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
gvetticaden
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
Sriharsha Chintalapani
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Sriharsha Chintalapani
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
DataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Kelly Kohlleffel
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
Hortonworks
 
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
Toralf Richter
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testing
Vinay Kumar
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
DataWorks Summit/Hadoop Summit
 
Linux Akraino Blueprint
Linux Akraino BlueprintLinux Akraino Blueprint
Linux Akraino Blueprint
Liz Warner
 
What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)
AppDynamics
 
Resume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expResume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expSandip Mohod
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 

Similar to HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features (20)

Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...APIs and Services for  Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Vnv kumar performance testing
Vnv kumar performance testingVnv kumar performance testing
Vnv kumar performance testing
 
Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
 
Linux Akraino Blueprint
Linux Akraino BlueprintLinux Akraino Blueprint
Linux Akraino Blueprint
 
What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)What's New in the Winter '16 Release (4.2)
What's New in the Winter '16 Release (4.2)
 
Resume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_expResume_Sandip_Mohod_Java_9_plus_years_exp
Resume_Sandip_Mohod_Java_9_plus_years_exp
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 

More from Hortonworks

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
Hortonworks
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
Hortonworks
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Hortonworks
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
Hortonworks
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Hortonworks
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
 

More from Hortonworks (20)

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 3.1 Streaming Webinar George Vetticaden VP of Emerging Products
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference Architecture: Real-time Streaming Analytics
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Trucking company w/ large fleet of international trucks A truck generates millions of events for a given route; an event could be:  'Normal' events: starting / stopping of the vehicle  ‘Violation’ events: speeding, excessive acceleration and breaking, unsafe tail distance  ‘Speed’ Events: The speed of a driver that comes in every minute. Company uses an application that monitors truck locations and violations from the truck/driver in real- time Route? Truck? Driver? Analysts query a broad history to understand if today’s violations are part of a larger problem with specific routes, trucks, or drivers
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Real-time Analytics Architecture with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Implement the Flow App with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Implementing the Flow Requirements with Apache NiFi and Kafka
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Kafka 1.0 Key Highlights  Kafka 1.0 was the most asked for feature/component in HDF Kafka 1.0  Key Features introduced in Kaka 0.11/1.0 included – Kafka Message Header support – Transactional Support – Performance improvements  These features are critical for customers who are building streaming apps  Customer didn’t’ just want support for Kafka 1.0 but want full HDF integration with Kafka 1.0 including Nifi, Ambari, Ranger, and Atlas Integration Kafka 1.0 - Ambari  Ambari support for Kafka 1.0 – Install, configure, manage & monitor Kafka 1.0 multi-node secure clusters. Kafka 1.0 - Ranger  Ranger ACL support for Kafka 1.0 – Support for resource and tag based access control for Kafka 1.0 Kafka 1.0 - NiFi & SAM  New Nifi Kafka 1.0 Processors – Kafka header & transaction support  SAM Source/Sink for Kafka 1.0 Kafka 1.0 - Atlas  Lineage of Kafka Topics – Who are all the consumers and producers
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF Ref App Data Sources: TruckGeoEvent and TruckSpeedEvent Streams eventType truckId driverId driverName longitude eventTime routeId eventSource 2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1| route latitude correlationId speed truckId driverId driverName eventTime routeIdeventSource 2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 | route  Each Truck emits different event stream – Truck Geo Event – Truck Speed Event Truck Geo Event: Truck Speed Event: eventTimeLong eventTimeLong
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Flow Management Requirements Flow Requirement # Requirement Description Req. #1 Edge deployed data collection service needs to capture data from the two sensors and stream to an IOT gateway powered by Apache Kafka. Req. #2 The ingestion service will deliver events in CSV format from each sensor to a Kafka topic (call it raw-all_truck_events_csv) in a secure cluster Req. #3 Metadata headers need to be sent with each event like the schema key that identifies the schema for the event in a centralized schema registry store. Req. #4 Consumers of this raw sensor data need to inspect the meta headers to lookup the schema and do routing, filtering, and enrichment. Req. #5 Producers need to publish the enriched streams to their own respective Kafka topics for consumption for downstream analytics (let's call the Kafka topics for the two streams: truck_events_avro, truck_speed_events_avro) Req. #6 Agility is key. Developer should be able to create consumers and producers quickly in a code-less approach, preferably UI driven.
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Flow Management App
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Implement the Streaming App with HDF Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Streaming Analytics Requirements Streaming Analytics Requirement # Requirement Description Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the enriched geo and speed streams to. Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation window. Req. #3 Apply rules on the stream to filter on events of interest. Req. #4 Enrich the stream with features required for a machine learning (ML) model. The enrichment entails performing lookups for driver HR info, hours/miles driven in the past week, weather info. Req. #5 Normalize the events in the stream to feed into the PMML model. Req. #6 Execute a predictive logistical regression model on the stream built with Spark ML to predict if a driver is in danger going to commit a violation. Req. #7 Alert and feed into real-time dashboard if model predicted a violation.
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Implementing the Streaming Analytics Requirements with SAM
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  Build and deploy complex stream analytics applications without writing any code  Only open source tool in the market with graphical programming paradigm  Speed time-to-market for complex stream apps  Build stream analytics apps without specialized skillsets.  Decouple data schema from the streaming application itself  Support multiple underlining streaming engine
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who Uses SAM?
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Real-Time Descriptive Analytics
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Predictive Analytics with SAM Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Predictive Analytics
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-Time Predictive Analytics  Question: No violation events but what might happen that I need to be worried about?  My data science team has a model that can predict that based on – Weather – Roads – Driver HR info like driver certification status, wagePlan – Driver timesheet info like hours, and miles logged over the last week
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Building the Predictive Model on HDP Identify suitable ML algorithms to train a model – we will use classification algorithms as we have labeled events data 2 Transform enriched events data to a format that is friendly to Spark MLlib – many ML libs expect training data in a certain format 3 Train a logistic classification Spark model on YARN, with above events as training input, and iterate to fine tune generated model 4 Explore small subset of events to identify predictive features and make a hypothesis. E.g. hypothesis: “foggy weather causes driver violations” 1
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logistical Regression Model
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scoring the Predictive Model on HDF Use SAM’s enrich/custom processors to enrich the event with the features required for the model6 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 7 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features8 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model9 Alert / Notify / Action Export the Spark Mllib model and import into the HDF’s Model Registry 5 Model Registry
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Predictive Analytics with SAM
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Integration
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors Where did the data come from? Where did it go? Who Produced/Consumed? Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Cloud Instance in Different Geo Locations China Cloud Instance (IBM) Germany Cloud Instance (Azure) US Cloud Instance (Amazon) Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform Centralized Schema Repository
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problem Statement:  Common ref architectures use all the HDF components including Nifi for Flow and Storm/SAM for streaming analytics with Schema registry as the glue.  Hence, as data flows through these different components in HDF, governance requirements such as lineage/provenance, chain of custody, security, audit are key requirements for every large enterprise. Solution:  With HDF 3.1, Flow and Streaming components will be integrated with Atlas: – Nifi is integrated with Atlas so that Atlas contains meta information of Nifi Data Flows including source data and target systems of the flow – SAM/Storm is integrated with Atlas that contains meta information about SAM App topologies including source data and target systems Why Should You Care?  Atlas Integration with HDF components allows enterprise to meet governance requirements allowing users to track data as it travels across the data-in-motion platform (HDF) and into the data-at-rest platform (HDP). Apache Atlas Integration for Flow and Streaming Components
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Atlas

Editor's Notes

  1. A tool using a graphically programming paradigm with drag and drop interface allowing customers to build and deploy complex stream applications without writing any code Bring complex stream apps to market considerably faster Allows any organization to to build stream analytics apps without specialized skillsets. The tool provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark, Flink, etc..) The only open source tool in the market that provides graphically programming paradigm to build streaming apps. Customers less concerned about what open source stream engine is used. Decouple Schema from the Stream App via integration with Schema Registry Combination of Flow Management’s Data Flow Management with Stream Processing’s Stream Builder provides a truly differentiated solution in the market to build end to end stream analytical apps