Uploaded byDataWorks Summit

PDF, PPTX607 views

Data-In-Motion Unleashed

The document discusses Hortonworks DataFlow (HDF), which is a platform for data in motion. HDF allows users to collect data at the edge, route and process streaming data with Apache NiFi and Kafka, and analyze, visualize, predict and prescribe outcomes from the data using HDF platform services. The HDF platform provides scalable stream processing, security, data provenance, and management capabilities for data in motion applications across the enterprise.

Data-in-Motion Unleashed
Andrew Psaltis
IoT Solutions Architect
Hortonworks

Hortonworks DataFlow (HDF)
Ã The HDF – Data in Motion
– Journey at Summit
Ã The End-to-End Data in
Motion Challenge
Ã Introducing Streaming
Analytics Manager (SAM)
HDF
HORTONWORKS
DATAFLOW

The HDF Journey
STREAM
PROCESSING
HDF
Real-time dashboard
on top of HDP
Hadoop Summit

Enhanced with enrichment
and predictive analytics
Hadoop Summit
The HDF Journey
STREAM
PROCESSING
HDF

The HDF Journey
HDF
FLOW
MANAGEMENT
All about Data Collection
Hadoop Summit

The HDF Journey
STREAM
PROCESSING
HDF
ENTERPRISE
SERVICES
FLOW
MANAGEMENT
2017
HDF 3.0
End-to-end solution
for data in motion
Dataworks Summit

Data In Motion Across the Enterprise
DATA
LIFECYCLE
BEGINS
DATA
VISUALIZATIONS
& OPERATIONS
CONTROL
ROUTING,
STREAM PROCESSING
& ANALYTICS

Across the Enterprise with HDF
COLLECT DATA
AT THE EDGE
ANALYZE, VISUALIZE,
PREDICT,
& PRESCRIBE
ROUTE,
PROCESS,
DELIVER,
& SYNDICATE
HDF
SECURITY
PROVENANCE
DATA
COMMAND &
CONTROL
FLOW
MANAGEMENT
STREAM PROCESSING
& ANALYTICS

Collect Data at the Edge

Collect Data at the Edge
SENT SECURED / CONTROLLED / TRACKED
RECEIVED FROM VEHICLE SENSORS
AQUIRE
PRIORITIZE
ANALYZE/ROUTE
TRANSMIT

HDF
Route, Process, Deliver and Syndicate
Apache NiFi
Apache Kafka

HDF
Route, Process, Deliver and Syndicate
Apache NiFi
Apache Kafka

Analyze, Visualize, Predict and Prescribe
HDF

Analyze, Visualize, Predict and Prescribe
HDF

End-to-End Vision
Ã Intelligence from edge to core
Ã Closed-loop processing
Ã Fully governed and secure
Ã Centralized management/policy with distributed execution
Ã Multi-tenant, multi-organization infrastructure

HDF: Data-in-Motion Platform
Ã Scalable broker for streaming apps
Ã Scale out computational engine
Ã Complex event processing
Ã Pattern matching
Ã Prescriptive/predictive analytics
FLOW MANAGEMENT STREAM PROCESSING /
ANALYTICS
ENTERPRISE SERVICES
▪ Acquisition and delivery
▪ Transformation, filtering, and routing
▪ Simple event processing
▪ Complete data provenance
▪ Bi-directional communication
▪ Provisioning and management
▪ Monitoring and security
▪ Auditing and compliance
▪ Governance and multi-tenancy

Find more #DWS17 sessions and slides at:
www.DataWorksSummit.com

18
T H A N K Y O U

Recommended

PPTX

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud

byDataWorks Summit/Hadoop Summit

PPTX

Scaling Data Science on Big Data

byDataWorks Summit

PPTX

Hadoop Journey at Walgreens

byDataWorks Summit

PPTX

Hadoop for the Masses

byDataWorks Summit/Hadoop Summit

PPTX

Insights into Real World Data Management Challenges

byDataWorks Summit

PDF

Filling the Data Lake

byDataWorks Summit/Hadoop Summit

PPTX

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

byDataWorks Summit

PDF

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

byDataWorks Summit

PPTX

Big Data at your Desk with KNIME

byDataWorks Summit/Hadoop Summit

PPTX

Build Big Data Enterprise Solutions Faster on Azure HDInsight

byDataWorks Summit/Hadoop Summit

PPTX

Securing your Big Data Environments in the Cloud

byDataWorks Summit

PPTX

Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...

byDataWorks Summit/Hadoop Summit

POTX

Addressing Enterprise Customer Pain Points with a Data Driven Architecture

byDataWorks Summit

PPTX

Beyond TCO

byDataWorks Summit/Hadoop Summit

PDF

Empowering you with Democratized Data Access, Data Science and Machine Learning

byDataWorks Summit

PPTX

Enterprise large scale graph analytics and computing base on distribute graph...

byDataWorks Summit

PPTX

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...

byDataWorks Summit

PPTX

Big Data in Azure

byDataWorks Summit/Hadoop Summit

PDF

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

byDataWorks Summit

PPTX

The DAP - Where YARN, HBase, Kafka and Spark go to Production

byDataWorks Summit/Hadoop Summit

PDF

High Performance Spatial-Temporal Trajectory Analysis with Spark

byDataWorks Summit/Hadoop Summit

PPTX

GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...

byDataWorks Summit

PPTX

CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta

PDF

On Demand HDP Clusters using Cloudbreak and Ambari

byDataWorks Summit/Hadoop Summit

PPTX

Hadoop Reporting and Analysis - Jaspersoft

PPTX

It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...

byDataWorks Summit

PPTX

Top Trends in Building Data Lakes for Machine Learning and AI

byHolden Ackerman

PDF

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

PDF

Next Generation Execution for Apache Storm

byDataWorks Summit

PDF

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...

byDataWorks Summit

More Related Content

PPTX

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud

byDataWorks Summit/Hadoop Summit

PPTX

Scaling Data Science on Big Data

byDataWorks Summit

PPTX

Hadoop Journey at Walgreens

byDataWorks Summit

PPTX

Hadoop for the Masses

byDataWorks Summit/Hadoop Summit

PPTX

Insights into Real World Data Management Challenges

byDataWorks Summit

PDF

Filling the Data Lake

byDataWorks Summit/Hadoop Summit

PPTX

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

byDataWorks Summit

PDF

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

byDataWorks Summit

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud

byDataWorks Summit/Hadoop Summit

Scaling Data Science on Big Data

byDataWorks Summit

Hadoop Journey at Walgreens

byDataWorks Summit

Hadoop for the Masses

byDataWorks Summit/Hadoop Summit

Insights into Real World Data Management Challenges

byDataWorks Summit

Filling the Data Lake

byDataWorks Summit/Hadoop Summit

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

byDataWorks Summit

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

byDataWorks Summit

What's hot

PPTX

Big Data at your Desk with KNIME

byDataWorks Summit/Hadoop Summit

PPTX

Build Big Data Enterprise Solutions Faster on Azure HDInsight

byDataWorks Summit/Hadoop Summit

PPTX

Securing your Big Data Environments in the Cloud

byDataWorks Summit

PPTX

Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...

byDataWorks Summit/Hadoop Summit

POTX

Addressing Enterprise Customer Pain Points with a Data Driven Architecture

byDataWorks Summit

PPTX

Beyond TCO

byDataWorks Summit/Hadoop Summit

PDF

Empowering you with Democratized Data Access, Data Science and Machine Learning

byDataWorks Summit

PPTX

Enterprise large scale graph analytics and computing base on distribute graph...

byDataWorks Summit

PPTX

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...

byDataWorks Summit

PPTX

Big Data in Azure

byDataWorks Summit/Hadoop Summit

PDF

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

byDataWorks Summit

PPTX

The DAP - Where YARN, HBase, Kafka and Spark go to Production

byDataWorks Summit/Hadoop Summit

PDF

High Performance Spatial-Temporal Trajectory Analysis with Spark

byDataWorks Summit/Hadoop Summit

PPTX

GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...

byDataWorks Summit

PPTX

CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta

PDF

On Demand HDP Clusters using Cloudbreak and Ambari

byDataWorks Summit/Hadoop Summit

PPTX

Hadoop Reporting and Analysis - Jaspersoft

PPTX

It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...

byDataWorks Summit

PPTX

Top Trends in Building Data Lakes for Machine Learning and AI

byHolden Ackerman

PDF

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

Big Data at your Desk with KNIME

byDataWorks Summit/Hadoop Summit

Build Big Data Enterprise Solutions Faster on Azure HDInsight

byDataWorks Summit/Hadoop Summit

Securing your Big Data Environments in the Cloud

byDataWorks Summit

Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...

byDataWorks Summit/Hadoop Summit

Addressing Enterprise Customer Pain Points with a Data Driven Architecture

byDataWorks Summit

Beyond TCO

byDataWorks Summit/Hadoop Summit

Empowering you with Democratized Data Access, Data Science and Machine Learning

byDataWorks Summit

Enterprise large scale graph analytics and computing base on distribute graph...

byDataWorks Summit

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...

byDataWorks Summit

Big Data in Azure

byDataWorks Summit/Hadoop Summit

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

byDataWorks Summit

The DAP - Where YARN, HBase, Kafka and Spark go to Production

byDataWorks Summit/Hadoop Summit

High Performance Spatial-Temporal Trajectory Analysis with Spark

byDataWorks Summit/Hadoop Summit

GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...

byDataWorks Summit

CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta

On Demand HDP Clusters using Cloudbreak and Ambari

byDataWorks Summit/Hadoop Summit

Hadoop Reporting and Analysis - Jaspersoft

It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...

byDataWorks Summit

Top Trends in Building Data Lakes for Machine Learning and AI

byHolden Ackerman

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

Viewers also liked

PDF

Next Generation Execution for Apache Storm

byDataWorks Summit

PDF

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...

byDataWorks Summit

PDF

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron

byDataWorks Summit

PDF

How Big Data and Deep Learning are Revolutionizing AML and Financial Crime De...

byDataWorks Summit

PDF

Data Science Crash Course

byDataWorks Summit

PDF

Delivering Data Science to the Business

byDataWorks Summit

PDF

SparkR Best Practices for R Data Scientists

byDataWorks Summit

PDF

Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...

byDataWorks Summit

PDF

The Apache Way

byDataWorks Summit

PDF

Data Guarantees and Fault Tolerance in Streaming Systems

byDataWorks Summit

PDF

Apache Hadoop Crash Course

byDataWorks Summit

PDF

Beyond Big Data: Data Science and AI

byDataWorks Summit

PDF

The Future of Data in Telecom and the Rise of Connected Communities

byDataWorks Summit

PDF

Apache Spark Crash Course

byDataWorks Summit

PDF

Running Zeppelin in Enterprise

byDataWorks Summit

PDF

An Apache Hive Based Data Warehouse

byDataWorks Summit

PDF

Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi

byDataWorks Summit

PPTX

Performance Update: When Apache ORC Met Apache Spark

byDataWorks Summit

Next Generation Execution for Apache Storm

byDataWorks Summit

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...

byDataWorks Summit

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron

byDataWorks Summit

How Big Data and Deep Learning are Revolutionizing AML and Financial Crime De...

byDataWorks Summit

Data Science Crash Course

byDataWorks Summit

Delivering Data Science to the Business

byDataWorks Summit

SparkR Best Practices for R Data Scientists

byDataWorks Summit

Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...

byDataWorks Summit

The Apache Way

byDataWorks Summit

Data Guarantees and Fault Tolerance in Streaming Systems

byDataWorks Summit

Apache Hadoop Crash Course

byDataWorks Summit

Beyond Big Data: Data Science and AI

byDataWorks Summit

The Future of Data in Telecom and the Rise of Connected Communities

byDataWorks Summit

Apache Spark Crash Course

byDataWorks Summit

Running Zeppelin in Enterprise

byDataWorks Summit

An Apache Hive Based Data Warehouse

byDataWorks Summit

Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi

byDataWorks Summit

Performance Update: When Apache ORC Met Apache Spark

byDataWorks Summit

Similar to Data-In-Motion Unleashed

PDF

What's new in Hortonworks DataFlow 3.0 by Andrew Psaltis

PPTX

Hortonworks Data In Motion Webinar Series Pt. 2

PDF

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi

byTimothy Spann

PDF

Introduction to Streaming Analytics Manager

PPTX

Hortonworks Data In Motion Series Part 4

PPTX

Hortonworks Data in Motion Webinar Series - Part 1

PPTX

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI

PDF

Data in Motion - Data at Rest - Hortonworks a Modern Architecture

byMats Johansson

PDF

Data on the Move - DataCon DC

PPTX

HDF Powered by Apache NiFi Introduction

byMilind Pandit

PPTX

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

PDF

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level

PDF

Spark meetup stream processing use cases

bypunesparkmeetup

PDF

Oil & Gas Big Data use cases

byelephantscale

PPTX

Unlocking insights in streaming data

PDF

Welcoming Hortonworks Data Flow 3.1

PDF

Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...

PPTX

[Hortonworks] Future Of Data: Madrid - HDF & Data in motion

PDF

Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data

byMats Johansson

PDF

HDF 3.2 - What's New

What's new in Hortonworks DataFlow 3.0 by Andrew Psaltis

Hortonworks Data In Motion Webinar Series Pt. 2

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi

byTimothy Spann

Introduction to Streaming Analytics Manager

Hortonworks Data In Motion Series Part 4

Hortonworks Data in Motion Webinar Series - Part 1

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI

Data in Motion - Data at Rest - Hortonworks a Modern Architecture

byMats Johansson

Data on the Move - DataCon DC

HDF Powered by Apache NiFi Introduction

byMilind Pandit

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level

Spark meetup stream processing use cases

bypunesparkmeetup

Oil & Gas Big Data use cases

byelephantscale

Unlocking insights in streaming data

Welcoming Hortonworks Data Flow 3.1

Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...

[Hortonworks] Future Of Data: Madrid - HDF & Data in motion

Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data

byMats Johansson

HDF 3.2 - What's New

More from DataWorks Summit

PPTX

Data Science Crash Course

byDataWorks Summit

PPTX

Floating on a RAFT: HBase Durability with Apache Ratis

byDataWorks Summit

PPTX

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

byDataWorks Summit

PDF

HBase Tales From the Trenches - Short stories about most common HBase operati...

byDataWorks Summit

PPTX

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

byDataWorks Summit

PPTX

Managing the Dewey Decimal System

byDataWorks Summit

PPTX

Practical NoSQL: Accumulo's dirlist Example

byDataWorks Summit

PPTX

HBase Global Indexing to support large-scale data ingestion at Uber

byDataWorks Summit

PPTX

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

byDataWorks Summit

PPTX

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

byDataWorks Summit

PPTX

Supporting Apache HBase : Troubleshooting and Supportability Improvements

byDataWorks Summit

PPTX

Security Framework for Multitenant Architecture

byDataWorks Summit

PDF

Presto: Optimizing Performance of SQL-on-Anything Engine

byDataWorks Summit

PPTX

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

byDataWorks Summit

PPTX

Extending Twitter's Data Platform to Google Cloud

byDataWorks Summit

PPTX

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

byDataWorks Summit

PPTX

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

byDataWorks Summit

PPTX

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

byDataWorks Summit

PDF

Computer Vision: Coming to a Store Near You

byDataWorks Summit

PPTX

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

byDataWorks Summit

Data Science Crash Course

byDataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis

byDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

byDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...

byDataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

byDataWorks Summit

Managing the Dewey Decimal System

byDataWorks Summit

Practical NoSQL: Accumulo's dirlist Example

byDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber

byDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

byDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

byDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability Improvements

byDataWorks Summit

Security Framework for Multitenant Architecture

byDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything Engine

byDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

byDataWorks Summit

Extending Twitter's Data Platform to Google Cloud

byDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

byDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

byDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

byDataWorks Summit

Computer Vision: Coming to a Store Near You

byDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

byDataWorks Summit

Recently uploaded

PDF

Chapter 5 Security Mechanisms iin computer security.pdf

byGetnet Tigabie Askale -(GM)

PDF

Security-Fails-in-the-Gaps-Between-Systems.pptx.pdf

byOhhproJunction

PDF

GenerationAI_Paris_2025_Architecting_Intelligence.pdf

PPTX

Tech Trends 2026: AI Agents, Quantum Computing, Robotics & Cybersecurity

byridwansassman

PDF

shayk.online - Anonymous chat with Sinatra and WebSockets

byEleanor McHugh

PDF

Escape from the Forbidden Zone: Smuggling green and inclusive tech past the g...

byBookNet Canada

PDF

UiPath Automation Developer Associate Training Series 2025 - Session 4

PDF

GDG Cloud Southlake #49: Pradeep R Kumar: Implications of Agentic AI for Iden...

byJames Anderson

PDF

Transcript: Escape from the Forbidden Zone: Smuggling green and inclusive tec...

byBookNet Canada

PDF

Automated Governance for FME Flow: Smarter Admin at Scale

bySafe Software

PDF

Preserve workload integrity during cross-architecture migration

byPrincipled Technologies

PDF

From Hallucinations to High-Confidence AI with Data Enrichment.pdf

PPTX

Do You Control the AI, or Does the AI Control You?

bymedhateltoukhy

PDF

final~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.pdf

byomarbishtawi04

PPTX

Automation Without Apprentices: How AI Challenges the Open Source Way

PPTX

Microsoft Azure News - February 2026 - BAUG

byDaniel Toomey

PPTX

CTO Strategy OS 2026: The Tech, AI & Cloud Playbook Boards Want

byridwansassman

PDF

February 2026 Patch Tuesday hosted by Chris Goettl and Todd Schell

PPTX

apidays Paris 2025 | Building Agentic Workflows

PDF

How AI Can Help Platform Engineers Build Better Platforms

byAll Things Open

Chapter 5 Security Mechanisms iin computer security.pdf

byGetnet Tigabie Askale -(GM)

Security-Fails-in-the-Gaps-Between-Systems.pptx.pdf

byOhhproJunction

GenerationAI_Paris_2025_Architecting_Intelligence.pdf

Tech Trends 2026: AI Agents, Quantum Computing, Robotics & Cybersecurity

byridwansassman

shayk.online - Anonymous chat with Sinatra and WebSockets

byEleanor McHugh

Escape from the Forbidden Zone: Smuggling green and inclusive tech past the g...

byBookNet Canada

UiPath Automation Developer Associate Training Series 2025 - Session 4

GDG Cloud Southlake #49: Pradeep R Kumar: Implications of Agentic AI for Iden...

byJames Anderson

Transcript: Escape from the Forbidden Zone: Smuggling green and inclusive tec...

byBookNet Canada

Automated Governance for FME Flow: Smarter Admin at Scale

bySafe Software

Preserve workload integrity during cross-architecture migration

byPrincipled Technologies

From Hallucinations to High-Confidence AI with Data Enrichment.pdf

Do You Control the AI, or Does the AI Control You?

bymedhateltoukhy

final~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.pdf

byomarbishtawi04

Automation Without Apprentices: How AI Challenges the Open Source Way

Microsoft Azure News - February 2026 - BAUG

byDaniel Toomey

CTO Strategy OS 2026: The Tech, AI & Cloud Playbook Boards Want

byridwansassman

February 2026 Patch Tuesday hosted by Chris Goettl and Todd Schell

apidays Paris 2025 | Building Agentic Workflows

How AI Can Help Platform Engineers Build Better Platforms

byAll Things Open

Data-In-Motion Unleashed

1.
Data-in-Motion Unleashed Andrew Psaltis IoT Solutions Architect Hortonworks
2.
Hortonworks DataFlow (HDF) Ã The HDF – Data in Motion –Journey at Summit Ã The End-to-End Data in Motion Challenge Ã Introducing Streaming Analytics Manager (SAM) HDF HORTONWORKS DATAFLOW
3.
The HDF Journey STREAM PROCESSING HDF Real-time dashboard on top of HDP Hadoop Summit
4.
Enhanced with enrichment and predictive analytics Hadoop Summit The HDF Journey STREAM PROCESSING HDF
5.
The HDF Journey HDF FLOW MANAGEMENT All about Data Collection Hadoop Summit
6.
The HDF Journey STREAM PROCESSING HDF ENTERPRISE SERVICES FLOW MANAGEMENT 2017 HDF 3.0 End-to-end solution for data in motion Dataworks Summit
7.
Data In Motion Across the Enterprise DATA LIFECYCLE BEGINS DATA VISUALIZATIONS & OPERATIONS CONTROL ROUTING, STREAM PROCESSING & ANALYTICS
8.
Across the Enterprise with HDF COLLECT DATA AT THE EDGE ANALYZE, VISUALIZE, PREDICT, & PRESCRIBE ROUTE, PROCESS, DELIVER, & SYNDICATE HDF SECURITY PROVENANCE DATA COMMAND & CONTROL FLOW MANAGEMENT STREAM PROCESSING & ANALYTICS
9.
Collect Data at the Edge
10.
Collect Data at the Edge SENT SECURED / CONTROLLED / TRACKED RECEIVED FROM VEHICLE SENSORS AQUIRE PRIORITIZE ANALYZE/ROUTE TRANSMIT
11.
HDF Route, Process, Deliver and Syndicate Apache NiFi Apache Kafka
12.
HDF Route, Process, Deliver and Syndicate Apache NiFi Apache Kafka
13.
Analyze, Visualize, Predict and Prescribe HDF
14.
Analyze, Visualize, Predict and Prescribe HDF
15.
End-to-End Vision Ã Intelligence from edge to core Ã Closed-loop processing ÃFully governed and secure Ã Centralized management/policy with distributed execution Ã Multi-tenant, multi-organization infrastructure
16.
HDF: Data-in-Motion Platform Ã Scalable broker for streaming apps Ã Scale out computational engine ÃComplex event processing Ã Pattern matching Ã Prescriptive/predictive analytics FLOW MANAGEMENT STREAM PROCESSING / ANALYTICS ENTERPRISE SERVICES ▪ Acquisition and delivery ▪ Transformation, filtering, and routing ▪ Simple event processing ▪ Complete data provenance ▪ Bi-directional communication ▪ Provisioning and management ▪ Monitoring and security ▪ Auditing and compliance ▪ Governance and multi-tenancy
17.
Find more #DWS17 sessions and slides at: www.DataWorksSummit.com
18.
18 T H AN K Y O U

Editor's Notes

#2 TALKING POINTS Welcome to DataWorks Summit – Why name change? Thank community for their support Market & community growth perspective? Approaching 6 years of Hwx Scott intro
#3 I'm going to spend a little bit of time this morning walking you through the journey that we've taken over the last four years with the data in motion concept. I’ll spend a few minutes after that walking you through the details of what we mean when we say data in motion. Many of you will hear quite a few buzzwords over the next few days and I want to make they land on a technical level. And then we're going to spend the bulk of the time with me showing you in detail something we're very excited about - streaming analytics manager.
#4 In 2014 we came out and gave a keynote where we described how you can use HDP to do real time visualization of data. At that time there was a big shift happening where people understood what they could do in HDP, specifically to analyze data at scales they never could before. But they were starting to have much more interest in lower latency analysis. We presented how you can do this type of processing and analysis, however we totally hand waved on how the data gets there in the first place.
#5 Then in 2015 we came out and described how to enhance the data as it lands into the cluster. As it's arriving enrich it and make it more useful for analysis and visualization. Again still very much hand waving on how the data gets there in the first place.
#6 In 2016 we wanted to expand our view and we wanted to help customers and companies get much better at how they collect the data, drive it throughout the enterprise and deliver it to the cluster. This time though we hand waved about how you do stream processing on the data.
#7 This year we are very excited that we could come out and tell a truly balanced end to end story and show solutions for how you can collect, process, visualize and understand data in motion all the way through and that's what HDF 3.0 is all about.
#8 Data in motion across the enterprise starts at the edge. For some of you the edge maybe planes, trains and automobiles for others it's traditional enterprise assets; servers, workstations, laptops, network devices and so on. For us it's wherever the data life cycle begins, right at the first moment that it's created. From the very first observation we want to help you command and manage the data all the way through to the next hop whether that's a regional gateway, a core data center or the cloud. Full end-to-end processing of the data along it’s journey. Then to be able to do stream processing as it arrives and provide powerful visualizations allowing you to interact with the data all while it's in motion. All of this in time for you to extract the maximum value from the data. As we all know in many cases the value of the data is perishable and reduces over time so we want to help you operate in time and on it in time.
#9 As you think about this this problem from end-to-end there are a few cross-cutting concerns that we have to make available all the way through the chain. First and foremost is security, as we move to the edge we step out of the cozy confines of the enterprise where we have LDAP, Active Directory, Kerberos and other technologies which often aren't available to us outside of the walls of the enterprise. We need to think about end to end security and we have to be able to shape shift so that we can use the best and most effective techniques all the way through. We also want to make sure that you understand the origin and attribution of every piece of data. Everywhere that it comes from and everywhere that it goes. Total lineage. Not just once it lands in the cluster but from the very moment it's created to all the systems that use it. Understanding latencies is also a very important part of that story and a big piece of the governance story. Finally a very critical part of our story is command and control. Building these things at scale is very difficult. Specifically it's difficult because once you get it set up you, as an organization want to make changes to it. It's not static, at least if you're doing it right it's not static, you need to be agile. Having a really powerful command control mechanism to allow that agility is a critical part of this story.
#10 Let me set the stage for the demo that I’m going to show you. Let’s imagine we are a transportation company that has a fleet of trucks driving around and we want to gather data from sensors on the vehicles. To do that we are going to use MiNiFi which is a sub project of Apache NiFi that we make available through HDF
#11 With MiNiFi we'll acquire the data from a variety of sensors, do the initial analysis so we can understand the relative value of the data and prioritize it. Then we will use the most effective and most appropriate communication mechanism available to us. Maybe that's activating an LTE signal to send out really critical time sensitive data or maybe for data that has less time sensitivity we buffer it and send it out when we have a WiFi signal.
#12 The data then arrives into a more regional or core location and here we use technologies like Apache NiFi and Apache Kafka
#13 Perhaps we want to normalize the events, add customer reference data to the, tokenize, or further enrich them. We can then syndicate the events for a wide range of uses cases as well as drive them into systems like Hadoop and Spark
#14 Now I'm very excited to be able to tell you about what we can do with that data while it's still moving while it's still very fresh using the Streaming Analytics Manager
#15 With SAM we can do various things such as: perform window based processing temporal spatial correlation evaluate models aggregate enrich all with a schema applied to the data. Having a schema is critical when building a Streaming application, it allows you to see and understand how your data evolves. With SAM we're really targeting app developers, business analysts and operations teams so that we can help them maximize their individual user experience as necessary for their skill set and also give them a way to easily collaborate around a unified platform and do a better job than they ever have before. Let me now introduce you to SAM.
#16 We’ve articulated an end to end GA vision for flow management You can see where we’re heading Closed loop processing Fully Governed Support for multiple processing environments Incorporate machine learning and AI to optimize what is collected, minimize time to insight, etc. The path is truly excited and we look forward to working with all of you on this journey
#17 One of the things that I really love about all of this not only the easy by which developers can build streaming applications, but the power of allowing all the different actors to collaborate. I think you would agree we have a pretty important story and vision here about how we're going continue to bring these things together and further enhance the experience. In closing I want you to think about the fact that your enterprise is not composed of various big cool clusters of distributed systems it is one big logical distributed computing system and we want to help you drive that and help you maximize the value of data all the way through. Again we do that through providing solutions all the way from the edge to the core with flow management, stream processing and enterprise services. Enabling you to bring all of it together semantically through the registries and through a development and deployment experience that helps you harness the power of your data. thank you all very much for your time.