SlideShare a Scribd company logo
1 of 31
HDF Powered by Apache NiFi
Intro
Milind Pandit
Solutions Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda HDF 2.0: Flow Management
– NiFi basics
– NiFi use cases
– NiFi demos
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Enterprise Data Flow
Data Flow
Process and Analyze
Data
Acquire Data
Store Data
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with different business partners and customers
Realistic View of Enterprise Data Flow
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Security
Visualization
On premises In the cloud
Registries/Catalogs Governance (Security/Compliance) Operations
HDF 2.0 – Data in Motion Platform
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow (HDF)
 Constrained
 High-latency
 Localized context
 Hybrid – cloud/on-premises
 Low-latency
 Global context
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control
• Ensures trust of your dataData Lineage (Provenance)
• Because not all data is of equal importanceData Prioritization
• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure
• Adapt to different situations with different requirementsControl Latency vs Throughput
• Security of data, and data accessSecure Control Plane/Data Plane
• ScalabilityScale out Clustering
• Ecosystem flexibility and growthExtensibility
Apache NiFi: Designed for 8 challenges of global enterprise dataflow
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: Three key concepts
• Manage the flow of information
• Data Provenance
• Secure the control plane and data plane
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi – Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-grained
history
• Visual command and control
• Flow templates
• Pluggable/multi-role security
• Designed for extension
• Clustering
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Apache NiFi Use Cases
Predictive Analytics
Ensure the highest value data is captured and available for analysis
Compliance
Gain full transparency into provenance and flow of data
IoT Optimization
Secure, Prioritize, Enrich and Trace data at the edge
Fraud Detection
Move sales transaction data in real time to analyze on demand
Big Data Ingest
Easily and efficiently ingest data into Hadoop
Value Resources
Gain visibility into how data sources are used to determine value
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi used for?
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
– Conversion between formats
– Extraction/Parsing
– Routing decisions
What is Apache NiFi NOT used for?
• Distributed Computation
• Complex Event Processing
• Joins / Complex Rolling Window Operations
Use Cases for Apache NiFi
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
Processor
• Performs the work, can access FlowFiles
Connection
• Links between processors
• Queues that can be dynamically prioritized
Terminology
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
Content-Type: text/html
Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
0101010101110101010101010101 (Binary)
Header
Content
Analogy: FlowFiles are like HTTP Data
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
1. Drag and drop processors to build a flow
2. Start, stop, and configure components in real time
3. View errors and corresponding error messages
4. View statistics and health of data flow
5. Create templates of common processor & connections
Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka,
Deliver to HDFS
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Data Provenance and Why is it Important?
BEGIN
END
LINEAGE
IT and Cloud Operators
• Understand traceability, lineage
• Enable recovery and replay
Compliance Regulations
• Provide an audit trail
• Remediation capabilities
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Provenance Enables Easy Access and Traceability of Changes
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Need Fine-Grained Security and Compliance?
Security
• Secured authentication
• Enterprise authorization services –
entitlements change often
• Encrypted content, encrypted
communications
• People and systems with different roles
require difference access levels
• Tagged/classified data
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Repositories - Pass by reference
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Repositories – Copy on Write
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
 Guaranteed delivery
 Data buffering
‒ Backpressure
‒ Pressure release
 Prioritized queuing
 Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
 Data provenance
 Recovery / recording a rolling log
of fine-grained history
 Designed for extension
Different from Apache NiFi
 Design and Deploy
 Warm re-deploys
Key Features
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Agent
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Company X provides alerting services when users’ resting heart rate higher
than a threshold
Real-Time Insights Require DataFlow Mgmt and Stream Processing
Acquire
Data
Company X Cloud
Instance 1
Acquire
Data
Company X Cloud
Instance 2
Acquire
Data
Company X Cloud
Instance 3
Acquire Data
Across Cloud
Instances
Parse, Filter,
Validate, Enrich
and Route
Core Data Center
Analytics/Pattern
Match
Data
Store
Alerts
Dashboards/Visualization
Flow Management Stream ProcessingLegend:
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion Needs Dataflow Management and Stream Processing
 Acquire data from various Wearable Device’s Cloud Instances
 Move Data from Customer Cloud Instances to on-premise instance
 Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often
changed at run-time.
 Deliver the data data to various downstream systems. New downstream apps should will always
appear and the data should be fed to it when it comes online.
 Parse the device data to standardized format that downstream sysem can understand
 Enrich the data with contextual information including patient/customer info (age, sex, etc..)
 Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight),
and then create an alert/notification.
 Run a Outlier detection model on streaming heart rate that comes in. If the score is above
certain threshold, alert on the heart rate.
Flow
Management
(NiFi, MiNiFi)
Stream
Processing
(Storm, Kafka)
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in
Motion
(Cloud)
Data in
Motion
(on-premises)
Data at
Rest
(on-premises)
Edge
Data
Data in
Motion
Edge
Analytics
Data at
Rest
(Cloud)
Edge
Data
Data at
Rest
(on-premises)
Closed Loop
Analytics
Machine
Learning
Deep
Historical
Analysis
The Future of Data
Architectural Transformation Enabled By Connected Data Platforms
On PremCloud
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Cases for Data in Motion
Use Cases for Data-in-Motion Using DataFlow Mgmt
• Data Ingestion
• Edge Intelligence
• First Mile Problem
• Physical Data Movement
• Simple event processing such as Route, Filter, Enrich,
Transform, etc.
When Only DataFlow
Management is
Required
Use Cases for Data-in-Motion Using DataFlow Mgmt and
Steam Processing
• Flow Management to deliver data for Stream Processing
• PLUS: Complex pattern matching on unbounded streams of
data.
When Both DataFlow
Management and
Stream Processing
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow management
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWS
Azure
Google Cloud
Hadoop
NiFi
Kafka
Storm
Others…
NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0: Data-in-Motion Platform
Enterprise Services
Ambari Ranger Other services
Flow management + Stream Processing
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Stream Processing Features HDF 2.0
 New Storm Connectors
 Storm-Kafka Spout using new
client APIs
 Storm Distributed Log Search
 Storm Dynamic Worker
Profiling
 Kafka Grafana Integration
 Storm Grafana Integration
 Improved Nimbus HA
 Storm Automatic Back
Pressure
 Storm Distributed cache
 Storm Windowing and State
Management
 Storm Performance
improvements
 Improved Kafka SASL
 Storm Topology Event inspector
 Storm Resource Aware
Scheduling
 Storm Dynamic Log Levels
 Pacemaker Storm Daemon
 Kafka Rack Awareness
Developer Productivity EnterpriseReadiness Operational Simplicity
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Information, Resources
Hortonworks Community Connection:
Data Ingestion and Streaming
https://community.hortonworks.com
 Partnerworks: http://hortonworks.com/partners/
 HDF Certification:
http://hortonworks.com/partners/product-integration-certification/
 Webinars: http://hortonworks.com/events-webcasts/
 Sandbox: http://hortonworks.com/events-webcasts/
 HDF: http://hortonworks.com/hdf/
 HDP: http://hortonworks.com/hdp/

More Related Content

What's hot

MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkJoe Percivall
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale Timothy Spann
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)Kay Lerch
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonHortonworks
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseGregory Keys
 

What's hot (19)

MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 

Viewers also liked

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillHenry Saputra
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillTerence Yim
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
 
Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Apache Apex
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemBryan Bende
 
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間Koji Kawamura
 
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
Apache NiFiで、楽して、つながる、広がる IoTプロジェクトApache NiFiで、楽して、つながる、広がる IoTプロジェクト
Apache NiFiで、楽して、つながる、広がる IoTプロジェクトKoji Kawamura
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellKoji Kawamura
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFHortonworks
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 

Viewers also liked (20)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJIntro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
 
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
 
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
Apache NiFiで、楽して、つながる、広がる IoTプロジェクトApache NiFiで、楽して、つながる、広がる IoTプロジェクト
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Hiveを高速化するLLAP
Hiveを高速化するLLAPHiveを高速化するLLAP
Hiveを高速化するLLAP
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 

Similar to HDF Powered by Apache NiFi Introduction

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitAldrin Piri
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方HortonworksJapan
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 

Similar to HDF Powered by Apache NiFi Introduction (20)

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 

Recently uploaded

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 

Recently uploaded (20)

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 

HDF Powered by Apache NiFi Introduction

  • 1. HDF Powered by Apache NiFi Intro Milind Pandit Solutions Engineer
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDF 2.0: Flow Management – NiFi basics – NiFi use cases – NiFi demos
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplistic View of Enterprise Data Flow Data Flow Process and Analyze Data Acquire Data Store Data
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with different business partners and customers Realistic View of Enterprise Data Flow
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Flow Management Enterprise Services At the edge Security Visualization On premises In the cloud Registries/Catalogs Governance (Security/Compliance) Operations HDF 2.0 – Data in Motion Platform
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks DataFlow (HDF)  Constrained  High-latency  Localized context  Hybrid – cloud/on-premises  Low-latency  Global context SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • For agile and immediate creation, configuration, control of dataflowsVisual Command and Control • Ensures trust of your dataData Lineage (Provenance) • Because not all data is of equal importanceData Prioritization • Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure • Adapt to different situations with different requirementsControl Latency vs Throughput • Security of data, and data accessSecure Control Plane/Data Plane • ScalabilityScale out Clustering • Ecosystem flexibility and growthExtensibility Apache NiFi: Designed for 8 challenges of global enterprise dataflow
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: Three key concepts • Manage the flow of information • Data Provenance • Secure the control plane and data plane
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi – Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine-grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common Apache NiFi Use Cases Predictive Analytics Ensure the highest value data is captured and available for analysis Compliance Gain full transparency into provenance and flow of data IoT Optimization Secure, Prioritize, Enrich and Trace data at the edge Fraud Detection Move sales transaction data in real time to analyze on demand Big Data Ingest Easily and efficiently ingest data into Hadoop Value Resources Gain visibility into how data sources are used to determine value
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Joins / Complex Rolling Window Operations Use Cases for Apache NiFi
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized Terminology
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT Content-Type: text/html Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ 0101010101110101010101010101 (Binary) Header Content Analogy: FlowFiles are like HTTP Data
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Drag and drop processors to build a flow 2. Start, stop, and configure components in real time 3. View errors and corresponding error messages 4. View statistics and health of data flow 5. Create templates of common processor & connections Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka, Deliver to HDFS
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Data Provenance and Why is it Important? BEGIN END LINEAGE IT and Cloud Operators • Understand traceability, lineage • Enable recovery and replay Compliance Regulations • Provide an audit trail • Remediation capabilities
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Provenance Enables Easy Access and Traceability of Changes
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Need Fine-Grained Security and Compliance? Security • Secured authentication • Enterprise authorization services – entitlements change often • Encrypted content, encrypted communications • People and systems with different roles require difference access levels • Tagged/classified data
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Repositories - Pass by reference
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Repositories – Copy on Write
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Agent NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Company X provides alerting services when users’ resting heart rate higher than a threshold Real-Time Insights Require DataFlow Mgmt and Stream Processing Acquire Data Company X Cloud Instance 1 Acquire Data Company X Cloud Instance 2 Acquire Data Company X Cloud Instance 3 Acquire Data Across Cloud Instances Parse, Filter, Validate, Enrich and Route Core Data Center Analytics/Pattern Match Data Store Alerts Dashboards/Visualization Flow Management Stream ProcessingLegend:
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data in Motion Needs Dataflow Management and Stream Processing  Acquire data from various Wearable Device’s Cloud Instances  Move Data from Customer Cloud Instances to on-premise instance  Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.  Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.  Parse the device data to standardized format that downstream sysem can understand  Enrich the data with contextual information including patient/customer info (age, sex, etc..)  Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.  Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate. Flow Management (NiFi, MiNiFi) Stream Processing (Storm, Kafka)
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data in Motion (Cloud) Data in Motion (on-premises) Data at Rest (on-premises) Edge Data Data in Motion Edge Analytics Data at Rest (Cloud) Edge Data Data at Rest (on-premises) Closed Loop Analytics Machine Learning Deep Historical Analysis The Future of Data Architectural Transformation Enabled By Connected Data Platforms On PremCloud
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Cases for Data in Motion Use Cases for Data-in-Motion Using DataFlow Mgmt • Data Ingestion • Edge Intelligence • First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich, Transform, etc. When Only DataFlow Management is Required Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing • Flow Management to deliver data for Stream Processing • PLUS: Complex pattern matching on unbounded streams of data. When Both DataFlow Management and Stream Processing
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow management D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0: Data-in-Motion Platform Enterprise Services Ambari Ranger Other services Flow management + Stream Processing
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New Stream Processing Features HDF 2.0  New Storm Connectors  Storm-Kafka Spout using new client APIs  Storm Distributed Log Search  Storm Dynamic Worker Profiling  Kafka Grafana Integration  Storm Grafana Integration  Improved Nimbus HA  Storm Automatic Back Pressure  Storm Distributed cache  Storm Windowing and State Management  Storm Performance improvements  Improved Kafka SASL  Storm Topology Event inspector  Storm Resource Aware Scheduling  Storm Dynamic Log Levels  Pacemaker Storm Daemon  Kafka Rack Awareness Developer Productivity EnterpriseReadiness Operational Simplicity
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Information, Resources Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com  Partnerworks: http://hortonworks.com/partners/  HDF Certification: http://hortonworks.com/partners/product-integration-certification/  Webinars: http://hortonworks.com/events-webcasts/  Sandbox: http://hortonworks.com/events-webcasts/  HDF: http://hortonworks.com/hdf/  HDP: http://hortonworks.com/hdp/

Editor's Notes

  1. Hortonworks: Powering the Future of Data
  2. TALK TRACK Hortonworks DataFlow is powered by Apache NiFI, Kafka, and Stor) – all key components of any streaming data architecture. MiNiFi/NiFi : dynamic, configurable data pipelines Kafka to adapt to differing rates of data creation and delivery Storm for real-time streaming analytics to create immediate insights at a massive scale. Only Hortonworks offers all of this as part of a Connected Data Platform that optimizes for delivery into HDP (HDFS, Hive, Spark, Hbase, etc…) There are scenarios where NiFI will provide all that you you need, but you will notice the orange and blue horizontal triangles provide a continium of capability from edge to core, that indicates varying degrees of need for the different products.
  3. Focus on predictive analytics case – use the uptake/cat/etc.. Case but generified.
  4. Introduce the architecture of NiFi, describe major system components, and describe the single node and clustering models. For each component describe its available (and potential)deployment models (relate it to Hadoop). Focus on the two deployment models (single node & cluster) roughly think of this as ‘edge’ vs ‘data center’
  5. 24
  6. 30