Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi and Stream
Processing
Dhruv Kumar
Sr. Solutions Architect
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of Enterprise Data Flow
Store Data
Process and
A...
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Realistic View of Enterprise Data Flow
?
?
?
?
?
?
?
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Basics of Connecting Systems
For every connection,
these must ag...
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi: The three key concepts
• Manage the flow of informa...
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Visual Command & Control
• Drag and drop processors to build a flow
...
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi – Key Features
• Guaranteed delivery
• Data bufferin...
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Matured at NSA 2006-2014
Brief history of the Apache NiFi Commun...
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flow Based Programming (FBP)
FBP Term NiFi Term Description
Info...
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
...
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi’s uses are many…
What is Apache NiFi used for?
• Reliab...
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF Powered by Apache NiFi Addresses Modern Data Flow Challenges
Ag...
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP + HDF Create Modern Data Apps
DATA AT
REST
HDF DATA
IN MOTION
A...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Architectures
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Drive Data to Core for Analysis
NiFi
Stream
Processing
MiNiFi
MiNiF...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamically Adjusting Data Flows
• Push contents back to core NiFi
...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Retail Store
Gateway
Server
MiNiFi
Mobile
Client
Libraries
Freezer
...
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Retail Store
Gateway
Server
MiNiFi
Mobile
Client
Libraries
Freezer
...
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Retail Store
Gateway
Server
MiNiFi
Mobile
Client
Libraries
Freezer
...
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Retail Store
Gateway
Server
MiNiFi
Mobile
Client
Libraries
Freezer
...
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi vs Kafka
NiFi
Good for data traceability
and flow manag...
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi vs Storm
NiFi
Good for data traceability, flow
manageme...
In a nutshell…
NiFi
Hadoop
HDFS
HBase Hive SOLR
YARN
Storm
Service
Management /
Workflow
SIEM
Spark
Raw Network Stream
Net...
Key Tenants of Lambda Architecture
 Batch Layer
 Manages master data
 Immutable, append-only set of raw data
 Cleanse,...
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm/Spark Streaming
Storm
Detailed Reference Architecture for...
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo!
Upcoming SlideShare
Loading in …5
×

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flows and IoT apps using Apache NiFi - Dhruv Kumar, Senior Solutions Architect - Hortonworks

768 views

Published on

Connecting enterprise systems has always been a tough task. Modern IoT applications have exacerbated the issue by the need to integrate legacy systems with novel high velocity data streams. Various patterns like messaging, REST, etc. have been proposed, but they necessitate rearchitecting the integration layer which is extremely arduous. In this talk we will show you how to use Apache NiFi to solve your data integration, movement and ingestion problems. Next, we will examine how Apache NiFi can be used to construct durable, scalable and responsive IoT apps in conjunction with other stream processing and messaging frameworks.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flows and IoT apps using Apache NiFi - Dhruv Kumar, Senior Solutions Architect - Hortonworks

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi and Stream Processing Dhruv Kumar Sr. Solutions Architect
  2. 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Simplistic View of Enterprise Data Flow Store Data Process and Analyze Data Acquire Data Dataflow
  3. 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Realistic View of Enterprise Data Flow ? ? ? ? ? ? ?
  4. 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Basics of Connecting Systems For every connection, these must agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 Producer C1 Consumer
  5. 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi: The three key concepts • Manage the flow of information • Data Provenance • Secure the control plane and data plane
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Visual Command & Control • Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
  7. 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi – Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  8. 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Matured at NSA 2006-2014 Brief history of the Apache NiFi Community Code developed at NSA 2006 Today Achieved TLP status in just 7 months July 2015 Dev mailing list Users mailing list* 182 subscribers producing ~100 emails/week 165 subscribers producing ~40 emails/week 55 125 1170 Code contributors Pull requests via Github JIRAs Filed. Code available open source ASL v2 December 2014 *Only 5 months old In 11 months… 6Targeting a 6-8 week release cycle Releases 153 new in last two months With more in pipeline Committers 13 PMC Members Affiliations Hortonworks, Twitter, Cloudera, US Government, Defense Contractors, etc.
  9. 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  10. 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Architecture OS/Host JVM NiFi Cluster Manger – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi’s uses are many… What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Joins / Complex Rolling Window Operations
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF Powered by Apache NiFi Addresses Modern Data Flow Challenges Aggregate all IoAT data from sensors, geo-location devices, machines, logs, files, and feeds via a highly secure lightweight agent Collect: Bring Together• Logs • Files • Feeds • Sensors Mediate point-to-point and bi-directional data flows, delivering data reliably to real-time applications and storage platforms such as HDP Conduct: Mediate the Data Flow• Deliver • Secure • Govern • Audit Parse, filter, join, transform, fork, and clone data in motion to empower analytics and perishable insights Curate: Gain Insights• Parse • Filter • Transform • Fork • Clone
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP + HDF Create Modern Data Apps DATA AT REST HDF DATA IN MOTION ACTIONABLE INTELLIGENCE MODERN DATA APPS Real-Time Cyber Security protects systems with superior threat detection Smart Manufacturing dramatically improves yields by managing more variables in greater detail Connected, Autonomous Cars drive themselves and improve road safety Future Farming optimizing soil, seeds and equipment to measured conditions on each square foot Automatic Recommendation Engines match products to preferences in milliseconds
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Architectures
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Drive Data to Core for Analysis NiFi Stream Processing MiNiFi MiNiFi • Drive data from sources to central data center for analysis • Tiered collection approach at various locations, think regional data centers Edge Edge Core Batch Analytics
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamically Adjusting Data Flows • Push contents back to core NiFi • Push results back to edge locations/devices to change behavior NiFi MiNiFi MiNiFi Edge Edge Core Batch Analytics Stream Processing
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Retail Store Gateway Server MiNiFi Mobile Client Libraries Freezer Client Libraries Server Cluster NiFi Register MiNiFi Regional Center NiFi NiFi Kafka Core Data Center Server Cluster NiFi NiFi NiFi Others Storm Kafka Spark/Flink/etc. AWS Azure Google Cloud Hortonworks DataFlow Reference Architecture DB Data WH  Tiered processing framework  Bi-directional communication  Data prioritization  Interactive command & control in the center, design & deploy on the edge
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Retail Store Gateway Server MiNiFi Mobile Client Libraries Freezer Client Libraries Server Cluster NiFi Register MiNiFi Regional Center NiFi NiFi Kafka Storm Hortonworks DataFlow Reference Architecture  Campaign management: coupons/promotions/etc.  Location based services Core Data Center Server Cluster NiFi NiFi NiFi Others Kafka Spark/Flink/etc. AWS Azure Google Cloud DB Data WH
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Retail Store Gateway Server MiNiFi Mobile Client Libraries Freezer Client Libraries Server Cluster NiFi Register MiNiFi Regional Center NiFi NiFi Kafka Storm Hortonworks DataFlow Reference Architecture  Transaction processing  Fraud detection Core Data Center Server Cluster NiFi NiFi NiFi Others Kafka Spark/Flink/etc. AWS Azure Google Cloud DB Data WH
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Retail Store Gateway Server MiNiFi Mobile Client Libraries Freezer Client Libraries Server Cluster NiFi Register MiNiFi Regional Center NiFi NiFi Kafka Storm Hortonworks DataFlow Reference Architecture  Complex processing and cloud computing  Historical data analytics based on nightly updates Core Data Center Server Cluster NiFi NiFi NiFi Others Kafka Spark/Flink/etc. AWS Azure Google Cloud DB Data WH
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi vs Kafka NiFi Good for data traceability and flow management • Interactive command and control – real time operational visibility • Data provenance – real time visual chain of custody • Low scripting maintenance ⚠ Requires adding/removing processors according to consumer-side updates Kafka Good for large number of consumers and dynamic consumer-side updates • Low latency • Great data durability • Support large number of producers/consumers ⚠ Not optimized to manage dataflows (prioritization, enrichment, protocols, formats, event level authorizations, objects with various sizes, etc.)
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi vs Storm NiFi Good for data traceability, flow management, and enrichment • Data provenance – real time visual chain of custody • Security – end-to-end secure routing with event level authorization • Simple event processing ⚠ Scaling model allowing for processor level workload to be only evenly distributed across worker nodes Storm Good for streaming analytics • Complex event processing • Flexible scaling model, allowing to specify workload distribution on-demand at bolt level ⚠ Not designed to manage data flows
  23. 23. In a nutshell… NiFi Hadoop HDFS HBase Hive SOLR YARN Storm Service Management / Workflow SIEM Spark Raw Network Stream Network Metadata Stream Data Stores Syslog Raw Application Logs Other Streaming Telemetry
  24. 24. Key Tenants of Lambda Architecture  Batch Layer  Manages master data  Immutable, append-only set of raw data  Cleanse, Normalize & Pre-Compute Batch Views  Advanced Statistical Calculations  Speed layer  Real Time Event Stream Processing  Computes Real-Time Views  Serving Layer  Low-latency, ad-hoc query  Reporting, BI & Dashboard New Data Stream Store Pre-Compute Views Process Streams Incremental Views Business View Business View Query SPEED LAYER BATCH LAYER SERVING LAYER HDP and HDF Fundamental Principles of Streaming Architectures
  25. 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm/Spark Streaming Storm Detailed Reference Architecture for IoT Applications HDF Flume Sink to HDFS Transform Interactive UI Framework Hive Hive HDFS HDFS SOURCE DATA Server logs Application Logs Firewall Logs CRM/ERP Sensor Kafka Kafka Stream to HDF Forward to Storm Real Time Storage Spark-ML Pig Alerts Bolt to HDFS Dashboard Silk JMS Alerts Hive Server HiveServer Reporting BI Tools High Speed Ingest Real-Time Batch Interactive Machine Learning Models Spark Pig AlertsSQOOP Flume Iterative ML Hbase/Pheonix HBaseEvent Enrichment Spark-Thrift Pig
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo!

×