SlideShare a Scribd company logo
Data at Scales &
the Values of
Starting Small
Aldrin Piri - @aldrinpiri
DataWorks Summit 2017 – Munich
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key: 'Apache NiFi’
Value: 'PMC Member'
Key: 'Work’
Value: ’Sr. Member of Technical Staff @ Hortonworks'
Key: 'Working with NiFi Since’
Value: '2010’
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Apache NiFi: A Primer
Apache MiNiFi
Architecture
Apache NiFi: The Ecosystem
Community
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Byte Scales for Data
SI Prefix
- 10
0
kilo 10
3
mega 10
6
giga 10
9
tera 10
12
peta 10
15
exa 10
18
zetta 10
21
yotta 10
24
“Big Data”
”everything else”
Greek
for
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Problem at Hand
Producers A.K.A Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case: Courier Service
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
NiFi NiFi NiFi NiFi NiFi NiFi
Gathering data from disparate sources
NiFi
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: A Primer
Key Features and Principles
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles are like HTTP data
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
Binary Content *
Header
Content
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles & Data Agnosticism
 NiFi is data agnostic!
 But, NiFi was designed understanding that users
can care about specifics and provides tooling
to interact with specific formats, protocols, etc.
ISO 8601 - http://xkcd.com/1179/
Robustness principle
Be conservative in what you do,
be liberal in what you accept from others“
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi
 NiFi lives in the data center. Give it an
enterprise server or a cluster of them.
 MiNiFi lives as close to where data is born
and is a guest on that device or system
“Let me get the key parts of NiFi close to where data begins and provide bidirectional
data transfer"
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi
 Limited computing capability
 Limited power/network
 Restricted software library/platform availability
 No UI
 Physically inaccessible
 Not frequently updated
 Competing standards/protocols
 Scalability
 Privacy & Security
Realities of computing outside the comforts of the data center
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Precedent from NiFi
 Provides the semantics between two NiFi components across network boundaries
– A custom protocol for inter-NiFi communication
– Secure, Extensible, Load Balanced & Scalable Delivery to Cluster
 Extracted out to a client library which powers integration into popular frameworks like
Apache Spark, Apache Storm, Apache Flink, and Apache Apex
 Attributes and the FlowFile format maintained
A quick look at NiFi Site to Site
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Precedent from NiFi
 Fine-grained, event level access of interactions with FlowFiles
– CREATE, RECEIVE, FETCH, SEND, DOWNLOAD, DROP, EXPIRE, FORK, JOIN …
 Captures the associated attributes/metadata at the time of the event
 A map of a FlowFile’s journey and how they relate to other FlowFiles in a system
– MiNiFi enables us to get more and further illuminate the map of data processing
A deeper dive into provenance
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Precedent from NiFi
RECEIVE event
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi
 The feedback loop is longer and not
guaranteed
– Removal of Web Server and UI
 Declarative configuration
– Lends itself well to CM processes
– Extensible interface to support varying formats
• Currently provided in YAML
 Reduced set of bundled components
Departures from NiFi in getting the right fit
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi: Scoping
 Go small: Java – Write once, run anywhere*
– Feature parity and reuse of core NiFi libraries
 Go smaller: C++ – Write once**, run anywhere
 Go smallest: Write n-many times, run anywhere
Language libraries to support tagging, FlowFile format, Site to Site protocol, and
provenance generation without a processing framework
– Mobile: Android & iOS
– Language SDKs
Provide all the key principles of NiFi in varying, smaller footprints
WHAT IS THIS!?
A NiFi FOR ANTS!?!
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Use Case - Connected Car
 Outside vehicle’s
network firewall
 On telematics layer
VEHICLE NETWORK FIREWALL
TRANSMIT
EXECUTE FILTER
PRIORITIZE
PARSE
LISTEN
ROUTE
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connecting the Drops
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case: Courier Service with Apache NiFi & MiNiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
NiFi NiFi NiFi NiFi NiFi NiFi
Client
Libraries
Gathering data from disparate sources
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011 – 2017. All Rights ReservedX
Data Provenance
▪ Constrained
▪ High-latency
▪ Localized context
▪ Hybrid – cloud/on-premises
▪ Low-latency
▪ Global context
Origin – attribution
Replay – recovery
Evolution of topologies
Long retention
Types of Lineage
• Event
• Configuration
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: The Ecosystem
 Site-to-Site in MiNiFi instances provides machine-to-machine (M2M) communication
– Data arrives to NiFi in a transparent manner allowing integration to existing flows
 Similar attention to extensibility in both Java and C++ clients allows agents to fit the
needs of your organization
 Reduced footprint allows NiFi functionality to aid in production of high fidelity data,
more closely attributable and tracked from where it is generated
We’ve provided a framework to extend the reach of data ingest
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: The Ecosystem
 Enter the MiNiFi Command & Control
– Provide tooling to map the UX of
interactive command and control in NiFi
to the design and deploy approach of
MiNiFi
But more instances complicate my operational management!
https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: The Ecosystem
 Configuration Management of Flows & Versioning
– The evolution of templates to better support SDLC functions
– https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
 Extension Repositories
– Publish & Share extension bundles (NARs)
– https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Regis
try%29+for+Dynamically-loaded+Extensions
 Variable Registry
– Initial framework support & file-based implementation
– https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
Building on efforts for reusable components in the community
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Apache NiFi & MiNiFi?
 Moving data is multifaceted in its challenges and these are present in different contexts
at varying scopes
– Think of our courier example and organizations like it: inter vs intra, domestically, internationally
 Provide common tooling and extensions that are commonly needed but be flexible for
extension
– Leverage existing libraries and expansive Java ecosystem for functionality
– Allow organizations to integrate with their existing infrastructure
 Empower folks managing your infrastructure to make changes and reason about issues
that are occurring
– Data Provenance to show context and data’s journey
– User Interface/Experience a key component
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn more and join us!
Apache NiFi site
https://nifi.apache.org
Subproject MiNiFi site
https://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
https://issues.apache.org/jira/browse/MINIFI
Follow us on Twitter
@apachenifi
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Crash Course
Thursday, 6 April
11:15 AM – 1:45PM, Room 12
• Learn more about NiFi, the community, and work through a hands-on lab
• Seats available on a first come, first served basis
• Make sure you are in possession of the latest version of VirtualBox
• More details: https://tinyurl.com/nifi-cc-munich17
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn, Share at Birds of a Feather
IOT, STREAMING & DATA FLOW
Thursday, April 6
5:50 pm, Room 5
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot

Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Apache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleApache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scale
Abdelkrim Hadjidj
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
Aldrin Piri
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
Deon Huang
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
Yifeng Jiang
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 

What's hot (20)

Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleApache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scale
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 

Similar to Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
DataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
Joe Percivall
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
Timothy Spann
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 

Similar to Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi (20)

Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 

Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

  • 1. Data at Scales & the Values of Starting Small Aldrin Piri - @aldrinpiri DataWorks Summit 2017 – Munich
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key: 'Apache NiFi’ Value: 'PMC Member' Key: 'Work’ Value: ’Sr. Member of Technical Staff @ Hortonworks' Key: 'Working with NiFi Since’ Value: '2010’
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Apache NiFi: A Primer Apache MiNiFi Architecture Apache NiFi: The Ecosystem Community
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Byte Scales for Data SI Prefix - 10 0 kilo 10 3 mega 10 6 giga 10 9 tera 10 12 peta 10 15 exa 10 18 zetta 10 21 yotta 10 24 “Big Data” ”everything else” Greek for
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Problem at Hand Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case: Courier Service Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ NiFi NiFi NiFi NiFi NiFi NiFi Gathering data from disparate sources NiFi
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: A Primer Key Features and Principles • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFiles are like HTTP data HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ Binary Content * Header Content
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFiles & Data Agnosticism  NiFi is data agnostic!  But, NiFi was designed understanding that users can care about specifics and provides tooling to interact with specific formats, protocols, etc. ISO 8601 - http://xkcd.com/1179/ Robustness principle Be conservative in what you do, be liberal in what you accept from others“
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi  NiFi lives in the data center. Give it an enterprise server or a cluster of them.  MiNiFi lives as close to where data is born and is a guest on that device or system “Let me get the key parts of NiFi close to where data begins and provide bidirectional data transfer"
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi  Limited computing capability  Limited power/network  Restricted software library/platform availability  No UI  Physically inaccessible  Not frequently updated  Competing standards/protocols  Scalability  Privacy & Security Realities of computing outside the comforts of the data center
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Precedent from NiFi  Provides the semantics between two NiFi components across network boundaries – A custom protocol for inter-NiFi communication – Secure, Extensible, Load Balanced & Scalable Delivery to Cluster  Extracted out to a client library which powers integration into popular frameworks like Apache Spark, Apache Storm, Apache Flink, and Apache Apex  Attributes and the FlowFile format maintained A quick look at NiFi Site to Site https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Precedent from NiFi  Fine-grained, event level access of interactions with FlowFiles – CREATE, RECEIVE, FETCH, SEND, DOWNLOAD, DROP, EXPIRE, FORK, JOIN …  Captures the associated attributes/metadata at the time of the event  A map of a FlowFile’s journey and how they relate to other FlowFiles in a system – MiNiFi enables us to get more and further illuminate the map of data processing A deeper dive into provenance http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Precedent from NiFi RECEIVE event
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi  The feedback loop is longer and not guaranteed – Removal of Web Server and UI  Declarative configuration – Lends itself well to CM processes – Extensible interface to support varying formats • Currently provided in YAML  Reduced set of bundled components Departures from NiFi in getting the right fit
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi: Scoping  Go small: Java – Write once, run anywhere* – Feature parity and reuse of core NiFi libraries  Go smaller: C++ – Write once**, run anywhere  Go smallest: Write n-many times, run anywhere Language libraries to support tagging, FlowFile format, Site to Site protocol, and provenance generation without a processing framework – Mobile: Android & iOS – Language SDKs Provide all the key principles of NiFi in varying, smaller footprints
  • 19. WHAT IS THIS!? A NiFi FOR ANTS!?!
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Use Case - Connected Car  Outside vehicle’s network firewall  On telematics layer VEHICLE NETWORK FIREWALL TRANSMIT EXECUTE FILTER PRIORITIZE PARSE LISTEN ROUTE
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connecting the Drops SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case: Courier Service with Apache NiFi & MiNiFi Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ Client Libraries Client Libraries MiNiFi MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi Client Libraries Gathering data from disparate sources
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011 – 2017. All Rights ReservedX Data Provenance ▪ Constrained ▪ High-latency ▪ Localized context ▪ Hybrid – cloud/on-premises ▪ Low-latency ▪ Global context Origin – attribution Replay – recovery Evolution of topologies Long retention Types of Lineage • Event • Configuration
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: The Ecosystem  Site-to-Site in MiNiFi instances provides machine-to-machine (M2M) communication – Data arrives to NiFi in a transparent manner allowing integration to existing flows  Similar attention to extensibility in both Java and C++ clients allows agents to fit the needs of your organization  Reduced footprint allows NiFi functionality to aid in production of high fidelity data, more closely attributable and tracked from where it is generated We’ve provided a framework to extend the reach of data ingest
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: The Ecosystem  Enter the MiNiFi Command & Control – Provide tooling to map the UX of interactive command and control in NiFi to the design and deploy approach of MiNiFi But more instances complicate my operational management! https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: The Ecosystem  Configuration Management of Flows & Versioning – The evolution of templates to better support SDLC functions – https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows  Extension Repositories – Publish & Share extension bundles (NARs) – https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Regis try%29+for+Dynamically-loaded+Extensions  Variable Registry – Initial framework support & file-based implementation – https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry Building on efforts for reusable components in the community
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache NiFi & MiNiFi?  Moving data is multifaceted in its challenges and these are present in different contexts at varying scopes – Think of our courier example and organizations like it: inter vs intra, domestically, internationally  Provide common tooling and extensions that are commonly needed but be flexible for extension – Leverage existing libraries and expansive Java ecosystem for functionality – Allow organizations to integrate with their existing infrastructure  Empower folks managing your infrastructure to make changes and reason about issues that are occurring – Data Provenance to show context and data’s journey – User Interface/Experience a key component
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn more and join us! Apache NiFi site https://nifi.apache.org Subproject MiNiFi site https://nifi.apache.org/minifi/ Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI https://issues.apache.org/jira/browse/MINIFI Follow us on Twitter @apachenifi
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Crash Course Thursday, 6 April 11:15 AM – 1:45PM, Room 12 • Learn more about NiFi, the community, and work through a hands-on lab • Seats available on a first come, first served basis • Make sure you are in possession of the latest version of VirtualBox • More details: https://tinyurl.com/nifi-cc-munich17
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn, Share at Birds of a Feather IOT, STREAMING & DATA FLOW Thursday, April 6 5:50 pm, Room 5
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You