1 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Timothy	Spann
2017	Future	of	Data	– Princeton	Meetup
Hosted	by	TRAC	Intermodal
Apache NiFi: Ingesting Enterprise Data
@ Scale
DATAWORKS	SUMMIT/HADOOP	SUMMIT
JUNE	13–15,	2017
San	Jose	McHenry	Convention	Center
REGISTER	NOW	AND	SAVE	$1,000
REGISTER	NOW!	>
dataworkssummit.com
3 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Agenda
• Apache NiFi RDBMS, EDI, JSON, CSV, Sensors
• EDI
• https://community.hortonworks.com/content/kbentry/59975/in
gesting-edi-into-hdfs-using-hdf-20.html
• https://github.com/tspannhw/EnterpriseNIFI
4 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
5 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
6 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Flow	Management Flow	management	+	Stream	Processing
D A T A 	 I N 	 M O T I O N D A T A 	 A T 	 R E S T
IoT	Data	Sources AWS
Azure
Google	Cloud
Hadoop
NiFi
Kafka
Storm
Others…
NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF	2.1	– Data	in	Motion	Platform
Enterprise	Services
Ambari Ranger Other	services
7 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable Insights Architecture
Ingestion
Simple	Event	Processing
Engine
Complex	Event	Processing
Destination
Data	Bus
Build	
Predictive	Model
From	Historical	Data
Deploy
Predictive	Model	
For	Real-time	Insights
Perishable	Insights
Historical	Insights
8 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Transforms	Industrial,	Transportation	&	
Utilities
Asset	
Data
Customer	
Surveys
Weather	&	
Environmental
Service	Fleet	
GPS	Data
Smart	Meter	
Streams
Commodity	
Prices
REVENUE	
PROTECTION
SINGLE	VIEW	
OF	CUSTOMER
PREDICTIVE	EQUIPMENT	
MAINTENANCE
CONSERVATION	
VOLTAGE	REDUCTION
COMMODITY	
TRADING
Social	
Media
GIS	
Data
SCADA Outage	
Histories
CIS	
Records
EDW
9 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
What is Apache NiFi?
• Created to address the challenges of global enterprise dataflow
• Key features:
– Visual	Command	and	Control
– Data	Lineage	(Provenance)
– Data	Prioritization
– Data	Buffering/Back-Pressure
– Control	Latency	vs.	Throughput
– Secure	Control	Plane	/	Data	Plane
– Scale	Out	Clustering
– Extensibility
10 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache NiFi
What is Apache NiFi used for?
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
– Conversion	between	formats
– Extraction/Parsing
– Routing	decisions
What is Apache NiFi NOT used for?
• Distributed Computation
• Complex Event Processing
• Complex Rolling Window Operations
11 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
NiFi Terminology
FlowFile
• Unit	of	data	moving	through	the	system
• Content	+	Attributes	(key/value	pairs)
Processor
• Performs	the	work,	can	access	FlowFiles
Connection
• Links	between	processors
• Queues	that	can	be	dynamically	prioritized
12 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Contact:
Timothy	Spann			@PaaSDeV
www.meetup.com/futureofdata-princeton
community.hortonworks.com/users/9304/tspann.html
13 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Hortonworks	Community	Connection
Read access for everyone, join to participate and be recognized
• Full	Q&A	Platform	(like	StackOverflow)
• Knowledge	Base	Articles
• Code	Samples	and	Repositories
14 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Community	Engagement
Participate now at: community.hortonworks.com©	Hortonworks	Inc.	2011	– 2015.	All	Rights	Reserved
4,000+
Registered	Users
10,000+
Answers
15,000+
Technical	Assets
One Website!

Apache NiFi: Ingesting Enterprise Data At Scale