1 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Unlock	Value	from	Big	Data	with
Apache	NiFi and	Streaming	CDC
Mark	Payne,	Sr.	Member	of	Technical	Staff,	Hortonworks
Jordan	Martz,	Director,	Technology	Solutions,	Attunity
2 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Jordan	Martz
Director,	Technology	
Solutions
ATTUNITY
Mark	Payne
Sr.	Member	Technical	
Staff,	NiFi PMC
HORTONWORKS
3 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Agenda
• Apache	NiFi – What	and	Why
• Features	of	Apache	NiFi
• Demo
• Use	Cases	of	Apache	NiFi
• Streaming	CDC	with	NiFi and	Attunity
4 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Apache	NiFi -
What	and	Why
5 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
• Tool	for	getting	the	right	data	to	the	right	place(s),	in	the	right	format(s),	at	the	right	
time.
• Listen,	Fetch	Data
• Split,	Aggregate	Data
• Route,	Transform	Data
• Push	Data
• Drag	&	Drop	Dataflow
What	Is	It?
Apache	NiFi – What	and	Why
6 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
• Even	with	a	large	budget,	nothing	on	the	market
• Typical	Approaches	to	DataFlow
• Messaging	Frameworks	(e.g.,	Kafka,	JMS)
• Scripts
• ESB’s
• Shortcomings	of	These	Approaches
• Visualization
• Maintainability
• Monitoring	and	Operations
• Data	Traceability
Why	– A	Brief	History
Apache	NiFi – What	and	Why
7 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Features	of	Apache	NiFi
8 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
• Quickly	build	out	DataFlow
by	dragging	components
• Real-Time	Command	&	
Control
• Real-Time	Monitoring
Drag	&	Drop	UI
Features	of	NiFi
9 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
• Visualize	and	Monitor	performance,	behavior	in	flow
• Bulletins	provide	immediate	insight
• Inline	documentation
• Start	and	Stop	components	individually	or	at	group	level
• Visualize	DataFlow at	the	enterprise	level,	not	only	the	“pipeline”	level
Operations	First
Features	of	NiFi
10 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
• Fine-grained	data	lineage
• Immutable	data	stores
• See	data	before	and	after	each	event
• Enables	compliance	use	cases
• Enables	debugging,	understanding
Data	Provenance
Features	of	NiFi
11 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
• Data	of	any	format,	any	schema
• Or	with	no	schema
• Structured,	unstructured,	or	semi-structured
• Data	of	any	size
Data	Agnostic
Features	of	NiFi
12 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Demo	Time
13 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
NiFi Use	Cases
14 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Insurance	Industry	- Actionable	Intelligence	with	Attunity	Replicate
Catastrophic	
Event	Data
Customer	
Onboarding	Data
Seismic	
Data
Biometrics	
Data
Usage-Based	
Driver	Data
Cyber	Threat	
Metadata
RISK	&	UNDERWRITING	
ANALYSIS
USAGE-BASED	
INSURANCE
CLAIMS	
ANALYTICS
NEW	PRODUCT	
DEVELOPMENT
CYBER	RISK	
ANALYTICS
Drones	&	
Aerial	Imagery
Claims	Docs,	
Notes	&	
Diaries
Weather	&	
Environment
Underwriting	
Analysis
Policy	
Histories
Photos
15 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Insurance Industry - Business use cases
Emerging Tech, Real-time data and the Connected World
Smart	Cities	
&	Buildings
Smart	
Factories	/	
Commercial
Connected	
Life	/	Health	
/	Medicine
IoT /	
Robotics
Telematics
Shared	
Economy
Smart	
Homes
Cyber	/	AI	/	
Analytics
16 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity Enables Transformation in Insurance
Insurance
Risk	&	underwriting	analysis;	Usage-based	insurance;	Claims	analytics;	New	
product	development,	and	cyber-risk	analytics
• NiFi:	Usage-Based	Driver	Data,	Weather	&	Environment,	Drones	&	Aerial	Imagery,	Seismic	Data,	
Biometrics	Data,	Cyber	Threat	Metadata,	Catastrophic	Event	Data,	Photos
• Replicate:	Underwriting	Analysis,	Customer	Onboarding	Data,	Underwriting	Analysis,	Policy	
histories,	Claims	Docs,	Notes	&	Diaries
17 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Actionable	Intelligence	Makes	Healthcare	Precise	and	Personal
Patient	
Records
Lab	Data
Pharmacy	
Data
Patient	
Locations
Wearables
Intra-Network	
Data
Sensor	
Data
Claims	
Data
Social	
Media Physician	
Notes
Patient	
Satisfaction	Data
Clinical	
(EMR)	
Data
SINGLE	VIEW	OF	
PATIENT
REAL-TIME	VITAL	
SIGN	MONITORING
BILLING	&	
REIMBURSEMENTS
EMR	
OPTIMIZATION
SUPPLY	 CHAIN	
OPTIMIZATION
18 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity	Enables	Transformational	Hadoop	Analytics	in	Healthcare
Healthcare
SINGLE	VIEW	OF	PATIENT,	REAL-TIME	VITAL	SIGN	MONITORING,	BILLING	&	
REIMBURSEMENTS,	EMR	OPTIMIZATION,	SUPPLY	CHAIN	OPTIMIZATION
• NiFi- Wearables,	Social	Media,	Sensor	Data,	Pharmacy	Data,	Physician	Notes,	Patient	Locations,	
Intra-Network	Data
• Replicate	- Claims	Data, Clinical	(EMR)	Data,	Patient	Records,	Patient	Satisfaction	Data,	Lab	Data
19 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Telecommunications
m
20 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity	Replicate	Completes	the	360-degree	Views	of	Telecom	
Customers
Telecommunications
SINGLE	VIEW	OF	THE	CUSTOMER;	CHURN	REDUCTION;	CDR	ANALYSIS;	NETWORK	
OPTIMIZATION,	and	DYNAMIC	BANDWIDTH	ALLOCATION
• NiFi- Server	Logs,	Social	Media,	Clickstream,	Cyber	Threat	Metadata,	Sensor	Data,	Voice-to-Text
• Replicate- ERP	System	Data,	CRM	Records,	Call	Detail	Records,	Billing	Data,	Subscriber	Profiles,	Product	
Catalogs
21 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Actionable	Intelligence	Powers	Modern	Manufacturing
Defect	
Testing	Data
Product	
Designs
MES	
Systems
RFID	
Streams
SCADA	
Systems
Shop	Floor	
Sensors
PREVENTATIVE
MAINTENANCE
SUPPLY	 CHAIN	
OPTIMIZATION
YIELD	
MAXIMIZATION
QUALITY	
CONTROL
RECALL	
AVOIDANCE
ERP	
Systems
Supplier	
Receipts
Machine	
Data
Assembly	
Line	Sensors
Data	
Historians
Work	
Orders
22 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Data Drives the Connected Car – Must include SAP SCM, etc.
Insurance	
Premiums
Warranties
Recalls
Pricing	
Models	
Design
Innovation
Autonomous	
Driving
Connected	
City Infotainment
Sensors
Scheduled	
Maintenance
Predictive	
Maintenance
Route
Optimization
INSURANCE
COMPANIES
GOVERNMENT
AGENCIES
INFOTAINMENT
PROVIDERS
SOFTWARE
COMPANIES
AUTO
MAKERS
23 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity	Replicate	Enables	Global	Manufacturing	and	Automotive	
Transformation
Manufacturing
Preventative	Maintenance,	Supply	Chain	Optimization,	Yield	Maximization,	Quality	
Control,	and	Recall	Avoidance	Development
• Apache	NiFi:	Machine	Data,	Testing	Data,	Assembly	Line	Sensors,	RFID	Streams,	Shop	Floors
• Attunity	Replicate:	Scada	systems,	Work	Orders,	Supplier	Receipts,	ERP	Systems,	MES,	Data	
Historians,	Product	Designs
24 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Oil	&	Gas	- Industry	Data	Trends
ERP	Data
Engineering	
Notes
IoT Gateway	
Data Video
WITSML	
Data
Weather	&	
Environment
REAL-TIME	
MONITORING
SINGLE	VIEW	OF	
OPERATIONS
PREDICTIVE	
MAINTENANCE
ARCHIVE	&	
ANALYTICS
UNSTRUCTURED	DATA	
CLASSIFICATION
Vehicle	GPS	
Data
GIS	Data
SCADA	
Systems Field	
Comments
Production	
Histories
G&G	
Data
25 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity	Replicate	Pumps	More	Relevant	Data	for	Oil	&	Gas
Oil	&	Gas
REAL-TIME	MONITORING, SINGLE	VIEW	OF	OPERATIONS,	PREDICTIVE	
MAINTENANC,	ARCHIVE	&	ANALYTICS,	UNSTRUCTURED	DATA	CLASSIFICATION
• NiFi:	WITSML	Data,	SCADA	Systems,	Vehicle	GPS	Data,	Video,	Production	Histories,	
Weather	&	Environment
• Replicate:	G&G	Data,	GIS	Data,	ERP	Data,	Field	Comments,	Scada	(Engineering	Notes)
26 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Actionable	Intelligence	Transforms	Energy	and	Utilities
Asset	
Data
Customer	
Surveys
Weather	&	
Environmental
Service	Fleet	
GPS	Data
Smart	Meter	
Streams
Commodity	
Prices
REVENUE	
PROTECTION
SINGLE	VIEW	
OF	CUSTOMER
PREDICTIVE	EQUIPMENT	
MAINTENANCE
CONSERVATION	
VOLTAGE	REDUCTION
COMMODITY	
TRADING
Social	
Media
GIS	
Data
SCADA Outage	
Histories
CIS	
Records
EDW
27 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity	Replicate	Provides	Pumps	More	Relevant	Data	for	Energy	
and	Utilities
Utilities
REVENUE	PROTECTION;	SINGLE	VIEW	OF	CUSTOMER;	PREDICTIVE	EQUIPMENT	
MAINTENANCE;	CONSERVATION	VOLTAGE	REDUCTION;	COMMODITY	TRADING
• Apache	NiFi:	smart	meter	streams,	GIS	data,	social	media,
weather	&	environmental,	CIS	record,	customer	surveys,	commodity	prices
• Attunity Replicate:	SCADA,	EDW,	asset	data,	outage	histories
28 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Actionable	Intelligence	Powers	Today’s	Financial	Services
OFAC	
Lists
Credit	
Records
ATM	
Streams Transactions	
&	Wires
Stock	
Tickers
Trade	
Settlements
DIGITAL	
CUSTOMER	360
RISK	DATA	
AGGREGATION
ANTI-MONEY	
LAUNDERING
FRAUD	
DETECTION
TRADE	
SURVEILLANCE
Mobile	
App	Data
Trade	
Data
Web	
Logs
Banker	
Notes
Demographic	
Data
Customer	
Transaction	
Data
29 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Attunity	Replicate	Connects	You	to	Your	Money	and	Integrates	with	
Key	Market	Resources
Financial	Services
DIGITAL	CUSTOMER	360;	RISK	DATA	AGGREGATION;	ANTI-MONEY	LAUNDERING;	
FRAUD	DETECTION;	TRADE	SURVELLIANCE
• Apache	NiFi:	web	logs,	trade	data,	mobile	app	data,	ATM	streams,	OFAC	lists,	transactions	&	
wires,	demographic	data,	trade	settlements.
• Attunity Replicate:	customer	transaction	data,	OFAC	lists,	banker	notes,	credit	records,	stock	
tickers
30 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Streaming	CDC	with
NiFi and	Attunity	
Replicate
ATTUNITY REPLICATE
Automated,	real-time	data	delivery	software
32© 2018 Attunity 32© 2017 Attunity
Attunity Replicate Architecture
TRANSFER
IN-MEMORY
FILTER
HADOOP
RDBMS
DATA
WAREHOUSE
FILES
MAINFRAME
TRANSFORM
FILE CHANNEL
PERSISTENT
STORE
CDC
BATCH
INCREMENTAL
BATCH
HADOOP
RDBMS
DATA
WAREHOUSE
STREAMING
FILES
33© 2018 Attunity
DATABASE EDW HADOOP
CLOUD
MAINFRAME
SAP FLAT FILESOTHER LEGACY
Oracle
SQL Server
DB2 iSeries
DB2 z/OS
DB2 LUW
MySQL
PostgeSQL
Sybase ASE
Informix
Exadata
Teradata
Netezza
Vertica
Pivotal
Hortonworks
Cloudera
MapR
DB2 for z/OS
IMS/DB
VSAM
ECC on Oracle
ECC on SQL
ECC on DB2
ECC on HANA
S4 HANA
AWS RDS
Amazon Aurora
Salesforce
SQL/MP
Enscribe
RMS
Delimited
(e.g., CSV,
TSV)
Universal Platform Coverage – Sources
34© 2018 Attunity
DATABASE EDW
STREAMING
CLOUD HADOOP
Oracle
SQL Server
DB2 LUW
MySQL
PostgreSQL
Sybase ASE
Informix
Microsoft PDW
Exadata
Teradata
Netezza
Vertica
Sybase IQ
Amazon Redshift
Actian Vector
SAP HANA
Amazon RDS
Amazon Redshift
Amazon EMR
Amazon S3
Amazon Aurora
Google Cloud SQL
Azure SQL DW
Azure SQL DB
Snowflake
Hortonworks
Cloudera
MapR
Amazon EMR
HDInsight *
Azure Event Hubs *
MapR-ES *
Kafka
FLAT FILESSAP
HANA Delimited
(e.g., CSV, TSV)
Universal Platform Coverage – Targets
35© 2018 Attunity 35© 2017 Attunity
MODERN DATA INGEST
METADATA
HIVE
OPTIMIZED
STREAM
OPTIMIZED
CHANGE DATA CAPTURE
CLOUD ON PREM
WAREHOUSE MAINFRAME RDBMS SAP
§ CDC (log-based) for
high performance,
low latency and low
impact
§ Single platform for all
key enterprise systems
§ Hive-optimized for
HDP and Stream-
optimized for HDF
§ Point-and-Click with
NO coding and NO
agents
36© 2018 Attunity 36© 2017 Attunity
SAP DATA INGEST
METADATA
HIVE
OPTIMIZED
STREAM
OPTIMIZED
CHANGE DATA CAPTURE
SAP
NATIVE AGENT
§ Unlock and decode SAP
application data
§ Real-time and continuous
ingest with CDC
§ Native agent, SAP
certified
§ All core and industry-
specific SAP ECC
modules
All the standard SAP ECC modules
(FI, CO, MM, PM, SD, PM, HR, …)
All industry specific solutions
(i.e. IS-Utilities, IS-OIL, …)
SAP
SRM
SAP
ERP
SAP
BW
SAP
HR
SAP
GTS
SAP
CRM
SAP
EWM
SAP
TM
SAP
SCM
ANY
INDUSTRY
SOLUTION
SAP
EM
37© 2018 Attunity 37© 2017 Attunity
METADATA
HIVE
OPTIMIZED
STREAM
OPTIMIZED
RAPID ODS WITH HIVE LLAP
§ Automates creation of
analytics-ready Hive
dataset
§ Reconciles source data
and metadata updates
§ Transformation
processing pushed
down to Hive
CHANGE DATA CAPTURE
CLOUD ON PREM
WAREHOUSE MAINFRAME RDBMS SAP
HIVE
HQL
TRANSFORM & UPDATE
38© 2018 Attunity 38© 2017 Attunity
CHANGE DATA CAPTURE
CLOUD ON PREM
EDW OFFLOAD WITH USAGE PROFILING
TERADATA EXADATA NETEZZA DB2
OFFLOAD
TASKS
EDW USAGE & ANALYTICS
§ Identify cold data to
be moved from EDW
§ Perform impact
analysis based on
user activity
§ Automatically generate
& execute replication
tasks
METADATA
HIVE
OPTIMIZED
STREAM
OPTIMIZED
39© 2018 Attunity 39© 2017 Attunity
§ Simple batch ingest
(easy, +metadata)
§ Streaming CDC ingest
(for HDF, cloud)
§ High volume offload
from EDW (e.g.
Teradata)
§ Metadata replication
(with DDL capture)
DATA PLANE SERVICES WITH ATTUNITY
Packaging:
• ISV service
• Co-branded service
• HWX service
EDW ETL & UPDATE
Ingest & Stream w CDC
Data Heat Map
40© 2018 Attunity 40© 2017 Attunity
BIG DATA INTEGRATION MATURITY MODEL
Level 1
Sandbox
Level 2
Opportunistic
Level 3
Workgroup
Level 5
Transformative
Level 4
Enterprise
Bulk data transfer Manual change data
capture
Non-invasive CDC
via change logs
Automatically generate
target schemas,process
DML, and respond to
source DDL changes
Hybrid deployments;
publish to multiple
streams; Microservices
API;
Programmatic,resource
intensive
System resource intensive;
inflexible and brittle;
people intensive change
management
Non-invasive,agentless,
automated movement,
flexible
Real-time analytic
availability;Lambda
architecture; fully
automated
Resilient; high-availability;
single console management
for global deployments
Style
Capabilities
Product
Examples
Sqoop
Sqoop with database
time stamps, triggers
and Change Tables;
or Query-basedCDC
Attunity Replicate
Attunity Enterprise
Manager
Attunity Visibility
Attunity Compose
for Hive
Manual Automated
41 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Download	
Apache	NiFi for	
Dummies	Today!
www.Attunity.com/nifibook
42 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Questions?
43 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved
Thank	you
For	more	information,	go	to:	www.Attunity.com/nifibook

Unlock Value from Big Data with Apache NiFi and Streaming CDC