Introducing #ApacheNiFi
Saptak Sen [@saptak]
Technical Product Manager, Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
#seascale
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
• New Data Sources and the Rise of the Internet of Anything
• Introducing: Hortonworks DataFlow powered by Apache NiFi
• Key concepts, architecture, and use cases
• Demo
• Q&A
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
IoAT Data Grows Faster Than We Consume It
Much of the new data
exists in-flight, between
systems and devices as
part of the Internet of
AnythingNEW
TRADITIONAL
The Opportunity
Unlock transformational business value
from a full fidelity of data and analytics
for all data.
Geolocation
Server logs
Files & emails
ERP, CRM, SCM
Traditional Data Sources
Internet of Anything
Sensors
and machines
Clickstream
Social media
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Interconnectedness Demands User Centricity
Changes Organizations into Data Companies
Hortonworks Data Platform
for rich historical insights
from data-at-rest
NEW Hortonworks DataFlow
for securely collecting,
conducting, and curating
data-in-motion while ALSO
driving value for data-at-rest
analytics and use cases
Source: Gartner - Architecture Options for Big Data Analytics on Hadoop, July 2015
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of IoAT & Data Flow
The Data Flow Thing
Process and
Analyze Data
Acquire Data
Store Data
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Global interactions with customers, business partners, and things
spanning different volume, velocity, bandwidth, and latency needs
Realistic View of IoAT and Data Flow
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Meeting IoAT Edge Requirements
GATHE
R
DELIVER
PRIORITIZE
Track from the edge Through to the datacenter
Small Footprints
operate with very little power
Limited Bandwidth
can create high latency
Data Availability
exceeds transmission bandwidth
Data Must Be Secured
throughout its journey
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Acquires Onyara
Turn Internet of Anything Data Into Actionable
Insights
• Onyara is the creator of and key contributor to Apache NiFi,
an open source solution for processing and distributing data.
• Over the past 8 years, Onyara engineers developed the U.S.
government software project called “Niagara Files”, the
precursor to Apache NiFi.
• Apache NiFi was made available as an Apache Incubator
project through the NSA Technology Transfer Program in the
Fall of 2014.
NEW Hortonworks DataFlow offering will
securely and easily collect, conduct and curate
any data, from anything, anywhere.
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The IoAT Data Flow
Hortonworks Data Platform
powered by Apache Hadoop
Hortonworks Data Platform
powered by Apache Hadoop
Enrich
Context
Store Data
and Metadata
Internet
of Anything
Hortonworks DataFlow
powered by Apache NiFi
Perishable
Insights
Historical
Insights
Introducing Hortonworks DataFlow powered by
Apache NiFi
Hortonworks DataFlow and the Hortonworks Data Platform
deliver the industry’s most complete solution for management of Big Data.
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi: Three key concepts
• Manage the flow of information
• Data Provenance
• Secure the control plane and data plane
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi – Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Common Apache NiFi Use Cases
Predictive Analytics
Ensure the highest value data is captured and available for analysis
Compliance
Gain full transparency into provenance and flow of data
IoT Optimization
Secure, Prioritize, Enrich and Trace data at the edge
Fraud Detection
Move sales transaction data in real time to analyze on demand
Big Data Ingest
Easily and efficiently ingest data into Hadoop
Value Resources
Gain visibility into how data sources are used to determine value
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Architecture
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master
NiFi Cluster
Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
High Availability: Control plane vs Data plane…
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDF – Powered by Apache NiFi
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Add processor for data intake
1 Drag and drop processor icon from the top menu
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Choose the specific processor
2 Choose one of the processors – currently 90 available – designed for extension
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Example: Pick Twitter Processor
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Configure the processor
3 Select processor and
choose option to Configure
4
Adjust
parameters as
required
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Another processor for data output
5 Drag and drop processor icon from the top menu
6 Example: choose PutHDFS processor
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Configure second processor
7 Configure 2nd processor
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Connect processors, configure connection
8
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Click Start to begin processing
9
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
See processors update with real time changes
10
As data flows, GUI interface updates in real
time.
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamically adjust and tune data flow as needed
11 Dynamically adjust and tune dataflow as needed, in
real time. Can also replicate data for testing and
comparison.
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Understand the data path with Data Provenance
14 Select Data Provenance
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Trace lineage of a particular piece of data
15
Icon for Data Lineage
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Every change to data is tracked: processing, views
16
Provenance event is tracked
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Updates as changes happen
17 Updates as data flows
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Easily access and trace changes to dataflow
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Audit trail of Hortonworks DataFlow User Actions
Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Operations: Planned
Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Q & A
Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Introduction to Apache NiFi - Seattle Scalability Meetup

  • 1.
    Introducing #ApacheNiFi Saptak Sen[@saptak] Technical Product Manager, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved #seascale
  • 2.
    Page2 © HortonworksInc. 2011 – 2014. All Rights Reserved Agenda • New Data Sources and the Rise of the Internet of Anything • Introducing: Hortonworks DataFlow powered by Apache NiFi • Key concepts, architecture, and use cases • Demo • Q&A
  • 3.
    Page3 © HortonworksInc. 2011 – 2014. All Rights Reserved IoAT Data Grows Faster Than We Consume It Much of the new data exists in-flight, between systems and devices as part of the Internet of AnythingNEW TRADITIONAL The Opportunity Unlock transformational business value from a full fidelity of data and analytics for all data. Geolocation Server logs Files & emails ERP, CRM, SCM Traditional Data Sources Internet of Anything Sensors and machines Clickstream Social media
  • 4.
    Page4 © HortonworksInc. 2011 – 2014. All Rights Reserved Interconnectedness Demands User Centricity Changes Organizations into Data Companies Hortonworks Data Platform for rich historical insights from data-at-rest NEW Hortonworks DataFlow for securely collecting, conducting, and curating data-in-motion while ALSO driving value for data-at-rest analytics and use cases Source: Gartner - Architecture Options for Big Data Analytics on Hadoop, July 2015
  • 5.
    Page5 © HortonworksInc. 2011 – 2015. All Rights Reserved Simplistic View of IoAT & Data Flow The Data Flow Thing Process and Analyze Data Acquire Data Store Data
  • 6.
    Page6 © HortonworksInc. 2011 – 2015. All Rights Reserved Global interactions with customers, business partners, and things spanning different volume, velocity, bandwidth, and latency needs Realistic View of IoAT and Data Flow
  • 7.
    Page7 © HortonworksInc. 2011 – 2015. All Rights Reserved Meeting IoAT Edge Requirements GATHE R DELIVER PRIORITIZE Track from the edge Through to the datacenter Small Footprints operate with very little power Limited Bandwidth can create high latency Data Availability exceeds transmission bandwidth Data Must Be Secured throughout its journey
  • 8.
    Page8 © HortonworksInc. 2011 – 2014. All Rights Reserved Hortonworks Acquires Onyara Turn Internet of Anything Data Into Actionable Insights • Onyara is the creator of and key contributor to Apache NiFi, an open source solution for processing and distributing data. • Over the past 8 years, Onyara engineers developed the U.S. government software project called “Niagara Files”, the precursor to Apache NiFi. • Apache NiFi was made available as an Apache Incubator project through the NSA Technology Transfer Program in the Fall of 2014. NEW Hortonworks DataFlow offering will securely and easily collect, conduct and curate any data, from anything, anywhere.
  • 9.
    Page9 © HortonworksInc. 2011 – 2015. All Rights Reserved The IoAT Data Flow Hortonworks Data Platform powered by Apache Hadoop Hortonworks Data Platform powered by Apache Hadoop Enrich Context Store Data and Metadata Internet of Anything Hortonworks DataFlow powered by Apache NiFi Perishable Insights Historical Insights Introducing Hortonworks DataFlow powered by Apache NiFi Hortonworks DataFlow and the Hortonworks Data Platform deliver the industry’s most complete solution for management of Big Data.
  • 10.
    Page10 © HortonworksInc. 2011 – 2015. All Rights Reserved Apache NiFi: Three key concepts • Manage the flow of information • Data Provenance • Secure the control plane and data plane
  • 11.
    Page11 © HortonworksInc. 2011 – 2015. All Rights Reserved Apache NiFi – Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 12.
    Page12 © HortonworksInc. 2011 – 2015. All Rights Reserved Common Apache NiFi Use Cases Predictive Analytics Ensure the highest value data is captured and available for analysis Compliance Gain full transparency into provenance and flow of data IoT Optimization Secure, Prioritize, Enrich and Trace data at the edge Fraud Detection Move sales transaction data in real time to analyze on demand Big Data Ingest Easily and efficiently ingest data into Hadoop Value Resources Gain visibility into how data sources are used to determine value
  • 13.
    Page13 © HortonworksInc. 2011 – 2015. All Rights Reserved OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Architecture OS/Host JVM NiFi Cluster Manager – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes High Availability: Control plane vs Data plane…
  • 14.
    Page14 © HortonworksInc. 2011 – 2015. All Rights Reserved HDF – Powered by Apache NiFi
  • 15.
    Page15 © HortonworksInc. 2011 – 2015. All Rights Reserved Add processor for data intake 1 Drag and drop processor icon from the top menu
  • 16.
    Page16 © HortonworksInc. 2011 – 2015. All Rights Reserved Choose the specific processor 2 Choose one of the processors – currently 90 available – designed for extension
  • 17.
    Page17 © HortonworksInc. 2011 – 2015. All Rights Reserved Example: Pick Twitter Processor
  • 18.
    Page18 © HortonworksInc. 2011 – 2015. All Rights Reserved Configure the processor 3 Select processor and choose option to Configure 4 Adjust parameters as required
  • 19.
    Page19 © HortonworksInc. 2011 – 2015. All Rights Reserved Another processor for data output 5 Drag and drop processor icon from the top menu 6 Example: choose PutHDFS processor
  • 20.
    Page20 © HortonworksInc. 2011 – 2015. All Rights Reserved Configure second processor 7 Configure 2nd processor
  • 21.
    Page21 © HortonworksInc. 2011 – 2015. All Rights Reserved Connect processors, configure connection 8
  • 22.
    Page22 © HortonworksInc. 2011 – 2015. All Rights Reserved Click Start to begin processing 9
  • 23.
    Page23 © HortonworksInc. 2011 – 2015. All Rights Reserved See processors update with real time changes 10 As data flows, GUI interface updates in real time.
  • 24.
    Page24 © HortonworksInc. 2011 – 2015. All Rights Reserved Dynamically adjust and tune data flow as needed 11 Dynamically adjust and tune dataflow as needed, in real time. Can also replicate data for testing and comparison.
  • 25.
    Page25 © HortonworksInc. 2011 – 2015. All Rights Reserved Understand the data path with Data Provenance 14 Select Data Provenance
  • 26.
    Page26 © HortonworksInc. 2011 – 2015. All Rights Reserved Trace lineage of a particular piece of data 15 Icon for Data Lineage
  • 27.
    Page27 © HortonworksInc. 2011 – 2015. All Rights Reserved Every change to data is tracked: processing, views 16 Provenance event is tracked
  • 28.
    Page28 © HortonworksInc. 2011 – 2015. All Rights Reserved Updates as changes happen 17 Updates as data flows
  • 29.
    Page29 © HortonworksInc. 2011 – 2015. All Rights Reserved Easily access and trace changes to dataflow
  • 30.
    Page30 © HortonworksInc. 2011 – 2015. All Rights Reserved Audit trail of Hortonworks DataFlow User Actions
  • 31.
    Page31 © HortonworksInc. 2011 – 2015. All Rights Reserved
  • 32.
    Page32 © HortonworksInc. 2011 – 2015. All Rights Reserved Operations: Planned
  • 33.
    Page33 © HortonworksInc. 2011 – 2015. All Rights Reserved
  • 34.
    Page34 © HortonworksInc. 2011 – 2015. All Rights Reserved
  • 35.
    Page35 © HortonworksInc. 2011 – 2015. All Rights Reserved Q & A Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Editor's Notes

  • #4 TALK TRACK The emergence and explosion from the Internet of Anything data puts tremendous pressure on the existing platforms.   Exponential Growth. As of 2014 there was an estimated 4ZB of data across the cybersphere, and that is expected to grow to 44ZB by 2020, with 85% of this data growth coming from newer types of data from sources like sensors and machines, geo-location tracking devices, server logs, clickstreams, social media or emails and shared files. Variable structures. The incoming data is often unstructured, or its structure changes too frequently for reliable schema creation at time of ingest. Low Value Per Unit, but High in Aggregate. The incoming data can have little or no value as individual, or small groups of, records. But at high volumes and with longer retention horizons, the enterprise can find previously unknown patterns. Advanced analytic applications turn these new insights into business value.   This insight is transforming business outcomes in every major industry, but to participate in that transformation, companies must first ingest that new data into an analytic platform.   [NEXT SLIDE]
  • #8 TALK TRACK The IoAT data edges created specific data flow requirements that Hortonworks DataFlow satisfies: Edges with small footprints operate with very little power Limited bandwidth and high latency are commonplace Data availability often exceeds transmission bandwidth Data must be secured throughout its journey [NEXT SLIDE]
  • #9 What is the announcement? Hortonworks has signed a definitive agreement to acquire Onyara, including the Onyara products and team of engineers developing and supporting their products. The new Hortonworks DataFlow powered by Apache NiFi, an open source project based on technology that has been in development at the NSA as “Niagara Files” for the last 8 years, is complementary to the Hortonworks Data Platform. With this acquisition, customers will be able to securely and easily collect, conduct and curate any type of data from any origin with the new Hortonworks DataFlow offering. Traditional Data at rest as well as real time data in motion can now be blended to provide historical and perishable insights for predictive analytic. What is the rationale behind the acquisition? As more and more data is generated from every possible source (machines, sensors, IoT, streaming, social, etc) Hortonworks capitalized on the opportunity to acquire key technology to augment and complement the Hortonworks Data Platform. Onyara, a spin out of the NSA Technology Transfer Program, has contributed and developed Apache NiFi over the last 8 years and have created a compelling set of tools to collect, conduct, and curate data. The new Hortonworks DataFlow powered by Apache NiFi provides the ability for more data to be delivered into the Hortonworks Data Platform and delivers full fidelity analytics on all data for every Hortonworks customer. Onyara’s employees, technology and products are complementary to Hortonworks’. With this acquisition, Hortonworks will be positioned as a leader in IoAT and Big Data with the Hortonworks DataFlow and Hortonworks Data Platform.
  • #13 Focus on predictive analytics case – use the uptake/cat/etc.. Case but generified.
  • #14 Introduce the architecture of NiFi, describe major system components, and describe the single node and clustering models. For each component describe its available (and potential)deployment models (relate it to Hadoop). Focus on the two deployment models (single node & cluster) roughly think of this as ‘edge’ vs ‘data center’
  • #36 Questions?