1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Mission to NARs with
Apache NiFi
Aldrin Piri - @aldrinpiri
ApacheCon Big Data 2016
12 May 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tutorial Resources
https://github.com/apiri/nifi-mission-to-nars-workshop
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Start with a dataflow… but we can do better!
• Do better with the NiFi Framework and custom processor
• Extension Points: Processors, Controller Services, Reporting Tasks
• Process Session & Process Context
• How the API ties to the NiFi repositories
• Testing isn’t that bad!
• Share with templates!
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Adding new functionality and development approach
 Extending the platform is about leveraging expansive Java ecosystem and existing code
– Make use of open source projects and provided libraries for targeted systems and services
– Reuse existing, proprietary or closed source libraries and wrap their functionality in the framework
 Test framework provides powerful means of testing extensions in isolation as they
would work in a live instance
 Deployment is as simple as copying the created NAR to your instance(s) lib directory
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Minimal Dependencies Needed
 Java Development Kit, version 1.7 or later
 Maven, version 3.1.0+
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Boilerplate Code is provided via Maven Archetype
 Support for creating bundles of major extension points of Processors and Controller
Services
– Processor Bundle
– Controller Service Bundle
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is a NAR?
– Bundles the developed code to provide
extensions and their dependencies
– Allows extension classloader isolation,
aiding in versioning issues that can be
pervasive in interacting with a wide variety
of systems, services, and formats
NAR == NiFi ARchive
Consider it to be an OSGi-lite package
NAR Bundle Structure
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How long does it take to create an extension?
 Incorporating functionality from an existing library
– Create a bundle
– Include a dependency to the library
– Design User Experience
• Properties – How can this extension be configured? What are valid values for user input?
• Relationships – How will data move to the next stage of its processing?
– Wrap the core classes of the library in the framework and implement onTrigger
• ProcessSession abstracts interactions with backing repositories and handles unit-of-work sessions
• ProcessContext allows accessing defined properties which the framework has validated
– Test
– Deploy
For the majority of cases, development time is measured in hours*
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How long does it really take to create an extension?
 Increased development effort may be needed for handling specific protocols
– Driven through manual management of sessions, when there are resources with their own
lifecycles beyond the sole onTrigger method
– Common for protocol “Listeners”
For the majority of cases, development time is still measured in hours
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Behind the Scenes
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories - Pass by reference
FlowFile Content Provenance
F1 C1 C1 P1 F1
BEFORE
AFTER
F2 C1 C1 P3 F2 – Clone (F1)
F1 C1 P2 F1 – Route
P1 F1 – Create
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1 C1 C1 P1 F1 - CREATE
BEFORE
AFTER
F1 C1
F1.1 C2 C2 (encrypted)
C1 (plaintext)
P2 F1.1 - MODIFY
P1 F1 - CREATE
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Quick (and dirty?) Prototyping
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prototype Dataflows Using Existing Binaries/Applications
 ExecuteProcess – Acts as a source
processor, creating FlowFiles containing
data written to STDOUT by the target
application
 ExecuteStreamCommand – Provides
content of FlowFiles to an external
application via STDIN and creates
FlowFiles containing data written STDOUT
Processors allow making external calls to applications and programs outside of the JVM
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Increased Flexibility of Prototyping via Scripting Languages
 ExecuteScript– Acts as a source processor,
creating FlowFiles containing data from a
referenced Script
 InvokeScriptedProcessor – Provides access
to the core framework API for interacting
with NiFi like a native Java processor
Processors allow using JVM friendly interpreted languages
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resources
Developer Guide
– http://nifi.apache.org/developer-guide.html
Apache NiFi Maven Archetypes
– https://cwiki.apache.org/confluence/display/NIFI/Maven+Proj
ects+for+Extensions
Mission to NARs with Apache NiFi sample bundle
– https://github.com/apiri/nifi-mission-to-nars-workshop
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thanks for hanging out!

Mission to NARs with Apache NiFi

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved Mission to NARs with Apache NiFi Aldrin Piri - @aldrinpiri ApacheCon Big Data 2016 12 May 2016
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Tutorial Resources https://github.com/apiri/nifi-mission-to-nars-workshop
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda • Start with a dataflow… but we can do better! • Do better with the NiFi Framework and custom processor • Extension Points: Processors, Controller Services, Reporting Tasks • Process Session & Process Context • How the API ties to the NiFi repositories • Testing isn’t that bad! • Share with templates!
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Adding new functionality and development approach  Extending the platform is about leveraging expansive Java ecosystem and existing code – Make use of open source projects and provided libraries for targeted systems and services – Reuse existing, proprietary or closed source libraries and wrap their functionality in the framework  Test framework provides powerful means of testing extensions in isolation as they would work in a live instance  Deployment is as simple as copying the created NAR to your instance(s) lib directory
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Minimal Dependencies Needed  Java Development Kit, version 1.7 or later  Maven, version 3.1.0+
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Boilerplate Code is provided via Maven Archetype  Support for creating bundles of major extension points of Processors and Controller Services – Processor Bundle – Controller Service Bundle
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved What is a NAR? – Bundles the developed code to provide extensions and their dependencies – Allows extension classloader isolation, aiding in versioning issues that can be pervasive in interacting with a wide variety of systems, services, and formats NAR == NiFi ARchive Consider it to be an OSGi-lite package NAR Bundle Structure
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved How long does it take to create an extension?  Incorporating functionality from an existing library – Create a bundle – Include a dependency to the library – Design User Experience • Properties – How can this extension be configured? What are valid values for user input? • Relationships – How will data move to the next stage of its processing? – Wrap the core classes of the library in the framework and implement onTrigger • ProcessSession abstracts interactions with backing repositories and handles unit-of-work sessions • ProcessContext allows accessing defined properties which the framework has validated – Test – Deploy For the majority of cases, development time is measured in hours*
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved How long does it really take to create an extension?  Increased development effort may be needed for handling specific protocols – Driven through manual management of sessions, when there are resources with their own lifecycles beyond the sole onTrigger method – Common for protocol “Listeners” For the majority of cases, development time is still measured in hours
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Behind the Scenes
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Architecture
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories - Pass by reference FlowFile Content Provenance F1 C1 C1 P1 F1 BEFORE AFTER F2 C1 C1 P3 F2 – Clone (F1) F1 C1 P2 F1 – Route P1 F1 – Create
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories – Copy on Write FlowFile Content Provenance F1 C1 C1 P1 F1 - CREATE BEFORE AFTER F1 C1 F1.1 C2 C2 (encrypted) C1 (plaintext) P2 F1.1 - MODIFY P1 F1 - CREATE
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Quick (and dirty?) Prototyping
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved Prototype Dataflows Using Existing Binaries/Applications  ExecuteProcess – Acts as a source processor, creating FlowFiles containing data written to STDOUT by the target application  ExecuteStreamCommand – Provides content of FlowFiles to an external application via STDIN and creates FlowFiles containing data written STDOUT Processors allow making external calls to applications and programs outside of the JVM
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Increased Flexibility of Prototyping via Scripting Languages  ExecuteScript– Acts as a source processor, creating FlowFiles containing data from a referenced Script  InvokeScriptedProcessor – Provides access to the core framework API for interacting with NiFi like a native Java processor Processors allow using JVM friendly interpreted languages
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Resources Developer Guide – http://nifi.apache.org/developer-guide.html Apache NiFi Maven Archetypes – https://cwiki.apache.org/confluence/display/NIFI/Maven+Proj ects+for+Extensions Mission to NARs with Apache NiFi sample bundle – https://github.com/apiri/nifi-mission-to-nars-workshop
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Thanks for hanging out!