SlideShare a Scribd company logo
1 of 65
Download to read offline
© Cloudera, Inc. All rights reserved.
BYOP: Custom Processor Development with Apache NiFi
Andy LoPresto | @yolopey
Cloudera Data in Motion Security
Dataworks Summit Barcelona | March 21, 2019
© Cloudera, Inc. All rights reserved.
Agenda
• Apache NiFi out of the box
• Extending NiFi functionality
• Custom processor development
• Testing
• Deploying
• Best practices
• Takeaways
• Additional resources
• Questions
2
* All slides provided online https://github.com/alopresto/slides
© Cloudera, Inc. All rights reserved. 3
Gauging Audience Familiarity With NiFi
“What’s a NeeFee?”
No experience with dataflow
No experience with NiFi
“I can pick this up pretty quickly”
Some experience with dataflow
Some experience with NiFi
“I refactored the Ambari
integration endpoint to allow
for mutual authentication TLS
during my coffee break”
Forgotten more about NiFi
than most of us will ever know
© Cloudera, Inc. All rights reserved.
Apache NiFi
© Cloudera, Inc. All rights reserved. 5
One Minute Intro to the NiFi Ecosystem
MiNiFi (JVM/C++)
• Agent class
• Runs on minimal hardware
• Interacts with NiFi
• Simple event processing
NiFi Registry
• Standalone application
• Handles asset
management
• Flow versioning &
deployment
• Extension registry
NiFi
• Server/DC class
• GUI & REST API
• Interacts with
hundreds of services
© Cloudera, Inc. All rights reserved. 6
User Interface
© Cloudera, Inc. All rights reserved.
Deeper Ecosystem Integration: 286+ Processors, 61 Controller Services
7
All Apache project logos are trademarks of the ASF and the respective projects.
FTP
SFTP
TCP
UDP
HTTP
HL7
HTML
Image
XML
JSON
Hash
Merge
Duplicate
Extract
Split
Parse Records
Encrypt
Tail
Evaluate
Execute
Fetch
GeoEnrich
Scan
Replace
Convert
Transform
Route Content
Route Context
Route Text
Control Rate
Convert Records
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritize Delivery
© Cloudera, Inc. All rights reserved.
Extending NiFi functionality
© Cloudera, Inc. All rights reserved.
With ~300 processors, what else is left?
• Custom/proprietary protocols
• Read serial interface
• USB devices, GPIO on Raspberry Pi
• Transactional operations
• Experimentation
• Commercial API/SDK
• Legacy code
9
© Cloudera, Inc. All rights reserved.
Capabilities of custom development
• ExecuteProcess /
ExecuteStreamCommand
• ExecuteScript /
InvokeScriptedProcessor
• ExecuteGroovyScript
• Custom Processor
10
Performance
Effort / Complexity
Process Script Custom Processor
© Cloudera, Inc. All rights reserved.
ExecuteStreamCommand
• Pipes flowfile
content to STDIN
• Populates outgoing
flowfile content
from STDOUT
• Can also direct to
named attribute
11
© Cloudera, Inc. All rights reserved.
ExecuteProcess
• Executes arbitrary
process and
captures output
• No incoming flowfile
12
© Cloudera, Inc. All rights reserved.
ExecuteScript
• Executes arbitrary code in a
JSR-223 compatible language
• Clojure
• Groovy
• Ruby
• Python (Jython - no C libs)
• Lua
• Javascript
• Source is inline or file system
13
© Cloudera, Inc. All rights reserved.
ExecuteGroovyScript
• Community-contributed
• Groovy only
• Allows for friendlier
syntax
• Allows for Controller
Service direct
reference
14
© Cloudera, Inc. All rights reserved.
InvokeScriptedProcessor
• “ExecuteScript = onTrigger()”
• Allows for additional methods
• Not compiled for each flowfile
• Supports custom relationships
• Improves performance
15
© Cloudera, Inc. All rights reserved.
Custom processor development
© Cloudera, Inc. All rights reserved.
Custom Processor
• High performance
• Reusability & convenience
• Versioning
17
© Cloudera, Inc. All rights reserved.
Using the Maven Archetype
• Prompts for metadata input
• groupId
• artifactId
• version
• artifactBaseName
• package
• Generates a Maven project structure

18
© Cloudera, Inc. All rights reserved.
Results of the Maven Archetype
• MyProcessor.java
• Source code for processor
• org.apache.nifi.processor.Processor
• Manifest file listing implementations
• MyProcessorTest.java
• JUnit test
• pom.xml
• Maven Project Object Model file
19
© Cloudera, Inc. All rights reserved.
Metadata
• @Tags
• Helpful for discovery
• @CapabilityDescription
• Shown in documentation
• Annotations for behavior
• @SeeAlso
• References similar/complementary
processors
• @Reads/WritesAttributes
• Listed in documentation
20
© Cloudera, Inc. All rights reserved.
Selected other annotations
• @EventDriven
• Indicates not based on timer
• @SideEffectFree
• Indicates safe to run w/o side effects
• @SupportsBatching
• Indicates multiple flowfiles
processed together
• @InputRequirement
• Can’t be the first processor in a flow
• @SystemResourceConsideration
• Warns users of implications
• @Experimental
• Self-explanatory
• @Restricted
• Controls access
21
© Cloudera, Inc. All rights reserved.
Implementing the code
• Add property descriptors and relationships
• Implement onTrigger() method
• This is where the logic is done
• Reads properties & attributes
• Configures external services
• Executes data transformation
• Transfers flowfile
• Add custom StreamCallback class
• Handles “behavior” of reversing String
• “32” lines of code
22
© Cloudera, Inc. All rights reserved.
Testing
© Cloudera, Inc. All rights reserved.
Write the tests (should be first)
• Test-driven development (TDD)
• Helpful TestRunner class
• Handles enqueueing arbitrary
flowfile content & attributes
• Allows intelligent assertions
24
© Cloudera, Inc. All rights reserved. 25
We’ll come back to this…
© Cloudera, Inc. All rights reserved.
Deploying
© Cloudera, Inc. All rights reserved.
Build the Maven project
• mvn clean install
• Builds the project
• Compiles source code
• Compiles tests
• Runs tests
• Outputs target file(s)
27
© Cloudera, Inc. All rights reserved.
Copy the NAR
• Copy into NiFi’s library
• $NIFI_HOME/lib
• NiFi will load extension
automatically during
startup
28
© Cloudera, Inc. All rights reserved.
Add the processor
• Can filter by source group
• Can search by tags or
processor name
29
© Cloudera, Inc. All rights reserved.
Configure the processor
• Feed data with
GenerateFlowFile
30
© Cloudera, Inc. All rights reserved.
Smoke test
• List queue on the success
relationship
• View attribute
• “reversed” - true
• View content
31
© Cloudera, Inc. All rights reserved.
NAR loading
• NiFi now hot-loads custom
NARs in $NIFI_HOME/
extensions
• Not good for SDLC
• Doesn’t reload matching
NAR
• Could use build script to
append timestamp to
filename and version
32
© Cloudera, Inc. All rights reserved.
Testing (continued)
© Cloudera, Inc. All rights reserved.
Testing approaches
• Unit testing
• Integration testing
• In-NiFi testing
34
© Cloudera, Inc. All rights reserved.
Unit testing
• Regression testing
• Supports refactoring
• Supports new feature development
• Extract behavior to service class
• Tests for logic and for “processor” behavior
• Combinations of properties & attributes/content
• Use the mock classes
• MockFlowFile - allows for internal assertions
• MockProcessSession - allows for flowfile operation (create, transfer, clone, etc.)
• MockProcessContext - allows for property and controller service interaction
35
© Cloudera, Inc. All rights reserved.
Groovy unit testing
• Easy mocking
• Map coercion
• Metaclass overrides
• Sparser syntax; less boilerplate
• Spock for BDD
36
© Cloudera, Inc. All rights reserved.
Integration testing
• Provide external services for testing
• Docker containers are popular
• Easy to spin up & delete
• Example
• MongoDBLookupServiceIT
37
© Cloudera, Inc. All rights reserved.
Testing inside NiFi
• Create reusable process group
which handles test fixtures
• Use GenerateFlowFile
• Call external APIs
• Read from test data on file
system
• Read from test database
• Connect output port to “flow
under test”
38
© Cloudera, Inc. All rights reserved.
Other capabilities
© Cloudera, Inc. All rights reserved.
Custom User Interface
• Sometimes generic
property descriptors
don’t make sense
• Niels Basjes’
logparser project
with 100+ boolean
flags that are auto-
detected
40
https://github.com/nielsbasjes/logparser
© Cloudera, Inc. All rights reserved.
Custom User Interface
• Allows for custom UX
• Validation, feedback,
wizards, etc.
41
© Cloudera, Inc. All rights reserved.
Custom User Interface
• Code in nifi-
processor-custom-ui
• Java classes for
objects & logic
• CSS/JS for
presentation
42
© Cloudera, Inc. All rights reserved.
Custom Content Viewer
• Special content viewers
can be registered
• Image (default)
• Video
• Visualization
43
© Cloudera, Inc. All rights reserved.
Custom Content Viewer
• Code in nifi-format-
viewer
• Java classes for
objects & logic
• CSS/JSP for
presentation
44
© Cloudera, Inc. All rights reserved.
Best practices
© Cloudera, Inc. All rights reserved.
ExecuteScript vs. InvokeScriptedProcessor
• ES easier for rapid development
• ISP more performant (should always be used in “production” flows)
• Allows custom relationships
46
© Cloudera, Inc. All rights reserved.
Module organization
• API module
• If using controller services;
separate the API and
implementation into separate
modules to allow use & extension
• Can be co-located with processors for
simple CS implementations
47
© Cloudera, Inc. All rights reserved.
Module configuration
• Check archetype version
• Check nifi-api version
• Inherits from nifi-nar-bundle by default
• If building outside of NiFi, can be removed
48
© Cloudera, Inc. All rights reserved.
Processor structure (con’t)
• Initialize relationships and property descriptors in init()
• Large allowable values collections can use builder
patterns or Enum
• Don’t connect to external services in onScheduled()
• Use onTrigger()
• Handle dynamic properties
• Custom validation (see StandardValidators)
49
© Cloudera, Inc. All rights reserved.
Other processor concerns
• Ensure good documentation
• Handle flowfile content in streaming manner
• Generate proper provenance events for interaction with external systems
• Send, fetch, etc.
• Define appropriate relationships
50
© Cloudera, Inc. All rights reserved.
Remember to list processors in manifest
• Most common issue when not seeing
processor in NiFi
• src/main/resources/META-INF/
services/
• org.apache.nifi.processor.Processor
51
© Cloudera, Inc. All rights reserved.
Takeaways
© Cloudera, Inc. All rights reserved.
Dynamic vs. compiled processors
53
Processor type Pros Cons
ExecuteScript/ExecuteGroovyScript • Very fast development cycle
• Easy to use
• Source code maintained in version
control if stored internally
• Poor performance
• Source code not maintained if located
on file system
• Limited capabilities
InvokeScriptedProcessor • Fast development cycle
• Complex scripting capabilities
• Better performance than E(G)S
• Performance still far below compiled
processor
• Same limitations on code control and
interaction with internal framework
Custom Processor • Full first-class interaction with
framework API
• Robust testing mechanisms
• High performance
• Reusability
• Versioning
• Ease-of-use (drag & drop)
• Slow development cycle
• Requires boilerplate code
© Cloudera, Inc. All rights reserved.
Performance testing
• GenerateFlowFile
• Filled back pressure of queues
• Comparing
• STDIN -> rev -> STDOUT
• Custom processor
• Groovy script
54
© Cloudera, Inc. All rights reserved.
Performance results
55
Total Task Time (µs)
Total Execution Time (µs)
0M 1.5M 3M 4.5M 6M
ExecuteGroovyScript ReverseText
© Cloudera, Inc. All rights reserved.
Performance results
56
© Cloudera, Inc. All rights reserved.
Notes on performance
• Very simple operation (JVM optimizing)
• External service calls, loops, etc. will have bigger impact
• Multi-threading / time-series dependent data can influence
• Streaming from file system will impact
57
© Cloudera, Inc. All rights reserved.
Additional resources
© Cloudera, Inc. All rights reserved.
Other ideas
• Build custom processor with any JVM-compatible language
• Community accepts contributions in Java only
• Host custom processor bundles on individual repos
• Extension registry coming soon
• PRs accepted recently
59
© Cloudera, Inc. All rights reserved.
Community resources
• Apache NiFi Contributor Guide
• Instructions on cloning repo; building; contributing code; checkstyle guide
• Apache NiFi Developer Guide
• Processor API; lifecycle; documenting; common patterns; errors; testing
• Walkthroughs
• Matt Burgess - funnifi.blogspot.com
• Bryan Bende - bryanbende.com
60
© Cloudera, Inc. All rights reserved.
New Announcements
• NiFi 1.9.1 — 16 Mar 2019 (167+ Jiras)
• New processors (Kudu, Slack)
• Hot-loading NARs
• Security HTTP headers
• Record processors automatically infer schema
• MiNiFi C++ 0.6.0 — Voting now…
• MiNiFi Java 0.5.0 — 7 July 2018
• NiFi Registry 0.3.0 — 25 Sept 2018
61
© Cloudera, Inc. All rights reserved. 62
Learn more and join us
Apache NiFi site
https://nifi.apache.org
Subproject MiNiFi site
https://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi
© Cloudera, Inc. All rights reserved. 63
NiFi at Dataworks Summit Barcelona
Title Date Speakers Room
Cloudera Data in Motion Meetup - Future of
Data Barcelona (free)
Tuesday
3/19
1800 - 2000 Tim Spann, Dan Chaffelson, Andy LoPresto 127-128
Apache NiFi Crash Course
Wednesday
3/20
1400 - 1600 Nathan Gough, Andy LoPresto 111
Data Acquisition Automation for NiFi in a Hybrid
Cloud environment – the Path towards DataOps
1450 - 1530 Arda Basar, Ivan Georgiev 129-130
IoT, Streaming, and Dataflow Birds of a Feather 1600 - 1730
Tim Spann, Dan Chaffelson, Purnima Reddy
Kuchikulla, Andy LoPresto
129-130
Addressing Challenges with IoT Edge
Management
Thursday
3/21
1100 - 1140 Dinesh Chandrasekhar 127-128
Intelligently Collecting Data at the Edge – Intro
to Apache MiNiFi
1150 - 1230 Andy LoPresto 127-128
Edge to AI: Analytics from Edge to Cloud with
Efficient Movement of Machine Data
1400 - 1440 Tim Spann 129-130
BYOP: Custom Processor Development with
Apache NiFi
1600 - 1640 Andy LoPresto 131-132
Apache Deep Learning 201 1650 - 1730 Tim Spann 122-123
Platform for the Research and Analysis of
Cybernetic Threats
1650 - 1730 Monica Franceschini 124-125
© Cloudera, Inc. All rights reserved.
Questions
© Cloudera, Inc. All rights reserved.
THANK YOU
alopresto@cloudera.com | alopresto@apache.org | @yolopey
github.com/alopresto/slides

More Related Content

What's hot

Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioAraf Karsh Hamid
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
Best Practices for API Management
Best Practices for API Management Best Practices for API Management
Best Practices for API Management WSO2
 
Azure API Management
Azure API ManagementAzure API Management
Azure API ManagementDaniel Toomey
 
Choose the Right Container Storage for Kubernetes
Choose the Right Container Storage for KubernetesChoose the Right Container Storage for Kubernetes
Choose the Right Container Storage for KubernetesYusuf Hadiwinata Sutandar
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoOpsta
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Lucas Jellema
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 

What's hot (20)

Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes Istio
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Netflix conductor
Netflix conductorNetflix conductor
Netflix conductor
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
Best Practices for API Management
Best Practices for API Management Best Practices for API Management
Best Practices for API Management
 
Nifi
NifiNifi
Nifi
 
Azure API Management
Azure API ManagementAzure API Management
Azure API Management
 
Choose the Right Container Storage for Kubernetes
Choose the Right Container Storage for KubernetesChoose the Right Container Storage for Kubernetes
Choose the Right Container Storage for Kubernetes
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Elk
Elk Elk
Elk
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with Demo
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Terraform
TerraformTerraform
Terraform
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 

Similar to BYOP: Custom Processor Development with Apache NiFi

12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodologylaeshin park
 
Considerations for Operating an OpenStack Cloud
Considerations for Operating an OpenStack CloudConsiderations for Operating an OpenStack Cloud
Considerations for Operating an OpenStack CloudAll Things Open
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java3Pillar Global
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayOkko Oulasvirta
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxBSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxJasonOstrom1
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Rightscale webinar-key-design-considerations-private-hybrid-clouds
Rightscale webinar-key-design-considerations-private-hybrid-cloudsRightscale webinar-key-design-considerations-private-hybrid-clouds
Rightscale webinar-key-design-considerations-private-hybrid-cloudsRightScale
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherNETWAYS
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerAmazon Web Services
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
Benefits of an Open environment with Wakanda
Benefits of an Open environment with WakandaBenefits of an Open environment with Wakanda
Benefits of an Open environment with WakandaAlexandre Morgaut
 
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...Denim Group
 
Why you should be using Aegir: The Drupal-oriented hosting system
Why you should be using Aegir: The Drupal-oriented hosting systemWhy you should be using Aegir: The Drupal-oriented hosting system
Why you should be using Aegir: The Drupal-oriented hosting systemSeth Viebrock
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup
 

Similar to BYOP: Custom Processor Development with Apache NiFi (20)

12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
 
Security for devs
Security for devsSecurity for devs
Security for devs
 
Considerations for Operating an OpenStack Cloud
Considerations for Operating an OpenStack CloudConsiderations for Operating an OpenStack Cloud
Considerations for Operating an OpenStack Cloud
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Power of Azure Devops
Power of Azure DevopsPower of Azure Devops
Power of Azure Devops
 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training day
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxBSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Rightscale webinar-key-design-considerations-private-hybrid-clouds
Rightscale webinar-key-design-considerations-private-hybrid-cloudsRightscale webinar-key-design-considerations-private-hybrid-clouds
Rightscale webinar-key-design-considerations-private-hybrid-clouds
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and Docker
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
Benefits of an Open environment with Wakanda
Benefits of an Open environment with WakandaBenefits of an Open environment with Wakanda
Benefits of an Open environment with Wakanda
 
Avoiding cloud lock-in
Avoiding cloud lock-inAvoiding cloud lock-in
Avoiding cloud lock-in
 
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
 
Why you should be using Aegir: The Drupal-oriented hosting system
Why you should be using Aegir: The Drupal-oriented hosting systemWhy you should be using Aegir: The Drupal-oriented hosting system
Why you should be using Aegir: The Drupal-oriented hosting system
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

BYOP: Custom Processor Development with Apache NiFi

  • 1. © Cloudera, Inc. All rights reserved. BYOP: Custom Processor Development with Apache NiFi Andy LoPresto | @yolopey Cloudera Data in Motion Security Dataworks Summit Barcelona | March 21, 2019
  • 2. © Cloudera, Inc. All rights reserved. Agenda • Apache NiFi out of the box • Extending NiFi functionality • Custom processor development • Testing • Deploying • Best practices • Takeaways • Additional resources • Questions 2 * All slides provided online https://github.com/alopresto/slides
  • 3. © Cloudera, Inc. All rights reserved. 3 Gauging Audience Familiarity With NiFi “What’s a NeeFee?” No experience with dataflow No experience with NiFi “I can pick this up pretty quickly” Some experience with dataflow Some experience with NiFi “I refactored the Ambari integration endpoint to allow for mutual authentication TLS during my coffee break” Forgotten more about NiFi than most of us will ever know
  • 4. © Cloudera, Inc. All rights reserved. Apache NiFi
  • 5. © Cloudera, Inc. All rights reserved. 5 One Minute Intro to the NiFi Ecosystem MiNiFi (JVM/C++) • Agent class • Runs on minimal hardware • Interacts with NiFi • Simple event processing NiFi Registry • Standalone application • Handles asset management • Flow versioning & deployment • Extension registry NiFi • Server/DC class • GUI & REST API • Interacts with hundreds of services
  • 6. © Cloudera, Inc. All rights reserved. 6 User Interface
  • 7. © Cloudera, Inc. All rights reserved. Deeper Ecosystem Integration: 286+ Processors, 61 Controller Services 7 All Apache project logos are trademarks of the ASF and the respective projects. FTP SFTP TCP UDP HTTP HL7 HTML Image XML JSON Hash Merge Duplicate Extract Split Parse Records Encrypt Tail Evaluate Execute Fetch GeoEnrich Scan Replace Convert Transform Route Content Route Context Route Text Control Rate Convert Records Distribute Load Generate Table Fetch Jolt Transform JSON Prioritize Delivery
  • 8. © Cloudera, Inc. All rights reserved. Extending NiFi functionality
  • 9. © Cloudera, Inc. All rights reserved. With ~300 processors, what else is left? • Custom/proprietary protocols • Read serial interface • USB devices, GPIO on Raspberry Pi • Transactional operations • Experimentation • Commercial API/SDK • Legacy code 9
  • 10. © Cloudera, Inc. All rights reserved. Capabilities of custom development • ExecuteProcess / ExecuteStreamCommand • ExecuteScript / InvokeScriptedProcessor • ExecuteGroovyScript • Custom Processor 10 Performance Effort / Complexity Process Script Custom Processor
  • 11. © Cloudera, Inc. All rights reserved. ExecuteStreamCommand • Pipes flowfile content to STDIN • Populates outgoing flowfile content from STDOUT • Can also direct to named attribute 11
  • 12. © Cloudera, Inc. All rights reserved. ExecuteProcess • Executes arbitrary process and captures output • No incoming flowfile 12
  • 13. © Cloudera, Inc. All rights reserved. ExecuteScript • Executes arbitrary code in a JSR-223 compatible language • Clojure • Groovy • Ruby • Python (Jython - no C libs) • Lua • Javascript • Source is inline or file system 13
  • 14. © Cloudera, Inc. All rights reserved. ExecuteGroovyScript • Community-contributed • Groovy only • Allows for friendlier syntax • Allows for Controller Service direct reference 14
  • 15. © Cloudera, Inc. All rights reserved. InvokeScriptedProcessor • “ExecuteScript = onTrigger()” • Allows for additional methods • Not compiled for each flowfile • Supports custom relationships • Improves performance 15
  • 16. © Cloudera, Inc. All rights reserved. Custom processor development
  • 17. © Cloudera, Inc. All rights reserved. Custom Processor • High performance • Reusability & convenience • Versioning 17
  • 18. © Cloudera, Inc. All rights reserved. Using the Maven Archetype • Prompts for metadata input • groupId • artifactId • version • artifactBaseName • package • Generates a Maven project structure
 18
  • 19. © Cloudera, Inc. All rights reserved. Results of the Maven Archetype • MyProcessor.java • Source code for processor • org.apache.nifi.processor.Processor • Manifest file listing implementations • MyProcessorTest.java • JUnit test • pom.xml • Maven Project Object Model file 19
  • 20. © Cloudera, Inc. All rights reserved. Metadata • @Tags • Helpful for discovery • @CapabilityDescription • Shown in documentation • Annotations for behavior • @SeeAlso • References similar/complementary processors • @Reads/WritesAttributes • Listed in documentation 20
  • 21. © Cloudera, Inc. All rights reserved. Selected other annotations • @EventDriven • Indicates not based on timer • @SideEffectFree • Indicates safe to run w/o side effects • @SupportsBatching • Indicates multiple flowfiles processed together • @InputRequirement • Can’t be the first processor in a flow • @SystemResourceConsideration • Warns users of implications • @Experimental • Self-explanatory • @Restricted • Controls access 21
  • 22. © Cloudera, Inc. All rights reserved. Implementing the code • Add property descriptors and relationships • Implement onTrigger() method • This is where the logic is done • Reads properties & attributes • Configures external services • Executes data transformation • Transfers flowfile • Add custom StreamCallback class • Handles “behavior” of reversing String • “32” lines of code 22
  • 23. © Cloudera, Inc. All rights reserved. Testing
  • 24. © Cloudera, Inc. All rights reserved. Write the tests (should be first) • Test-driven development (TDD) • Helpful TestRunner class • Handles enqueueing arbitrary flowfile content & attributes • Allows intelligent assertions 24
  • 25. © Cloudera, Inc. All rights reserved. 25 We’ll come back to this…
  • 26. © Cloudera, Inc. All rights reserved. Deploying
  • 27. © Cloudera, Inc. All rights reserved. Build the Maven project • mvn clean install • Builds the project • Compiles source code • Compiles tests • Runs tests • Outputs target file(s) 27
  • 28. © Cloudera, Inc. All rights reserved. Copy the NAR • Copy into NiFi’s library • $NIFI_HOME/lib • NiFi will load extension automatically during startup 28
  • 29. © Cloudera, Inc. All rights reserved. Add the processor • Can filter by source group • Can search by tags or processor name 29
  • 30. © Cloudera, Inc. All rights reserved. Configure the processor • Feed data with GenerateFlowFile 30
  • 31. © Cloudera, Inc. All rights reserved. Smoke test • List queue on the success relationship • View attribute • “reversed” - true • View content 31
  • 32. © Cloudera, Inc. All rights reserved. NAR loading • NiFi now hot-loads custom NARs in $NIFI_HOME/ extensions • Not good for SDLC • Doesn’t reload matching NAR • Could use build script to append timestamp to filename and version 32
  • 33. © Cloudera, Inc. All rights reserved. Testing (continued)
  • 34. © Cloudera, Inc. All rights reserved. Testing approaches • Unit testing • Integration testing • In-NiFi testing 34
  • 35. © Cloudera, Inc. All rights reserved. Unit testing • Regression testing • Supports refactoring • Supports new feature development • Extract behavior to service class • Tests for logic and for “processor” behavior • Combinations of properties & attributes/content • Use the mock classes • MockFlowFile - allows for internal assertions • MockProcessSession - allows for flowfile operation (create, transfer, clone, etc.) • MockProcessContext - allows for property and controller service interaction 35
  • 36. © Cloudera, Inc. All rights reserved. Groovy unit testing • Easy mocking • Map coercion • Metaclass overrides • Sparser syntax; less boilerplate • Spock for BDD 36
  • 37. © Cloudera, Inc. All rights reserved. Integration testing • Provide external services for testing • Docker containers are popular • Easy to spin up & delete • Example • MongoDBLookupServiceIT 37
  • 38. © Cloudera, Inc. All rights reserved. Testing inside NiFi • Create reusable process group which handles test fixtures • Use GenerateFlowFile • Call external APIs • Read from test data on file system • Read from test database • Connect output port to “flow under test” 38
  • 39. © Cloudera, Inc. All rights reserved. Other capabilities
  • 40. © Cloudera, Inc. All rights reserved. Custom User Interface • Sometimes generic property descriptors don’t make sense • Niels Basjes’ logparser project with 100+ boolean flags that are auto- detected 40 https://github.com/nielsbasjes/logparser
  • 41. © Cloudera, Inc. All rights reserved. Custom User Interface • Allows for custom UX • Validation, feedback, wizards, etc. 41
  • 42. © Cloudera, Inc. All rights reserved. Custom User Interface • Code in nifi- processor-custom-ui • Java classes for objects & logic • CSS/JS for presentation 42
  • 43. © Cloudera, Inc. All rights reserved. Custom Content Viewer • Special content viewers can be registered • Image (default) • Video • Visualization 43
  • 44. © Cloudera, Inc. All rights reserved. Custom Content Viewer • Code in nifi-format- viewer • Java classes for objects & logic • CSS/JSP for presentation 44
  • 45. © Cloudera, Inc. All rights reserved. Best practices
  • 46. © Cloudera, Inc. All rights reserved. ExecuteScript vs. InvokeScriptedProcessor • ES easier for rapid development • ISP more performant (should always be used in “production” flows) • Allows custom relationships 46
  • 47. © Cloudera, Inc. All rights reserved. Module organization • API module • If using controller services; separate the API and implementation into separate modules to allow use & extension • Can be co-located with processors for simple CS implementations 47
  • 48. © Cloudera, Inc. All rights reserved. Module configuration • Check archetype version • Check nifi-api version • Inherits from nifi-nar-bundle by default • If building outside of NiFi, can be removed 48
  • 49. © Cloudera, Inc. All rights reserved. Processor structure (con’t) • Initialize relationships and property descriptors in init() • Large allowable values collections can use builder patterns or Enum • Don’t connect to external services in onScheduled() • Use onTrigger() • Handle dynamic properties • Custom validation (see StandardValidators) 49
  • 50. © Cloudera, Inc. All rights reserved. Other processor concerns • Ensure good documentation • Handle flowfile content in streaming manner • Generate proper provenance events for interaction with external systems • Send, fetch, etc. • Define appropriate relationships 50
  • 51. © Cloudera, Inc. All rights reserved. Remember to list processors in manifest • Most common issue when not seeing processor in NiFi • src/main/resources/META-INF/ services/ • org.apache.nifi.processor.Processor 51
  • 52. © Cloudera, Inc. All rights reserved. Takeaways
  • 53. © Cloudera, Inc. All rights reserved. Dynamic vs. compiled processors 53 Processor type Pros Cons ExecuteScript/ExecuteGroovyScript • Very fast development cycle • Easy to use • Source code maintained in version control if stored internally • Poor performance • Source code not maintained if located on file system • Limited capabilities InvokeScriptedProcessor • Fast development cycle • Complex scripting capabilities • Better performance than E(G)S • Performance still far below compiled processor • Same limitations on code control and interaction with internal framework Custom Processor • Full first-class interaction with framework API • Robust testing mechanisms • High performance • Reusability • Versioning • Ease-of-use (drag & drop) • Slow development cycle • Requires boilerplate code
  • 54. © Cloudera, Inc. All rights reserved. Performance testing • GenerateFlowFile • Filled back pressure of queues • Comparing • STDIN -> rev -> STDOUT • Custom processor • Groovy script 54
  • 55. © Cloudera, Inc. All rights reserved. Performance results 55 Total Task Time (µs) Total Execution Time (µs) 0M 1.5M 3M 4.5M 6M ExecuteGroovyScript ReverseText
  • 56. © Cloudera, Inc. All rights reserved. Performance results 56
  • 57. © Cloudera, Inc. All rights reserved. Notes on performance • Very simple operation (JVM optimizing) • External service calls, loops, etc. will have bigger impact • Multi-threading / time-series dependent data can influence • Streaming from file system will impact 57
  • 58. © Cloudera, Inc. All rights reserved. Additional resources
  • 59. © Cloudera, Inc. All rights reserved. Other ideas • Build custom processor with any JVM-compatible language • Community accepts contributions in Java only • Host custom processor bundles on individual repos • Extension registry coming soon • PRs accepted recently 59
  • 60. © Cloudera, Inc. All rights reserved. Community resources • Apache NiFi Contributor Guide • Instructions on cloning repo; building; contributing code; checkstyle guide • Apache NiFi Developer Guide • Processor API; lifecycle; documenting; common patterns; errors; testing • Walkthroughs • Matt Burgess - funnifi.blogspot.com • Bryan Bende - bryanbende.com 60
  • 61. © Cloudera, Inc. All rights reserved. New Announcements • NiFi 1.9.1 — 16 Mar 2019 (167+ Jiras) • New processors (Kudu, Slack) • Hot-loading NARs • Security HTTP headers • Record processors automatically infer schema • MiNiFi C++ 0.6.0 — Voting now… • MiNiFi Java 0.5.0 — 7 July 2018 • NiFi Registry 0.3.0 — 25 Sept 2018 61
  • 62. © Cloudera, Inc. All rights reserved. 62 Learn more and join us Apache NiFi site https://nifi.apache.org Subproject MiNiFi site https://nifi.apache.org/minifi/ Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi
  • 63. © Cloudera, Inc. All rights reserved. 63 NiFi at Dataworks Summit Barcelona Title Date Speakers Room Cloudera Data in Motion Meetup - Future of Data Barcelona (free) Tuesday 3/19 1800 - 2000 Tim Spann, Dan Chaffelson, Andy LoPresto 127-128 Apache NiFi Crash Course Wednesday 3/20 1400 - 1600 Nathan Gough, Andy LoPresto 111 Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path towards DataOps 1450 - 1530 Arda Basar, Ivan Georgiev 129-130 IoT, Streaming, and Dataflow Birds of a Feather 1600 - 1730 Tim Spann, Dan Chaffelson, Purnima Reddy Kuchikulla, Andy LoPresto 129-130 Addressing Challenges with IoT Edge Management Thursday 3/21 1100 - 1140 Dinesh Chandrasekhar 127-128 Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi 1150 - 1230 Andy LoPresto 127-128 Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data 1400 - 1440 Tim Spann 129-130 BYOP: Custom Processor Development with Apache NiFi 1600 - 1640 Andy LoPresto 131-132 Apache Deep Learning 201 1650 - 1730 Tim Spann 122-123 Platform for the Research and Analysis of Cybernetic Threats 1650 - 1730 Monica Franceschini 124-125
  • 64. © Cloudera, Inc. All rights reserved. Questions
  • 65. © Cloudera, Inc. All rights reserved. THANK YOU alopresto@cloudera.com | alopresto@apache.org | @yolopey github.com/alopresto/slides