SlideShare a Scribd company logo
1 of 44
Download to read offline
NIFI DEVELOPER GUIDE
Presenter Deon Huang
2017/7/7
Agenda
• NiFi REST API
• NiFi In Depth
• NiFi developer Guide
• Custom Processor
• Contribution Sharing
NiFi REST API
• The Rest API provides programmatic access to command and control a
NiFi instance in real time.
• Start and stop processors, monitor queues, query provenance data, and
more.
NiFi REST API
What happen?
NiFi REST API
We’ve send a REST request to NiFi instance
NiFi REST API
Request URL
Component ID
Request body we actually send
NiFi REST API
• Every component in NiFi actually has a unique ID.
• Every operation to component is actually REST request to NiFi instance.
• Most of operation need to specify component ID
• https:// /nifi-api/process-groups
/015d1045-0b88-1db2-da38-cb71ac006792/process-groups
NiFi Instance URL
REST API Usage
REST Path
Unique Component ID
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO – indentify component version view to client
ProcessGroupDTO – Component body of ProcessGroup
PositionDTO – Position in canvas
• All DTO, Entity are provided.
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-client-dto</artifactId>
<version>1.1.2</version>
</dependency>
REST API Recap
• Every component in NiFi actually has a unique ID.
• Every operation to component is actually REST request to NiFi instance.
• Most of operation need to specify component ID
NiFi in Depth
• Repositories
• Life of FlowFile
FlowFile Mechanism in Depth
NiFi Architecture
NiFi Architecture
Attribute
1. HashMap in JVM
2. WAL in FlowFile Repository
Content
Immutable in disk
NiFi in Depth
• FlowFile are the heart of NiFi and its flow-based design.
• A FlowFile is a data record, Consist of a pointer to its content, attributes
and associated with provenance events
• Attribute are key/value pairs act as metadata for the FlowFile
• Content is the actual data of the file
• Provenance is a record of what has happened to the FlowFile
NiFi in Depth
• Repository are immutable.
• The benefits of this are many, including: substantial reduction in storage
space required for the typical complex graphs of processing, natural
replay capability, takes advantage of OS caching, reduces random
read/write performance hits, and is easy to reason over.
• All three repositories actually directories on local storage to persist data.
NiFi in Depth
• The FlowFile repository contains metadata for all current FlowFiles in the
flow
• The Content Repository holds the content for current and past FlowFiles
• The Provenance Repository holds the history of FlowFiles
NiFi in Depth
• FlowFiles are held in Map in JVM memory
• FlowFile metadata include
- Attributes
- A pointer to the actual contet of FlowFile
- State (Which Connection/Queue belonged in)
• FlowFile Repository act as NiFi’s “Write-Ahead Log”
• Each change happens as a transactional unit of work
NiFi in Depth
• NiFi recover a FlowFile by restoring a snapshot of the FlowFile
• A snapshot is automatically taken periodically by the system
• Compute a new base checkpoint by serializing FlowFile map into disk
with filename ‘.partial’
• Step by Step WAL in NiFi
https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-
Ahead+Log+Implementation
Content Repository
• Largest Repositories, utilize immutability and copy-on-write to maximize
speed and thread-safety
• Resource Claims are Java objects that point to specific files on disk
• The FlowFile has a “Content Claim” object
- a reference to Resource Claims
- offset of content within the file
- length of the content
Provenance Repository
• History of each FlowFile, provide Data Lineage (Chain of Custody)
• When a provenance event is created, it copies all the FlowFile’s
attributes and content pointer and stat to one location in the
Provenance Repo
• Provenance Repository design decisions
https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance
+Repository+Design
Provenance Repository
• Provenance Event
-CLONE
-ATTIBUTES_MODIFIED
-CONTENT_MODIFIED
-CREATE
-DROP
-EXPIRE
-FORK
-JOIN
-ROUTE
…
Repositories Recap
• The FlowFile repository contains metadata for all current FlowFiles in the
flow
• The Content Repository holds the content for current and past FlowFiles
• The Provenance Repository holds the history of FlowFiles
• Best practice
- Analyze contents of FlowFile as few times as possible
- Extract key information into attributes
- Update FlowFile repository is much faster than content repository
Life of FlowFile
• Data Ingress → Pass by Reference → Copy-On-Write → Data Egress
• Important aspect of flow-based programming is the resource-
constrained relationships between the black boxes.
• Route from one processor to another simply by passing a reference to
FlowFile
Pass by Reference
Funnels
Copy On Write
Update Attribute
Data Egress
• Eventually FlowFile will be “DROPPED”, no longer processing and is
available for deletion.
• Remains in the FlowFile repository until next repository checkpoint. (24
hours default) release all old content claims.
• Periodically, The Content Repo ask the Resource Claim Manager which
Resource Claims can be cleaned up.
Developer Guide
• Processor
• Reporting Task
• ControllerService
• FlowFilePrioritizer
• AuthorityProvider
Supporting API
• ProcessSession
• ProcessContext
• PropertyDesciptor
• Validator
• ValidationContext
• PropertyValue
• RelationShip
• StateManager
• ComponentLog
Proceesor Life Cycle
• Processor Initialization →
• Exposing Processor’s Relationships →
• Exposing Processor Properties →
• Validating Processor Properties →
• Triggered and Performing the Work →
• ProcessSeesion finish
Component Life Cycle
• @OnAdded →
• @OnEnabled →
• @OnRemoved →
• @OnScheduled →
• @OnUnscheduled →
• @OnStopped →
• @OnShutdown
Common Processor Patterns
• Data Ingress
• Data Egress
• Route Based on Content
• Route Based on Attribute
• Split Content
• Update Attributes Based on Content
• Enrich Modify Content
Error Handling
• ProcessException or other Exception means it is known failure
and roll back session
• Don’t catch general Exceptions, Throwable.
• Penalization vs Yielding
Session rollback
• ProcessSession provide transactionality
• Call commit() or rollback() to end session.
• Best practice is to keep simplicity
Testing
• NiFi provide mock framework for Processor testing.
Use TestRunner interface
• 1-AddControllerService if needed
runner.addControllerService()
• 2-Set Property Value
Map<String, String> attributes
attributes.put(‘property name’, ‘property value’);
• 3-Enqueue FlowFiles
runner.enqueuer(“Select ….”.getBytes(),attributes);
• 4-Run the processor
runner.run();
runner.assertAllFlowFilesTransferred(Success,1);
Recap Developer guide
• Understand life cycle of Processor
• Understand supporting component API
• Understand processor general pattern
• Understand how to handle process failure
• Understand how to test processor
Contribution preparation
• NiFi Contributor Guide
https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide
• Git Feature Branch Workflow
https://www.atlassian.com/git/tutorials/comparing-workflows
• How to Write a Git Commit Message
https://chris.beams.io/posts/git-commit/
Contribution feedback
• Don’t produce trailing whitespace
• GitHub Pull request procedure
• Commit title start with NIFI-2829
• Open Source Ci fail all the time, Don’t panic.
• Keep patient and humble for reviewers feedback.
Contribution feedback
• While dealing with Time Zone problem.
We should consider building in different time zone.
• In java 1.8, there is standard library provide great support to dealing
with Time issue in Java.
https://docs.oracle.com/javase/8/docs/api/java/time/package-
summary.html
https://magiclen.org/java-8-date-time-api/
Reference
• Official Apache NiFi
https://nifi.apache.org/
• All Micron nifi instance
http://nifi.micron.com/
• Hortonworks forum

More Related Content

What's hot

The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

What's hot (20)

Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 

Similar to NiFi Developer Guide

Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
VMware Tanzu
 
The Need For Speed - NEBytes
The Need For Speed - NEBytesThe Need For Speed - NEBytes
The Need For Speed - NEBytes
Phil Pursglove
 
The Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen CambridgeThe Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen Cambridge
Phil Pursglove
 
Extending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and FiltersExtending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and Filters
WSO2
 

Similar to NiFi Developer Guide (20)

NiFi - First approach
NiFi - First approachNiFi - First approach
NiFi - First approach
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Apache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachApache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop Approach
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
The Need For Speed - NEBytes
The Need For Speed - NEBytesThe Need For Speed - NEBytes
The Need For Speed - NEBytes
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Coherence sig-nfr-web-tier-scaling-using-coherence-web
Coherence sig-nfr-web-tier-scaling-using-coherence-webCoherence sig-nfr-web-tier-scaling-using-coherence-web
Coherence sig-nfr-web-tier-scaling-using-coherence-web
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
The Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen CambridgeThe Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen Cambridge
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Extending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and FiltersExtending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and Filters
 
What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0
 
Afs manager
Afs managerAfs manager
Afs manager
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 

NiFi Developer Guide

  • 1. NIFI DEVELOPER GUIDE Presenter Deon Huang 2017/7/7
  • 2. Agenda • NiFi REST API • NiFi In Depth • NiFi developer Guide • Custom Processor • Contribution Sharing
  • 3. NiFi REST API • The Rest API provides programmatic access to command and control a NiFi instance in real time. • Start and stop processors, monitor queues, query provenance data, and more.
  • 5. NiFi REST API We’ve send a REST request to NiFi instance
  • 6. NiFi REST API Request URL Component ID Request body we actually send
  • 7. NiFi REST API • Every component in NiFi actually has a unique ID. • Every operation to component is actually REST request to NiFi instance. • Most of operation need to specify component ID • https:// /nifi-api/process-groups /015d1045-0b88-1db2-da38-cb71ac006792/process-groups NiFi Instance URL REST API Usage REST Path Unique Component ID
  • 8. NiFi REST API • RevisionDTO
  • 9. NiFi REST API • RevisionDTO
  • 10. NiFi REST API • RevisionDTO
  • 11. NiFi REST API • RevisionDTO
  • 12. NiFi REST API • RevisionDTO – indentify component version view to client ProcessGroupDTO – Component body of ProcessGroup PositionDTO – Position in canvas • All DTO, Entity are provided. <dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-client-dto</artifactId> <version>1.1.2</version> </dependency>
  • 13. REST API Recap • Every component in NiFi actually has a unique ID. • Every operation to component is actually REST request to NiFi instance. • Most of operation need to specify component ID
  • 14. NiFi in Depth • Repositories • Life of FlowFile FlowFile Mechanism in Depth
  • 16. NiFi Architecture Attribute 1. HashMap in JVM 2. WAL in FlowFile Repository Content Immutable in disk
  • 17. NiFi in Depth • FlowFile are the heart of NiFi and its flow-based design. • A FlowFile is a data record, Consist of a pointer to its content, attributes and associated with provenance events • Attribute are key/value pairs act as metadata for the FlowFile • Content is the actual data of the file • Provenance is a record of what has happened to the FlowFile
  • 18. NiFi in Depth • Repository are immutable. • The benefits of this are many, including: substantial reduction in storage space required for the typical complex graphs of processing, natural replay capability, takes advantage of OS caching, reduces random read/write performance hits, and is easy to reason over. • All three repositories actually directories on local storage to persist data.
  • 19. NiFi in Depth • The FlowFile repository contains metadata for all current FlowFiles in the flow • The Content Repository holds the content for current and past FlowFiles • The Provenance Repository holds the history of FlowFiles
  • 20. NiFi in Depth • FlowFiles are held in Map in JVM memory • FlowFile metadata include - Attributes - A pointer to the actual contet of FlowFile - State (Which Connection/Queue belonged in) • FlowFile Repository act as NiFi’s “Write-Ahead Log” • Each change happens as a transactional unit of work
  • 21. NiFi in Depth • NiFi recover a FlowFile by restoring a snapshot of the FlowFile • A snapshot is automatically taken periodically by the system • Compute a new base checkpoint by serializing FlowFile map into disk with filename ‘.partial’ • Step by Step WAL in NiFi https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write- Ahead+Log+Implementation
  • 22. Content Repository • Largest Repositories, utilize immutability and copy-on-write to maximize speed and thread-safety • Resource Claims are Java objects that point to specific files on disk • The FlowFile has a “Content Claim” object - a reference to Resource Claims - offset of content within the file - length of the content
  • 23. Provenance Repository • History of each FlowFile, provide Data Lineage (Chain of Custody) • When a provenance event is created, it copies all the FlowFile’s attributes and content pointer and stat to one location in the Provenance Repo • Provenance Repository design decisions https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance +Repository+Design
  • 24. Provenance Repository • Provenance Event -CLONE -ATTIBUTES_MODIFIED -CONTENT_MODIFIED -CREATE -DROP -EXPIRE -FORK -JOIN -ROUTE …
  • 25. Repositories Recap • The FlowFile repository contains metadata for all current FlowFiles in the flow • The Content Repository holds the content for current and past FlowFiles • The Provenance Repository holds the history of FlowFiles • Best practice - Analyze contents of FlowFile as few times as possible - Extract key information into attributes - Update FlowFile repository is much faster than content repository
  • 26. Life of FlowFile • Data Ingress → Pass by Reference → Copy-On-Write → Data Egress • Important aspect of flow-based programming is the resource- constrained relationships between the black boxes. • Route from one processor to another simply by passing a reference to FlowFile
  • 31. Data Egress • Eventually FlowFile will be “DROPPED”, no longer processing and is available for deletion. • Remains in the FlowFile repository until next repository checkpoint. (24 hours default) release all old content claims. • Periodically, The Content Repo ask the Resource Claim Manager which Resource Claims can be cleaned up.
  • 32. Developer Guide • Processor • Reporting Task • ControllerService • FlowFilePrioritizer • AuthorityProvider
  • 33. Supporting API • ProcessSession • ProcessContext • PropertyDesciptor • Validator • ValidationContext • PropertyValue • RelationShip • StateManager • ComponentLog
  • 34. Proceesor Life Cycle • Processor Initialization → • Exposing Processor’s Relationships → • Exposing Processor Properties → • Validating Processor Properties → • Triggered and Performing the Work → • ProcessSeesion finish
  • 35. Component Life Cycle • @OnAdded → • @OnEnabled → • @OnRemoved → • @OnScheduled → • @OnUnscheduled → • @OnStopped → • @OnShutdown
  • 36. Common Processor Patterns • Data Ingress • Data Egress • Route Based on Content • Route Based on Attribute • Split Content • Update Attributes Based on Content • Enrich Modify Content
  • 37. Error Handling • ProcessException or other Exception means it is known failure and roll back session • Don’t catch general Exceptions, Throwable. • Penalization vs Yielding
  • 38. Session rollback • ProcessSession provide transactionality • Call commit() or rollback() to end session. • Best practice is to keep simplicity
  • 39. Testing • NiFi provide mock framework for Processor testing. Use TestRunner interface • 1-AddControllerService if needed runner.addControllerService() • 2-Set Property Value Map<String, String> attributes attributes.put(‘property name’, ‘property value’); • 3-Enqueue FlowFiles runner.enqueuer(“Select ….”.getBytes(),attributes); • 4-Run the processor runner.run(); runner.assertAllFlowFilesTransferred(Success,1);
  • 40. Recap Developer guide • Understand life cycle of Processor • Understand supporting component API • Understand processor general pattern • Understand how to handle process failure • Understand how to test processor
  • 41. Contribution preparation • NiFi Contributor Guide https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide • Git Feature Branch Workflow https://www.atlassian.com/git/tutorials/comparing-workflows • How to Write a Git Commit Message https://chris.beams.io/posts/git-commit/
  • 42. Contribution feedback • Don’t produce trailing whitespace • GitHub Pull request procedure • Commit title start with NIFI-2829 • Open Source Ci fail all the time, Don’t panic. • Keep patient and humble for reviewers feedback.
  • 43. Contribution feedback • While dealing with Time Zone problem. We should consider building in different time zone. • In java 1.8, there is standard library provide great support to dealing with Time issue in Java. https://docs.oracle.com/javase/8/docs/api/java/time/package- summary.html https://magiclen.org/java-8-date-time-api/
  • 44. Reference • Official Apache NiFi https://nifi.apache.org/ • All Micron nifi instance http://nifi.micron.com/ • Hortonworks forum