Learning the basics of Apache NiFi for iot OSS Europe 2020

Timothy Spann
Timothy SpannDeveloper Advocate
Learning the Basics of
Apache NiFi for IoT
Timothy Spann
Principal DataFlow Field Engineer
Cloudera
#ossummit @PaasDev
#ossummit #lfelc
Speaker - Timothy Spann
Principal DataFlow Field Engineer
@PaasDev
DZone Zone Leader and Big Data MVB
Princeton NJ Future of Data Meetup
https://github.com/tspannhw
https://www.datainmotion.dev/
#ossummit #lfelc
Future of Data - Princeton (Global via YouTube)
@Pa
asDe
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
#ossummit #lfelc
BASICS of APACHE NIFI
•A general overview of capabilitiesWhat Is NiFi
•Navigating the Apache NiFi canvasGrand Tour
•Examples of processing IoT data from edge to consumptionExample IoT Flows
#ossummit #lfelc
●
●
●
●
●
●
Learning the Basics of Apache NiFi
#ossummit #lfelc
STORAGE LAYER
sensors
IoT REFERENCE ARCHITECTURE
Apache NiFi
Apache Kafka
DATA SYNDICATION
SERVICE BY KAFKA
Kafka Topic
iot
DATA FLOW APPS
POWERED BY NIFI
Apache Impala
Deep Learning & Machine
Learning
MODEL EXECUTION
REST
#ossummit #lfelc
End to End Logs Pipeline
Routers
Databases
Firewalls
Logs
Logs
Errors
Aggregates
Alerts
Other data
ETL
Analytics
Enterprise Analysis Real Time Analytics
Complexity Reduction
Events
#ossummit #lfelc
Apache Hue
VISUALIZATION
SQL and Query Editor & Performance Diagnostics
Tool for the Cloudera Data Platform
What is Apache NiFi?
#ossummit #lfelc
Apache NiFi
●
●
●
●
●
●
●
●
#ossummit #lfelc
Apache NiFi High Level Capabilities
• Scale horizontal and vertically
• Scale your data flow to millions event/s
• Ingest TB to PB of data per day
• Adapt to your flow requirements
• Back pressure & Dynamic prioritization
• Loss tolerant vs guaranteed delivery
• Low latency vs high throughput
• Secure
• SSL, HTTPS, SFTP, etc.
• Governance and data provenance
• Extensible
• Build your own processors and Controller services (providers)
• Integrate with external systems (Security, Monitoring, Governance, etc)
#ossummit #lfelc
FLOW FILES ARE LIKE HTTP DATA
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT
2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT
2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’Value: './’
Binary Content *
Header
Content
#ossummit #lfelc
Apache NiFi
Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud,
data center) to any downstream system with built in end-to-end security and provenance
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Guaranteed Delivery
• Full data provenance
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
#ossummit #lfelc
Provenance/Lineage
#ossummit #lfelc
Prioritization
• Configure a prioritizer per
connection
• Determine what is important for
your data – time based, arrival
order, importance of a data set
• Funnel many connections down
to a single connection to
prioritize across data sets
• Develop your own prioritizer if
needed
#ossummit #lfelc
SQL BASED ROUTING WITH NiFi’s QueryRecord Processor
• QueryRecord Processor- Executes a SQL
statement against records and writes the results
to the flow file content.
• CSVReader: Looking up schema from SR, it will
converts CSV Records into ProcessRecords
• SQL execution via Apache Calcite: execute
configured SQL against the ProcessRecords for
routing
• CSVRecordSetWriter: Converts the result of
the query from Process records into CSV for the
for the flow file content
Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.
#ossummit #lfelc
STATELESS ENGINE
• Granular containers
per flow
• Flows From NiFi
Registry
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
bin/nifi.sh stateless RunFromRegistry Continuous --file kafka.json
https://github.com/apache/nifi/blob/ea1becac4fc519c54b8b4d21773e68f8da364755/nifi-nar-bundles/nifi-framework-bundle/nifi-
framework/nifi-stateless/README.md
#ossummit #lfelc
STATELESS ENGINE
• See also Parameters
• Docker
• YARN
• Kubernetes (K8)
• Stateful NiFi clusters
• Apache OpenWhisk
(FaaS)
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
{"registryUrl": "http://tspann-mbp15-hw14277:18080",
"bucketId": "140b30f0-5a47-4747-9021-19d4fde7f993",
"flowId": "0540e1fd-c7ca-46fb-9296-e37632021945",
"ssl": {
"keystoreFile": "","keystorePass": "","keyPass": "","keystoreType": "",
"truststoreFile":
"/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib/sec
urity/cacerts",
"truststorePass": "changeit", "truststoreType": "JKS"
},
"parameters": {
"broker" : "4.317.852.100:9092",
"topic" : "iot",
"group_id" : "nifi-stateless-kafka-consumer",
"DestinationDirectory" : "/tmp/nifistateless/output2/",
"output_dir": "/Users/tspann/Documents/nifi-1.10.0-SNAPSHOT/logs/output"
}
}
https://github.com/tspannhw/stateless-examples
#ossummit #lfelc
PARAMETER CONTEXT
• Parameters
• Parameter
Context
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
#ossummit #lfelc
PARAMETERS
• Parameters
• Parameter
Context
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
#ossummit #lfelc
RETRYFLOWFILE
• Configurable Retries
• Maximum #
• Penalties
• When to Fail
• Reuse Mode
https://medium.com/@abdelkrim.hadjidj/apache-nifi-1-10-series-simplifying-error-handling-7de86f130acd
#ossummit #lfelc
BACKPRESSURE PREDICTION
OrdinaryLeastSquares
SimpleRegression
Enable analytics feature
http://lonnifi.blogspot.com/2019/11/back-pressure-prediction-deep-dive.html?es_id=5233333939
https://youtu.be/Tt8TSlHu7PE
#ossummit #lfelc
PARQUET READER AND WRITER
• Native Record Processors for Apache
Parquet Files!
• CSV <-> Parquet
• XML <-> Parquet
• AVRO <-> Parquet
• JSON <-> Parquet
• More...
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apac
he_7.html
#ossummit #lfelc
MANY OTHER FEATURES
• Prometheus Reporting Task
• Experimental Encrypted content
repository
• PublishKafka Partition Support
• Toolkit module to generate and build
Swagger
• GeoEnrichIPRecord Processor
• Command Line Diagnostics
• RocksDB FlowFile Repository
• PutBigQueryStreaming Processor
• Enhanced DevOps and CD/CI
ELT/ETL Lookup Services
• DatabaseRecordLookupService
• KuduLookupService
• HBase_2_ListLookupService
#ossummit #lfelc
Scalable and distributed architecture
#ossummit #lfelc
NiFi Flow Registry
#ossummit #lfelc
Example of NiFi Transformations
Data enrichment
Enrich events by adding the classification
based on the host
Use reference lookup table from a CSV file
[ {
  "time" : ”7845800765",
  "host" : ”web-...",
  "sourcetype" : ”cpu_resource_usage",
  "source" : "...",
  "index" : "_metrics",
  "meta" : "...",
  "event" : "..."}}",
  "classification" : internal
}, 
...
[ {
  "time" : ”7845800765",
  "host" : ”web-...",
  "sourcetype" : ”cpu_resource_usage",
  "source" : "...",
  "index" : "_metrics",
  "meta" : "...",
  "event" : "..."}}",
  "classification" : null
}, 
...
#ossummit #lfelc
INGEST RDBMS TABLES
https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your-Hadoop-DataLake/ta-p/247927
https://community.cloudera.com/t5/Community-Articles/Ingesting-Golden-Gate-Rec
ords-From-Apache-Kafka-and/ta-p/247557
https://community.cloudera.com/t5/Community-Articles/Incremental-Fetch-in-NiFi-
with-QueryDatabaseTable/ta-p/247073
#ossummit #lfelc
EXAMPLE IoT Flows
#ossummit #lfelc
IoT Reference Architecture
STORAGE LAYER
sensors
Apache NiFi
Apache Kafka
DATA SYNDICATION
SERVICE BY KAFKA
Kafka Topic
iot
DATA FLOW APPS
POWERED BY NIFI
Apache Impala
Deep Learning & Machine
Learning
MODEL EXECUTION
REST
#ossummit #lfelc
Best Practices
https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
● Reduce, Reuse, Recycle. Use Parameters to reuse
common modules.
● Put flows, reusable chunks into separate Process
Groups.
● Write custom processors if you need new or
specialized features
● Use Cloudera supported NiFi Processors
● Use Record Processors everywhere
#ossummit #lfelc
Cloudera Communities
Got questions? Leverage community.cloudera.com
Join our meetup:
www.meetup/pro/futureofdata
Learning the basics of Apache NiFi for iot OSS Europe 2020
1 of 33

Recommended

Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu) by
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)Timothy Spann
559 views42 slides
Using apache mx net in production deep learning streaming pipelines by
Using apache mx net in production deep learning streaming pipelinesUsing apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelinesTimothy Spann
455 views16 slides
ApacheCon 2021: Apache NiFi 101- introduction and best practices by
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practicesTimothy Spann
887 views22 slides
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP) by
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
355 views32 slides
Api world apache nifi 101 by
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101Timothy Spann
523 views33 slides
Cracking the nut, solving edge ai with apache tools and frameworks by
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
713 views25 slides

More Related Content

What's hot

Music city data Hail Hydrate! from stream to lake by
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
708 views37 slides
Real-time Streaming Pipelines with FLaNK by
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKData Con LA
871 views22 slides
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r... by
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
553 views37 slides
Incrementally streaming rdbms data to your data lake automagically by
Incrementally streaming rdbms data to your data lake automagicallyIncrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagicallyTimothy Spann
624 views18 slides
Spark optimization by
Spark optimizationSpark optimization
Spark optimizationAnkit Beohar
484 views6 slides
Using FLiP with influxdb for edgeai iot at scale 2022 by
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
465 views61 slides

What's hot(20)

Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
Real-time Streaming Pipelines with FLaNK by Data Con LA
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
Data Con LA871 views
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r... by Timothy Spann
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann553 views
Incrementally streaming rdbms data to your data lake automagically by Timothy Spann
Incrementally streaming rdbms data to your data lake automagicallyIncrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagically
Timothy Spann624 views
Using FLiP with influxdb for edgeai iot at scale 2022 by Timothy Spann
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann465 views
Codeless pipelines with pulsar and flink by Timothy Spann
Codeless pipelines with pulsar and flinkCodeless pipelines with pulsar and flink
Codeless pipelines with pulsar and flink
Timothy Spann658 views
Let's build a simple ingest to cloud datawarehouse with low code by Timothy Spann
Let's build a simple ingest to cloud datawarehouse with low codeLet's build a simple ingest to cloud datawarehouse with low code
Let's build a simple ingest to cloud datawarehouse with low code
Timothy Spann416 views
Matt Franklin - Apache Software (Geekfest) by W2O Group
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group2K views
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka... by Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Timothy Spann519 views
Data science online camp using the flipn stack for edge ai (flink, nifi, pu... by Timothy Spann
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Timothy Spann1K views
ApacheCon 2021 - Apache NiFi Deep Dive 300 by Timothy Spann
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann690 views
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by Timothy Spann
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Timothy Spann571 views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
Cloud lunch and learn real-time streaming in azure by Timothy Spann
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann663 views
fluentd -- the missing log collector by Muga Nishizawa
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
Muga Nishizawa2.2K views
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends by Timothy Spann
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann986 views
Using the FLiPN stack for edge ai (flink, nifi, pulsar) by Timothy Spann
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Timothy Spann450 views

Similar to Learning the basics of Apache NiFi for iot OSS Europe 2020

Introduction to Apache NiFi 1.11.4 by
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
1.1K views32 slides
Running High-Speed Serverless with nuclio by
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioiguazio
3.5K views22 slides
Workshop: Big Data Visualization for Security by
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityRaffael Marty
22.1K views59 slides
Attack monitoring using ElasticSearch Logstash and Kibana by
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
67.6K views59 slides
Headless approach for offloading heavy tasks in Magento by
Headless approach for offloading heavy tasks in MagentoHeadless approach for offloading heavy tasks in Magento
Headless approach for offloading heavy tasks in MagentoSander Mangel
1.1K views31 slides
Hail hydrate! from stream to lake using open source by
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
569 views25 slides

Similar to Learning the basics of Apache NiFi for iot OSS Europe 2020(20)

Introduction to Apache NiFi 1.11.4 by Timothy Spann
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann1.1K views
Running High-Speed Serverless with nuclio by iguazio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
iguazio3.5K views
Workshop: Big Data Visualization for Security by Raffael Marty
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
Raffael Marty22.1K views
Attack monitoring using ElasticSearch Logstash and Kibana by Prajal Kulkarni
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
Prajal Kulkarni67.6K views
Headless approach for offloading heavy tasks in Magento by Sander Mangel
Headless approach for offloading heavy tasks in MagentoHeadless approach for offloading heavy tasks in Magento
Headless approach for offloading heavy tasks in Magento
Sander Mangel1.1K views
Hail hydrate! from stream to lake using open source by Timothy Spann
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
Timothy Spann569 views
Managing Your Security Logs with Elasticsearch by Vic Hargrave
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
Vic Hargrave6.1K views
Lares from LOW to PWNED by Chris Gates
Lares from LOW to PWNEDLares from LOW to PWNED
Lares from LOW to PWNED
Chris Gates6.4K views
Neo4j Database and Graph Platform Overview by Neo4j
Neo4j Database and Graph Platform OverviewNeo4j Database and Graph Platform Overview
Neo4j Database and Graph Platform Overview
Neo4j707 views
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi by DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit2.1K views
Real time cloud native open source streaming of any data to apache solr by Timothy Spann
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann759 views
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka by Guido Schmutz
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz3.4K views
20181215 introduction to graph databases by Timothy Findlay
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databases
Timothy Findlay55 views
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto by Docker, Inc.
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Docker, Inc.29.4K views
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee... by Daniel Czerwonk
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
Daniel Czerwonk134 views
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ... by it-people
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...
it-people137 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
iguazio - nuclio overview to CNCF (Sep 25th 2017) by Eran Duchan
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
Eran Duchan351 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
23 views43 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
Best Practices For Workflow by Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 views

Recently uploaded

Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...ShapeBlue
199 views20 slides
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream by
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
38 views34 slides
LLMs in Production: Tooling, Process, and Team Structure by
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
57 views77 slides
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueShapeBlue
139 views15 slides
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...ShapeBlue
108 views12 slides
Optimizing Communication to Optimize Human Behavior - LCBM by
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBMYaman Kumar
38 views49 slides

Recently uploaded(20)

Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue139 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue247 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue152 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue208 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue141 views

Learning the basics of Apache NiFi for iot OSS Europe 2020

  • 1. Learning the Basics of Apache NiFi for IoT Timothy Spann Principal DataFlow Field Engineer Cloudera #ossummit @PaasDev
  • 2. #ossummit #lfelc Speaker - Timothy Spann Principal DataFlow Field Engineer @PaasDev DZone Zone Leader and Big Data MVB Princeton NJ Future of Data Meetup https://github.com/tspannhw https://www.datainmotion.dev/
  • 3. #ossummit #lfelc Future of Data - Princeton (Global via YouTube) @Pa asDe https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 4. #ossummit #lfelc BASICS of APACHE NIFI •A general overview of capabilitiesWhat Is NiFi •Navigating the Apache NiFi canvasGrand Tour •Examples of processing IoT data from edge to consumptionExample IoT Flows
  • 6. #ossummit #lfelc STORAGE LAYER sensors IoT REFERENCE ARCHITECTURE Apache NiFi Apache Kafka DATA SYNDICATION SERVICE BY KAFKA Kafka Topic iot DATA FLOW APPS POWERED BY NIFI Apache Impala Deep Learning & Machine Learning MODEL EXECUTION REST
  • 7. #ossummit #lfelc End to End Logs Pipeline Routers Databases Firewalls Logs Logs Errors Aggregates Alerts Other data ETL Analytics Enterprise Analysis Real Time Analytics Complexity Reduction Events
  • 8. #ossummit #lfelc Apache Hue VISUALIZATION SQL and Query Editor & Performance Diagnostics Tool for the Cloudera Data Platform
  • 11. #ossummit #lfelc Apache NiFi High Level Capabilities • Scale horizontal and vertically • Scale your data flow to millions event/s • Ingest TB to PB of data per day • Adapt to your flow requirements • Back pressure & Dynamic prioritization • Loss tolerant vs guaranteed delivery • Low latency vs high throughput • Secure • SSL, HTTPS, SFTP, etc. • Governance and data provenance • Extensible • Build your own processors and Controller services (providers) • Integrate with external systems (Security, Monitoring, Governance, etc)
  • 12. #ossummit #lfelc FLOW FILES ARE LIKE HTTP DATA HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’Value: './’ Binary Content * Header Content
  • 13. #ossummit #lfelc Apache NiFi Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 15. #ossummit #lfelc Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  • 16. #ossummit #lfelc SQL BASED ROUTING WITH NiFi’s QueryRecord Processor • QueryRecord Processor- Executes a SQL statement against records and writes the results to the flow file content. • CSVReader: Looking up schema from SR, it will converts CSV Records into ProcessRecords • SQL execution via Apache Calcite: execute configured SQL against the ProcessRecords for routing • CSVRecordSetWriter: Converts the result of the query from Process records into CSV for the for the flow file content Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.
  • 17. #ossummit #lfelc STATELESS ENGINE • Granular containers per flow • Flows From NiFi Registry https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html bin/nifi.sh stateless RunFromRegistry Continuous --file kafka.json https://github.com/apache/nifi/blob/ea1becac4fc519c54b8b4d21773e68f8da364755/nifi-nar-bundles/nifi-framework-bundle/nifi- framework/nifi-stateless/README.md
  • 18. #ossummit #lfelc STATELESS ENGINE • See also Parameters • Docker • YARN • Kubernetes (K8) • Stateful NiFi clusters • Apache OpenWhisk (FaaS) https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html {"registryUrl": "http://tspann-mbp15-hw14277:18080", "bucketId": "140b30f0-5a47-4747-9021-19d4fde7f993", "flowId": "0540e1fd-c7ca-46fb-9296-e37632021945", "ssl": { "keystoreFile": "","keystorePass": "","keyPass": "","keystoreType": "", "truststoreFile": "/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib/sec urity/cacerts", "truststorePass": "changeit", "truststoreType": "JKS" }, "parameters": { "broker" : "4.317.852.100:9092", "topic" : "iot", "group_id" : "nifi-stateless-kafka-consumer", "DestinationDirectory" : "/tmp/nifistateless/output2/", "output_dir": "/Users/tspann/Documents/nifi-1.10.0-SNAPSHOT/logs/output" } } https://github.com/tspannhw/stateless-examples
  • 19. #ossummit #lfelc PARAMETER CONTEXT • Parameters • Parameter Context https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 20. #ossummit #lfelc PARAMETERS • Parameters • Parameter Context https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 21. #ossummit #lfelc RETRYFLOWFILE • Configurable Retries • Maximum # • Penalties • When to Fail • Reuse Mode https://medium.com/@abdelkrim.hadjidj/apache-nifi-1-10-series-simplifying-error-handling-7de86f130acd
  • 22. #ossummit #lfelc BACKPRESSURE PREDICTION OrdinaryLeastSquares SimpleRegression Enable analytics feature http://lonnifi.blogspot.com/2019/11/back-pressure-prediction-deep-dive.html?es_id=5233333939 https://youtu.be/Tt8TSlHu7PE
  • 23. #ossummit #lfelc PARQUET READER AND WRITER • Native Record Processors for Apache Parquet Files! • CSV <-> Parquet • XML <-> Parquet • AVRO <-> Parquet • JSON <-> Parquet • More... https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apac he_7.html
  • 24. #ossummit #lfelc MANY OTHER FEATURES • Prometheus Reporting Task • Experimental Encrypted content repository • PublishKafka Partition Support • Toolkit module to generate and build Swagger • GeoEnrichIPRecord Processor • Command Line Diagnostics • RocksDB FlowFile Repository • PutBigQueryStreaming Processor • Enhanced DevOps and CD/CI ELT/ETL Lookup Services • DatabaseRecordLookupService • KuduLookupService • HBase_2_ListLookupService
  • 25. #ossummit #lfelc Scalable and distributed architecture
  • 27. #ossummit #lfelc Example of NiFi Transformations Data enrichment Enrich events by adding the classification based on the host Use reference lookup table from a CSV file [ {   "time" : ”7845800765",   "host" : ”web-...",   "sourcetype" : ”cpu_resource_usage",   "source" : "...",   "index" : "_metrics",   "meta" : "...",   "event" : "..."}}",   "classification" : internal },  ... [ {   "time" : ”7845800765",   "host" : ”web-...",   "sourcetype" : ”cpu_resource_usage",   "source" : "...",   "index" : "_metrics",   "meta" : "...",   "event" : "..."}}",   "classification" : null },  ...
  • 28. #ossummit #lfelc INGEST RDBMS TABLES https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your-Hadoop-DataLake/ta-p/247927 https://community.cloudera.com/t5/Community-Articles/Ingesting-Golden-Gate-Rec ords-From-Apache-Kafka-and/ta-p/247557 https://community.cloudera.com/t5/Community-Articles/Incremental-Fetch-in-NiFi- with-QueryDatabaseTable/ta-p/247073
  • 30. #ossummit #lfelc IoT Reference Architecture STORAGE LAYER sensors Apache NiFi Apache Kafka DATA SYNDICATION SERVICE BY KAFKA Kafka Topic iot DATA FLOW APPS POWERED BY NIFI Apache Impala Deep Learning & Machine Learning MODEL EXECUTION REST
  • 31. #ossummit #lfelc Best Practices https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html ● Reduce, Reuse, Recycle. Use Parameters to reuse common modules. ● Put flows, reusable chunks into separate Process Groups. ● Write custom processors if you need new or specialized features ● Use Cloudera supported NiFi Processors ● Use Record Processors everywhere
  • 32. #ossummit #lfelc Cloudera Communities Got questions? Leverage community.cloudera.com Join our meetup: www.meetup/pro/futureofdata