My talk at yesterdays AWS Usergroup meetup in Berlin gave the audience an introduction to the concepts and features of Apache NiFi as well as to the capabilities of this product regarding integration of AWS IoT.
2. Agenda
1. Why Apache NiFi?
2. That‘s Apache NiFi - Exploring the UI
3. AWS Integration Capabilities
4. AWS IoT – Basics (Recap)
5. Apache NiFi and AWS IoT
Seite 2
Apache NiFi & AWS | Kay Lerch
3. Why Apache NiFi
A brief overview of data processing and analysis
Seite 3
Apache NiFi & AWS | Kay Lerch
4. A brief overview of data processing and analysis
Stone age: no tooling at all
Seite 4
Data
Producers
Data
Consumers
Potential
Bottleneck
Integration
challenges
Left alone with
analytic challenges
unreliable
delivery
Apache NiFi & AWS | Kay Lerch
IoAT
5. A brief overview of data processing and analysis
Bronze age: invent the wheel (event broker) for reliable (message) transportation
Seite 5
Data
Producers
Data
Consumers
Event
Broker
Limited durability
Hidden complexities
Left alone with
analytic challenges
Apache NiFi & AWS | Kay Lerch
6. A brief overview of data processing and analysis
Industrialization: stores for massive production of durable yet unstructured information
Seite 6
Hidden complexities
Data
Producers
Data
Consumers
Event
Broker
Ingestionchallenges
Realtimelag
Left alone with
analytic challenges
Data analysis challengesData processing challenges
Data security challenges
(Big) data stores
Apache NiFi & AWS | Kay Lerch
7. A brief overview of data processing and analysis
Digital age: realtime processing and analysis of (streaming) data
Seite 7
Hidden complexities
Data
Producers
Data
Consumers
Event
Broker
(Big) data stores
Ingestionchallenges
Data analysis challenges
(Realtime) Data Processing & Analytics
Apache NiFi & AWS | Kay Lerch
Integration
challenges
Data security
challenges
8. A brief overview of data processing and analysis
That‘s quite a lot of tooling and technology …
Seite 8
Hidden complexities
Data
Producers
Data
Consumers
Event
Broker
(Big) data stores
Data analysis challenges
(Realtime) Data Processing & Analytics
Apache NiFi & AWS | Kay Lerch
Integration
challenges
Data security challenges
Ingestionchallenges
Data security
challenges
9. A brief overview of data processing and analysis
That‘s quite a lot of tooling and technology …
Seite 9
Hidden complexities
Data
Producers
Data
Consumers
Event
Broker
(Big) data stores
Data analysis challenges
(Realtime) Data Processing & Analytics
Apache NiFi & AWS | Kay Lerch
Integration
challenges
Ingestionchallenges
Data security
challenges
If you want …
a (realtime) big picture of your dataflows
an option to overlook lineage of each data element
have the flexibility to change things on the fly
prioritize data
overcome challenges of integrating the variety of
technologies with one overarching solution
enforce security and compliance along dataflows
rely on extensibility driven by OS community
satisfy all those needs and keep your tools
get rid of only those tools focused on moving data without
making concessions to overall performance
… then you might love:
11. Seite 11
That‘s Apache NiFi
in one page
Apache nifi supports powerful and scalable directed graphs of data
routing, transformation, and system mediation logic.
Web-based user interface
Seamless experience between design, control, feedback, and monitoring
Highly configurable
Loss tolerant vs guaranteed delivery, Low latency vs high throughput, Dynamic prioritization, Flow can be modified at runtime, Back pressure
Data Provenance
Track dataflow from beginning to end
Designed for extension
Build your own processors and more, Enables rapid development and effective testing
Secure
SSL, SSH, HTTPS, encrypted content, etc..., Pluggable role-based authentication/authorization
Apache NiFi & AWS | Kay LerchSource: https://nifi.apache.org/
12. Seite 12
That‘s Apache NiFi
in real and feel
Go to NiFi’s interface and understand:
Processors
Templates
Concept of back pressure
Concept of data prioritization
Provenence Graph
Apache NiFi & AWS | Kay Lerch
13. NiFi Cluster
NiFi Cluster
Manager (NCM)
JVM
Node (Primary)
NiFi Clustered Architecture
JVM
Webserver
Provenance
Repository
Content
Repository
Flowfile
Repository
REST-APIAdminUI
Webserver REST-APIAdmin UI
Flow Controller
Cluster Manager
Processor 1
Processor 2
Isolated
Processor
Controller Service 1
Controller Service 2
Controller Service n
Heartbeat
Leader
election
Report
change
Embedded Apache Zookeeper
Node (Slave)
JVM
Webserver
Provenance
Repository
Content
Repository
Flowfile
Repository
REST-APIAdminUI
Flow Controller
Processor 1
Processor 2
Isolated
Processor
Controller Service 1
Controller Service 2
Controller Service n
Heartbeat
Report
change
Sync
State
Sync
State
17. Seite 17
AWS IoT
The Shadow
AWS IoT
Thing
Thing
Shadow
Rule
Reports State Mirrors State
in Shadow
Gets reported
state or sets
desired state
Propagates
desired state
Receives
desired state
Fulfills
desiredstate
Subscribes to
particular messages
AWS Services
Some
AWS
Resource
Routes
message
TLS
1.2
TLS
1.2 Policy
Apache NiFi & AWS | Kay Lerch
18. AWS IoT
MQTT topics
AWS IoT
Thing
Shadow
get
get/
accepted
get/
rejected
Request state
Get shadow state
Get error
update
update/
accepted
update/
rejected
update/
delta
Update state
Confirmation
Get error
Changed
state
1
2
1
2
3
Thing topics name pattern: $aws/things/thing_name/...
Apache NiFi & AWS | Kay Lerch
19. Apache NiFi & AWS IoT
New processors
Seite 19
Apache NiFi & AWS | Kay Lerch
20. Seite 20
Apache NiFi & AWS IoT
Where NiFi comes in
If your managed services you want to integrate with your „things“ run on AWS you are good to go => Thing rules
If not, you need either an MQTT client (=> live data) or an application which communicates with managed AWS API (for shadow data)
AWS announced MQTT over WebSockets in January 2016
Which means you’re not limited to TLS connections anymore
Establish durable connection to AWS IoT endpoint
Then talk MQTT over websockets in order to subscribe or publish to the thing topics
AWS service limit on connection duration: 300 seconds
You need a way to reconnect your client to hold your MQTT subscriptions
NiFi processors have potential to become MQTT clients __|
Apache NiFi & AWS | Kay Lerch
21. Seite 21
Apache NiFi & AWS IoT
GetIOTMqtt – a MQTT client
AWS IoT
Thing
Shadow
update
Update state
Establish
Connection
Subscribe
Receive state
1
2
3
Flow
file
Apache NiFi & AWS | Kay Lerch
22. Apache NiFi & AWS IoT
GetIOTMqtt – Reconnect accordingly
First of all: I don’t want to wait for the auto-termination. I want to act upfront
AWS IoT does not support persistent client sessions
Therefore:
If disconnecting and then reconnecting there is a short gap in which I probably miss a message
If a reconnect and then disconnect there is a short gap in which I probably receive messages twice
Fortunately one of these effects is officially accepted by the client anyways due to the quality of service level
if a subscription is desired with QoS=0 (“at most once message delivery”)
=> disconnect, then reconnect
=> maybe message loss
=> that’s fine
if a subscription is desired with QoS=1 (“at least one message delivery”)
=> reconnect, then disconnect
=> maybe duplicate message
=> that’s fine
QoS=2 (“exact one message delivery”) is not supported by AWS IoT __|
Session 1 Session 2
connect close connect
Potential
message loss
Session 1
Session 2
Potential
duplicates
connect closeconnect
Session 3
closeconnect
Potential
duplicates
close connect
Potential
message loss
23. Seite 23
Apache NiFi & AWS IoT
GetIOTMqtt – Configuration
Apache NiFi & AWS | Kay Lerch
24. Seite 24
Apache NiFi & AWS IoT
GetIOTMqtt – Live demo
Apache NiFi & AWS | Kay Lerch
25. Seite 25
Apache NiFi & AWS IoT
GetIOTMqtt – Live demo
Apache NiFi & AWS | Kay Lerch
26. Seite 26
Apache NiFi & AWS IoT
PutIOTMqtt – instruct a „thing“ (but bypass the shadow)
AWS IoT
Thing
Shadow
update /
delta
Update state
Establish
Connection
Publish state
1
2
Flow
file
Flow
file
Apache NiFi & AWS | Kay Lerch
27. Seite 27
Apache NiFi & AWS IoT
PutIOTMqtt – Configuration
Apache NiFi & AWS | Kay Lerch
28. Seite 28
Apache NiFi & AWS IoT
PutIOTMqtt – Live demo
Apache NiFi & AWS | Kay Lerch
29. Seite 29
Apache NiFi & AWS IoT
GetIOTShadow – constantly check last reported state
AWS IoT
Thing
Shadow
update
Report state
Request
Shadow
Flow
file
Flow
file
Apache NiFi & AWS | Kay Lerch
30. Seite 30
Apache NiFi & AWS IoT
GetIOTShadow – Configuration
Apache NiFi & AWS | Kay Lerch
31. Seite 31
Apache NiFi & AWS IoT
PutIOTShadow – instruct a „thing“ over its shadow
AWS IoT
Thing
Shadow
update /
delta
Desire state
Update
Shadow
Flow
file
Flow
file
Apache NiFi & AWS | Kay Lerch
32. Seite 32
Apache NiFi & AWS IoT
PutIOTShadow – Configuration
Apache NiFi & AWS | Kay Lerch
33. Seite 33
More to come
MiNiFi (lightweight agent as data collectors)
Variable registry
Improvement on HA / Cluster management
Multi tenancy
More Processors
Extension registry (choose nar’s from a central repository)
Apache NiFi & AWS | Kay Lerch
34. www.immobilienscout24.de
Thanks for you attention. Any questions?
Contact:
Immobilien Scout GmbH
Andreasstraße 10
10243 Berlin
Kay Lerch
Fon +49 30 24 301-1149
kay.lerch@immobilienscout24.de
Editor's Notes
Kafka, RabbitMQ, SNS, TIBCO
HBase, HDFS, Cassandra, S3, Dynamo DB, Redshift, RDS