FIWARE Global Summit - Real-time Processing of Historic Context Information using Apache Flink
1. FIWARE Cosmos
Real-time Processing of Historic Context Information using Apache Flink
Sonsoles López (slopez@dit.upm.es)
Andres Muñoz (jamunoz@dit.upm.es)
Joaquin Salvachua (jsalvachua@dit.upm.es)
Universidad Politécnica de Madrid
@sonsoleslp, @jsalvachua, @FIWARE
2. What is Cosmos?
The Cosmos Generic Enabler simplifies Big Data analysis of context data
and integrates with some of the many popular Big Data platforms.
3. Old Cosmos Platform
Features
✔ Batch Processing (only)
✔ HDFS for file storage
✔ Map Reduce Jobs (only)
χ NO direct connection
with Orion
χ NO direct ingestion of
data
4. New Cosmos Platform
Features
✔ Batch Processing
✔ Stream Processing (Real-time)
✔ Direct data ingestion
✔ Direct connection with Orion
✔ Multiple Sinks
Orion
Context
Broker
COSMOS
DB HDFS
Web
service
Interface with the Internet of Things
(IoT), Robots and third-party systems
8. OrionSource
Receives data from the Orion Context Broker from a given port.
The received data is a Stream of NgsiEvent objects
val eventStream = env.addSource(new OrionSource(9001))
9. OrionSink
Sends data back to the Orion Context Broker
Takes a stream of OrionSinkObjects as a source:
● content: Message content in String format. If it is a JSON, it needs to
be stringified
● url: URL to which the message should be sent
● contentType: Type of HTTP content of the message (JSON, Plain)
● method: HTTP method of the message (POST, PUT, PATCH)
OrionSink.addSink( processedDataStream )
11. def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create Orion Source. Receive notifications on port 9001
val eventStream = env.addSource(new OrionSource(9001))
// Process event stream
val processedDataStream = eventStream
.flatMap(event => event.entities)
.map(entity => {
val temp = entity.attrs("temperature").value.asInstanceOf[Number].floatValue()
new Temp_Node( entity.id, temp)
})
.keyBy("id")
.timeWindow(Time.seconds(10))
.aggregate(new Average)
// print the results with a single thread, rather than in parallel
processedDataStream.print().setParallelism(1)
env.execute("Socket Window NgsiEvent")
}
Demo: Average temperature for each entity
12. def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create Orion Source. Receive notifications on port 9001
val eventStream = env.addSource(new OrionSource(9001))
// Process event stream
val processedDataStream = eventStream
.flatMap(event => event.entities)
.map(entity => {
val temp = entity.attrs("temperature").value.asInstanceOf[Number].floatValue()
new Temp_Node( entity.id, temp)
})
.timeWindowAll(Time.seconds(10))
.max("temperature")
// print the results with a single thread, rather than in parallel
processedDataStream.print().setParallelism(1)
env.execute("Socket Window NgsiEvent")
}
Demo: Maximum temperature overall
14. Data Usage Control
Processing
Engines
Define Access/ Usage
Control Policies
Storage
Systems
PDP / PAP
(IDM Keyrock)
PXP/PDP
policy rules
ODRL policies
Stored Data
“Real-Time”
Data
Shared Data
Usage Control
Ongoing Decisions
Data-processing
Engine Traces
Data Consumer
Data Provider
https://github.com/ging/fiware-usage-control
15. Roadmap
● Short term
ー Connector for Spark and examples (alpha version: finishing up!)
ー Step by step tutorial both for Flink and Spark
● Medium term
ー Support for NGSI-LD
ー Custom Docker images
● Long term
ー Apache Atlas and Ranger