BLOCK-BASED REAL-TIME BIG-
DATA PROCESSING FOR SMART
CITIES
Introducing the
ALMANAC Data
Fusion Language
Dario Bonino, Rizzo F., Pastrone C., Carvajal Soto J.A., Ahlsen
M., Axling M.
AGENDA
 Background
 Goal statement
 Basic Principles
 Block-based Data Fusion Language
 Templates
 Wildcard Template binding
 Preliminary Results
 Conclusions & Future Works
BACKGROUND
Context in which the
work is deployed and
basic assumptions
SMART CITIES
 Data intensive
 Huge amount of sensors deployed
 (>>1000 sensors / devices)
 Moderately high sampling frequencies
 (depends on applications)
 Several actors involved
 City administrators
 Service providers
 Utilities
 Citizens
DATA HANDLING
Technology capable
to
 Handle in real-time>> 1000
datapoints /s
 E.g. Complex Event Processing
(CEP)
 Store and Handle >> 100k
datapoints*
 E.g. NoSQL and or Time Series
Databases
Requires
programming by
experts
select
avg(cast(waterMeter.resultValue,
double)) from
Observation.win:time(600
sec).std:unique(datastream.id) as
waterMeter where
(waterMeter.sensor.metadata like
'%FlowSensor' or
waterMeter.resultType='Flo')
However
 Knowledge about “relevant” processing is
owned by not experts
 Utility operators
 City Administrators (low/mid level)
 It often unfeasible/not scalable to
 Set dedicated positions for CEP experts to address
Smart City processing needs
DATA HANDLING
GOAL
Enable not experts to
handle CEP in Smart
Cities
GOAL
Enable Not Experts to
 Define Complex Event Processing queries with
NO knowledge of any CEP language
 And to some extent with no knowledge of the
underlying theory
 Support effectively Smart City deployment
 Huge amount of data streams to process
 Same “kind” of queries may possibly be applied to
many (all) city sensors
 e.g., data smoothing (moving average).
 Single queries might involve a number of data streams
not know a priori
BASIC PRINCIPLES Block-based CEP
BLOK-BASED CEP (FOR NOT
EXPERTS)
Based on previous work (SpChains)
 Processing Block
 Single, parametrized query
 Cascading
 Permits to concatenate simple blocks to obtain
complex queries
BLOK-BASED CEP (FOR NOT
EXPERTS)
Pros
 Easy to understand
 Easy to compose complex
queries
 Almost no knowledge on CEP
needed
 Good performance
 > 20k event/s for complex (>3 stages)
queries
 Peak performance around 150k event/s
Cons
 Applies to named streams only
 Streams must be a priori known
 Stream shall be “manually” selected
 Difficult to scale to smart city
scenarios
BLOCK-BASED DATA
FUSION LANGUAGE
Extend SpChains to
support Smart City
CEP
BLOCK-BASED DATA FUSION
LANGUAGE
Based on SpChains, introduces 2
extensions for smart city scenarios
 Template
 Special class of processing chain with “free” input /
ouput (source/drain)
 Wild-card template binding
 Flexible algorithm to bind data sources and
processing chain inputs
 Carried when queries are deployed
 Based on data stream metadata
 Declarative description of “allowed streams”
TEMPLATE
Free parameters filled when
The template is instantiated
Can be seen as a way to define custom bloc
TEMPLATE
{
” id ”: ” bad smell template ” ,
” blocks ” : [
{
” id ” : ” Th1_genid ” ,
” function ” :” threshold ” ,
” params ” : [
{”name” : ” threshold ” , ” value ” : ”80” , ”uom” : ”%”},
{”name” : ”mode ” , ” value ” : ” rising”}]
} ,
{
” i d ” : ” Th2_genid ” ,
” function ” :” threshold ” ,
” params ” : [
{”name” : ” threshold ” , ” value ” : ”35” , ”uom” : ”Celsius”} ,
{”name” : ”mode ” , ”value” : ”rising”}]
} ,
{
”id ” : ”And_genid ” ,
” function” : ”and ”
}
],
”connections ” : [
{
”from ” : {”blockId” : ”Th1_genid” , ”ioId” : ”out”} ,
”to” : {”blockId” : ”And_genid” , ”ioId” : ”in1”}
} ,
{
”from” : {”blockId ” : ”Th2_genid” , ”ioId” : ”out”} ,
”to” : {”blockId” : ”And_genid” , ”ioId” : ”in2”}
}] ,
”inputs ”: [
{”blockId” : ”Th1_genid” , ”port” : ”in” , ”ioId” : ”$inLevel ”} ,
{”blockId” : ”Th2_genid” , ”port” : ”in” , ”ioId” : ”$inTemperature”} ”outputs”: [
{”blockId” : ”And ” , ”port” : ”out” , ”ioId” : ”$smell ”}]
} ,
]
WILD-CARD TEMPLATE
BINDING
Defines bindings for chain
inputs (templates) at
deployment time
Two versions
 Simple
 Single source queries
 Full
 Multiple source queries
Declarative definition in
the “bindings” section of a
chain specification
” bindings ”:[{
” fromSources” : [{
”sourceType” : ”smartcity:WasteBin ” ,
” dataStream ” : [
{
”streamType” : ”smartcity:Temperature” ,
”ioId” : ”inTemperature_genid”
} ,
{
”streamType” : ”smartcity:FillLevel” ,
”ioId” : ”inLevel_genid”}
]
}] ,
” toDrains ” : [
{
”drainId” : ”badsmell36754” ,
”ioId” : ”smell_genid”
}]
}]
SIMPLE WILD-CARD
TEMPLATE BINDING
 Only relies on direct metadata match (stream
metadata)
 Assumes that all streams are available for
processing (sources are pre-defined)
 Applies to templates with one source only
FULL WILD-CARD TEMPLATE
BINDING
Works for templates involving multiple
sources
Selects involved sources on the basis of
a set of constrains
 In our vision such constraints will, for
example, be defined as SPARQL (or DL)
queries over the data streams semantic
metadata
SELECT ?S1,?S2 WHERE {S1 rdfs:Type smartcity:WasteBin . S2 rdfs:Type
smartcity:TemperatureSensor . S1 geo:hasGeometry ?S1geo . ?S1geo ogc:asWKT
?S1WKT . S2 geo:hasGeometry ?S2geo . . ?S2geo ogc:asWKT ?S2WKT FILTER
(ogcf:distance(?S1WKT , ?S2WKT , my:Meter) < 5)}
FULL WILD-CARD TEMPLATE
BINDING
RESULTS & FUTURE
WORKS
What next?
PRELIMINARY RESULTS
Defined
 Completely new JSON syntax for block
definitions
 Completely new JSON syntax for template
definitions
 Completely new JSON syntax for wildcard
template binding
Implemented
 RESTful service to
 Create/update/delete CEP queries including template
chains and wildcard instantiation of them
FUTURE WORKS
Implement SPARQL-based full wildcard
instantiation
Test wildcard generation algorithm
 Performance
 Overhead
 Possible missed matches / improvements
User interface
 Design graphical composition environment
(e.g., similar to Node-RED)
 User study
QUESTIONS?
Block-based Real-
time Big-Data
Processing for Smart
CitiesDario Bonino, bonino@ismb.it
The project is co-funded by the European Commission under grant
agreement 609081.
This presentation reflects solely the views of its authors. The
European Commission is not liable for any use that may be made of
the information contained therein

dfl

  • 1.
    BLOCK-BASED REAL-TIME BIG- DATAPROCESSING FOR SMART CITIES Introducing the ALMANAC Data Fusion Language Dario Bonino, Rizzo F., Pastrone C., Carvajal Soto J.A., Ahlsen M., Axling M.
  • 2.
    AGENDA  Background  Goalstatement  Basic Principles  Block-based Data Fusion Language  Templates  Wildcard Template binding  Preliminary Results  Conclusions & Future Works
  • 3.
    BACKGROUND Context in whichthe work is deployed and basic assumptions
  • 4.
    SMART CITIES  Dataintensive  Huge amount of sensors deployed  (>>1000 sensors / devices)  Moderately high sampling frequencies  (depends on applications)  Several actors involved  City administrators  Service providers  Utilities  Citizens
  • 5.
    DATA HANDLING Technology capable to Handle in real-time>> 1000 datapoints /s  E.g. Complex Event Processing (CEP)  Store and Handle >> 100k datapoints*  E.g. NoSQL and or Time Series Databases Requires programming by experts select avg(cast(waterMeter.resultValue, double)) from Observation.win:time(600 sec).std:unique(datastream.id) as waterMeter where (waterMeter.sensor.metadata like '%FlowSensor' or waterMeter.resultType='Flo')
  • 6.
    However  Knowledge about“relevant” processing is owned by not experts  Utility operators  City Administrators (low/mid level)  It often unfeasible/not scalable to  Set dedicated positions for CEP experts to address Smart City processing needs DATA HANDLING
  • 7.
    GOAL Enable not expertsto handle CEP in Smart Cities
  • 8.
    GOAL Enable Not Expertsto  Define Complex Event Processing queries with NO knowledge of any CEP language  And to some extent with no knowledge of the underlying theory  Support effectively Smart City deployment  Huge amount of data streams to process  Same “kind” of queries may possibly be applied to many (all) city sensors  e.g., data smoothing (moving average).  Single queries might involve a number of data streams not know a priori
  • 9.
  • 10.
    BLOK-BASED CEP (FORNOT EXPERTS) Based on previous work (SpChains)  Processing Block  Single, parametrized query  Cascading  Permits to concatenate simple blocks to obtain complex queries
  • 11.
    BLOK-BASED CEP (FORNOT EXPERTS) Pros  Easy to understand  Easy to compose complex queries  Almost no knowledge on CEP needed  Good performance  > 20k event/s for complex (>3 stages) queries  Peak performance around 150k event/s Cons  Applies to named streams only  Streams must be a priori known  Stream shall be “manually” selected  Difficult to scale to smart city scenarios
  • 12.
    BLOCK-BASED DATA FUSION LANGUAGE ExtendSpChains to support Smart City CEP
  • 13.
    BLOCK-BASED DATA FUSION LANGUAGE Basedon SpChains, introduces 2 extensions for smart city scenarios  Template  Special class of processing chain with “free” input / ouput (source/drain)  Wild-card template binding  Flexible algorithm to bind data sources and processing chain inputs  Carried when queries are deployed  Based on data stream metadata  Declarative description of “allowed streams”
  • 14.
    TEMPLATE Free parameters filledwhen The template is instantiated Can be seen as a way to define custom bloc
  • 15.
    TEMPLATE { ” id ”:” bad smell template ” , ” blocks ” : [ { ” id ” : ” Th1_genid ” , ” function ” :” threshold ” , ” params ” : [ {”name” : ” threshold ” , ” value ” : ”80” , ”uom” : ”%”}, {”name” : ”mode ” , ” value ” : ” rising”}] } , { ” i d ” : ” Th2_genid ” , ” function ” :” threshold ” , ” params ” : [ {”name” : ” threshold ” , ” value ” : ”35” , ”uom” : ”Celsius”} , {”name” : ”mode ” , ”value” : ”rising”}] } , { ”id ” : ”And_genid ” , ” function” : ”and ” } ], ”connections ” : [ { ”from ” : {”blockId” : ”Th1_genid” , ”ioId” : ”out”} , ”to” : {”blockId” : ”And_genid” , ”ioId” : ”in1”} } , { ”from” : {”blockId ” : ”Th2_genid” , ”ioId” : ”out”} , ”to” : {”blockId” : ”And_genid” , ”ioId” : ”in2”} }] , ”inputs ”: [ {”blockId” : ”Th1_genid” , ”port” : ”in” , ”ioId” : ”$inLevel ”} , {”blockId” : ”Th2_genid” , ”port” : ”in” , ”ioId” : ”$inTemperature”} ”outputs”: [ {”blockId” : ”And ” , ”port” : ”out” , ”ioId” : ”$smell ”}] } , ]
  • 16.
    WILD-CARD TEMPLATE BINDING Defines bindingsfor chain inputs (templates) at deployment time Two versions  Simple  Single source queries  Full  Multiple source queries Declarative definition in the “bindings” section of a chain specification ” bindings ”:[{ ” fromSources” : [{ ”sourceType” : ”smartcity:WasteBin ” , ” dataStream ” : [ { ”streamType” : ”smartcity:Temperature” , ”ioId” : ”inTemperature_genid” } , { ”streamType” : ”smartcity:FillLevel” , ”ioId” : ”inLevel_genid”} ] }] , ” toDrains ” : [ { ”drainId” : ”badsmell36754” , ”ioId” : ”smell_genid” }] }]
  • 17.
    SIMPLE WILD-CARD TEMPLATE BINDING Only relies on direct metadata match (stream metadata)  Assumes that all streams are available for processing (sources are pre-defined)  Applies to templates with one source only
  • 18.
    FULL WILD-CARD TEMPLATE BINDING Worksfor templates involving multiple sources Selects involved sources on the basis of a set of constrains  In our vision such constraints will, for example, be defined as SPARQL (or DL) queries over the data streams semantic metadata SELECT ?S1,?S2 WHERE {S1 rdfs:Type smartcity:WasteBin . S2 rdfs:Type smartcity:TemperatureSensor . S1 geo:hasGeometry ?S1geo . ?S1geo ogc:asWKT ?S1WKT . S2 geo:hasGeometry ?S2geo . . ?S2geo ogc:asWKT ?S2WKT FILTER (ogcf:distance(?S1WKT , ?S2WKT , my:Meter) < 5)}
  • 19.
  • 20.
  • 21.
    PRELIMINARY RESULTS Defined  Completelynew JSON syntax for block definitions  Completely new JSON syntax for template definitions  Completely new JSON syntax for wildcard template binding Implemented  RESTful service to  Create/update/delete CEP queries including template chains and wildcard instantiation of them
  • 22.
    FUTURE WORKS Implement SPARQL-basedfull wildcard instantiation Test wildcard generation algorithm  Performance  Overhead  Possible missed matches / improvements User interface  Design graphical composition environment (e.g., similar to Node-RED)  User study
  • 23.
    QUESTIONS? Block-based Real- time Big-Data Processingfor Smart CitiesDario Bonino, bonino@ismb.it The project is co-funded by the European Commission under grant agreement 609081. This presentation reflects solely the views of its authors. The European Commission is not liable for any use that may be made of the information contained therein