Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BigData
Semantic Approach to
Big Data and Event Processing
Mastering	the	Velocity	
Dimension	of	Big	Data	
Emanuele	Della	V...
BigData
Agenda	
•  It's	a	streaming	world	
•  Mastering	the	velocity	dimension	with	
informaEon	flow	processing	
•  A	model...
BigData
It's	a	streaming	world	…	
@manudellavalle		-		h?p://emanueledellavalle.org	 3
[…]	
•  Financial	markets	
•  Sensor...
BigData
…	looking	for	reacEve	answers	
@manudellavalle		-		h?p://emanueledellavalle.org	 4
[…]	
•  Based	on	the	last	secon...
BigData
Other	domains	
•  Intrusion	detecEon		
•  Fraud	DetecEon	
•  Emergency	Response	Services	
•  TransportaEon	and	Log...
BigData
Mastering	the	Velocity	dimension	with	
InformaEon	Flow	Processing	(IFP)	soluEons	
@manudellavalle		-		h?p://emanue...
BigData
Paradigm	Shias	Enabled	4/4	
Leverage	data	as	it	is	captured	
7/10/2015	 @manudellavalle		-		h?p://emanueledellaval...
BigData
Paradigm	Shias	Enabled	4/4	
Leverage	data	as	it	is	captured	
7/10/2015	 @manudellavalle		-		h?p://emanueledellaval...
BigData
IFP	-	Gartner	
The	Gartner	hype	cycle	
@manudellavalle		-		h?p://emanueledellavalle.org	 97/10/2015
BigData
IFP	-	Forrester	
Forrester’s	top	15	
emerging	tech	to	
watch:	
Now	to	2018	
@manudellavalle		-		h?p://emanueledell...
BigData
Is	there	a	market	beyond	hype?	
•  Complex	Event	Processing	
Market	Worth	$3,322M	by	
2018	
(2014	Report	by	
Marke...
BigData
DSMS/CEP	State	of	the	Art	
[source:TheForresterWave:
BigDataStreamingAnalyticsPlatforms,
Q32014]
7/10/2015	 @manud...
BigData
DSMS/CEP	State	of	the	Art	
•  InformaEca:	Vibe	data	stream	
–  h?ps://www.informaEca.com/products/data-integraEon/...
BigData
DSMS/CEP	State	of	the	Art	
•  Gianpaolo	Cugola,	Alessandro	Margara:	Processing	flows	of	
informaEon:	From	data	stre...
BigData
InformaEon	Flow	Processing	
•  The	IFP	engine	processes	incoming	flows	of	informa1on	according	to	a	set	of	
process...
BigData
IFP:	A	bit	of	history	of	two	approaches	
Traditional
DBMS
Active
DBMS
DSMS
Event-
based
Systems
CEP
@manudellavall...
BigData
From	Passive	to	AcEve	DBMSs	
•  Standard	DBMSs	
– Purely	passive:	Human-ac1ve	
database-passive	(HADP)	
– ExecuEon...
BigData
AcEve	DBMSs	
•  As	a	DBMS	extension	
– Rules	may	only	refer	to	the	internal	state	of	the	DB	
•  Closed	DB	applicaE...
BigData
Data	Stream	Management	Systems	(DSMS)	
•  Data	streams	are	(unbounded)	
sequences	of	Eme-varying	
data	elements	
•...
BigData
Data	Stream	Management	Systems	(DSMS)	
•  The	nature	of	streams	requires	a		
paradigmaEc	change*	
–  from	persiste...
BigData
ConEnuous	SemanEcs	
•  ConEnuous	queries	registered	over	streams	that	
are	observed	trough	windows	 window
input s...
BigData
Event-based	systems	
•  Components	collaborate	by	
exchanging	informaEon	about	
occurrent	events.	In	parEcular:	
–...
BigData
Complex	Event	Processing	(CEP)	
•  CEP	systems	adds	the	ability	to	deploy	rules	that	describe	how	
composite	event...
BigData
The	current	situaEon	
•  Back	in	2007	CEP	was	
already	a	hot	topic…	
•  …	but	having	a	good	grasp	
of	the	area	was...
BigData
The	current	situaEon	
•  Several	communiEes	
were	contribuEng	to	
the	area…	
•  …	each	bringing	its	own	
experEse	...
BigData
The	current	situaEon	
•  That	was	2007.	What	about	today?	
•  Things	did	not	change	much	
–  From	the	“Event	Proce...
BigData
The	goal	of	the	survey	
•  Define	a	modeling	framework	to	
–  compare	different	systems	in	a	precise	way	
–  compare...
BigData
The	InformaEon	Flow	Processing	domain	
•  The	IFP	engine	processes	incoming	flows	of	informa5on	
according	to	a	set...
BigData
One	framework,	several	models	
•  Different	models	to	capture	different	viewpoints	
– FuncEonal	model	
– Processing	...
BigData
FuncEonal	model	
Receiver Forwarder
Clock	
-----	
-----	-----	
-----	
-----	
-----	
-----	
-----	-----	
-----	
---...
BigData
A	short	digression	
•  We	assume	rules	can	be	(logically)	
decomposed	in	two	parts:	C	→	A	
–  C	is	the	condi1on	
–...
BigData
FuncEonal	model	
Receiver Forwarder
Clock	
-----	
-----	-----	
-----	
-----	
-----	
-----	
-----	-----	
-----	
---...
BigData
FuncEonal	model:	ConsideraEons	
•  The	detecEon-producEon	cycle	
–  Fired	by	a	new	item	I	entering	the	engine	thro...
BigData
FuncEonal	model:	ConsideraEons	
•  Maximum	length	of	Seq	a	key	aspect	
–  1	≈	PubSub	
–  Bounded	⇒		
•  CQL	like	l...
BigData
FuncEonal	model:	ConsideraEons	
(see	previous	slide)	
–  Presence	of	the	Knowledge	base	
•  Only	available	in	syst...
BigData
The	semanEcs	of	processing	
•  What	determines	the	output	of	each	detecEon-producEon	cycle?	
–  The	new	item	enter...
BigData
Processing	model	
•  Three	policies	affect	the	behavior	of	the	system	
– The	selec1on	policy	
– The	consump1on	poli...
BigData
SelecEon	policy	
•  Determines	if	a	rule	fires	once	or	mulEple	Emes	
and	the	items	actually	selected	from	the	Histo...
BigData
SelecEon	policy:	ConsideraEons	
•  Most	systems	adopt	a	mulEple	selecEon	policy	
–  It	is	simpler	to	implement	
– ...
BigData
SelecEon	policy:	The	TESLA	case	
•  TESLA	(Trio-based	Event	SpecificaEon	Language):	the	T-Rex	language	
–  A	rule	l...
BigData
ConsumpEon	policy	
•  Determines	how	the	history	changes	aaer	firing	of	
a	rule	⇒	what	happens	when	new	items	enter...
BigData
ConsumpEon	policy:	ConsideraEons	
•  Most	systems	couple	a	mulEple	selecEon	policy	with	a	zero	
consumpEon	policy	...
BigData
ConsumpEon	policy:	The	TESLA	case	
•  Zero	consumpEon	policy	
–  define Fire(area: string, measuredTemp: double)
f...
BigData
Load	shedding	policy	
•  Problem:	How	to	manage	bursts	of	input	data	
•  It	may	seem	a	system	issue	
–  i.e.,	an	i...
BigData
Deployment	model	
•  IFP	applicaEons	may	
include	a	large	number	of	
sources	and	sinks	
–  Possibly	dispersed	over...
BigData
Deployment	model	
Centralized	
Distributed	
Clustered	 Networked	
Sources	
IFP	Engine	
Informa1on	Flows	 Informa1o...
BigData
Deployment	Model	
•  Most	exisEng	systems	adopt	a	centralized	soluEon	
•  When	distributed	processing	is	allowed,	...
BigData
Deployment	model	
•  AutomaEc	distribuEon	of	
processing	introduces	the	
operator	placement	problem	
•  Given	a	se...
BigData
Operator	placement	
•  The	operator	placement	problem	is	sEll	open	
– Several	proposals	
•  Oaen	adopEng	technique...
BigData
More	on	deployment	model	
•  Operator	placement	is	only	part	of	the	problem	
•  Other	issues	
– How	to	build	the	n...
BigData
Deployment	model	and	dynamics	
•  How	to	cope	with	mobile	nodes?	
– Mobile	sinks	and	sources…	
– …but	also	mobile	...
BigData
InteracEon	Model	
•  It	is	interesEng	to	study	the	characterisEcs	of	the	
interacEons	among	the	main	component	of	...
BigData
InteracEon	Model	
Sources	 Sinks	IFP	Engine	
•  Push	
•  Pull	
Observation Model
•  Push	
•  Pull	
Forwarding Mode...
BigData
Time	Model	
•  RelaEonship	between	informaEon	items	and	
passing	of	Eme	
•  Ability	of	an	IFP	system	to	associate	...
BigData
Stream-Only	Time	Model	
•  Used	in	original	DSMSs	
•  Timestamps	may	be	present	
or	not	
•  When	present,	they	are...
BigData
Causal	Time	Model	
•  Each	item	has	a	label	
reflecEng	some	kind	of	
causal	relaEonship	
•  ParEal	order	
•  E.g.	R...
BigData
Absolute	Time	Model	
•  InformaEon	items	have	
an	associated	
Emestamp	
•  Defining	a	single	point	
in	Eme	w.r.t.	a...
BigData
Interval	Time	Model	
•  Used	for	events	to	include	“duraEon”	
– SnoopIB,	Cayuga,	NextCEP,	…	
•  At	a	first	sight,	i...
BigData
Interval	Time	Model	
•  Which	is	the	immediate	
successor	of	A?	
–  Choose	according	to	end	Eme	only:	
B	
•  But	i...
BigData
Interval	Time	Model	
•  “What	is	“Next”	in	event	processing?”	by	White	et.	Al	
–  Proposes	a	number	of	desired	pro...
BigData
Data	Model	
•  Studies	how	the	different	
systems	
–  Represent	single	data	items	
–  Organize	them	into	data	
flows...
BigData
Nature	of	Items	
•  The	meaning	we	associate	to	
informaEon	items	
–  Generic	data	
–  Event	noEficaEons	
•  Deeply...
BigData
Nature	of	Items	
CQL/Stream	(Generic	Data)	
Select IStream(*)
From F1[Rows 5],
F2[Range 1 Minute]
Where F1.A = F2....
BigData
Format	of	Items	
•  How	informaEon	is	
represented	
•  Influences	the	way	items	
are	processed	
–  E.g.,	RelaEonal	...
BigData
Support	for	Uncertainty	
•  Ability	to	associate	a	degree	of	
uncertainty	to	informaEon	items	
–  To	the	content	o...
BigData
Data	Flows	
•  Homogeneous	
–  Each	flow	contains	data	with	the	
same	format	and	“kind”	
•  E.g.	Tuples	with	idenEc...
BigData
Rule	Model	
•  Rules	are	much	more	complex	
enEEes	than	data	items	
•  Large	number	of	different	
approaches	
–  Al...
BigData
Transforming	Rules	
•  Do	not	present	an	explicit	
disEncEon	between	detecEon	
and	producEon	
•  Define	an	execuEon...
BigData
DetecEng	Rules	
•  Present	an	explicit	
disEncEon	between	
detecEon	and	producEon	
•  Usually,	the	detecEon	is	
ba...
BigData
Support	for	Uncertainty	
•  Two	orthogonal	aspects	
–  Support	for	uncertain	input	
•  Allows	rules	to	deal	with/
...
BigData
Language	Model	
•  Specify	operaEons	to	
–  Filter	
–  Join	
–  Aggregate	
•  input	flows	…	
•  …	to	produce	one	or...
BigData
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	langua...
BigData
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	langua...
BigData
ImperaEve	Languages	
Aurora (Boxes & Arrows Model)
74G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/co...
BigData
Hybrid	Languages	
Oracle CEP
75G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/20...
BigData
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	langua...
BigData
DetecEng	Languages	
TESLA / T-Rex
Define Fire(area: string, measuredTemp: double)
From Smoke(area=$a) and last
Tem...
BigData
Language	Model	
•  Different	syntaxes	/	constructs	/	operators	
•  Comparison	of	languages	semanEcs	and	
expressive...
BigData
Language	Model	
•  Single-Item	operators	
–  SelecEon	operators	
•  Filter	items	according	to	their	
content	
–  E...
BigData
Language	Model	
•  Single-Item	operators	
–  SelecEon	operators	
•  Filter	items	according	to	their	
content	
–  E...
BigData
Language	Model	
•  Logic	Operators	
–  ConjuncEon	
–  DisjuncEon	
–  RepeEEon	
–  NegaEon	
•  Explicitly	present	i...
BigData
Language	Model	
•  Logic	Operators	
–  ConjuncEon	
–  DisjuncEon	
–  RepeEEon	
–  NegaEon	
•  Some	logic	operators...
BigData
Language	Model	
•  Logic	Operators	
–  ConjuncEon	
–  DisjuncEon	
–  RepeEEon	
–  NegaEon	
•  Tradi1onally,	logic	...
BigData
Language	Model	
•  Sequences	
–  Similar	to	logic	operators	
–  Based	on	Eming	
relaEons	among	items	
•  Present	i...
BigData
Language	Model	
•  Sequences	
–  Similar	to	logic	operators	
–  Based	on	Eming	
relaEons	among	items	
•  Tradi1ona...
BigData
Language	Model	
•  IteraEons	
– Express	possibly	unbounded	sequences	of	items	…	
– …	saEsfying	an	itera1ng	condiEo...
BigData
Language	Model	
•  Logic	operators,	
sequences,	and	
iteraEons	tradi1onally	
not	offered	by	
transforming	languages...
BigData
Language	Model	
•  Windows	
– Kind:	
•  Logical	(Time-Based)	
•  Physical	(Count-
Based)	
•  User-Defined	
Logical
...
BigData
Language	Model	
•  Windows	are	used	to	limit	the	scope	of	blocking	operators	
•  They	are	generally	available	in	d...
BigData
Language	Model	
•  Windows	movement	
–  Fixed:	do	not	move	at	all	
–  Landmark:	have	a	fixed	lower	bound,	while	the...
BigData
Language	Model	
•  Flow	management	operators	
–  Required	by	declaraEve	and	imperaEve	languages	to	
merge,	split,	...
BigData
Language	Model	
•  ParameterizaEon	
–  Allows	the	binding	of	
different	informaEon	items	
based	on	their	content	
–...
BigData
Language	Model	
Aggregates
•  DetecEon	Aggregates	
•  ProducEon	Aggregates	
Scope
Definition
•  Predefined	
•  User...
BigData
Credits	
•  These	slides	are	parEally	based	on	"A	modeling	
framework	for	DSMS	and	CEP"	by	G.	Cugola	and	
A.	Marga...
BigData
Semantic Approach to
Big Data and Event Processing
Thank	you!	
Any	QuesEon?	
Emanuele	Della	Valle	
DEIB	-	Politecn...
Upcoming SlideShare
Loading in …5
×

Mastering the Velocity Dimension of Big Data

240 views

Published on

Mastering the Velocity Dimension of Big Data
Prof Emanuele Della Valle - DEIB Politecnico di Milano

Published in: Data & Analytics
  • Be the first to comment

Mastering the Velocity Dimension of Big Data

  1. 1. BigData Semantic Approach to Big Data and Event Processing Mastering the Velocity Dimension of Big Data Emanuele Della Valle DEIB - Politecnico di Milano @manudellavalle emanuele.dellavalle@polimi.it h?p://emanueledellavalle.org
  2. 2. BigData Agenda •  It's a streaming world •  Mastering the velocity dimension with informaEon flow processing •  A modeling framework for DSMS and CEP –  FuncEonal model –  Processing model –  Deployment model –  InteracEon model –  Data model –  Time model –  Rule model –  Language model @manudellavalle - h?p://emanueledellavalle.org 27/10/2015
  3. 3. BigData It's a streaming world … @manudellavalle - h?p://emanueledellavalle.org 3 […] •  Financial markets •  Sensor networks •  Social networks •  Generate data streams! 7/10/2015
  4. 4. BigData … looking for reacEve answers @manudellavalle - h?p://emanueledellavalle.org 4 […] •  Based on the last seconds of transacEons, what shall I buy/sell now •  Shall I keep drilling based on the last sensor observaEons? •  Which are the top hashtags in the last few minutes? •  Require conEnuous processing and reacEve answer 7/10/2015
  5. 5. BigData Other domains •  Intrusion detecEon •  Fraud DetecEon •  Emergency Response Services •  TransportaEon and LogisEcs •  Supply Chain OpEmizaEon •  System monitoring •  Click inspecEon •  ... @manudellavalle - h?p://emanueledellavalle.org 57/10/2015
  6. 6. BigData Mastering the Velocity dimension with InformaEon Flow Processing (IFP) soluEons @manudellavalle - h?p://emanueledellavalle.org 67/10/2015
  7. 7. BigData Paradigm Shias Enabled 4/4 Leverage data as it is captured 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org 7 [source: Marc Andrews, 2014]
  8. 8. BigData Paradigm Shias Enabled 4/4 Leverage data as it is captured 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org 8 [source: Marc Andrews, 2014]
  9. 9. BigData IFP - Gartner The Gartner hype cycle @manudellavalle - h?p://emanueledellavalle.org 97/10/2015
  10. 10. BigData IFP - Forrester Forrester’s top 15 emerging tech to watch: Now to 2018 @manudellavalle - h?p://emanueledellavalle.org 107/10/2015
  11. 11. BigData Is there a market beyond hype? •  Complex Event Processing Market Worth $3,322M by 2018 (2014 Report by MarketsandMarkets) •  Major players include: –  Microsoa –  IBM –  Oracle –  SAP –  Tibco –  .... @manudellavalle - h?p://emanueledellavalle.org 117/10/2015
  12. 12. BigData DSMS/CEP State of the Art [source:TheForresterWave: BigDataStreamingAnalyticsPlatforms, Q32014] 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org 12
  13. 13. BigData DSMS/CEP State of the Art •  InformaEca: Vibe data stream –  h?ps://www.informaEca.com/products/data-integraEon/real-Eme-integraEon/vibe-data-stream.html •  SAP: Event Stream Processor –  h?p://www.sap.com/pc/tech/database/soaware/sybase-complex-event-processing/index.html •  Soaware AG: Intelligent Business OperaEons –  h?p://www.soawareag.com/corporate/products/apama_webmethods/ •  SQL Stream: blaze –  h?p://www.sqlstream.com/blaze/ •  Tibco Complex Event Processing –  h?p://www.Ebco.com/products/event-processing/complex-event-processing/ •  Vitra OperaEonal Intelligence –  h?p://www.vitria.com/products/operaEonal-intelligence 1/12/2014 h?p://emanueledellavalle.org 13
  14. 14. BigData DSMS/CEP State of the Art •  Gianpaolo Cugola, Alessandro Margara: Processing flows of informaEon: From data stream to complex event processing. ACM Comput. Surv. 44(3): 15 (2012) •  Content –  Type of models compared •  FuncEonal and processing •  Deployment and interacEons •  Data, Time, and Rule •  Language –  # of systems surveyed: •  Academic: 24 •  Industrial: 9 •  Total: 33 –  To learn more: •  h?p://home.dei.polimi.it/margara/papers/survey.pdf 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org 14
  15. 15. BigData InformaEon Flow Processing •  The IFP engine processes incoming flows of informa1on according to a set of processing rules –  Processing is “on line” •  Sources produce the incoming informaEon flows, sinks consume the results of processing, rule managers add or remove rules •  InformaEon flows are composed of informa1on items –  Items part of the same flow are neither necessarily ordered nor of the same kind •  Processing involve filtering, combining, and aggregaEng flows, item by item as they enter the engine Sources Sinks IFP Engine Informa1on Flows Informa1on Flows ----- ----- ----- ----- Rules ----- ----- ----- ----- Rule managers @manudellavalle - h?p://emanueledellavalle.org 157/10/2015
  16. 16. BigData IFP: A bit of history of two approaches Traditional DBMS Active DBMS DSMS Event- based Systems CEP @manudellavalle - h?p://emanueledellavalle.org 167/10/2015
  17. 17. BigData From Passive to AcEve DBMSs •  Standard DBMSs – Purely passive: Human-ac1ve database-passive (HADP) – ExecuEon happens only when asked by clients (through queries) •  AcEve DBMSs – The reacEve behavior moves (in part) from the applicaEon to the DB layer… – …which executes Event CondiEon AcEon (ECA) rules @manudellavalle - h?p://emanueledellavalle.org 177/10/2015
  18. 18. BigData AcEve DBMSs •  As a DBMS extension – Rules may only refer to the internal state of the DB •  Closed DB applicaEons – Rules may support the semanEcs of the applicaEon, but external sources of events are not allowed – But events may come from external sources … •  Open DB applicaEons – Events may come from external sources @manudellavalle - h?p://emanueledellavalle.org 187/10/2015
  19. 19. BigData Data Stream Management Systems (DSMS) •  Data streams are (unbounded) sequences of Eme-varying data elements •  Represent: – an (almost) “conEnuous” flow of informaEon – with the recent informaEon being more relevant as it describes the current state of a dynamic system Eme @manudellavalle - h?p://emanueledellavalle.org 197/10/2015
  20. 20. BigData Data Stream Management Systems (DSMS) •  The nature of streams requires a paradigmaEc change* –  from persistent data •  one Eme semanEcs –  to transient data •  conEnuous * This paradigmaEc change first arose in DB community in the late '90s 20 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org
  21. 21. BigData ConEnuous SemanEcs •  ConEnuous queries registered over streams that are observed trough windows window input streams streams of answerRegistered ConEnuous Query Dynamic System 21 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org
  22. 22. BigData Event-based systems •  Components collaborate by exchanging informaEon about occurrent events. In parEcular: –  Components publish noEficaEons about the events they observe, or –  they subscribe to the events they are interested to be noEfied about •  CommunicaEon is: –  Purely message based –  Asynchronous –  MulEcast –  Implicit –  Anonymous topic=fire* & place=* topic=* & place=1st floor topic=fire alarm & place=* fire alarm at 1st floor fire alarm at 1st floor fire alarm at 1st floor fire alarm at 1st floor fire training at 1st floor fire training at 1st floor fire training at 1st floor @manudellavalle - h?p://emanueledellavalle.org 227/10/2015
  23. 23. BigData Complex Event Processing (CEP) •  CEP systems adds the ability to deploy rules that describe how composite events can be generated from primiEve (or composite) ones •  Typical CEP rules search for sequences of events –  Raise C if A→B •  Time is a key aspect in CEP Rules ----- ----- ----- ----- @manudellavalle - h?p://emanueledellavalle.org 237/10/2015
  24. 24. BigData The current situaEon •  Back in 2007 CEP was already a hot topic… •  … but having a good grasp of the area was rather hard •  As observed by Opher Etzion the area was looking like the “Tower of Babel” –  Event Processing and the Babylon Tower – Event process thinking blog – Sept. 8, 2007 event data stream message flow publish subscribe noEfy adverEse pa?ern sequence primiEve complex composite join send receive middleware system applicaEon protocol rouEng network query rule condiEon acEon 24G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  25. 25. BigData The current situaEon •  Several communiEes were contribuEng to the area… •  … each bringing its own experEse and vocabulary… •  …but oaen working in isolaEon event data stream message flow publish subscribe noEfy adverEse pa?ern sequence primiEve complex composite join send receive middleware system applicaEon protocol rouEng network query rule condiEon acEon ad-hoc tools (intrusion det., …) data management & databases DEBS process modeling & automaEon DBMSs applicaEon servers middleware systems Researchers Tool vendors 25G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  26. 26. BigData The current situaEon •  That was 2007. What about today? •  Things did not change much –  From the “Event Process Thinking” blog [Which is] the relaEon between event processing and data stream management? 1.  They are aliases -- stream is just a collecEon of events, likewise, an event is just a member in a stream, and the funcEonality is the same 2.  Stream management is a subset of event processing -- there are different ways to do event processing, streams is one of them 3.  Event processing is a subset of stream management -- event streams is just one type of stream, but there are voice stream, video stream, … 4.  Event processing and stream management are disEnct and there is no overlapping between them •  At the same Eme tool vendors are building tools that try to combine ¿ juxtapose ? different approaches 26G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  27. 27. BigData The goal of the survey •  Define a modeling framework to –  compare different systems in a precise way –  compare different approaches in a precise way –  help people coming from different areas communicate and compare their work with others –  isolate the open issues from those already solved –  precisely iden5fy the challenges –  isolate the best part of the various approaches –  … finding a way to combine them •  7/10/2015 G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 27 G. Cugola, A. Margara: "Processing Flows of Informa5on: From Data Stream to Complex Event Processing”. ACM Compu)ng Surveys, 44(3), ACM Press, June 2012
  28. 28. BigData The InformaEon Flow Processing domain •  The IFP engine processes incoming flows of informa5on according to a set of processing rules •  Sources produce the incoming informaEon flows, sinks consume the results of processing, rule managers add or remove rules •  InformaEon flows are composed of informa5on items –  Items part of the same flow are neither necessarily ordered nor of the same kind –  Processing involve filtering, combining, and aggregaEng flows, item by item as they enter the engine 7/10/2015 G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 28 Sources Sinks IFP Engine Informa1on Flows Informa1on Flows ----- ----- ----- ----- Rules ----- ----- ----- ----- Rule managers
  29. 29. BigData One framework, several models •  Different models to capture different viewpoints – FuncEonal model – Processing model – Deployment model – InteracEon model – Time model – Data model – Rule model – Language model 29G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  30. 30. BigData FuncEonal model Receiver Forwarder Clock ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Rules Knowledge base •  Implements the transport protocol to move information items along the net •  Acts as a demultiplexer •  Implements the transport protocol to move information items along the net •  Acts as a multiplexer 30G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  31. 31. BigData A short digression •  We assume rules can be (logically) decomposed in two parts: C → A –  C is the condi1on –  A is the ac1on •  Example (in CQL): Select IStream(Count(*)) From F1 [Range 1 Minute] Where F1.A > 0 •  This way we can split processing in two phases: –  The detec1on phase determines the items that trigger the rule –  The produc1on phase use those items to produce the output of the rule Receiver Forwarder Clock ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Rules Knowledge base condiEon acEon 31G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  32. 32. BigData FuncEonal model Receiver Forwarder Clock ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Rules Knowledge base •  Implements the detection phase •  Accumulates partial results into the history •  When a rule fires passes to the producer its action part and the triggering items•  Implements the production phase •  Uses the items in Seq as stated in action A •  Some systems allow rules to be added or removed at processing time •  Some systems allows rules to combine flowing items with items previously stored into a (read only) storage •  If present models the ability of performing recursive processing building hierarchies of items •  Optional component •  Periodically creates special information items holding current time •  Its presence models the ability of performing periodic processing of inputs 32G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  33. 33. BigData FuncEonal model: ConsideraEons •  The detecEon-producEon cycle –  Fired by a new item I entering the engine through the Receiver •  Including those periodically produced by the Clock, if present –  DetecEon phase: Evaluates all the rules to find those enabled •  Using item I plus the data into the Knowledge base, if present •  The item I can be accumulated into the History for parEally enabled rules •  The acEon part of the enabled rules together with the triggering items (A+Seq) is passed to the producer –  ProducEon phase: Produces the output items •  Combining the items that triggered the rule with data present in the Knowledge base, if present •  New items are sent to subscribed sinks (through the Forwarder)… •  …but they could also be sent internally to be processed again (recursive processing) •  In some systems the acEon part of fired rules may also change the set of deployed rules 33G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  34. 34. BigData FuncEonal model: ConsideraEons •  Maximum length of Seq a key aspect –  1 ≈ PubSub –  Bounded ⇒ •  CQL like languages without Eme based windows •  Pa?ern based languages without a Kleene+ operator •  Other key aspects that impact expressiveness –  Presence of the Clock •  Models the ability to process rules periodically •  Available in almost half of the systems reviewed •  Most AcEve DBMSs and DSMSs but few CEP systems (see next slide) Receiver Forwarder Clock ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Rules Knowledge base 34G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  35. 35. BigData FuncEonal model: ConsideraEons (see previous slide) –  Presence of the Knowledge base •  Only available in systems coming from the database community –  Presence of the looping flow exiEng the Producer •  Models the ability of performing recursive processing •  Half CEP systems have it •  All AcEve DBMSs but very few DSMSs have it –  They have nested rules –  Support to dynamic rule change •  Few systems support it •  Can be implemented externally… –  Through sinks acEng also as rule managers •  …but we think it is nice to have it internally Receiver Forwarder Clock ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Rules Knowledge base 35G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  36. 36. BigData The semanEcs of processing •  What determines the output of each detecEon-producEon cycle? –  The new item entering the engine –  The set of deployed rules –  The items stored into the History –  The content of the Knowledge Base •  Is this enough? •  Example (in Padres and CQL): –  Smoke && Temp>50 –  Select IStream(Smoke.area) From Smoke[Rows 30 Slide 10], Temp[Rows 50 Slide 5] Where Smoke.area = Temp.area AND Temp.value > 50 Rules ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Knowledge base ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- History History History Knowledge base 36G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  37. 37. BigData Processing model •  Three policies affect the behavior of the system – The selec1on policy – The consump1on policy – The load shedding policy 37G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  38. 38. BigData SelecEon policy •  Determines if a rule fires once or mulEple Emes and the items actually selected from the History •  Example: Receiver Decider A A A A B ? A ∧B A0 A1 A0 B R A1 B R A0 B R A1 B R single mulEple or 38G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  39. 39. BigData SelecEon policy: ConsideraEons •  Most systems adopt a mulEple selecEon policy –  It is simpler to implement –  Is it adequate? •  Example: Alert fire when smoke and high temperature in a short Eme frame –  If 10 sensors read high temperature and immediately aaerward one detects smoke I would like to receive a single alert, not 10 •  A few systems allow this policy to be programmed… •  …some of them on a per-rule base –  E.g., Amit, T-Rex 39G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  40. 40. BigData SelecEon policy: The TESLA case •  TESLA (Trio-based Event SpecificaEon Language): the T-Rex language –  A rule language for CEP. Tries to combine expressiveness and efficiency –  Has a formally defined semanEcs •  Expressed in Trio, a Metric Temporal Logic (see DEBS 2010) •  Allows rule managers to choose their own selecEon policy on a per rule base –  Example: MulEple selecEon define Fire(area: string, measuredTemp: double) from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smoke where area=Smoke.area and measuredTemp=Temp.value –  Example: Single selecEon define Fire(area: string, measuredTemp: double) from Smoke(area=$a) and last Temp(area=$a and val>50) within 1min. from Smoke where area=Smoke.area and measuredTemp=Temp.val •  Alternatively you may use: •  first…within •  n-first…within n-last…within 40G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  41. 41. BigData ConsumpEon policy •  Determines how the history changes aaer firing of a rule ⇒ what happens when new items enter the Decider •  Example: Receiver Decider A A B ? A ∧B A B R selected zero A B R 41G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  42. 42. BigData ConsumpEon policy: ConsideraEons •  Most systems couple a mulEple selecEon policy with a zero consumpEon policy –  This is the common case with DSMSs, which use (sliding) windows to select relevant events •  Example (in CQL) Select IStream(Smoke.area) From Smoke[Range 1 min], Temp[Range 1 min] Where Smoke.area = Temp.area AND Temp.val > 50 •  The systems that allow the selecEon policy to be programmed oaen allow the consumpEon policy to be programmed, too –  E.g., Amit, T-Rex 42G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  43. 43. BigData ConsumpEon policy: The TESLA case •  Zero consumpEon policy –  define Fire(area: string, measuredTemp: double) from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smoke where area=Smoke.area and measuredTemp=Temp.value •  Selected consumpEon policy –  define Fire(area: string, measuredTemp: double) from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smoke where area=Smoke.area and measuredTemp=Temp.value consuming Temp T T T T S Fire! Fire! Fire! Fire! S Fire! Fire! Fire! Fire! T T T T S Fire! Fire! Fire! Fire! T T T T S 43G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  44. 44. BigData Load shedding policy •  Problem: How to manage bursts of input data •  It may seem a system issue –  i.e., an issue to solve into the Receiver •  But it strongly impacts the results produced –  i.e., the “semanEcs” of the rules •  Accordingly, some systems allows this issue to be determined on a per-rule basis –  e.g., Aurora allows rules to specify the expected QoS and sheds input to stay within limits with the available resources –  Conceptually the issue is addressed into the decider Receiver Forwarder Clock ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Decider History History History Producer A A A Seq Rules Knowledge base 44G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  45. 45. BigData Deployment model •  IFP applicaEons may include a large number of sources and sinks –  Possibly dispersed over a wide geographical area •  It becomes important to consider the deployment architecture of the engine –  How the components of the funcEonal model can be distributed to achieve scalability 45G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  46. 46. BigData Deployment model Centralized Distributed Clustered Networked Sources IFP Engine Informa1on Flows Informa1on Flows ----- ----- ----- ----- Rules ----- ----- ----- ----- Rule managers Sinks 46G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  47. 47. BigData Deployment Model •  Most exisEng systems adopt a centralized soluEon •  When distributed processing is allowed, it is usually based on clustered soluEons •  A few systems have recognized the importance of networked deployment for some applicaEons – E.g. Microsoa StreamInsight (part of SQLServer) •  Filtering near sources •  AggregaEon and correlaEon in-network •  AnalyEcs and historical data in a centralized server/cluster •  In most cases, deployment/configuraEon is not automaEc 47G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  48. 48. BigData Deployment model •  AutomaEc distribuEon of processing introduces the operator placement problem •  Given a set of rules (composed of operators) and a set of nodes –  How to split the processing load –  How to assign operators to available nodes •  In other words (Event Processing in AcEon) –  Given an event processing network –  How to map it onto the physical network of nodes 48G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  49. 49. BigData Operator placement •  The operator placement problem is sEll open – Several proposals •  Oaen adopEng techniques coming from the OperaEonal Research – Difficult to compare soluEons and results •  Even in its simplest form the problem is NP-hard •  more in the "operator placement problem" lecture of the PhD course on "Stream and Complex Event Processing" offered by Politecnico di Milano in 2015 h?p://www.streamreasoning.org/TR/2015/scep/corso_do?_ifp_operatorPlacement_2015.pdf 49G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  50. 50. BigData More on deployment model •  Operator placement is only part of the problem •  Other issues – How to build the network of nodes? – How to maintain it? – How to gather the informaEon required to solve the operator placement problem? – How to actually “place” the operators? – How to “replace” them when the situaEon changes? •  New rules added, old rules removed… •  …new sources/sinks 50G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  51. 51. BigData Deployment model and dynamics •  How to cope with mobile nodes? – Mobile sinks and sources… – …but also mobile “processors” •  The issue is relevant – We leave in a mobile world •  Very few proposals •  A lot of work in the area of pure publish/subscribe – Several works published in DEBS, not to menEon other major conferences/journals •  May we reuse some of this work? 51G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  52. 52. BigData InteracEon Model •  It is interesEng to study the characterisEcs of the interacEons among the main component of an IFP system –  Who starts the communicaEon? Sources Sinks IFP Engine 52G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  53. 53. BigData InteracEon Model Sources Sinks IFP Engine •  Push •  Pull Observation Model •  Push •  Pull Forwarding Model •  Push •  Pull Notification Model 53G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  54. 54. BigData Time Model •  RelaEonship between informaEon items and passing of Eme •  Ability of an IFP system to associate some kind of happened-before (ordering) relaEonship to informaEon items •  We idenEfied 4 classes: 1.  Stream-only 2.  Causal 3.  Absolute 4.  Interval 54G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  55. 55. BigData Stream-Only Time Model •  Used in original DSMSs •  Timestamps may be present or not •  When present, they are used only to order items before entering the engine, then they are forgo?en •  They are not exposed to the language –  With the excepEon of windowing constructs •  Ordering in output streams is conceptually separate from the ordering in input streams CQL/Stream Select DStream(*) From F1[Rows 5], F2[Range 1 Minute] Where F1.A = F2.A Relational Tables Stream Stream S2R R2S R2R 55G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  56. 56. BigData Causal Time Model •  Each item has a label reflecEng some kind of causal relaEonship •  ParEal order •  E.g. Rapide –  An event is causally ordered aaer all events that led to its occurrence Gigascope Select count(*) From A, B Where A.a-1 <= B.b and A.a+1 > B.b A.a, B.b monotonically increase 56G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  57. 57. BigData Absolute Time Model •  InformaEon items have an associated Emestamp •  Defining a single point in Eme w.r.t. a (logically)unique clock –  Total order •  Timestamps are fully exposed to the language •  InformaEon items can be Emestamped at source or entering the engine TESLA/T-Rex Define Fire(area: string, measuredTemp: double) From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=Smoke.area and measuredTemp=Temp.value 57G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  58. 58. BigData Interval Time Model •  Used for events to include “duraEon” – SnoopIB, Cayuga, NextCEP, … •  At a first sight, it is a simple extension of the absolute Eme model – Timestamps with two values: start Eme and end Eme •  However, it opens many issues – What is the successor of an event? – What is the Emestamp associated to a composite event? 58G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  59. 59. BigData Interval Time Model •  Which is the immediate successor of A? –  Choose according to end Eme only: B •  But it started before A! –  Exclude B: C, D •  Both of them? •  Which of them? –  No other event strictly between A and its successor: C, D, E •  Seems a natural definiEon •  Unfortunately we loose associaEvity! –  Xà(YàZ) ≠ (X àY)àZ •  May impede some rule rewriEng for processing opEmizaEons A B C D E 59G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  60. 60. BigData Interval Time Model •  “What is “Next” in event processing?” by White et. Al –  Proposes a number of desired properEes to be saEsfied by the “Next” funcEon –  There is one model that saEsfies them all •  Complete History –  It is not sufficient to encode Emestamps using a couple of values –  Timestamps of composite events must embed the Emestamps of all the events that led to their occurrence –  Possibly, Emestamps of unbounded size •  In case of unbounded Seq 60G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  61. 61. BigData Data Model •  Studies how the different systems –  Represent single data items –  Organize them into data flows Data •  Generic Data •  Event NoEficaEons •  Records •  Tuples •  Objects •  … Data Items Nature of Items Format Support for Uncertainty Data Flows •  Homogeneous •  Heterogeneous 61G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  62. 62. BigData Nature of Items •  The meaning we associate to informaEon items –  Generic data –  Event noEficaEons •  Deeply influences several other aspects of an IFP system –  Time model !!! –  Rule language –  SemanEcs of processing •  Heritage of the heterogeneous backgrounds of different communiEes Data •  Generic Data •  Event NoEficaEons •  Records •  Tuples •  Objects •  … Data Items Nature of Items Format Support for Uncertainty Data Flows •  Homogeneous •  Heterogeneous 62G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  63. 63. BigData Nature of Items CQL/Stream (Generic Data) Select IStream(*) From F1[Rows 5], F2[Range 1 Minute] Where F1.A = F2.A TESLA/T-Rex (Event No5fica5ons) Define Fire (area: string, measuredTemp: double) From Smoke(area=$a)and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=Smoke.area and measuredTemp=Temp.value Relational Tables Stream Stream S2R R2S R2R 63G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  64. 64. BigData Format of Items •  How informaEon is represented •  Influences the way items are processed –  E.g., RelaEonal model requires tuples Data •  Generic Data •  Event NoEficaEons •  Records •  Tuples •  Objects •  … Data Items Nature of Items Format Support for Uncertainty Data Flows •  Homogeneous •  Heterogeneous 64G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  65. 65. BigData Support for Uncertainty •  Ability to associate a degree of uncertainty to informaEon items –  To the content of items •  Imprecise temperature reading –  To the presence of an item (occurrence of an event) •  Spurious RFID reading •  When present, probabilisEc informaEon is usually exploited in rules during processing Data •  Generic Data •  Event NoEficaEons •  Records •  Tuples •  Objects •  … Data Items Nature of Items Format Support for Uncertainty Data Flows •  Homogeneous •  Heterogeneous 65G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  66. 66. BigData Data Flows •  Homogeneous –  Each flow contains data with the same format and “kind” •  E.g. Tuples with idenEcal structure –  Oaen associated with “database-like” rule languages •  Heterogeneous –  InformaEon flows are seen as channels connecEng sources, processors, and sinks –  Each channel may transport items with different kind and format Data •  Generic Data •  Event NoEficaEons •  Records •  Tuples •  Objects •  … Data Items Nature of Items Format Support for Uncertainty Data Flows •  Homogeneous •  Heterogeneous 66G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  67. 67. BigData Rule Model •  Rules are much more complex enEEes than data items •  Large number of different approaches –  Already observed in the previous slides •  Looking back to our funcEonal model, we classify them into two macro classes –  Transforming rules –  DetecEng rules Rule •  Transforming Rules •  DetecEng Rules Type of Rules Support for Uncertainty 67G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  68. 68. BigData Transforming Rules •  Do not present an explicit disEncEon between detecEon and producEon •  Define an execuEon plan combining primi1ve operators •  Each operator transforms one or more input flows into one or more output flows •  The execuEon plan can be defined –  explicitly (e.g., through graphical notaEon) –  implicitly (using a high level language) •  Oaen used with homogeneous informaEon flows –  To take advantage of the predefined structure of input and output Rule •  Transforming Rules •  DetecEng Rules Type of Rules Support for Uncertainty 68G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  69. 69. BigData DetecEng Rules •  Present an explicit disEncEon between detecEon and producEon •  Usually, the detecEon is based on a logical predicate that captures paIerns of interest in the history of received items Rule •  Transforming Rules •  DetecEng Rules Type of Rules Support for Uncertainty 69G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  70. 70. BigData Support for Uncertainty •  Two orthogonal aspects –  Support for uncertain input •  Allows rules to deal with/ reason about uncertain input data –  Support for uncertain output •  Allows rules to associate a degree of uncertainty to the output produced Rule •  Transforming Rules •  DetecEng Rules Type of Rules Support for Uncertainty 70G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  71. 71. BigData Language Model •  Specify operaEons to –  Filter –  Join –  Aggregate •  input flows … •  … to produce one or more output flows •  Following the rule model, we define two classes of languages: –  Transforming languages •  DeclaraEve languages •  ImperaEve languages –  DetecEng languages •  Pa?ern-based 71G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  72. 72. BigData •  Following the rule model, we define two classes of languages: –  Transforming languages •  DeclaraEve languages •  ImperaEve languages –  DetecEng languages •  Pa?ern-based Language Model •  Specify the expected result rather than the desired execuEon flow •  Usually derive from relaEonal languages –  RelaEonal algebra / SQL CQL/Stream: Select IStream(*) From F1[Rows 5], F2[Rows 10] Where F1.A = F2.A 72G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  73. 73. BigData •  Following the rule model, we define two classes of languages: –  Transforming languages •  DeclaraEve languages •  ImperaEve languages –  DetecEng languages •  Pa?ern-based Language Model •  Specify the desired execuEon flow •  StarEng from primiEve operators –  Can be user-defined •  Usually adopt a graphical notaEon 73G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  74. 74. BigData ImperaEve Languages Aurora (Boxes & Arrows Model) 74G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  75. 75. BigData Hybrid Languages Oracle CEP 75G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  76. 76. BigData •  Following the rule model, we define two classes of languages: –  Transforming languages •  DeclaraEve languages •  ImperaEve languages –  DetecEng languages •  Pa?ern-based Language Model •  Specify a firing condiEon as a pa?ern •  Select a porEon of incoming flows through –  Logic operators –  Content / Eming constraints •  The acEon uses selected items to produce new knowledge 76G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  77. 77. BigData DetecEng Languages TESLA / T-Rex Define Fire(area: string, measuredTemp: double) From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=Smoke.area and measuredTemp=Temp.value 77G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  78. 78. BigData Language Model •  Different syntaxes / constructs / operators •  Comparison of languages semanEcs and expressiveness sEll an open issue •  Our approach: – Review all operators encountered in the analysis of systems – Specifying the classes of languages adopEng them – Trying to capture some semanEcs relaEonship •  Among operators 78G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  79. 79. BigData Language Model •  Single-Item operators –  SelecEon operators •  Filter items according to their content –  ElaboraEon operators •  ProjecEon –  Extracts a part of the content of an item •  Renaming –  Changes the name of a field in languages based on records or tuples •  Present in all languages •  Defined as primiEve operators in imperaEve languages •  DeclaraEve languages inherit selecEon, projecEon, and renaming from relaEonal algebra 79G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 Select RStream (I.Price as HighPrice) From Items[Rows 1] as I Where I.Price > 100 Renaming Projection Selection 7/10/2015
  80. 80. BigData Language Model •  Single-Item operators –  SelecEon operators •  Filter items according to their content –  ElaboraEon operators •  ProjecEon –  Extracts a part of the content of an item •  Renaming –  Changes the name of a field in languages based on records or tuples •  Pa?ern-based languages –  SelecEon inside the condiEon part (pa?ern) –  ElaboraEon as part of the acEon 80G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 Define ExpensiveItem (highPrice: double) From Item(price>100) Where highPrice = price Selection Renaming Projection 7/10/2015
  81. 81. BigData Language Model •  Logic Operators –  ConjuncEon –  DisjuncEon –  RepeEEon –  NegaEon •  Explicitly present in pa?ern- based languages 81G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 PADRES (A & B) || (C & D) Conjunction DisjuncEon 7/10/2015
  82. 82. BigData Language Model •  Logic Operators –  ConjuncEon –  DisjuncEon –  RepeEEon –  NegaEon •  Some logic operators are blocking –  Express pa?ern whose validity cannot be decided into a bounded amount of Eme •  E.g., NegaEon –  Used in conjuncEon with windows 82G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 Define Fire() From Smoke(area=$a) and not Rain(area=$a) within 10 min from Smoke NegaEon Window 7/10/2015
  83. 83. BigData Language Model •  Logic Operators –  ConjuncEon –  DisjuncEon –  RepeEEon –  NegaEon •  Tradi1onally, logic operators were not explicitly offered by declaraEve and imperaEve languages •  However, they could be expressed as transformaEon of input flows 83G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 Select IStream (F1.A, F2.B) From F1 [Rows 10], F2 [Rows 20] ConjuncEon of A and B 7/10/2015
  84. 84. BigData Language Model •  Sequences –  Similar to logic operators –  Based on Eming relaEons among items •  Present in almost all pa?ern-based languages 84G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 Define Fire() From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Sequence (Eme-bounded) 7/10/2015
  85. 85. BigData Language Model •  Sequences –  Similar to logic operators –  Based on Eming relaEons among items •  Tradi1onally, transforming languages did not provide sequences explicitly •  Could be expressed with an explicit reference to Emestamps –  If present inside items 85G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 Select IStream (F1.A, F2.B) From F1 [Rows 10], F2 [Rows 20] Where F1.timestamp < F2.timestamp Impose Emestamp order 7/10/2015
  86. 86. BigData Language Model •  IteraEons – Express possibly unbounded sequences of items … – … saEsfying an itera1ng condiEon •  Implicitly defines an ordering among items SASE+ PATTERN SEQ(Alert a, Shipment+ b[ ]) WHERE skip_till_any_match(a, b[ ]) { a.type = ’contaminated’ and b[1].from = a.site and b[i].from = b[i-1].to } WITHIN 3 hours IteraEon (Kleene +) 86G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  87. 87. BigData Language Model •  Logic operators, sequences, and iteraEons tradi1onally not offered by transforming languages •  And now? –  Current trend: •  Embed pa?erns inside declaraEve languages •  Especially adopted in commercial systems Esper Select A.price From pattern [every (A à(B or C))] Where A.price > 100 87G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  88. 88. BigData Language Model •  Windows – Kind: •  Logical (Time-Based) •  Physical (Count- Based) •  User-Defined Logical Select IStream(Count(*)) From F1[Range 1 Minute] Physical Select IStream(Count(*)) From F1[Rows 50 Slide 10] 88G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  89. 89. BigData Language Model •  Windows are used to limit the scope of blocking operators •  They are generally available in declaraEve and imperaEve languages •  They are not present in all pa?ern-based languages –  Some of them do not include blocking operators –  Some of them “embed” windows inside operators •  Making them unblocking CEDR EVENT Test-Rule WHEN UNLESS(A, B, 12 hours) WHERE A.a < B.b 89G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  90. 90. BigData Language Model •  Windows movement –  Fixed: do not move at all –  Landmark: have a fixed lower bound, while the upper bound advances every Eme a new informaEon item enters the system •  E.g., all items since 1/1/2013 –  Sliding: have a fixed size, both lower and upper bounds advance when new items enter the system –  Pane: both the lower and the upper bounds move by k elements, as k elements enter the system •  K is smaller than the window size –  Tumble: same as above •  K is greater or equal to the window size 90G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  91. 91. BigData Language Model •  Flow management operators –  Required by declaraEve and imperaEve languages to merge, split, organize, and process incoming flows of informaEon Flow Management Operators Join Bag Operators Duplicate Union Except Intersect Remove-duplicates Group By Order By Flow Creation 91G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  92. 92. BigData Language Model •  ParameterizaEon –  Allows the binding of different informaEon items based on their content –  Offered implicitly by declaraEve and imperaEve languages •  Through a combinaEon of join and selecEon –  Offered as an explicit operator in pa?ern-based languages CQL / Stream Select IStream (F1.A, F2.B) From F1 [Rows 10], F2 [Rows 20] Where F1.A > F2.B Cartesian product SelecEon Explicit ParameterTESLA / T-Rex Define Fire() From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke 92G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  93. 93. BigData Language Model Aggregates •  DetecEon Aggregates •  ProducEon Aggregates Scope Definition •  Predefined •  User-defined Define Fire(area: string, measuredTemp: double) From Smoke(area=$a) and 45 < Avg(Temp(area=$a).value within 5 min. from Smoke) Where area=Smoke.area and measuredTemp=Temp.value Define Fire(area: string, measuredTemp: double) From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke) Where area=Smoke.area and measuredTemp=Avg(Temp(area=$a).value) within 1 hour from Smoke DetecEon Aggregate ProducEon Aggregate 93G. Cugola and A. Margara - h?p://www.streamreasoning.org/courses/scep2015 7/10/2015
  94. 94. BigData Credits •  These slides are parEally based on "A modeling framework for DSMS and CEP" by G. Cugola and A. Margara presented in the PhD course on "Stream and complex event processing" offered by Politecnico di Milano in 2015. – h?p://www.streamreasoning.org/courses/scep2015 7/10/2015 @manudellavalle - h?p://emanueledellavalle.org 94
  95. 95. BigData Semantic Approach to Big Data and Event Processing Thank you! Any QuesEon? Emanuele Della Valle DEIB - Politecnico di Milano @manudellavalle emanuele.dellavalle@polimi.it h?p://emanueledellavalle.org

×