Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BigData
Semantic Approach to
Big Data and Event Processing
Stream	Reasoning:	
mastering		
the	velocity	and	variety	dimensi...
BigData
It's	a	streaming	world	…	
•  Off-shore	oil	operaIons	
•  Smart	CiIes	
•  Global	Contact	Center	
•  Social	networks	...
BigData
…	looking	for	reacIve	answers	…	
3	
•  What	is	the	expected	Ime	to	failure	when	that	
turbine's	barring	starts	to	...
BigData
…	and	more	conflicIng	requirements	1/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  handle	massive	d...
BigData
…	conflicIng	requirements	2/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  process	data	streams	on	t...
BigData
…	conflicIng	requirements	3/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  cope	with	heterogeneous	d...
BigData
…	conflicIng	requirements	4/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  cope	with	incomplete	data...
BigData
…	conflicIng	requirements	5/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  cope	with	noisy	data					...
BigData
…	conflicIng	requirements	6/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  provide	reac:ve	answers		...
BigData
…	conflicIng	requirements	7/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  support	fine-grained	infor...
BigData
…	conflicIng	requirements	8/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  integrate	complex	domain	...
BigData
Challenges	
A	system	able	to	answer	those	queries	must	be	able	to		
•  handle	massive	datasets 	 	 	 	 	 	 	x		
• ...
BigData
Challenges	
•  Volume	+	Velocity	+	Variety	=	hard	deal	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.or...
BigData
A	good	reason	to	embrace	it!	
•  Volume	+	Velocity	+	Variety	=>	high	value	
	
8/10/2015	 @manudellavalle		-		hBp:/...
BigData
From	challenges	to	opportuniIes	
•  Formally	data	streams	are	:		
–  unbounded	sequences	of	Ime-varying	data	eleme...
BigData
State-of-the-art:	DSMS	and	CEP		
•  A	paradigma:c	change!	
•  ConInuous	queries	registered	over	streams	that	
are	...
BigData
DSMS	and	CEP	vs.	requirements	
Requirement
DSMS
CEP
massive datasets
data streams
heterogeneous dataset
incomplete...
BigData
State of the art: Ontology Based Data Access	
•  Given	ontology	O	and	query	Q,	use	O	to	rewrite	Q	
as	Q’	so	that,	...
BigData
DSMS,	CEP	and	OBDA	vs.	requirements	
Requirement
DSMS
CEP
OBDA
massive datasets
data streams
heterogeneous dataset...
BigData
Stream	Reasoning	
•  Research	quesIon	
–  is	it	possible	to	make	sense	in	real	:me	of		
mul:ple,	heterogeneous,	gi...
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	...
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	...
BigData
Model	data	stream	at	semanIc	level					1/2		
•  State	of	the	art:	RDF	
–  It	allows	to	make	statements	about	resou...
BigData
Model	data	stream	at	semanIc	level					2/2		
•  State	of	the	art:	RDF	
•  ContribuIon:	RDF	Stream	
–  Unbound	sequ...
BigData
Adding	conInuous	semanIcs	to	SPARQL
Who	are	the	opinion	makers?	i.e.,	the	users	who	are	
likely	to	influence	the	be...
BigData
Adding	conInuous	semanIcs	to	SPARQL
Who	are	the	opinion	makers?	i.e.,	the	users	who	are	
likely	to	influence	the	be...
BigDatadiscusses	 discusses	 discusses	
discusses	 discusses	
discusses	
discusses	
Defining	conInuous	deducIve	reasoning	
...
BigData
Finding	
•  The	Seman:c	Web	stack	can	be	extended	so	to	
incorporate	streaming	data	as	a	first	class	ciIzen	
–  RDF...
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	...
BigData
OpImize	querying	for	reacIve	answers	
•  C-SPARQL	engine	Ime	window-based	selecIon	outperforms							
							SPARQ...
BigData
Is	it	an	hard	problem?	
•  In	the	database	community	it	is	known	as	the		
incremental	view	maintenance	problem	
• ...
BigData
Is	it	an	hard	problem?	
1000	
	
	
	
100	
	
	
	
10	
	
	
	
0	
rematerialize	the	view	
aver	each	update	
0%								2%...
BigData
Is	it	an	hard	problem?	
1000	
	
	
	
100	
	
	
	
10	
	
	
	
0	
rematerialize	the	view	
aver	each	update	
0%								2%...
BigData
What	is	the	target	behaviour?	
1000	
	
	
	
100	
	
	
	
10	
	
	
	
0	
rematerialize	the	view	
aver	each	update	
incre...
BigData
ObservaIon	
•  DRed	works	with	random	inserIons	and	deleIons	
•  In	a	streaming	sebng,	when	a	triple	enters	the	wi...
BigData
OpImized	Stream	Reasoning	(IMaRS)	
•  Idea:		
–  add	an	expira:on	:me	to	each	triple	and		
–  use	an	hash	table	to...
BigData
OpImize	reasoning	for	reacIve	answers	
•  Incremental	Reasoning	on	RDF	streams	(IMaRS):	new	reasoning	
algorithm	o...
BigData
OpImize	reasoning	for	reacIve	answers	
•  comparison	of	the	average	Ime	needed	to	answer	
a	C-SPARQL	query,	when	2...
BigData
Finding	
•  Stream	Reasoning	task	is	feasible	and	the	very	nature	of	
streaming	data	offers	opportuniIes	to	op:mise...
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	...
BigData
Cope	with	the	noisy	and	incomplete	data	
•  "Noise"	is	reduced	using	DSMS	techniques	
•  Deduc:ve	stream	reasoning...
BigData
Findings	
•  A	combina:on	of	deduc:ve	and	induc:ve	stream	
reasoning	techniques	can	cope	with	incomplete	and	
nois...
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	...
BigData
PraIcal	cases	
•  10+	deployments	in	Sensor	Networks	&	Social	media	analyIcs,	e.g.					
BOTTARI
Winner of Semantic...
BigData
Findings	
•  There	are	applicaIon	domains	where	Stream	Reasoning	
offers	an	adequate	soluIon	
45	8/10/2015	 @manude...
BigData
Findings	
1.  The	Seman:c	Web	stack	can	be	extended	so	to	incorporate	
streaming	data	as	a	first	class	ciIzen	
–  R...
BigData
Open	issues	
1.  The	Seman:c	Web	stack	can	be	extended		
–  "NavigaIng	the	Chasm	between	the	Scylla	of	PracIcal	Ap...
BigData
AdverIsements	:-P	
•  Check	out	my	PhD	thesis	
– hBp://dare.ubvu.vu.nl/handle/1871/53293		
– Chapter	1:	IntroducIo...
BigData
Semantic Approach to
Big Data and Event Processing
Thank	you!	
Any	QuesIon?	
Emanuele	Della	Valle	
DEIB	-	Politecn...
Upcoming SlideShare
Loading in …5
×

Stream Reasoning: mastering the velocity and variety dimensions of Big Data at once

223 views

Published on

Stream Reasoning: mastering the velocity and variety dimensions of Big Data at once
Prof Emanuele Della Valle - DEIB Politecnico di Milano

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Stream Reasoning: mastering the velocity and variety dimensions of Big Data at once

  1. 1. BigData Semantic Approach to Big Data and Event Processing Stream Reasoning: mastering the velocity and variety dimensions of Big Data at once Emanuele Della Valle DEIB - Politecnico di Milano @manudellavalle emanuele.dellavalle@polimi.it hBp://emanueledellavalle.org
  2. 2. BigData It's a streaming world … •  Off-shore oil operaIons •  Smart CiIes •  Global Contact Center •  Social networks •  Generate data streams! 2 E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing Informa:on. IEEE Intelligent Systems 24(6): 83-89 (2009) 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  3. 3. BigData … looking for reacIve answers … 3 •  What is the expected Ime to failure when that turbine's barring starts to vibrate as detected in the last 10 minutes? •  Is public transportaIon where the people are? •  Who are the best available agents to route all these unexpected contacts about the tariff plan launched yesterday? •  Who is driving the discussion about the top 10 emerging topics ? •  Require conInuous processing and reacIve answer 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  4. 4. BigData … and more conflicIng requirements 1/8 A system able to answer those queries must be able to •  handle massive datasets –  A typical oil producIon placorm is equipped with about 400.000 sensors –  Telecom data is the most pervasive data source in urban are, in Milano there are 1.8 million mobile users –  A global contact centre of a Telecom operator counts 500 millions of clients –  Facebook alone has 1.1 billion of acIve users 4 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  5. 5. BigData … conflicIng requirements 2/8 A system able to answer those queries must be able to •  process data streams on the fly –  The sensors on typical oil producIon placorm generates 10,000 observaIons per minute with peaks of 100,000 o/m –  The mobile users in Milano generates 20,000 call/sms/data connecIons per minute with peaks of 80,000 c/m –  A global contact centre receives 10,000 contacts per minute with peaks of 30,000 c/m –  Facebook, as of May 2013, observes 3 millions "I like" per minute 5 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  6. 6. BigData … conflicIng requirements 3/8 A system able to answer those queries must be able to •  cope with heterogeneous dataset –  The sensors on typical oil producIon have been deployed over 10 years by 10s of different producers –  Tens of data sources are normally needed to make sense of an urban phenomena –  A global contact centre consists in 100s of offices owned by different subsidiary companies engaged yearly –  Each social network has its own data model, APIs, … 6 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  7. 7. BigData … conflicIng requirements 4/8 A system able to answer those queries must be able to •  cope with incomplete data –  10s of sensors and networking links broke down daily –  Coverage is incomplete –  Only standard cases are covered by fully machine processable data records 100s of contacts per minute are manage ad-hoc –  Conversa:ons happen outside the social networks, too! 7 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  8. 8. BigData … conflicIng requirements 5/8 A system able to answer those queries must be able to •  cope with noisy data –  Sensor out-of-opera:ng range –  Faulty sensors –  Agents misunderstand, get :red, … –  Irony, sarcasm, … 8 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  9. 9. BigData … conflicIng requirements 6/8 A system able to answer those queries must be able to •  provide reac:ve answers –  detecIon of dangerous situaIons must occur within minutes –  recommendaIons to ciIzens must be performed in few seconds –  rouIng a contact through each step of the decision tree must take less than a second –  Search autocompleIng may need to be updated every few minutes 9 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  10. 10. BigData … conflicIng requirements 7/8 A system able to answer those queries must be able to •  support fine-grained informa:on access –  IdenIfy a turbine among thousands –  Locate a bus among thousands –  Contact an agent among thousands –  IdenIfy an opinion maker among thousands of influencers for a topic 10 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  11. 11. BigData … conflicIng requirements 8/8 A system able to answer those queries must be able to •  integrate complex domain models of –  opera:onal and control process –  various city aspects –  contact management, contract types, agent skills, contactor profiles, … –  topics, user profiles, … 11 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  12. 12. BigData Challenges A system able to answer those queries must be able to •  handle massive datasets x •  process data streams on the fly x •  cope with heterogeneous datasets x •  cope with incomplete data x x •  cope with noisy data x •  provide reac:ve answers x •  support fine-grained access x x •  integrate complex domain models x 12 Volume' Velocity' Variety' Veracity' In Big Data terms 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  13. 13. BigData Challenges •  Volume + Velocity + Variety = hard deal 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org 13 Volume months days hours min. sec. ms. Volume ZB EB PB TB GB MB KB Variety
  14. 14. BigData A good reason to embrace it! •  Volume + Velocity + Variety => high value 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org 14 Value ms. sec. min. hours days months years velocity Variety
  15. 15. BigData From challenges to opportuniIes •  Formally data streams are : –  unbounded sequences of Ime-varying data elements •  Less formally, in many applicaIon domains, they are: –  a “conInuous” flow of informaIon –  where recent informa:on is more relevant as it describes the current state of a dynamic system •  OpportuniIes –  Forget old enough informa:on –  Exploit the implicit ordering (by recency) in the data time 15 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  16. 16. BigData State-of-the-art: DSMS and CEP •  A paradigma:c change! •  ConInuous queries registered over streams that are observed trough windows window input streams streams of answerRegistered ConInuous Query Dynamic System 16 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  17. 17. BigData DSMS and CEP vs. requirements Requirement DSMS CEP massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models 17 ✗ ✗ ✗ 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  18. 18. BigData State of the art: Ontology Based Data Access •  Given ontology O and query Q, use O to rewrite Q as Q’ so that, for any set of ground facts A contained in mulIple databases: –  answer(Q,O,A) = answer(Q’,!,A) The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O •  Use mapping M to map Q’ to mulIple SQL queries to the various databases Rewrite O Q Q’ Map SQL M answer A 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org 18
  19. 19. BigData DSMS, CEP and OBDA vs. requirements Requirement DSMS CEP OBDA massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models 19 ✗ ✗ ✗ ✗ ✗ ✗ 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  20. 20. BigData Stream Reasoning •  Research quesIon –  is it possible to make sense in real :me of mul:ple, heterogeneous, gigan:c and inevitably noisy and incomplete data streams in order to support the decision processes of extremely large numbers of concurrent users? •  Proposed approach 20 Complexity Raw Stream Processing SemanIc Streams DL-Lite DL AbstracIon SelecIon InterpretaIon Reasoning Querying Re-wriIng Change Frequency PTIME NEXPTIME 104 Hz 1 Hz Complexity vs. Dynamics AC0 H. Stuckenschmidt, S. Ceri, E. Della Valle, F. van Harmelen: Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on SemanIc Aspects of Sensor Networks, 2010. 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  21. 21. BigData Sub-research quesIons 1.  Is it possible extend the Seman:c Web stack in order to represent heterogeneous data streams, conInuous queries, and conInuous reasoning tasks? 2.  Does the ordered nature of data streams and the possibility to forget old enough informaIon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness? 3.  Can SemanIc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams? 4.  Are there prac:cal cases where processing data stream at semanIc level is the best choice? 21 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  22. 22. BigData Sub-research quesIons 1.  Is it possible extend the Seman:c Web stack in order to represent heterogeneous data streams, conInuous queries, and conInuous reasoning tasks? 2.  Does the ordered nature of data streams and the possibility to forget old enough informaIon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness? 3.  Can SemanIc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams? 4.  Are there prac:cal cases where processing data stream at semanIc level is the best choice? 22 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  23. 23. BigData Model data stream at semanIc level 1/2 •  State of the art: RDF –  It allows to make statements about resources in the form of subject-predicate-object expressions •  In RDF terminology triples •  E.g. @BarakObama posts "Four more years" –  A collecIon of RDF statements represents a labelled, directed graph •  In RDF terminology a graph •  E.g., the tweet above by Barak Obama is connected to –  800,000+ twiBer user profiles via retweets –  300,000+ twiBer user profiles favorite –  … •  ContribuIon: RDF stream subject predicate object 23 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  24. 24. BigData Model data stream at semanIc level 2/2 •  State of the art: RDF •  ContribuIon: RDF Stream –  Unbound sequence of :me-varying triples –  each represented by a pair made of an RDF triple and its Imestamp … @BarakObama posts "Four more years", 8:16PM 6 Nov 2012 @Alice posts "RT: Four more years", 8:17PM 6 Nov 2012 … D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010) 24 subject predicate object timestamp 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  25. 25. BigData Adding conInuous semanIcs to SPARQL Who are the opinion makers? i.e., the users who are likely to influence the behavior their followers REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) ) } HAVING ( COUNT(DISTINCT ?follower) > 3 ) 25 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  26. 26. BigData Adding conInuous semanIcs to SPARQL Who are the opinion makers? i.e., the users who are likely to influence the behavior their followers REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) ) } HAVING ( COUNT(DISTINCT ?follower) > 3 ) Query registra:on (for con:nuous execu:on) FROM STREAM clause WINDOW RDF Stream added as new ouput format Buil:n to access :mestamps 26 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  27. 27. BigDatadiscusses discusses discusses discusses discusses discusses discusses Defining conInuous deducIve reasoning What impact has been my micropost p1 creating in the last hour? Let’s count the number of microposts that discuss it … REGISTER STREAM ImpactMeter AS SELECT (count(?p) AS ?impact) FROM STREAM <http://…/fb> [RANGE 60m STEP 10m] WHERE { :Alice posts [ sr:discusses ?p ] } p1 p3 p5 p8 p2 p4 p7 p6 7! 27 Transitive property Alice posts p1 . 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  28. 28. BigData Finding •  The Seman:c Web stack can be extended so to incorporate streaming data as a first class ciIzen –  RDF stream data model –  Con:nuous SPARQL syntax and semanIcs –  Con:nuous deduc:ve reasoning semanIcs 28 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  29. 29. BigData Sub-research quesIons 1.  Is it possible extend the Seman:c Web stack in order to represent heterogeneous data streams, conInuous queries, and conInuous reasoning tasks? 2.  Does the ordered nature of data streams and the possibility to forget old enough informaIon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness? 3.  Can SemanIc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams? 4.  Are there prac:cal cases where processing data stream at semanIc level is the best choice? 29 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  30. 30. BigData OpImize querying for reacIve answers •  C-SPARQL engine Ime window-based selecIon outperforms SPARQL filter-based selecIon (Jena-ARQ) D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Reunger, H. Wermser: DeducIve and InducIve Stream Reasoning for SemanIc Social Media AnalyIcs IEEE Intelligent Systems, 30 Aug. 2010. 30 Our In-memory RDF stream processing engine 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  31. 31. BigData Is it an hard problem? •  In the database community it is known as the incremental view maintenance problem •  The state of the art soluIon is the DRed algorithm –  Overes<ma<on of dele<on: OveresImates deleIons by compuIng all direct consequences of a deleIon. –  Rederiva<on: Prunes those esImated deleIons for which alternaIve derivaIons (via some other facts in the program) exist. –  Inser<on: Adds the new derivaIons that are consequences of inserIons •  DeleIon are twice as expensive as inserIons A B C E A C A B C E A C 31 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  32. 32. BigData Is it an hard problem? 1000 100 10 0 rematerialize the view aver each update 0% 2% 4% 6% 8% 10% 12% 14% % of deleIons w.r.t the size of the DB ms Break-even point In a normal DB incremental view maintenance is convenient in most of the cases: •  Lot of inserts, few deletions •  Small % of deletions w.r.t. the size of the DB incremental view maintenance 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org 32
  33. 33. BigData Is it an hard problem? 1000 100 10 0 rematerialize the view aver each update 0% 2% 4% 6% 8% 10% 12% 14% % of deleIons w.r.t the content of the window ms Break-even point In a streaming setting •  The number of inserts are equals to the number of deletions •  the data exiting the windows causes large % of deletions w.r.t. the size of the window incremental view maintenance 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org 33
  34. 34. BigData What is the target behaviour? 1000 100 10 0 rematerialize the view aver each update incremental view maintenance 0% 2% 4% 6% 8% 10% 12% 14% ms Break-even point Target behaviour % of deleIons w.r.t the content of the window 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org 34
  35. 35. BigData ObservaIon •  DRed works with random inserIons and deleIons •  In a streaming sebng, when a triple enters the window, given the size of the window, the reasoner knows already when it will be deleted! •  E.g., –  if the window is 40 minutes long, and, –  it is 10:00, the triple(s) entering now –  will exit on 10:40. •  Conclusion –  dele:ons are predictable Time Enter window Exit window Explicitly in window Infer win 10:00 A!B 10:10 B!C 10:20 A!E 10:30 E!C 10:40 A!B 10:50 B!C 11:00 A!E A B A B C A A B C E A A B C E A A C E A A B C E A C E 35 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  36. 36. BigData OpImized Stream Reasoning (IMaRS) •  Idea: –  add an expira:on :me to each triple and –  use an hash table to index triples by their expiraIon Ime •  The algorithm 1.  deletes expired triples 2.  Adds the new derivaIons that are consequences of inserIons annota:ng each inferred triple with an expira:on :me (the min of those of the triple it is derived from), and 3.  when mul:ple deriva:ons occur, for each mulIple derivaIon, it keeps the max expiraIon Ime. 36 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  37. 37. BigData OpImize reasoning for reacIve answers •  Incremental Reasoning on RDF streams (IMaRS): new reasoning algorithm opImized for reacIve query answering D.F. Barbieri, D. Braga, S.Ceri, E. Della Valle, M. Grossniklaus: Incremental Reasoning on Streams and Rich Background Knowledge. ESWC (1) 2010: 1-15 D. Dell'Aglio, E. Della Valle: Incremental Reasoning on RDF Streams. In A.Harth, K.Hose, R.Schenkel (Eds.) Linked Data Management, CRC Press 2014, ISBN 9781466582408 37 !  Re-materialize after each window slide !  Use DRed !  IMaRS % of deletions w.r.t. the content of the window 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  38. 38. BigData OpImize reasoning for reacIve answers •  comparison of the average Ime needed to answer a C-SPARQL query, when 2% of the content exits the window each Ime it slides, using –  A backward reasoner on the window content –  DRed + standard SPARQL on the materializaIon –  IMaRS + standard SPARQL on the materializaIon 38 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  39. 39. BigData Finding •  Stream Reasoning task is feasible and the very nature of streaming data offers opportuniIes to op:mise reasoning tasks where data is ordered by recency and can be forgoBen aver a while –  C-SPARQL Engine prototype –  IMaRS conInuous incremental reasoning algorithm 39 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  40. 40. BigData Sub-research quesIons 1.  Is it possible extend the Seman:c Web stack in order to represent heterogeneous data streams, conInuous queries, and conInuous reasoning tasks? 2.  Does the ordered nature of data streams and the possibility to forget old enough informaIon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness? 3.  Can SemanIc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams? 4.  Are there prac:cal cases where processing data stream at semanIc level is the best choice? 40 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  41. 41. BigData Cope with the noisy and incomplete data •  "Noise" is reduced using DSMS techniques •  Deduc:ve stream reasoning copes with incompleteness deducing implicit facts •  Induc:ve stream reasoning copes with "irrepairable" incompleteness inducing missing facts D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A. Reunger, H. Wermser: DeducIve and InducIve Stream Reasoning for SemanIc Social Media AnalyIcs. IEEE Intelligent Systems 25(6): 32-41 (2010) 41 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  42. 42. BigData Findings •  A combina:on of deduc:ve and induc:ve stream reasoning techniques can cope with incomplete and noisy data 42 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  43. 43. BigData Sub-research quesIons 1.  Is it possible extend the Seman:c Web stack in order to represent heterogeneous data streams, conInuous queries, and conInuous reasoning tasks? 2.  Does the ordered nature of data streams and the possibility to forget old enough informaIon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness? 3.  Can SemanIc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams? 4.  Are there prac:cal cases where processing data stream at semanIc level is the best choice? 43 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  44. 44. BigData PraIcal cases •  10+ deployments in Sensor Networks & Social media analyIcs, e.g. BOTTARI Winner of Semantic Web Challenge 2011 City Data Fusion Winner of IBM faculty award 2013 M. Balduini, I. Celino, D. Dell’Aglio, E. Della Valle, Y. Huang, T. Lee, S.-H. Kim, V. Tresp: BOTTARI: An augmented reality mobile applicaIon to deliver personalized and locaIon-based recommendaIons by conInuous analysis of social media streams. J. Web Sem. 16: 33-41 (2012) 44 Social Listener M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciuccarelli: CitySensing: Fusing City Data for Visual Storytelling. IEEE MulIMedia 22(3): 44-53 (2015) 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  45. 45. BigData Findings •  There are applicaIon domains where Stream Reasoning offers an adequate soluIon 45 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  46. 46. BigData Findings 1.  The Seman:c Web stack can be extended so to incorporate streaming data as a first class ciIzen –  RDF stream data model –  Con:nuous SPARQL syntax and semanIcs –  Con:nuous deduc:ve reasoning semanIcs 2.  Stream Reasoning task is feasible and the very nature of streaming data offers opportuniIes to op:mise reasoning tasks where data is ordered by recency and can be forgoBen aver a while –  IMaRS conInuous incremental reasoning algorithm –  C-SPARQL Engine prototype 3.  A combinaIon of deduc:ve and induc:ve stream reasoning techniques can cope with incomplete and noisy data 4.  There are applica:on domains where Stream Reasoning offers an adequate soluIon 46 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  47. 47. BigData Open issues 1.  The Seman:c Web stack can be extended –  "NavigaIng the Chasm between the Scylla of PracIcal ApplicaIons and the Charybdis of TheoreIcal Approaches" A. Bernstein, 2015 2.  Stream Reasoning task is feasible –  It's Ime to start removing assumpIons •  knowledge does not change •  background data does not change –  OBDA for SQL ≠ OBDA for conInuous querying 3.  Stream reasoning can cope with incomplete and noisy data –  Theory is needed! 4.  There are applica:on domains where Stream Reasoning offers an adequate soluIon –  Rigorous quanItaIve comparaIve research is needed 47 8/10/2015 @manudellavalle - hBp://emanueledellavalle.org
  48. 48. BigData AdverIsements :-P •  Check out my PhD thesis – hBp://dare.ubvu.vu.nl/handle/1871/53293 – Chapter 1: IntroducIon •  The content of this presentaIon – Chapter 8: conclusions •  A review of stream reasoning approaches updated in spring 2015 •  Put an "I like" to Stream Reasoning on Facebook – hBps://www.facebook.com/streamreasoning @manudellavalle - hBp://emanueledellavalle.org 488/10/2015
  49. 49. BigData Semantic Approach to Big Data and Event Processing Thank you! Any QuesIon? Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it hBp://emanueledellavalle.org

×