SlideShare a Scribd company logo
1 of 45
Download to read offline
Provenance	in	Databases	
and	Scientific	Workflows
Bertram	Ludäscher
ludaesch@illinois.edu
31st	Brazilian	Symposium	on	Databases
October	4-7,	2016,	Salvador,	Bahia 1
• Part	I:	Provenance	in	Scientific	Workflows
– Alta	Vista:	Provenance	everywhere!	
– Provenance	&	Scientific	Workflows	
– Provenance	Models	and	Standards	(not	so	much)
– Provenance	Tools
• Example	&	Demo:	YesWorkflow
• Part	II:	Provenance	in	Databases
– Foundations	of	provenance	in	databases
– Why-,	How-,	and	Why-Not provenance
Outline	of	the	Tutorial	
A	“Tour	de	Provenance”
2
Provenance	@	SBBD'16
Types of Data Provenance
• Black-box
– know (next to) nothing at compile-time
– at runtime: keep some data lineage
– most prov sensu WF work use this
• White-box
– statically (compile-time) analyzable
– q(Y1,Y2) :- p(X1,X2), r(X1,Y1), s(X2,Y2)
– Most prov sensu DB work use this
• Grey-box
– can “look inside” (some black boxes)
– … e.g. b/c they have subworkflows
– … or FP signatures: A :: t1, t2à t3,t4
– … or semantic annotations (sem.types)
f
A
q
t1
t2
t3
t4
X1
X2
Y1
Y2
Provenance	@	SBBD'16
3
6th Stop:	Provenance	in	Databases
• Some	key	questions:
– Why is	tuple	t in	answer	to	query	q(D)?
– Which	set	of	tuples	L in	D does	t depend on?
i.e.,	what	is	the	lineage	L of	t	?	
– How was	t	derived	from	its	lineage	L	?
• Also:
– Where in	D	do	the	values	in	t	come	from?
– Why	is	t’ not in	q(D)?
• ..	fasten	your	seatbelts	…	
4
Provenance	@	SBBD'16
Provenance in Databases
5
Land	of	many	different	provenance	species:	
Why?	How?	Where?
Later:	Why-Not?	How	many?	How	long?
Provenance	@	SBBD'16
Provenance in Databases
(fine-grained, white-box)
6
Provenance	@	SBBD'16
Compare	with:
Provenance	in	Scientific	Workflows
• Some	key	questions:
– What	is	the	lineage/trace	T of	data	product	(output)	yi:
(y1	…,	yn )	=	execute(W,	x,	p)	?
• …	given	workflow/script	W with	inputs x and	parameters p	?
• …	i.e.,	find	subset of	x,	p,	and	(program	slices	of)	W on	which	a		specific	yi
depends!
– How	can	we	store,	query	the	provenance	(trace)	graph	
effectively,	efficiently?	
• Regular	Path	Queries	(RPQs),	Lowest	Common	Ancestor	(LCA)
• Temporal	Query	Languages	(e.g.	Past-Temporal	Logic)
• other	graph	queries
– What	is	the	difference between	traces	T1,	T2?
– Does	the	trace	(retrospective provenance)	match	the	workflow	
(prospective provenance)?	
7
Provenance	@	SBBD'16
8
Provenance in (Scientific) Workflows
(“Coarse-grained”, “Black-box”)
Provenance	@	SBBD'16
What people do with “provenance”
• Which	one	is	“workflows”	vs	“databases”	?	
– Result	validation	
– Result	debugging	(science	vs wf logic)
– Reproducibility	and	Repeatability		
– Explanation	(derivations,	traces,	proof	trees)
– Runtime	monitoring
• Profiling,	benchmarking
– Performance	Optimization	(“smart	re-run”)
– Fault-tolerance,	crash-recovery
– Database	view	maintenance	(e.g.	data	warehousing)
– Workflow	design
9
Provenance	@	SBBD'16
Database	Provenance:	Some	Pioneers	…
Cui	(PhD	2001),	Widom:
TODS’00,	VLDB’03
10
Provenance	@	SBBD'16
Database	Provenance:	Some	Pioneers
Buneman et	al.
ICDT	2001
(citations:	1000+)	
11
Provenance	@	SBBD'16
Provenance	
Semirings:	
The	Great	
Database	
Provenance	
Unification*!
TJ	Green	et	al:
PODS’07,
SIGMOD	Record’12
12
*Restrictions	apply:	
positive	queries	only…Provenance	@	SBBD'16
7th Stop:	Provenance	Polynomials
One	Semiring to	Rule	them	all!
(Theory	strikes!)
Green,	Karvounarakis,	Tannen.	Provenance semirings,	PODS,	2007
13
Provenance	@	SBBD'16
Example:	Go	from	X	to	Y	in	3	hops!
(a =	CS b =	NCSA c =	iSchool)
• Database:	 hop(X,Y)	:=	
• Query:		3hop(X,Y)	:-
hop(X,	Z1),	hop(Z1, Z2),	hop(Z2,Y).
a
p
b
q
r
c
s
Note:	Cannot	go	from	c to	a in	3hops!	
a
ppp+pqr+qrp
b
ppq+qrq
cpqs
ppr+qrr
rpq
rqs
hop(a,a, p).
hop(a,b, q).
hop(b,a, r)
hop(b,c, s).
3hop(a,a, p3+2pqr).
3hop(a,b, p2q+q2r).
…
3hop(a,c, pqs).
14
Provenance	@	SBBD'16
hop(S,T)
thop(S,T)			:-
hop(S,U),	hop(U,V),	hop(V,T).
thop(S,T)
hop(a,a).
hop(a,b).
hop(b,a).
hop(b,c).
thop(a,a).
thop(a,b).
thop(a,c).
thop(b,a).
thop(b,b).
thop(b,c).
15
a b c
a b
c
Provenance	@	SBBD'16
hop(S,T)
thop(S,T,	P1*P2*P3)	:-
hop(S,U,	P1),	hop(U,V,	P2),	hop(V,T,	P3).
thop(S,T)
a
p
b
q
r
c
s
a
ppp+pqr+qrp
b
ppq+qrq
cpqs
ppr+qrr
rpq
rqs
hop(a,a, p).
hop(a,b, q).
hop(b,a, r).
hop(b,c, s).
thop(a,a, p3+2pqr).
thop(a,b, p2q+q2r).
thop(a,c, pqs).
thop(b,a, p2r+r2q).
thop(b,b, rpq).
thop(b,c, rqs).
16
Provenance	@	SBBD'16
hop
thop(S,T)			:-
hop(S,U),	hop(U,V),	hop(V,T).
thop
17
a b c
a b
c
Input	
Three-Hop	Query
Output	
Provenance	@	SBBD'16
hop
thop(S,T,	P)	:-
hop(S,U,	P1),	hop(U,V,	P2),	hop(V,T,	P3),
P	= P1*P2*P3	.
thop
a
p
b
q
r
c
s
a
ppp+pqr+qrp
b
ppq+qrq
cpqs
ppr+qrr
rpq
rqs
18
Annotated Input	
Rewritten Three-Hop	Query
Annotated	Output	
Provenance	@	SBBD'16
Provenance	Polynomials
,,Mein	Schatz!”
p3 +	2pqr									
p3 +		pqr p +	2pqr									
p +		pqr
pqr
p +		pqr
p
a
ppp+pqr+qrp
b
ppq+qrq
cpqs
ppr+qrr
rpq
rqs
19
Provenance	@	SBBD'16
8th Stop:	The	Negation &	Why-Not	Problem
• Provenance	Semirings work	well	for:
– Positive Queries	(e.g.,	RA+	)
• Challenges:	Handling	of	
– set	difference (~	negation)
– Why-Not provenance
– Missing	Answer	provenance		
• A	fresh	look	at	provenance!
• …	using	an	old	idea:	Game	semantics!
– for	query	evaluation
20
Provenance	@	SBBD'16
Query	evaluation	
game
EDB:		e(a,b),	e(b,b)	
a b
tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2)
tc(X,Y) :- # (1)--exists:Z-->(3)
e(X,Z), # (3)->(4)-e(X,Z)->(5)
tc(Z,Y). # (3)--X:=Z-->(1) 2
3
1
X := Z
4 5
e(X,Y)
exists:Z
e(X,Z)
3:(b,b,b) 1
1:(b,b) 11
4:(b,b) 1
1
1:(a,b) 1
3:(a,b,a) 1
2:(a,b) 01
3:(a,b,b) 1
2
2
3:(b,b,a) 1
2:(b,b) 01
4:(a,b) 1 5:(a,b) 01
5:(b,b) 01
3:(a,a,a) 1
4:(a,a) 0
1
1:(a,a) 2
1
3:(b,a,a) 1
4:(b,a) 0
1
1
1
1
3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2
Provenance’12	@Dagstuhl
with	JanVdB TJ	Green			
Flum,	Kubierschky,	Ludäscher,	Total	and	partial	well-founded	
Datalog coincide,	ICDT-The-Bag-1997,	Delphi,	Greece
Eureka!
21
Provenance	@	SBBD'16
a b
tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2)
tc(X,Y) :- # (1)--exists:Z-->(3)
e(X,Z), # (3)->(4)-e(X,Z)->(5)
tc(Z,Y). # (3)--X:=Z-->(1) 2
3
1
X := Z
4 5
e(X,Y)
exists:Z
e(X,Z)
3:(b,b,b) 1
1:(b,b) 11
4:(b,b) 1
1
1:(a,b) 1
3:(a,b,a) 1
2:(a,b) 01
3:(a,b,b) 1
2
2
3:(b,b,a) 1
2:(b,b) 01
4:(a,b) 1 5:(a,b) 01
5:(b,b) 01
3:(a,a,a) 1
4:(a,a) 0
1
1:(a,a) 2
1
3:(b,a,a) 1
4:(b,a) 0
1
1
1
1
3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2
EDB:		e(a,b),	e(b,b)	
Game	
diagram
Instantiated	
move	graph
Flum,	Kubierschky,	Ludäscher,	Total	and	
partial	well-founded	Datalog coincide,	
ICDT-The-Bag-1997,	Delphi,	Greece
22
Eureka moment:
1. query	evaluation	=	evaluation	game	(argument	about	truth	in	a	database)
2. provenance	=	winning	strategies	(justified/winning	arguments)
Provenance	@	SBBD'16
9th Stop:	A	Game
a k
b c l
d e m
g h nf
23
Provenance	@	SBBD'16
Solving	the	Game
a k
b c l
d e m
g h nf
All successors	won è position	lost
Some successor	lost è position	won
24
Provenance	@	SBBD'16
Solving	the	Game
a k
b c l
d e m
g h nf
All	leaves (dead-ends)	are	immediately lost!
25
Provenance	@	SBBD'16
Solving	the	Game
a k
b c l
d e m
g h nf
X is	won if	there	exists a	move	to	a	lost	Y
26
Provenance	@	SBBD'16
Solving	the	Game
a k
b c l
d e m
g h nf
X is	lost	if	all	moves	lead	to	a	won	Y
27
Provenance	@	SBBD'16
Solving	the	Game
a k
b c l
d e m
g h nf
Repeat	until	no	change	=>	drawn	positions	remain
28
Provenance	@	SBBD'16
10th Stop:	Game	Provenance
a
b
1
c
3
d e
f
1
g
3
m
h
1
k
l
oo
n
oo
oo
oo
2 2
2
• Game	can	be	solved in	time	
linear	in	|Move|
• One	rule	to	rule	them	all!
win(X)	:- move(X,Y),	not win(Y)
• node	color	=>	edge	color	
– good vs bad moves
• good	moves	=	natural,	new	
notion	of	provenance!
Aside:	Games	~	Argumentation	Frameworks
win(X)	:- move(X,Y),	not win(Y)
def(X)	:- attacks(Y,X),	not def(Y)
Eureka!
29
Provenance	@	SBBD'16
Game	Provenance
W
bad Dbad
L
winning
bad
drawing
n/a
delaying
n/a
n/a
a
b
1
c
3
d e
f
1
g
3
m
h
1
k
l
oo
n
oo
oo
oo
2 2
2
Extracting	Provenance:
ü Why/how win(x)?									
• [x]	–G.(R.G)*–> [y]
ü Why-not win(x)?	
• [x]	–(R.G)*–>	[y]
• [x]		–(Y+)–>			[y]
Move	types
30
Provenance	@	SBBD'16
Game	Provenance
a
b
1
c
3
d e
f
1
g
3
m
h
1
k
l
oo
n
oo
oo
oo
2 2
2
Extracting	Provenance:
ü Why/how win(x)?									
• [x]	–G.(R.G)*–> [y]
ü Why-not win(x)?	
• [x]	–(R.G)*–>	[y]
• [x]		–(Y+)–>			[y]
• Next:	play	a	query	
evaluation	game
• =>	new	why-(not)	
provenance via	games!
31
Provenance	@	SBBD'16
11th Stop:	Provenance	(or	Query	
Evaluation)	Games	Construction
“SLD-resolution	game”
Next	(Example):	
A(X)	:– B(X,Y,Z)		…	not	C(X,Y)	…
Eureka!
32
Provenance	@	SBBD'16
Translation:	Q(I) => G Q(I)
A(X)
C(X)
B(X, Y )
r2(X, Y )
g1
2(X, Y )
g2
2(Y )
rB(X, Y )
rC (X)
¬A(X)
¬B(X, Y )
¬C(X)
B(X, Y )
C(X)
X:=Y
9Y
(a) Game template for QABC : A(X) : B(X, Y ), ¬C(Y ).
¬C(a)
¬C(b)
¬B(a, a)
¬B(a, b)
rB(b, a)
r2(b, a)¬A(b)
¬A(a)
g1
2(a, a)
B(a, b)
B(a, a)
C(a)
g2
2(a)
g2
2(b)
C(b)
¬B(b, a)
¬B(b, b)
rC (a)
A(b)
A(a)
r2(a, b)
r2(a, a)
g1
2(a, b) rB(a, b)
r2(b, b)
g1
2(b, b)
g1
2(b, a)
B(b, b)
B(b, a)
9a
9b
9b
9a
(b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}.
A(b)
Figure 4: Alt
x
¬A :
x1 = a
33
Provenance	@	SBBD'16
Solve	G Q(I) =>	Provenance!	
¬B(a, b)¬A(a) B(a, b)
r2(a, b)
g1
2(a, b) rB(a, b)
(b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}.
¬C(a)
¬C(b)
¬B(a, a)
¬B(a, b)
rB(b, a)
r2(b, a)¬A(b)
¬A(a) rB(a, b)B(a, b)
B(a, a)
C(a)
g2
2(a)
g2
2(b)
C(b)
¬B(b, a)
¬B(b, b)
rC (a)
A(b)
A(a)
r2(a, b)
r2(a, a)
g1
2(a, b)
g1
2(a, a)
r2(b, b)
g1
2(b, b)
g1
2(b, a)
B(b, b)
B(b, a)
9a
9b
9b
9a
(c) Solved game: lost positions are (dark) red; won positions
are (light) green. Provenance edges (= good moves) are solid.
Bad moves are dashed and not part of the provenance. A(a) is
true (A(b) is false) as it is won (lost) in the solved game; the
game provenance explains why (why-not).
Figure 3: Provenance game for Q . The well-founded model of 34
Provenance	@	SBBD'16
Happy	End	(1	of	3)a p
b
q r
c
s
(a) input I ...
hop
a a p
a b q
b a r
b c s
(b) ... annotated.
3hop
a a p3
+ 2pqr
a b p2
q + q2
r
a c pqs
b a p2
r + qr2
b b pqr
b c qrs
(c) 3hop with provenance.
r1(a, a, b, a)
g2
1(a, a)
¬hop(b, a)
g1
1(a, a)
hop(b, a)
g2
1(a, b) g3
1(b, a)
rhop(b, a)
r1(a, a, a, a)
r1(a, a, a, b)
3hop(a, a)
g3
1(a, a)
rhop(a, a)
hop(a, b)
¬hop(a, a)
g1
1(a, b)
rhop(a, b)
g2
1(b, a)
¬hop(a, b)
hop(a, a)
9 a,a 9 b,a
9 a,b
(d) The game provenance of 3hop(a, a) ...
⇥
+
⇥
+
+
+ +
r
⇥
⇥
+
+
p
+
⇥
+
q
+
⇥
+
(e) ... is p3 + 2pqr.
Figure 1: Each edge hop(x, y) in the input graph I in (a) is annotated
Provenance	Game	on	GQ(I)
=		Provenance	Polynomials	
…	for	positive queries!
Yes!
35
Provenance	@	SBBD'16
Happy	End	(2	of	3)
…	but	also	works	for	Why-Not	provenance	&	non-monotonic	
queries	(i.e.,	Q	can	have	negation)	!!
Here:	not 3hop(c,a) – can’t	go	back	from	 GSLIS	 to				CS
c	 a
g2
1(c, a)
¬3hop(c, a)
g2
1(c, c)g1
1(c, c)
r1(c, a, c, b)
¬hop(c, b)
hop(c, a)
g2
1(b, b)
¬hop(a, c)
hop(c, c)
g1
1(c, a)
r1(c, a, b, c)r1(c, a, a, b)
3hop(c, a)
hop(b, b)
g2
1(c, b)g2
1(a, c)
r1(c, a, a, c)
¬hop(c, c)
hop(c, b)
¬hop(c, a)
g1
1(c, b)
r1(c, a, b, b)
¬hop(b, b)
g3
1(c, a)
r1(c, a, a, a) r1(c, a, b, a)
hop(a, c)
r1(c, a, c, a) r1(c, a, c, c)
9 a,b 9 a,c 9 c,a 9 c,c9 b,c 9 b,b9 b,a9 a,a 9 c,b
Figure 2: Why-not provenance for 3hop(c, a) using provenance games.
gi
1 in the body of r1, thus claiming that gi
1 is false and hence that
the r1 instance doesn’t derive t. The first player can counter and
demonstrate that gi
1 is true by selecting a rule instance or fact as
evidence for gi
1. The game proceeds in rounds until some player
cannot move and thus loses (the opponent wins). In [KLZ13] it36
Provenance	@	SBBD'16
Happy	End	(2	of	3)
5	leaf	nodes	~		5	missing	
(“hypothetical”)	edges
Insert	those	
=>	3hop(c,a)	will	be	true!
g2
1(c, a)
¬3hop(c, a)
g2
1(c, c)g1
1(c, c)
r1(c, a, c, b)
¬hop(c, b)
hop(c, a)
g2
1(b, b)
¬hop(a, c)
hop(c, c)
g1
1(c, a)
r1(c, a, b, c)r1(c, a, a, b)
3hop(c, a)
hop(b, b)
g2
1(c, b)g2
1(a, c)
r1(c, a, a, c)
¬hop(c, c)
hop(c, b)
¬hop(c, a)
g1
1(c, b)
r1(c, a, b, b)
¬hop(b, b)
g3
1(c, a)
r1(c, a, a, a) r1(c, a, b, a)
hop(a, c)
r1(c, a, c, a) r1(c, a, c, c)
9 a,b 9 a,c 9 c,a 9 c,c9 b,c 9 b,b9 b,a9 a,a 9 c,b
Figure 2: Why-not provenance for 3hop(c, a) using provenance games.
gi
1 in the body of r1, thus claiming that gi
1 is false and hence that
the r1 instance doesn’t derive t. The first player can counter and
demonstrate that gi
1 is true by selecting a rule instance or fact as
evidence for gi
1. The game proceeds in rounds until some player
cannot move and thus loses (the opponent wins). In [KLZ13] it
was shown how the provenance of a tuple t can be obtained via a
regular path query over a solved game graph like the one in Fig. 1d:
e.g., p3
+ 2pqr for 3hop(a, a) is represented by a solved game
as shown in Fig. 1e: for positive queries, solved games represent
semiring provenance by noting that won (green) and lost (red) po-
sitions correspond to “+” and “⇥” operations, respectively (leaves
represent input annotations, here: p, q, r, s) [KLZ13].
h labels t, u, v, w, and x. These missing edges
failed leaf nodes in Fig. 2. The table in Fig. 6
not provenance, with different combinations of
reconditions for a derivation of 3hop(c, a).
a p
b
q
c
u
r
x
s
t
w
v
h I with five additional, hypothetical edges (dashed).
t Game Construction
y QABC. To build the game, each ground tu-
currently ‘at’ a rule node is
firing is satisfied and creat
claim, the player moves to
The goal, if unsatisfied, wi
at least one goal is unsatisfi
for the rule node.
A detailed example usin
next section.
Constraint provenance
games by making them dom
tivating example, consider
are effectively the same as i
nodes that apply to more th
the firing r2(b, c) was not
has to find the node admitt
The subgraph of this node
explain why rule firings adm
Example Consider the ex
straint game in Fig. 5. After
cessed, the rule is processed
of A(X) is to select a node
in B and a node for the abse
domain, also captures the rule non-satisfaction of an infinite s
possible variable bindings to elements possibly outside the a
domain. Any constraint that has a variable that is only disequa
constrained represents an infinite set of firings. Consider the
node: R1 : X6=a, X6=b, Z1=a, Z2=a, Y =a. This correspon
the (hypothetical) 3hop path c
t
a
p
a
p
a and the situ
in which the edge t exist (see first row of Fig. 6). However, it
explains why the rule firing d ! a ! a ! a is not succes
The explanation is the failure of the first goal of the rule. In the
of X=c, it represents that there are no outgoing edges from
the case of X=d or any other invented value this is trivially tr
This shows that constraint provenance games do not suffer
the same problems as their fully-grounded counterparts. Pr
nance can be queried for any imaginable tuple, including one n
the active domain, and the provenance presented is still corre
the presence of a growing active domain.
r1(X, Y, Z1, Z2) X ! Z1 ! Z2 ! Y Why Not R1
[Fig. 2] [Fig. 7] Provenance [F
r1(c, a, a, a) c
t
a
p
a
p
a t ) t·p·p
r1(c, a, a, b) c
t
a
q
b
r
a t ) t·q·r
r1(c, a, a, c) c
t
a
u
c
t
a t, u ) t·u·t
r1(c, a, c, a) c
v
c
t
a
p
a t, v ) v·t·p
r1(c, a, b, c) c
w
b
s
c
t
a t, w ) w·s·t
r1(c, a, c, c) c
v
c
v
c
t
a t, v ) v·v·t
r1(c, a, c, b) c
v
c
w
b
r
a v, w ) v·w·r
r1(c, a, b, a) c
w
b
r
a
p
a w ) w·r·p
r1(c, a, b, b) c
w
b
x
b
r
a w, x ) w·x·r
Figure 6: The nine r1-instances in the first column correspond to
in Fig. 2 from left to right. The 3hop-path is shown in the second col
=>	What-If	provenance!
37
Provenance	@	SBBD'16
Are	there	more	ways	to	fail?
(X, Y )
C (X)
(Y ).
(b, a)
g1
2(b, c)
g1
2(b, b)
r2(b, a)
¬B(b, c) B(b, c)
g2
2(a)
¬B(b, b)
rC (a)
A(b)
C(a)
B(b, b)r2(b, b)
r2(b, c)
9 c
9 a
9 b
Figure 4: Altered subgraph of Fig. 3c after adding c to the active domain.
¬B(a, a)
¬B(a, b)¬A(a)
g1
2(a, a)
B(a, b)
B(a, a)
r2(a, b)
g1
2(a, b) rB(a, b)
9b
(b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}.
¬C(a)
¬C(b)
¬B(a, a)
¬B(a, b)
rB(b, a)
r2(b, a)¬A(b)
¬A(a) rB(a, b)B(a, b)
B(a, a)
C(a)
g2
2(a)
g2
2(b)
C(b)
¬B(b, a)
¬B(b, b)
rC (a)
A(b)
A(a)
r2(a, b)
r2(a, a)
g1
2(a, b)
g1
2(a, a)
r2(b, b)
g1
2(b, b)
g1
2(b, a)
B(b, b)
B(b, a)
9a
9b
9b
9a
(c) Solved game: lost positions are (dark) red; won positions
are (light) green. Provenance edges (= good moves) are solid.
Bad moves are dashed and not part of the provenance. A(a) is
true (A(b) is false) as it is won (lost) in the solved game; the
game provenance explains why (why-not).
gure 3: Provenance game for QABC. The well-founded model of
n(X) : M(X, Y ), ¬win(Y ), applied to move graph M, solves the game.
A :
x1 =
A :
x1 =
¬A :
x1 6= a,
x1 6= b
A :
x1 6=
x1 6=
¬A :
x1 = b
¬A :
x1 = a
Figure 5: Constr
may represent fin
Two	branches	that	explain
Why-not	A(b)
Adding	a	new	constant	c	to	the	
domain	=>	new	why-not	answer!
38
Provenance	@	SBBD'16
¬C(b)
¬B(a, a)
¬B(a, b)¬A(a)
g1
2(a, a)
B(a, b)
B(a, a)
g2
2(b)
C(b)
A(b)
A(a)
r2(a, b)
r2(a, a)
g1
2(a, b) rB(a, b)
r2(b, b)
9a
9b
(b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}.
¬C(a)
¬C(b)
¬B(a, a)
¬B(a, b)
rB(b, a)
r2(b, a)¬A(b)
¬A(a) rB(a, b)B(a, b)
B(a, a)
C(a)
g2
2(a)
g2
2(b)
C(b)
¬B(b, a)
¬B(b, b)
rC (a)
A(b)
A(a)
r2(a, b)
r2(a, a)
g1
2(a, b)
g1
2(a, a)
r2(b, b)
g1
2(b, b)
g1
2(b, a)
B(b, b)
B(b, a)
9a
9b
9b
9a
(c) Solved game: lost positions are (dark) red; won positions
are (light) green. Provenance edges (= good moves) are solid.
Bad moves are dashed and not part of the provenance. A(a) is
true (A(b) is false) as it is won (lost) in the solved game; the
game provenance explains why (why-not).
igure 3: Provenance game for QABC. The well-founded model of
in(X) : M(X, Y ), ¬win(Y ), applied to move graph M, solves the game.
he new binding for X; a condition “B(X, Y )” means that a move
s possible only if B(X, Y ) is true in I for the current X, Y values.2
Given database I, a template can be instantiated yielding a game
raph GQ(I) as in Fig. 3b. Note how template variables (e.g., Y )
ave been replaced by domain values (a or b), and that conditional
dges (e.g., labeled “C(X)”) became unconditional edges (e.g.,
(a) ! rC(a)) or no edge at all (e.g., from C(b)), depending on
whether or not the condition holds in I. To extract why(-not)
rovenance from a game graph GQ(I) as in Fig. 3b, we need to
olve the game first, i.e., determine which positions are won (light
¬B :
x1 6= a,
x1 6= b,
x2 = a
C :
x1 = a
A :
x1 = a
A :
x1 = b
¬C :
x1 6= a
¬A :
x1 6= a,
x1 6= b
C :
x1 6= a
R2 :
X = a,
Y = a
R2 :
X = a,
Y = b
B :
x1 6= a,
x2 6= a
R2 :
X 6= a,
Y 6= a
RB :
x1 = b
x2 = a
B :
x1 = a,
x2 = b
A :
x1 6= a,
x1 6= b
G2
2 : ¬C :
Y 6= a
G1
2 : B :
X 6= a,
X 6= b,
Y = a
¬A :
x1 = b
¬A :
x1 = a
¬B :
x1 6= a,
x2 6= a
¬B :
x1 = a,
x2 = b
B :
x1 = b,
x2 = a
RC :
x1 = a
RB :
x1 = a
x2 = b
R2 :
Y 6= b,
X = a,
Y 6= a
G1
2 : B :
X 6= a,
Y 6= a
G1
2 : B :
X = b,
Y = a
B :
x1 6= a,
x1 6= b,
x2 = a
R2 :
X 6= a,
X 6= b,
Y = a
G1
2 : B :
X = a,
Y = b
R2 :
X = b,
Y = a
¬C :
x1 = a
¬B :
x1 = b,
x2 = a
G2
2 : ¬C :
Y = a
Figure 5: Constraint provenance game for QABC. Unlike in Figure 3, node
may represent finite or infinite sets here.
GQ(I) thus consists only of edges that are matched by the regula
path queries (g.r)+
and r.(g.r)⇤
, i.e., alternating sequences o
green (winning) and red (delaying) moves [KLZ13].
3. Constraint Provenance Games
Consider the solved game graph of Fig. 3c. If the value c wer
added to the active domain, the provenance would be incomplete
e.g., to explain why-not A(b) there are two 9a, 9b branches ema
nating from A(b). However, with c in the active domain there is
third 9c branch via r2(b, c): see Fig. 4. We show that a modifie
Happy	End	(3	of	3)…	sort	of	…	C(X)
B(X, Y )
X, Y )
g1
2(X, Y )
g2
2(Y )
rB(X, Y )
rC (X)
¬B(X, Y )
¬C(X)
B(X, Y )
C(X)
X:=Y
mplate for QABC : A(X) : B(X, Y ), ¬C(Y ).
¬C(a)
¬C(b)
¬B(a, a)
¬B(a, b)
rB(b, a)
r2(b, a)
g1
2(a, a)
B(a, b)
B(a, a)
C(a)
g2
2(a)
g2
2(b)
C(b)
¬B(b, a)
¬B(b, b)
rC (a)
r2(a, b)
r2(a, a)
g1
2(a, b) rB(a, b)
r2(b, b)
g1
2(b, b)
g1
2(b, a)
B(b, b)
B(b, a)
ed QABC game on I = {B(a, b), B(b, a), C(a)}.
¬C(a)
¬C(b)
¬B(a, a)
¬B(a, b)
rB(b, a)
r2(b, a)
rB(a, b)B(a, b)
B(a, a)
C(a)
g2
2(a)
g2
2(b)
C(b)
¬B(b, a)
¬B(b, b)
rC (a)
r2(a, b)
r2(a, a)
g1
2(a, b)
g1
2(a, a)
r2(b, b)
g1
2(b, b)
g1
2(b, a)
B(b, b)
B(b, a)
me: lost positions are (dark) red; won positions
en. Provenance edges (= good moves) are solid.
e dashed and not part of the provenance. A(a) is
alse) as it is won (lost) in the solved game; the
nce explains why (why-not).
g1
2(b, c)
g1
2(b, b)
r2(b, a)
¬B(b, c) B(b, c)
g2
2(a)
¬B(b, b)
rC (a)
A(b)
C(a)
B(b, b)r2(b, b)
r2(b, c)
9 c
9 a
9 b
Figure 4: Altered subgraph of Fig. 3c after adding c to the active domain.
¬B :
x1 6= a,
x1 6= b,
x2 = a
C :
x1 = a
A :
x1 = a
A :
x1 = b
¬C :
x1 6= a
¬A :
x1 6= a,
x1 6= b
C :
x1 6= a
R2 :
X = a,
Y = a
R2 :
X = a,
Y = b
B :
x1 6= a,
x2 6= a
R2 :
X 6= a,
Y 6= a
RB :
x1 = b,
x2 = a
B :
x1 = a,
x2 = b
A :
x1 6= a,
x1 6= b
G2
2 : ¬C :
Y 6= a
G1
2 : B :
X 6= a,
X 6= b,
Y = a
B :
x2 6= b,
x1 = a
¬A :
x1 = b
¬A :
x1 = a
G1
2 : B :
Y 6= b,
X = a
¬B :
x1 6= a,
x2 6= a
¬B :
x1 = a,
x2 = b
B :
x1 = b,
x2 = a
RC :
x1 = a
¬B :
x2 6= b,
x1 = a
RB :
x1 = a,
x2 = b
R2 :
Y 6= b,
X = a,
Y 6= a
G1
2 : B :
X 6= a,
Y 6= a
G1
2 : B :
X = b,
Y = a
B :
x1 6= a,
x1 6= b,
x2 = a
R2 :
X 6= a,
X 6= b,
Y = a
G1
2 : B :
X = a,
Y = b
R2 :
X = b,
Y = a
¬C :
x1 = a
¬B :
x1 = b,
x2 = a
G2
2 : ¬C :
Y = a
Why-not	provenance	
complete	only	for	
adom(I)	=	{	a,	b	}	!
Constraint	why-not	provenance	
also	captures	new	constants,	i.e.,	
for	an	unlimited	domain	
D =	{	a,	b,	c,	…	}
=>	Constraint	Provenance	answer	is	
domain	independent!	(sort	of)	
39
Provenance	@	SBBD'16
Why-Not:	The	Full	Story	Emerges…
(sort	of…)		
R1 :
X 6= a,
X 6= b,
Z1 = c,
Z2 = c,
Y 6= c
¬hop :
x2 6= a,
x2 6= b,
x1 = a
R1 :
X 6= a,
X 6= b,
Z1 = c,
Z2 = b,
Y = a
3Hop :
x1 6= a,
x1 6= b,
x2 = a
R1 :
X 6= a,
X 6= b,
Z1 6= c,
Z1 6= a,
Z1 6= b,
Z2 = c,
Y 6= c
G1
1 : hop :
X 6= a,
X 6= b,
Z1 6= c
R1 :
X 6= a,
X 6= b,
Z1 = b,
Z2 = c,
Y 6= c
G1
1 : hop :
X 6= a,
X 6= b,
Z1 = c
¬hop :
x1 6= a,
x1 6= b,
x2 = c
hop :
x2 6= a,
x2 6= b,
x1 = a
R1 :
X 6= a,
X 6= b,
Z1 = a,
Z2 = a,
Y = a
¬hop :
x2 6= a,
x2 6= c,
x1 = b
G2
1 : hop :
U 6= a,
Z1 6= b,
Z2 6= c
R1 :
X 6= a,
X 6= b,
Z1 = c,
Z2 6= c,
Z2 6= a,
Z2 6= b,
Y 6= c
R1 :
X 6= a,
X 6= b,
Z1 6= c,
Z1 6= a,
Z1 6= b,
Z2 6= c,
Z2 6= a,
Z2 6= b,
Y 6= c
hop :
x1 6= a,
x1 6= b,
x2 6= c
¬hop :
x1 6= a,
x1 6= b,
x2 6= c
R1 :
X 6= a,
X 6= b,
Z1 = b,
Z2 = b,
Y = a
R1 :
X 6= a,
X 6= b,
Z1 6= c,
Z1 6= a,
Z1 6= b,
Z2 = b,
Y = a
G2
1 : hop :
Z1 6= a,
Z1 6= b,
Z2 = c
hop :
x2 6= a,
x2 6= c,
x1 = b
hop :
x1 6= a,
x1 6= b,
x2 = c
R1 :
X 6= a,
X 6= b,
Z1 = b,
Z2 = a,
Y = a
G2
1 : hop :
Z2 6= a,
Z2 6= c,
Z1 = b
R1 :
X 6= a,
X 6= b,
Z2 6= a,
Z2 6= b,
Z1 = a,
Y 6= c
R1 :
X 6= a,
X 6= b,
Z1 = c,
Z2 = a,
Y = a
R1 :
X 6= a,
X 6= b,
Z1 6= c,
Z1 6= a,
Z1 6= b,
Z2 = a,
Y = a
R1 :
X 6= a,
X 6= b,
Z1 = a,
Z2 = b,
Y = a
G3
1 : hop :
Z2 6= a,
Z2 6= b,
Y 6= c
R1 :
X 6= a,
X 6= b,
Z2 6= a,
Z2 6= c,
Z1 = b,
Z2 6= b,
Y 6= c
G2
1 : hop :
Z2 6= a,
Z2 6= b,
Z1 = a
Figure 9: The why-not provenance of 3hop(c, a). The provenance is represented in the failure of the claim that 3hop(c, a) is in the answer. This is argued
over the Boolean expression defining 3hop(x, y). A move from the source node to a child represents the choice of a Boolean expression that is sufficient to
g2
1(c, a)
¬3hop(c, a)
g2
1(c, c)g1
1(c, c)
r1(c, a, c, b)
¬hop(c, b)
hop(c, a)
g2
1(b, b)
¬hop(a, c)
hop(c, c)
g1
1(c, a)
r1(c, a, b, c)r1(c, a, a, b)
3hop(c, a)
hop(b, b)
g2
1(c, b)g2
1(a, c)
r1(c, a, a, c)
¬hop(c, c)
hop(c, b)
¬hop(c, a)
g1
1(c, b)
r1(c, a, b, b)
¬hop(b, b)
g3
1(c, a)
r1(c, a, a, a) r1(c, a, b, a)
hop(a, c)
r1(c, a, c, a) r1(c, a, c, c)
9 a,b 9 a,c 9 c,a 9 c,c9 b,c 9 b,b9 b,a9 a,a 9 c,b
Figure 2: Why-not provenance for 3hop(c, a) using provenance games.
gi
1 in the body of r1, thus claiming that gi
1 is false and hence that
the r1 instance doesn’t derive t. The first player can counter and
demonstrate that gi
1 is true by selecting a rule instance or fact as
evidence for gi
1. The game proceeds in rounds until some player
cannot move and thus loses (the opponent wins). In [KLZ13] it
was shown how the provenance of a tuple t can be obtained via a
regular path query over a solved game graph like the one in Fig. 1d:
e.g., p3
+ 2pqr for 3hop(a, a) is represented by a solved game
as shown in Fig. 1e: for positive queries, solved games represent
semiring provenance by noting that won (green) and lost (red) po-
sitions correspond to “+” and “⇥” operations, respectively (leaves
represent input annotations, here: p, q, r, s) [KLZ13].
Why-Not Provenance and the Many Ways to Fail. Since games
are inherently symmetric (one player’s win is the opponent’s loss
and vice versa), the approach yields an elegant provenance model
that unifies why and why-not provenance. Consider the (dark, red)
node 3hop(c, a) in Fig. 2. The color coding indicates that the posi-
Constraint Provenance Games. We propose to solve th
lem of domain dependence by modifying provenance ga
that they can handle certain infinite relations that can be
represented. For example, in addition to the finitely many
why 3hop(c, a) fails over the active domain adom(I), ther
finitely many others, if we consider new constants d, e, . . .
of adom(I). For example, let relation R = {a, b} have tw
R(a) and R(b). If we want to know why-not R(c), we just
c /2 R. But we could also return a more general answer for w
R(x) and say that ¬R(x) is true for all x with x 6= a ^ x 6=
just for x = c). This approach is inspired by Chan’s Cons
Negation [Cha88], a form of constraint logic programming [
The key idea is to represent (potentially infinite) relations
constraints, i.e., Boolean combinations of equalities x = c
equalities x 6= c.
Overview and Contributions. Section 2 briefly explains ho
order queries are translated into games and how provenanc
tracted from solved games. In Section 3 we describe the co
tion of constraint provenance games; additional details and
ples are contained in the appendix. Our main contributio
(i) game provenance provides a uniform treatment of why an
not provenance for first-order logic (= relational algebra w
difference); (ii) for positive queries, the approach captures t
informative semiring provenance [GKT07, KG12]; (iii) we
a constraint provenance framework which yields domain in
dent provenance expressions, extending prior results [KLZ1
(iv) we implemented a prototype of constraint provenance g
inal database instance I plus a number of hypothetical
edges (dotted), with labels t, u, v, w, and x. These m
correspond to the failed leaf nodes in Fig. 2. The ta
contains the why-not provenance, with different com
missing edges as preconditions for a derivation of 3ho
a p
b
q
c
u
r
x
s
t
w
v
Figure 7: Input graph I with five additional, hypothetical ed
B. Constraint Game Construction
Consider the query QABC. To build the game, each
ple in the program such as B(a, b) is replaced by
B: x1=a, x2=b (a conjunction).
First, the subgraph for EDB predicates is created. T
of the game is constructed iteratively similar to quer
For rules whose subgoals are all on EDB predicat
nodes/edges are generated. For IDB predicates that
the head of EDB-only rules, tuple nodes are generate
5 missing	edges
9 minimal	combinations	
A. Why-Not 3hop(c, a) Dissected
Consider the input graph in Fig. 1a and its why-not
for 3hop(c, a) in Fig. 2. The graph encodes the re
3hop(c, a) is not in the answer. Moving from the lost 3
Fig. 2, there are nine possible rule instantiations r1(c, a
of which represent a reason why there is no 3hop(c, a)
diate nodes z1, z2 2 {a, b, c}. To better understand th
explanations, consider the input graph in Fig. 7. It conta
inal database instance I plus a number of hypothetical
edges (dotted), with labels t, u, v, w, and x. These mi
correspond to the failed leaf nodes in Fig. 2. The tab
contains the why-not provenance, with different com
missing edges as preconditions for a derivation of 3hop
a p
b
q
c
u
r
x
s
t
w
v
Figure 7: Input graph I with five additional, hypothetical ed
+	…	?
Constraints	imply
15 disjoint	relations	over	
key	variables	X,	Z1, Z2,	Y
40
Provenance	@	SBBD'16
Provenance	Games:	Summary
• (1)	Game	Provenance	
– The	win-move	game	has	a	natural	why	and	why-not	provenance	“built-in”
• “good”	and	“bad	moves”
• è discard	bad	moves	è game	provenance	
• (2)	Provenance	Games
– Query	evaluation	also	is	a	game!
– Game	provenance	can	be	applied	to	query	evaluation	game
=>	uniform why	+	why-not provenance	
• (3)	Constraint	Provenance	
– Domain	independent (some	infinite	domains	OK)
– Prototypically	implemented
• (4)	Future	Work
– Make	theory	practical!	
• e.g.	implement	in	Boris	Glavic’s Perm	or	GPROM		system
– Theoretical	properties
– Relation	to	Argumentation	Frameworks	
– Clarify	relationship	to	monus semirings (Floris Geerts et	al)
– Higher-order	reasons!
41
Provenance	@	SBBD'16
Why-Not:	so	many	
answers,	so	little	
time
• The	crux	of	
current	why-not	
approaches:
– Enumerate	all	
ways	that	
could/might	have	
worked,	but	
failed…
• Idea
è abstract	those	
many,	many	
explanations!
TaPP’15
42
Provenance	@	SBBD'16
Conclusions
• Provenance is	an	important,	active,	and	broad	area	of	
research	in	databases and scientific	workflows
– Both	in	specialized	(TAPP,	IPAW)	and	mainstream	venues	(SBBD,	
VLDB,	SIGMOD,	EDBT,	ICDE,	PODS,	ICDT,	..)
• There	are	(still)	many	deeply	technical and	practical challenges:
– Efficient	capture,	management,	use	of	provenance
– Models,	semantics,	query	languages
– Provenance	..	for	others?	Or	provenance	for	self!
– Interdisciplinary	work;	cross-fertilization:	databases,	workflows,	
programming	languages,	security,	…,	various	scientific	communities	
(bioinformatics,	...)
• …	oh,	and	it’s	also	a	lot	of	fun!	
– Interested	to	join?
– Ludaesch@illinois.edu
43
Provenance	@	SBBD'16
Why-Not	Provenance	References
• Köhler,	Sven,	Bertram	Ludäscher,	and	Daniel	Zinn.	"First-
order	provenance	games.” In	Search	of	Elegance	in	the	
Theory	and	Practice	of	Computation.	Peter	Buneman
Festschrift,		LNCS	8000.	Springer	Berlin	Heidelberg,	2013.
• Riddle,	Sean,	Sven	Köhler,	and	Bertram	Ludäscher.	
"Towards	constraint	provenance	games.” 6th	USENIX	
Workshop	on	the	Theory	and	Practice	of	Provenance	
(TaPP 2014).	
• Glavic,	Boris,	Sven	Köhler,	Sean	Riddle,	and	Bertram	
Ludäscher.	"Towards	constraint-based	explanations	for	
answers	and	non-answers.”	7th	USENIX	Workshop	on	
the	Theory	and	Practice	of	Provenance	(TaPP 2015).
44
Provenance	@	SBBD'16
Other	References	(Part	I,	II)
• (coming	soon)
45
Provenance	@	SBBD'16

More Related Content

Similar to Provenance in Databases and Scientific Workflows: Part II (Databases)

SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
DataWorks Summit
 
Add Maths Module
Add Maths ModuleAdd Maths Module
Add Maths Module
bspm
 

Similar to Provenance in Databases and Scientific Workflows: Part II (Databases) (20)

SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Provenance Games
Provenance GamesProvenance Games
Provenance Games
 
PushdownAutomata.ppt
PushdownAutomata.pptPushdownAutomata.ppt
PushdownAutomata.ppt
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automata
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automata
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
SEC5261_SAT_Week07_Spring22.pdf
SEC5261_SAT_Week07_Spring22.pdfSEC5261_SAT_Week07_Spring22.pdf
SEC5261_SAT_Week07_Spring22.pdf
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
 
cwit-poster_logo
cwit-poster_logocwit-poster_logo
cwit-poster_logo
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Semantic Data Box
Semantic Data BoxSemantic Data Box
Semantic Data Box
 
Geospatial Data in R
Geospatial Data in RGeospatial Data in R
Geospatial Data in R
 
Abstract machines for great good
Abstract machines for great goodAbstract machines for great good
Abstract machines for great good
 
ゲーム理論BASIC 演習52 -完全ベイジアン均衡-
ゲーム理論BASIC 演習52 -完全ベイジアン均衡-ゲーム理論BASIC 演習52 -完全ベイジアン均衡-
ゲーム理論BASIC 演習52 -完全ベイジアン均衡-
 
Add Maths Module
Add Maths ModuleAdd Maths Module
Add Maths Module
 
Lp presentations fnctions
Lp presentations fnctionsLp presentations fnctions
Lp presentations fnctions
 
Rug hogan-10-03-2012
Rug hogan-10-03-2012Rug hogan-10-03-2012
Rug hogan-10-03-2012
 
An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slide
 
Igraph
IgraphIgraph
Igraph
 
Graph Algebra
Graph AlgebraGraph Algebra
Graph Algebra
 

More from Bertram Ludäscher

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Bertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
Bertram Ludäscher
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
Bertram Ludäscher
 

More from Bertram Ludäscher (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Provenance in Databases and Scientific Workflows: Part II (Databases)

  • 2. • Part I: Provenance in Scientific Workflows – Alta Vista: Provenance everywhere! – Provenance & Scientific Workflows – Provenance Models and Standards (not so much) – Provenance Tools • Example & Demo: YesWorkflow • Part II: Provenance in Databases – Foundations of provenance in databases – Why-, How-, and Why-Not provenance Outline of the Tutorial A “Tour de Provenance” 2 Provenance @ SBBD'16
  • 3. Types of Data Provenance • Black-box – know (next to) nothing at compile-time – at runtime: keep some data lineage – most prov sensu WF work use this • White-box – statically (compile-time) analyzable – q(Y1,Y2) :- p(X1,X2), r(X1,Y1), s(X2,Y2) – Most prov sensu DB work use this • Grey-box – can “look inside” (some black boxes) – … e.g. b/c they have subworkflows – … or FP signatures: A :: t1, t2à t3,t4 – … or semantic annotations (sem.types) f A q t1 t2 t3 t4 X1 X2 Y1 Y2 Provenance @ SBBD'16 3
  • 4. 6th Stop: Provenance in Databases • Some key questions: – Why is tuple t in answer to query q(D)? – Which set of tuples L in D does t depend on? i.e., what is the lineage L of t ? – How was t derived from its lineage L ? • Also: – Where in D do the values in t come from? – Why is t’ not in q(D)? • .. fasten your seatbelts … 4 Provenance @ SBBD'16
  • 6. Provenance in Databases (fine-grained, white-box) 6 Provenance @ SBBD'16
  • 7. Compare with: Provenance in Scientific Workflows • Some key questions: – What is the lineage/trace T of data product (output) yi: (y1 …, yn ) = execute(W, x, p) ? • … given workflow/script W with inputs x and parameters p ? • … i.e., find subset of x, p, and (program slices of) W on which a specific yi depends! – How can we store, query the provenance (trace) graph effectively, efficiently? • Regular Path Queries (RPQs), Lowest Common Ancestor (LCA) • Temporal Query Languages (e.g. Past-Temporal Logic) • other graph queries – What is the difference between traces T1, T2? – Does the trace (retrospective provenance) match the workflow (prospective provenance)? 7 Provenance @ SBBD'16
  • 8. 8 Provenance in (Scientific) Workflows (“Coarse-grained”, “Black-box”) Provenance @ SBBD'16
  • 9. What people do with “provenance” • Which one is “workflows” vs “databases” ? – Result validation – Result debugging (science vs wf logic) – Reproducibility and Repeatability – Explanation (derivations, traces, proof trees) – Runtime monitoring • Profiling, benchmarking – Performance Optimization (“smart re-run”) – Fault-tolerance, crash-recovery – Database view maintenance (e.g. data warehousing) – Workflow design 9 Provenance @ SBBD'16
  • 14. Example: Go from X to Y in 3 hops! (a = CS b = NCSA c = iSchool) • Database: hop(X,Y) := • Query: 3hop(X,Y) :- hop(X, Z1), hop(Z1, Z2), hop(Z2,Y). a p b q r c s Note: Cannot go from c to a in 3hops! a ppp+pqr+qrp b ppq+qrq cpqs ppr+qrr rpq rqs hop(a,a, p). hop(a,b, q). hop(b,a, r) hop(b,c, s). 3hop(a,a, p3+2pqr). 3hop(a,b, p2q+q2r). … 3hop(a,c, pqs). 14 Provenance @ SBBD'16
  • 16. hop(S,T) thop(S,T, P1*P2*P3) :- hop(S,U, P1), hop(U,V, P2), hop(V,T, P3). thop(S,T) a p b q r c s a ppp+pqr+qrp b ppq+qrq cpqs ppr+qrr rpq rqs hop(a,a, p). hop(a,b, q). hop(b,a, r). hop(b,c, s). thop(a,a, p3+2pqr). thop(a,b, p2q+q2r). thop(a,c, pqs). thop(b,a, p2r+r2q). thop(b,b, rpq). thop(b,c, rqs). 16 Provenance @ SBBD'16
  • 17. hop thop(S,T) :- hop(S,U), hop(U,V), hop(V,T). thop 17 a b c a b c Input Three-Hop Query Output Provenance @ SBBD'16
  • 19. Provenance Polynomials ,,Mein Schatz!” p3 + 2pqr p3 + pqr p + 2pqr p + pqr pqr p + pqr p a ppp+pqr+qrp b ppq+qrq cpqs ppr+qrr rpq rqs 19 Provenance @ SBBD'16
  • 20. 8th Stop: The Negation & Why-Not Problem • Provenance Semirings work well for: – Positive Queries (e.g., RA+ ) • Challenges: Handling of – set difference (~ negation) – Why-Not provenance – Missing Answer provenance • A fresh look at provenance! • … using an old idea: Game semantics! – for query evaluation 20 Provenance @ SBBD'16
  • 21. Query evaluation game EDB: e(a,b), e(b,b) a b tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2) tc(X,Y) :- # (1)--exists:Z-->(3) e(X,Z), # (3)->(4)-e(X,Z)->(5) tc(Z,Y). # (3)--X:=Z-->(1) 2 3 1 X := Z 4 5 e(X,Y) exists:Z e(X,Z) 3:(b,b,b) 1 1:(b,b) 11 4:(b,b) 1 1 1:(a,b) 1 3:(a,b,a) 1 2:(a,b) 01 3:(a,b,b) 1 2 2 3:(b,b,a) 1 2:(b,b) 01 4:(a,b) 1 5:(a,b) 01 5:(b,b) 01 3:(a,a,a) 1 4:(a,a) 0 1 1:(a,a) 2 1 3:(b,a,a) 1 4:(b,a) 0 1 1 1 1 3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2 Provenance’12 @Dagstuhl with JanVdB TJ Green Flum, Kubierschky, Ludäscher, Total and partial well-founded Datalog coincide, ICDT-The-Bag-1997, Delphi, Greece Eureka! 21 Provenance @ SBBD'16
  • 22. a b tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2) tc(X,Y) :- # (1)--exists:Z-->(3) e(X,Z), # (3)->(4)-e(X,Z)->(5) tc(Z,Y). # (3)--X:=Z-->(1) 2 3 1 X := Z 4 5 e(X,Y) exists:Z e(X,Z) 3:(b,b,b) 1 1:(b,b) 11 4:(b,b) 1 1 1:(a,b) 1 3:(a,b,a) 1 2:(a,b) 01 3:(a,b,b) 1 2 2 3:(b,b,a) 1 2:(b,b) 01 4:(a,b) 1 5:(a,b) 01 5:(b,b) 01 3:(a,a,a) 1 4:(a,a) 0 1 1:(a,a) 2 1 3:(b,a,a) 1 4:(b,a) 0 1 1 1 1 3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2 EDB: e(a,b), e(b,b) Game diagram Instantiated move graph Flum, Kubierschky, Ludäscher, Total and partial well-founded Datalog coincide, ICDT-The-Bag-1997, Delphi, Greece 22 Eureka moment: 1. query evaluation = evaluation game (argument about truth in a database) 2. provenance = winning strategies (justified/winning arguments) Provenance @ SBBD'16
  • 23. 9th Stop: A Game a k b c l d e m g h nf 23 Provenance @ SBBD'16
  • 24. Solving the Game a k b c l d e m g h nf All successors won è position lost Some successor lost è position won 24 Provenance @ SBBD'16
  • 25. Solving the Game a k b c l d e m g h nf All leaves (dead-ends) are immediately lost! 25 Provenance @ SBBD'16
  • 26. Solving the Game a k b c l d e m g h nf X is won if there exists a move to a lost Y 26 Provenance @ SBBD'16
  • 27. Solving the Game a k b c l d e m g h nf X is lost if all moves lead to a won Y 27 Provenance @ SBBD'16
  • 28. Solving the Game a k b c l d e m g h nf Repeat until no change => drawn positions remain 28 Provenance @ SBBD'16
  • 29. 10th Stop: Game Provenance a b 1 c 3 d e f 1 g 3 m h 1 k l oo n oo oo oo 2 2 2 • Game can be solved in time linear in |Move| • One rule to rule them all! win(X) :- move(X,Y), not win(Y) • node color => edge color – good vs bad moves • good moves = natural, new notion of provenance! Aside: Games ~ Argumentation Frameworks win(X) :- move(X,Y), not win(Y) def(X) :- attacks(Y,X), not def(Y) Eureka! 29 Provenance @ SBBD'16
  • 30. Game Provenance W bad Dbad L winning bad drawing n/a delaying n/a n/a a b 1 c 3 d e f 1 g 3 m h 1 k l oo n oo oo oo 2 2 2 Extracting Provenance: ü Why/how win(x)? • [x] –G.(R.G)*–> [y] ü Why-not win(x)? • [x] –(R.G)*–> [y] • [x] –(Y+)–> [y] Move types 30 Provenance @ SBBD'16
  • 31. Game Provenance a b 1 c 3 d e f 1 g 3 m h 1 k l oo n oo oo oo 2 2 2 Extracting Provenance: ü Why/how win(x)? • [x] –G.(R.G)*–> [y] ü Why-not win(x)? • [x] –(R.G)*–> [y] • [x] –(Y+)–> [y] • Next: play a query evaluation game • => new why-(not) provenance via games! 31 Provenance @ SBBD'16
  • 33. Translation: Q(I) => G Q(I) A(X) C(X) B(X, Y ) r2(X, Y ) g1 2(X, Y ) g2 2(Y ) rB(X, Y ) rC (X) ¬A(X) ¬B(X, Y ) ¬C(X) B(X, Y ) C(X) X:=Y 9Y (a) Game template for QABC : A(X) : B(X, Y ), ¬C(Y ). ¬C(a) ¬C(b) ¬B(a, a) ¬B(a, b) rB(b, a) r2(b, a)¬A(b) ¬A(a) g1 2(a, a) B(a, b) B(a, a) C(a) g2 2(a) g2 2(b) C(b) ¬B(b, a) ¬B(b, b) rC (a) A(b) A(a) r2(a, b) r2(a, a) g1 2(a, b) rB(a, b) r2(b, b) g1 2(b, b) g1 2(b, a) B(b, b) B(b, a) 9a 9b 9b 9a (b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}. A(b) Figure 4: Alt x ¬A : x1 = a 33 Provenance @ SBBD'16
  • 34. Solve G Q(I) => Provenance! ¬B(a, b)¬A(a) B(a, b) r2(a, b) g1 2(a, b) rB(a, b) (b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}. ¬C(a) ¬C(b) ¬B(a, a) ¬B(a, b) rB(b, a) r2(b, a)¬A(b) ¬A(a) rB(a, b)B(a, b) B(a, a) C(a) g2 2(a) g2 2(b) C(b) ¬B(b, a) ¬B(b, b) rC (a) A(b) A(a) r2(a, b) r2(a, a) g1 2(a, b) g1 2(a, a) r2(b, b) g1 2(b, b) g1 2(b, a) B(b, b) B(b, a) 9a 9b 9b 9a (c) Solved game: lost positions are (dark) red; won positions are (light) green. Provenance edges (= good moves) are solid. Bad moves are dashed and not part of the provenance. A(a) is true (A(b) is false) as it is won (lost) in the solved game; the game provenance explains why (why-not). Figure 3: Provenance game for Q . The well-founded model of 34 Provenance @ SBBD'16
  • 35. Happy End (1 of 3)a p b q r c s (a) input I ... hop a a p a b q b a r b c s (b) ... annotated. 3hop a a p3 + 2pqr a b p2 q + q2 r a c pqs b a p2 r + qr2 b b pqr b c qrs (c) 3hop with provenance. r1(a, a, b, a) g2 1(a, a) ¬hop(b, a) g1 1(a, a) hop(b, a) g2 1(a, b) g3 1(b, a) rhop(b, a) r1(a, a, a, a) r1(a, a, a, b) 3hop(a, a) g3 1(a, a) rhop(a, a) hop(a, b) ¬hop(a, a) g1 1(a, b) rhop(a, b) g2 1(b, a) ¬hop(a, b) hop(a, a) 9 a,a 9 b,a 9 a,b (d) The game provenance of 3hop(a, a) ... ⇥ + ⇥ + + + + r ⇥ ⇥ + + p + ⇥ + q + ⇥ + (e) ... is p3 + 2pqr. Figure 1: Each edge hop(x, y) in the input graph I in (a) is annotated Provenance Game on GQ(I) = Provenance Polynomials … for positive queries! Yes! 35 Provenance @ SBBD'16
  • 36. Happy End (2 of 3) … but also works for Why-Not provenance & non-monotonic queries (i.e., Q can have negation) !! Here: not 3hop(c,a) – can’t go back from GSLIS to CS c a g2 1(c, a) ¬3hop(c, a) g2 1(c, c)g1 1(c, c) r1(c, a, c, b) ¬hop(c, b) hop(c, a) g2 1(b, b) ¬hop(a, c) hop(c, c) g1 1(c, a) r1(c, a, b, c)r1(c, a, a, b) 3hop(c, a) hop(b, b) g2 1(c, b)g2 1(a, c) r1(c, a, a, c) ¬hop(c, c) hop(c, b) ¬hop(c, a) g1 1(c, b) r1(c, a, b, b) ¬hop(b, b) g3 1(c, a) r1(c, a, a, a) r1(c, a, b, a) hop(a, c) r1(c, a, c, a) r1(c, a, c, c) 9 a,b 9 a,c 9 c,a 9 c,c9 b,c 9 b,b9 b,a9 a,a 9 c,b Figure 2: Why-not provenance for 3hop(c, a) using provenance games. gi 1 in the body of r1, thus claiming that gi 1 is false and hence that the r1 instance doesn’t derive t. The first player can counter and demonstrate that gi 1 is true by selecting a rule instance or fact as evidence for gi 1. The game proceeds in rounds until some player cannot move and thus loses (the opponent wins). In [KLZ13] it36 Provenance @ SBBD'16
  • 37. Happy End (2 of 3) 5 leaf nodes ~ 5 missing (“hypothetical”) edges Insert those => 3hop(c,a) will be true! g2 1(c, a) ¬3hop(c, a) g2 1(c, c)g1 1(c, c) r1(c, a, c, b) ¬hop(c, b) hop(c, a) g2 1(b, b) ¬hop(a, c) hop(c, c) g1 1(c, a) r1(c, a, b, c)r1(c, a, a, b) 3hop(c, a) hop(b, b) g2 1(c, b)g2 1(a, c) r1(c, a, a, c) ¬hop(c, c) hop(c, b) ¬hop(c, a) g1 1(c, b) r1(c, a, b, b) ¬hop(b, b) g3 1(c, a) r1(c, a, a, a) r1(c, a, b, a) hop(a, c) r1(c, a, c, a) r1(c, a, c, c) 9 a,b 9 a,c 9 c,a 9 c,c9 b,c 9 b,b9 b,a9 a,a 9 c,b Figure 2: Why-not provenance for 3hop(c, a) using provenance games. gi 1 in the body of r1, thus claiming that gi 1 is false and hence that the r1 instance doesn’t derive t. The first player can counter and demonstrate that gi 1 is true by selecting a rule instance or fact as evidence for gi 1. The game proceeds in rounds until some player cannot move and thus loses (the opponent wins). In [KLZ13] it was shown how the provenance of a tuple t can be obtained via a regular path query over a solved game graph like the one in Fig. 1d: e.g., p3 + 2pqr for 3hop(a, a) is represented by a solved game as shown in Fig. 1e: for positive queries, solved games represent semiring provenance by noting that won (green) and lost (red) po- sitions correspond to “+” and “⇥” operations, respectively (leaves represent input annotations, here: p, q, r, s) [KLZ13]. h labels t, u, v, w, and x. These missing edges failed leaf nodes in Fig. 2. The table in Fig. 6 not provenance, with different combinations of reconditions for a derivation of 3hop(c, a). a p b q c u r x s t w v h I with five additional, hypothetical edges (dashed). t Game Construction y QABC. To build the game, each ground tu- currently ‘at’ a rule node is firing is satisfied and creat claim, the player moves to The goal, if unsatisfied, wi at least one goal is unsatisfi for the rule node. A detailed example usin next section. Constraint provenance games by making them dom tivating example, consider are effectively the same as i nodes that apply to more th the firing r2(b, c) was not has to find the node admitt The subgraph of this node explain why rule firings adm Example Consider the ex straint game in Fig. 5. After cessed, the rule is processed of A(X) is to select a node in B and a node for the abse domain, also captures the rule non-satisfaction of an infinite s possible variable bindings to elements possibly outside the a domain. Any constraint that has a variable that is only disequa constrained represents an infinite set of firings. Consider the node: R1 : X6=a, X6=b, Z1=a, Z2=a, Y =a. This correspon the (hypothetical) 3hop path c t a p a p a and the situ in which the edge t exist (see first row of Fig. 6). However, it explains why the rule firing d ! a ! a ! a is not succes The explanation is the failure of the first goal of the rule. In the of X=c, it represents that there are no outgoing edges from the case of X=d or any other invented value this is trivially tr This shows that constraint provenance games do not suffer the same problems as their fully-grounded counterparts. Pr nance can be queried for any imaginable tuple, including one n the active domain, and the provenance presented is still corre the presence of a growing active domain. r1(X, Y, Z1, Z2) X ! Z1 ! Z2 ! Y Why Not R1 [Fig. 2] [Fig. 7] Provenance [F r1(c, a, a, a) c t a p a p a t ) t·p·p r1(c, a, a, b) c t a q b r a t ) t·q·r r1(c, a, a, c) c t a u c t a t, u ) t·u·t r1(c, a, c, a) c v c t a p a t, v ) v·t·p r1(c, a, b, c) c w b s c t a t, w ) w·s·t r1(c, a, c, c) c v c v c t a t, v ) v·v·t r1(c, a, c, b) c v c w b r a v, w ) v·w·r r1(c, a, b, a) c w b r a p a w ) w·r·p r1(c, a, b, b) c w b x b r a w, x ) w·x·r Figure 6: The nine r1-instances in the first column correspond to in Fig. 2 from left to right. The 3hop-path is shown in the second col => What-If provenance! 37 Provenance @ SBBD'16
  • 38. Are there more ways to fail? (X, Y ) C (X) (Y ). (b, a) g1 2(b, c) g1 2(b, b) r2(b, a) ¬B(b, c) B(b, c) g2 2(a) ¬B(b, b) rC (a) A(b) C(a) B(b, b)r2(b, b) r2(b, c) 9 c 9 a 9 b Figure 4: Altered subgraph of Fig. 3c after adding c to the active domain. ¬B(a, a) ¬B(a, b)¬A(a) g1 2(a, a) B(a, b) B(a, a) r2(a, b) g1 2(a, b) rB(a, b) 9b (b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}. ¬C(a) ¬C(b) ¬B(a, a) ¬B(a, b) rB(b, a) r2(b, a)¬A(b) ¬A(a) rB(a, b)B(a, b) B(a, a) C(a) g2 2(a) g2 2(b) C(b) ¬B(b, a) ¬B(b, b) rC (a) A(b) A(a) r2(a, b) r2(a, a) g1 2(a, b) g1 2(a, a) r2(b, b) g1 2(b, b) g1 2(b, a) B(b, b) B(b, a) 9a 9b 9b 9a (c) Solved game: lost positions are (dark) red; won positions are (light) green. Provenance edges (= good moves) are solid. Bad moves are dashed and not part of the provenance. A(a) is true (A(b) is false) as it is won (lost) in the solved game; the game provenance explains why (why-not). gure 3: Provenance game for QABC. The well-founded model of n(X) : M(X, Y ), ¬win(Y ), applied to move graph M, solves the game. A : x1 = A : x1 = ¬A : x1 6= a, x1 6= b A : x1 6= x1 6= ¬A : x1 = b ¬A : x1 = a Figure 5: Constr may represent fin Two branches that explain Why-not A(b) Adding a new constant c to the domain => new why-not answer! 38 Provenance @ SBBD'16
  • 39. ¬C(b) ¬B(a, a) ¬B(a, b)¬A(a) g1 2(a, a) B(a, b) B(a, a) g2 2(b) C(b) A(b) A(a) r2(a, b) r2(a, a) g1 2(a, b) rB(a, b) r2(b, b) 9a 9b (b) Instantiated QABC game on I = {B(a, b), B(b, a), C(a)}. ¬C(a) ¬C(b) ¬B(a, a) ¬B(a, b) rB(b, a) r2(b, a)¬A(b) ¬A(a) rB(a, b)B(a, b) B(a, a) C(a) g2 2(a) g2 2(b) C(b) ¬B(b, a) ¬B(b, b) rC (a) A(b) A(a) r2(a, b) r2(a, a) g1 2(a, b) g1 2(a, a) r2(b, b) g1 2(b, b) g1 2(b, a) B(b, b) B(b, a) 9a 9b 9b 9a (c) Solved game: lost positions are (dark) red; won positions are (light) green. Provenance edges (= good moves) are solid. Bad moves are dashed and not part of the provenance. A(a) is true (A(b) is false) as it is won (lost) in the solved game; the game provenance explains why (why-not). igure 3: Provenance game for QABC. The well-founded model of in(X) : M(X, Y ), ¬win(Y ), applied to move graph M, solves the game. he new binding for X; a condition “B(X, Y )” means that a move s possible only if B(X, Y ) is true in I for the current X, Y values.2 Given database I, a template can be instantiated yielding a game raph GQ(I) as in Fig. 3b. Note how template variables (e.g., Y ) ave been replaced by domain values (a or b), and that conditional dges (e.g., labeled “C(X)”) became unconditional edges (e.g., (a) ! rC(a)) or no edge at all (e.g., from C(b)), depending on whether or not the condition holds in I. To extract why(-not) rovenance from a game graph GQ(I) as in Fig. 3b, we need to olve the game first, i.e., determine which positions are won (light ¬B : x1 6= a, x1 6= b, x2 = a C : x1 = a A : x1 = a A : x1 = b ¬C : x1 6= a ¬A : x1 6= a, x1 6= b C : x1 6= a R2 : X = a, Y = a R2 : X = a, Y = b B : x1 6= a, x2 6= a R2 : X 6= a, Y 6= a RB : x1 = b x2 = a B : x1 = a, x2 = b A : x1 6= a, x1 6= b G2 2 : ¬C : Y 6= a G1 2 : B : X 6= a, X 6= b, Y = a ¬A : x1 = b ¬A : x1 = a ¬B : x1 6= a, x2 6= a ¬B : x1 = a, x2 = b B : x1 = b, x2 = a RC : x1 = a RB : x1 = a x2 = b R2 : Y 6= b, X = a, Y 6= a G1 2 : B : X 6= a, Y 6= a G1 2 : B : X = b, Y = a B : x1 6= a, x1 6= b, x2 = a R2 : X 6= a, X 6= b, Y = a G1 2 : B : X = a, Y = b R2 : X = b, Y = a ¬C : x1 = a ¬B : x1 = b, x2 = a G2 2 : ¬C : Y = a Figure 5: Constraint provenance game for QABC. Unlike in Figure 3, node may represent finite or infinite sets here. GQ(I) thus consists only of edges that are matched by the regula path queries (g.r)+ and r.(g.r)⇤ , i.e., alternating sequences o green (winning) and red (delaying) moves [KLZ13]. 3. Constraint Provenance Games Consider the solved game graph of Fig. 3c. If the value c wer added to the active domain, the provenance would be incomplete e.g., to explain why-not A(b) there are two 9a, 9b branches ema nating from A(b). However, with c in the active domain there is third 9c branch via r2(b, c): see Fig. 4. We show that a modifie Happy End (3 of 3)… sort of … C(X) B(X, Y ) X, Y ) g1 2(X, Y ) g2 2(Y ) rB(X, Y ) rC (X) ¬B(X, Y ) ¬C(X) B(X, Y ) C(X) X:=Y mplate for QABC : A(X) : B(X, Y ), ¬C(Y ). ¬C(a) ¬C(b) ¬B(a, a) ¬B(a, b) rB(b, a) r2(b, a) g1 2(a, a) B(a, b) B(a, a) C(a) g2 2(a) g2 2(b) C(b) ¬B(b, a) ¬B(b, b) rC (a) r2(a, b) r2(a, a) g1 2(a, b) rB(a, b) r2(b, b) g1 2(b, b) g1 2(b, a) B(b, b) B(b, a) ed QABC game on I = {B(a, b), B(b, a), C(a)}. ¬C(a) ¬C(b) ¬B(a, a) ¬B(a, b) rB(b, a) r2(b, a) rB(a, b)B(a, b) B(a, a) C(a) g2 2(a) g2 2(b) C(b) ¬B(b, a) ¬B(b, b) rC (a) r2(a, b) r2(a, a) g1 2(a, b) g1 2(a, a) r2(b, b) g1 2(b, b) g1 2(b, a) B(b, b) B(b, a) me: lost positions are (dark) red; won positions en. Provenance edges (= good moves) are solid. e dashed and not part of the provenance. A(a) is alse) as it is won (lost) in the solved game; the nce explains why (why-not). g1 2(b, c) g1 2(b, b) r2(b, a) ¬B(b, c) B(b, c) g2 2(a) ¬B(b, b) rC (a) A(b) C(a) B(b, b)r2(b, b) r2(b, c) 9 c 9 a 9 b Figure 4: Altered subgraph of Fig. 3c after adding c to the active domain. ¬B : x1 6= a, x1 6= b, x2 = a C : x1 = a A : x1 = a A : x1 = b ¬C : x1 6= a ¬A : x1 6= a, x1 6= b C : x1 6= a R2 : X = a, Y = a R2 : X = a, Y = b B : x1 6= a, x2 6= a R2 : X 6= a, Y 6= a RB : x1 = b, x2 = a B : x1 = a, x2 = b A : x1 6= a, x1 6= b G2 2 : ¬C : Y 6= a G1 2 : B : X 6= a, X 6= b, Y = a B : x2 6= b, x1 = a ¬A : x1 = b ¬A : x1 = a G1 2 : B : Y 6= b, X = a ¬B : x1 6= a, x2 6= a ¬B : x1 = a, x2 = b B : x1 = b, x2 = a RC : x1 = a ¬B : x2 6= b, x1 = a RB : x1 = a, x2 = b R2 : Y 6= b, X = a, Y 6= a G1 2 : B : X 6= a, Y 6= a G1 2 : B : X = b, Y = a B : x1 6= a, x1 6= b, x2 = a R2 : X 6= a, X 6= b, Y = a G1 2 : B : X = a, Y = b R2 : X = b, Y = a ¬C : x1 = a ¬B : x1 = b, x2 = a G2 2 : ¬C : Y = a Why-not provenance complete only for adom(I) = { a, b } ! Constraint why-not provenance also captures new constants, i.e., for an unlimited domain D = { a, b, c, … } => Constraint Provenance answer is domain independent! (sort of) 39 Provenance @ SBBD'16
  • 40. Why-Not: The Full Story Emerges… (sort of…) R1 : X 6= a, X 6= b, Z1 = c, Z2 = c, Y 6= c ¬hop : x2 6= a, x2 6= b, x1 = a R1 : X 6= a, X 6= b, Z1 = c, Z2 = b, Y = a 3Hop : x1 6= a, x1 6= b, x2 = a R1 : X 6= a, X 6= b, Z1 6= c, Z1 6= a, Z1 6= b, Z2 = c, Y 6= c G1 1 : hop : X 6= a, X 6= b, Z1 6= c R1 : X 6= a, X 6= b, Z1 = b, Z2 = c, Y 6= c G1 1 : hop : X 6= a, X 6= b, Z1 = c ¬hop : x1 6= a, x1 6= b, x2 = c hop : x2 6= a, x2 6= b, x1 = a R1 : X 6= a, X 6= b, Z1 = a, Z2 = a, Y = a ¬hop : x2 6= a, x2 6= c, x1 = b G2 1 : hop : U 6= a, Z1 6= b, Z2 6= c R1 : X 6= a, X 6= b, Z1 = c, Z2 6= c, Z2 6= a, Z2 6= b, Y 6= c R1 : X 6= a, X 6= b, Z1 6= c, Z1 6= a, Z1 6= b, Z2 6= c, Z2 6= a, Z2 6= b, Y 6= c hop : x1 6= a, x1 6= b, x2 6= c ¬hop : x1 6= a, x1 6= b, x2 6= c R1 : X 6= a, X 6= b, Z1 = b, Z2 = b, Y = a R1 : X 6= a, X 6= b, Z1 6= c, Z1 6= a, Z1 6= b, Z2 = b, Y = a G2 1 : hop : Z1 6= a, Z1 6= b, Z2 = c hop : x2 6= a, x2 6= c, x1 = b hop : x1 6= a, x1 6= b, x2 = c R1 : X 6= a, X 6= b, Z1 = b, Z2 = a, Y = a G2 1 : hop : Z2 6= a, Z2 6= c, Z1 = b R1 : X 6= a, X 6= b, Z2 6= a, Z2 6= b, Z1 = a, Y 6= c R1 : X 6= a, X 6= b, Z1 = c, Z2 = a, Y = a R1 : X 6= a, X 6= b, Z1 6= c, Z1 6= a, Z1 6= b, Z2 = a, Y = a R1 : X 6= a, X 6= b, Z1 = a, Z2 = b, Y = a G3 1 : hop : Z2 6= a, Z2 6= b, Y 6= c R1 : X 6= a, X 6= b, Z2 6= a, Z2 6= c, Z1 = b, Z2 6= b, Y 6= c G2 1 : hop : Z2 6= a, Z2 6= b, Z1 = a Figure 9: The why-not provenance of 3hop(c, a). The provenance is represented in the failure of the claim that 3hop(c, a) is in the answer. This is argued over the Boolean expression defining 3hop(x, y). A move from the source node to a child represents the choice of a Boolean expression that is sufficient to g2 1(c, a) ¬3hop(c, a) g2 1(c, c)g1 1(c, c) r1(c, a, c, b) ¬hop(c, b) hop(c, a) g2 1(b, b) ¬hop(a, c) hop(c, c) g1 1(c, a) r1(c, a, b, c)r1(c, a, a, b) 3hop(c, a) hop(b, b) g2 1(c, b)g2 1(a, c) r1(c, a, a, c) ¬hop(c, c) hop(c, b) ¬hop(c, a) g1 1(c, b) r1(c, a, b, b) ¬hop(b, b) g3 1(c, a) r1(c, a, a, a) r1(c, a, b, a) hop(a, c) r1(c, a, c, a) r1(c, a, c, c) 9 a,b 9 a,c 9 c,a 9 c,c9 b,c 9 b,b9 b,a9 a,a 9 c,b Figure 2: Why-not provenance for 3hop(c, a) using provenance games. gi 1 in the body of r1, thus claiming that gi 1 is false and hence that the r1 instance doesn’t derive t. The first player can counter and demonstrate that gi 1 is true by selecting a rule instance or fact as evidence for gi 1. The game proceeds in rounds until some player cannot move and thus loses (the opponent wins). In [KLZ13] it was shown how the provenance of a tuple t can be obtained via a regular path query over a solved game graph like the one in Fig. 1d: e.g., p3 + 2pqr for 3hop(a, a) is represented by a solved game as shown in Fig. 1e: for positive queries, solved games represent semiring provenance by noting that won (green) and lost (red) po- sitions correspond to “+” and “⇥” operations, respectively (leaves represent input annotations, here: p, q, r, s) [KLZ13]. Why-Not Provenance and the Many Ways to Fail. Since games are inherently symmetric (one player’s win is the opponent’s loss and vice versa), the approach yields an elegant provenance model that unifies why and why-not provenance. Consider the (dark, red) node 3hop(c, a) in Fig. 2. The color coding indicates that the posi- Constraint Provenance Games. We propose to solve th lem of domain dependence by modifying provenance ga that they can handle certain infinite relations that can be represented. For example, in addition to the finitely many why 3hop(c, a) fails over the active domain adom(I), ther finitely many others, if we consider new constants d, e, . . . of adom(I). For example, let relation R = {a, b} have tw R(a) and R(b). If we want to know why-not R(c), we just c /2 R. But we could also return a more general answer for w R(x) and say that ¬R(x) is true for all x with x 6= a ^ x 6= just for x = c). This approach is inspired by Chan’s Cons Negation [Cha88], a form of constraint logic programming [ The key idea is to represent (potentially infinite) relations constraints, i.e., Boolean combinations of equalities x = c equalities x 6= c. Overview and Contributions. Section 2 briefly explains ho order queries are translated into games and how provenanc tracted from solved games. In Section 3 we describe the co tion of constraint provenance games; additional details and ples are contained in the appendix. Our main contributio (i) game provenance provides a uniform treatment of why an not provenance for first-order logic (= relational algebra w difference); (ii) for positive queries, the approach captures t informative semiring provenance [GKT07, KG12]; (iii) we a constraint provenance framework which yields domain in dent provenance expressions, extending prior results [KLZ1 (iv) we implemented a prototype of constraint provenance g inal database instance I plus a number of hypothetical edges (dotted), with labels t, u, v, w, and x. These m correspond to the failed leaf nodes in Fig. 2. The ta contains the why-not provenance, with different com missing edges as preconditions for a derivation of 3ho a p b q c u r x s t w v Figure 7: Input graph I with five additional, hypothetical ed B. Constraint Game Construction Consider the query QABC. To build the game, each ple in the program such as B(a, b) is replaced by B: x1=a, x2=b (a conjunction). First, the subgraph for EDB predicates is created. T of the game is constructed iteratively similar to quer For rules whose subgoals are all on EDB predicat nodes/edges are generated. For IDB predicates that the head of EDB-only rules, tuple nodes are generate 5 missing edges 9 minimal combinations A. Why-Not 3hop(c, a) Dissected Consider the input graph in Fig. 1a and its why-not for 3hop(c, a) in Fig. 2. The graph encodes the re 3hop(c, a) is not in the answer. Moving from the lost 3 Fig. 2, there are nine possible rule instantiations r1(c, a of which represent a reason why there is no 3hop(c, a) diate nodes z1, z2 2 {a, b, c}. To better understand th explanations, consider the input graph in Fig. 7. It conta inal database instance I plus a number of hypothetical edges (dotted), with labels t, u, v, w, and x. These mi correspond to the failed leaf nodes in Fig. 2. The tab contains the why-not provenance, with different com missing edges as preconditions for a derivation of 3hop a p b q c u r x s t w v Figure 7: Input graph I with five additional, hypothetical ed + … ? Constraints imply 15 disjoint relations over key variables X, Z1, Z2, Y 40 Provenance @ SBBD'16
  • 41. Provenance Games: Summary • (1) Game Provenance – The win-move game has a natural why and why-not provenance “built-in” • “good” and “bad moves” • è discard bad moves è game provenance • (2) Provenance Games – Query evaluation also is a game! – Game provenance can be applied to query evaluation game => uniform why + why-not provenance • (3) Constraint Provenance – Domain independent (some infinite domains OK) – Prototypically implemented • (4) Future Work – Make theory practical! • e.g. implement in Boris Glavic’s Perm or GPROM system – Theoretical properties – Relation to Argumentation Frameworks – Clarify relationship to monus semirings (Floris Geerts et al) – Higher-order reasons! 41 Provenance @ SBBD'16
  • 43. Conclusions • Provenance is an important, active, and broad area of research in databases and scientific workflows – Both in specialized (TAPP, IPAW) and mainstream venues (SBBD, VLDB, SIGMOD, EDBT, ICDE, PODS, ICDT, ..) • There are (still) many deeply technical and practical challenges: – Efficient capture, management, use of provenance – Models, semantics, query languages – Provenance .. for others? Or provenance for self! – Interdisciplinary work; cross-fertilization: databases, workflows, programming languages, security, …, various scientific communities (bioinformatics, ...) • … oh, and it’s also a lot of fun! – Interested to join? – Ludaesch@illinois.edu 43 Provenance @ SBBD'16
  • 44. Why-Not Provenance References • Köhler, Sven, Bertram Ludäscher, and Daniel Zinn. "First- order provenance games.” In Search of Elegance in the Theory and Practice of Computation. Peter Buneman Festschrift, LNCS 8000. Springer Berlin Heidelberg, 2013. • Riddle, Sean, Sven Köhler, and Bertram Ludäscher. "Towards constraint provenance games.” 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2014). • Glavic, Boris, Sven Köhler, Sean Riddle, and Bertram Ludäscher. "Towards constraint-based explanations for answers and non-answers.” 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2015). 44 Provenance @ SBBD'16