Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us

Possible Worlds Explorer (PWE):
Datalog & Answer Set Programming
for the Rest of Us
Sahil Gupta
Jessica Yi-Yun Cheng
Bertram Ludäscher
PHILADELPHIA LOGIC WEEK
June 3-7, 2019

Intros should come first (“provenance”)
• Memory Lane (& Quiz): SLD-CNF …
• [Cha88] Chan, D., Constructive Negation Based on the Completed Database. 5th ICLP, 1988
• … F-Logic (Datalog + OO) … Flip … Flora(-2) …
• … Statelog (Datalog + States) ….
• Scientific Workflow Design for Mere Mortals (“Kepler”)
• … Datalog as a Lingua Franca for Querying Provenance …
• … Declarative Debugging for Mere Mortals …
2
B
cuting
order
an be
: e.g.,
data y
nding
ainers
der to
tation
other
cle in
e sure
ples).
a con-
which
x A).
rules
models
aces),
to the
A
a
X
x
in
read
Y
y
out
write
B
b
in
read
firing
constraint
data
constraint
homomorphism h
≤f ≤d
Z
z
out
write
in
read
Workflow W
Trace T
Fig. 4: Workflow W (top) vs Trace T (bottom): Traces
are associated to workflows, guaranteeing structural con-
sistency; workflow-level (firing or data) constraints in-
duce temporal constraints f and d on traces.
and in which way, they can be stateful; how they con-
sume their inputs, produce their outputs; and so on. As
a result, different systems use different models of prove-
nance (MoPs), with different temporal semantics. Thus,
instead of “hard-wiring” a fixed temporal semantics to
a particular graph-based MoP, we again use logic con-
straints to obtain a “customizable” temporal semantics.
(3) We illustrate this concept by providing firing con-
straints at the workflow level, which induce temporal
constraints f at the level of traces (cf. Figure 4). These
temporal-constraint generating rules can be chosen to
conform to the temporal axioms in [15], or to accom-
ARTICLE IN PRESS
2 T. McPhillips et al. / Future Generation Computer Systems ( ) –
Fig. 1. A phylogenetics workflow implemented in the Kepler system. Kepler workflows are built from actors (boxes) that perform computational tasks. Users can select
actors from component libraries (panel on the left) and connect them on the canvas to form a workflow graph (center/right). Connections specify dataflow between actors.
Configuration parameters can also be provided (top center), e.g., the location of input data and the initial jumble seed value are given. A director (top left corner on the canvas)
is a special component, specifying a model of computation and controlling its execution.
become broadly adopted as a technology for assembling and au-
tomating analyses, these systems must provide scientists concrete
and demonstrable advantages, both over general-purpose script-
ing languages and more focused scientific computing environ-
ments currently occupying the tool-integration niche.
Scientific workflow systems. Existing scientific workflow systems
generally share a number of common goals and characteristics [17]
that differentiate them from tool-integration approaches based
on scripting languages and other platforms with tool-automation
features. One of the most significant differences is that whereas
scripting approaches are largely based on imperative languages,
scientific workflow systems are typically based on dataflow
languages [23,17] in which workflows are represented as directed
graphs, with nodes denoting computational steps (or actors),
and connections representing data dependencies (and data flow)
between steps. Many systems (e.g., [3,27,29,33]) allow workflows
to be created and edited using graphical interfaces (see Fig. 1
for an example in Kepler). The dataflow paradigm is well-suited
for supporting modular workflow design and facilitating reuse of
components [23,25,27,5]. Many workflow systems (e.g., [33,27])
further allow workflows to be used as actors in other workflows,
thus providing workflow authors an abstraction mechanism for
hiding implementation details and facilitating even more reuse.
One advantage of workflow systems that derives from this
dataflow-orientation is the ease with which data produced by one
actor can be routed to multiple downstream actors. While the flow
of data to multiple receivers is often difficult to describe clearly
in plain text, the dataflow approach makes explicit this detailed
routing of data. For instance, in Fig. 1 it is clear that data can
flow directly from Refine alignment only to Iterate over seeds.
The result is that scientific workflows can be more declarative
about the interactions between actors than scripts, where the
flow of data between components is typically hidden within
(often complex) code. The downside of this approach is that if
taken too far, specifications of complex scientific workflows can
become a confusing tangle of actors and wires unless the workflow
specification language provides additional, more sophisticated
means for declaring how data is to be routed (as comad does—see
below as well as [30,6]).
Other notable advantages of scientific workflow systems
over traditional approaches are their potential for transparently
optimizing workflow performance and automatically recording
data and process provenance. Unlike most scripting language
implementations, scientific workflow systems often provide
capabilities for executing workflow tasks concurrently where data
dependencies between tasks allow, either in an ‘‘assembly-line’’
fashion with actors connected in a linear pipeline performing their
tasks simultaneously, or in parallel with multiple such pipelines
operating at the same time (e.g., over multiple input data sets or via
explicit branches in the workflow specification) [43,34,30]. Many
scientific workflow systems also can record, store, and query data
and process dependencies that result during one or more workflow
runs, enabling scientists to later investigate the data and processes
used to derive results and to examine intermediate data products
[38,31].
While these and other advantages of systems designed
specifically to automate scientific workflows help to position these
technologies as viable alternatives to traditional approaches based
on scripting languages and the like, much is yet required to achieve
the vision of putting workflow automation fully into the hands of
‘‘mere mortals’’ [17]. Much remains to be done to realize the vision
of scientists untrained in programming and relatively ignorant
of the details of information technology rapidly composing,
deploying, executing, monitoring, and reviewing the results
of scientific workflows without assistance from information-
technology experts.
Contributions and paper outline. In this paper we describe key
aspects of scientific workflow systems that can help broader-scale
adoption of workflow technology by scientists, and demonstrate
how these properties can be realized by a novel and generic work-
flow modeling paradigm that extends existing dataflow computa-
tion models. In Section 2, we present what we see as important
desiderata for scientific workflow systems from a workflow mod-
eling and design perspective. In Section 3, we describe our main
contribution, the collection oriented modeling and design (comad)
framework, for delivering on the expectations described in Sec-
tion 2. Our framework is especially suited for cases where data
is nested in structure and computational steps can be pipelined
(which is often true, e.g., in bioinformatics). The comad frame-
work provides an assembly-line style computation approach that
Please cite this article in press as: T. McPhillips, et al., Scientific workflow design for mere mortals, Future Generation Computer Systems (2008),
doi:10.1016/j.future.2008.06.013
1
2
3
5
4
6
10
9
15
25
8
12
20
18
30
50
27
45
75
16
24
40
36
60
100
125
54
90
150
32
48
80
72
120
200
81
135
225
250
108
180
300
375
64
96
160
144
240
400
162
270
450
500
216
360
600
625
243
405
675
750
128
192
320
288
480
800
324
540
900
1000 432
720
486
810
256
384
640
576
960
648
729
864
972
512
768
1
2
3
5
4
6
10
9
15
25
8
12
20
18
30
50
27
45
75
16
24
40
36
60
100
125
54
90
150
32
48
80
72
120
200
81
135
225
250
108
180
300
375
64
96
160
144
240
400
162
270
450
500
216
360
600
625
243
405
675
750
128
192
320
288
480
800
324
540
900
1000
432
720
486
810
256
384
640
576
960
648
729
864
972
512
768
PWE: Datalog & ASP for the Rest of Us

From past Provenance … to the Future!
• Time flies: “… for Mere Mortals” => “for the Rest of Us”
• If Datalog & ASP are so great, why don’t more people use it?
– MA: “one generation has to die …” ?
• Alt-answer: “Be a teacher!” (Tim Minchin) + Use Tools!
3PWE: Datalog & ASP for the Rest of Us

Human Cycles vs Machine Cycles
• Where is the semantics (e.g., in the “Semantic Web”)?
• Ask a DB-theory/LP person: …
– "A query is a question about a concept"
– Google it => 1 hit (Bing it => millions of “hits” ..)
• Datalog & ASP occupy a sweet spot …
– … between conceptual modeling & computational thinking
– … optimizing human cycles!
– cf. Brains & Brawns (Molham Aref’s keynote)

Motivation for PWE
• Datalog & ASP for a larger community?
– … meet users (novices) where they are!
– … plus: a “logic lab” for DBLP gurus, teachers, …
• Ideas:
– Wrap existing engines (dlv, clingo, … XSB … <yours> ..)
– Allow to easily combine with Python ecosystem!
• … meet users where they are!
– … inside of Jupyter (and deployed in the cloud…)
• It isn’t that hard
– … with the right people … J
PWE: Datalog & ASP for the Rest of Us 5

Partial Recall (Datalog 2.0 Vienna 2012)
Pop Quiz: Why/how come tc(a,b) ?
¨ Why/how is (a,b) in the transitive closure tc of e ?
¨ What about ?-tc(e,X) vs ?-tc(X,e)

e/2 cycles => tc/2 SLD(NF) issues
Prolog’s SLD-NF resolution does not seem to work for declarative/naïve tc/2 rules
=> What’s happening anyways?

Explaining Derivations via Provenance
[r1] tc(X,Y) :- e(X,Y)
[r2] tc(X,Y) :- e(X,Z), tc(Z,Y)
A firing [F] à (H) is called unfounded, if all derivations of F require H as an assumption!
Here tc(a,b) has (at least) two different derivations, neither of which is unfounded.
However, [r2] à tc(c,b) is unfounded: The firing of r2 depends on tc(b,b) which can only be
derived by already assuming the desired conclusion tc(c,b)!

Step 1: Capturing Rule Firings (“F-trick”)
¨ Capture rule firings and keep “witness info” (existential variables)
¤ no premature projections in the rule head please!
¨ Example. Instead of a given rule …
tc(X,Y) :- e(X,Z), tc(Z,Y).
… we rather use these two rules, keeping witnesses Z around:
fire2(X,Z,Y) :- e(X,Z), tc(Z,Y).
tc(X,Y) :- fire2(X,Z,Y).
Example rule firings
This is the “secret sauce” in Orchestra, provenance polynomials (Val’s TaPP Keynote), …

Step 2: Graph Transformation (“G-trick”)
¨ Reify provenance atoms & firings in a labeled graph g/3
¨ Example for N = 2 subgoals and 1 head atom …
fire2(X,Z,Y) :- e(X,Z), tc(Z,Y). % two in-edges
tc(X,Y) :- fire2(X,Z,Y). % one out-edge
… generates N+1 “reification rules” (Skolems are safe):
g( e(X,Z), in, skfire2(X,Z,Y) ) :- fire2(X,Z,Y).
g( tc(Z,Y), in, skfire2(X,Z,Y) ) :- fire2(X,Z,Y).
g( skfire2(X,Z,Y), out, tc(X,Y) ) :- fire2(X,Z,Y).
e(a,b)
ﬁre2(a,b,d)
in
tc(a,d)
out
tc(b,d)
in
Example instance generated by these rules
This is the “secret sauce” in Frame-Logic, RDF, …

Step 3: Using Statelog (“S-Trick”)
¨ Use Statelog to keep record of firing rounds:
¤ Add state (=stage) argument to provenance rules and graph relations
¤ EDB facts are derived in state 0.
¤ Subsequently: extract earliest round for firings and IDB facts
¨ Example:
rin : firer(S1, X) :- B1(S, X1), … , Bn(S, Xn), next(S, S1).
rout : H(S, Y) :- firer(S, X).
e(a,b) r1 [1]
r2 [3]
tc(a,b)
[1]e(b,c)
r2 [2]
tc(b,b)
[2]
e(c,b)
r1 [1]
r2 [3]
tc(c,b)
[1]
This is the “secret sauce” in Statelog, Datalog1S, …

How long (does it take) Provenance!
¨ These definitions are recursive but well-founded
¨ The numbers can be easily obtained via Statelog
This is the “secret sauce” behind declarative profiling …

Declarative Profiling
¨ Number of Facts:
derived(H) :-
g(_,out, H).
derivedHeadCount(C) :-
C = count{
H : derived(H)
}.
¨ Number of Firings:
firing(F) :- g(_,F,out,_).
firingCount(C) :-
C = count{F : firing(F)}.
e(a,b) 1
2
3
4
tc(a,b)
[1]
tc(a,c)
[2]
tc(a,d)
[3]
tc(a,e)
[4]
e(b,c) 1
2
3
tc(b,c)
[1]
tc(b,d)
[2]
tc(b,e)
[3]
e(c,d)
1
2
tc(c,d)
[1]
tc(c,e)
[2]
e(d,e) 1
tc(d,e)
[1]
3
tc(a,d)
[3]
3
3
tc(a,e)
[3]
3
tc(b,e)
[3]
3
4
4
e(a,b) 1
tc(a,b)
[1]
e(b,c) 1
tc(b,c)
[1]
e(c,d) 1
tc(c,d)
[1]
e(d,e) 1
tc(d,e)
[1]
2
2
2
tc(a,c)
[2]
tc(b,d)
[2]
tc(c,e)
[2]
(a) right-recursive
(b) doubly-recursive

… from a Vienna Datalog 2.0 paper
... but where is the code?
Can I reproduce the results?
Work with the examples?
Build on them? Extend them?
… try new things???

ASP + PWE: Possible Worlds Explorer
15
https://github.com/idaks/PW-explorer
https://github.com/idaks/PWE-demosPWE: Datalog & ASP for the Rest of Us

PWE/Python visualization of input
graph e/2 (solid edges) and output
graph tc/2 (dashed edges)
… the F-trick (= firing rules =
provenance capture)
… compute via clingo

… the G-trick (reify as a graph
using Skolem terms)

… the S-trick
(Statelog encoding)

graph/4 = Firing + Graph + Statelog
• Et Voilà: Rule Firings captured, reified as a Graph,
derivations through States!
• It’s all relational! J

Let’s mix in some Python and Graphviz

…now reproducing a figure from [KLS12]!

… and another one via graph pruning
… in Python

Answer Set Programming: a superpower for “doing semantics”
• ASP = DB+LP+KR+SAT
• Reasoning spectrum: …queries … constraint solving
• … OWL/DL, FO, SQL, Datalog, ..., ASP, ...
• ... occupying a “sweet spot”
• ... but needs GTD extensions:
• PWE = ASP + Python + Jupyter
23https://github.com/idaks/PWE-demos

Datalog .. ASP: Hitting KR&R Sweet Spots
24
Variations on
FOL +
Recursion +
Negation
=
S/I/W/P/… -
Datalog
… ASP ...
Many Results from Theory
Getting Things Done with Jupyter notebooks & Python
RPQ:
similar
Unique 3-valued Model
vs
Set of Stable Models

a b
tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2)
tc(X,Y) :- # (1)--exists:Z-->(3)
e(X,Z), # (3)->(4)-e(X,Z)->(5)
tc(Z,Y). # (3)--X:=Z-->(1) 2
3
1
X := Z
4 5
e(X,Y)
exists:Z
e(X,Z)
3:(b,b,b) 1
1:(b,b) 11
4:(b,b) 1
1
1:(a,b) 1
3:(a,b,a) 1
2:(a,b) 01
3:(a,b,b) 1
2
2
3:(b,b,a) 1
2:(b,b) 01
4:(a,b) 1 5:(a,b) 01
5:(b,b) 01
3:(a,a,a) 1
4:(a,a) 0
1
1:(a,a) 2
1
3:(b,a,a) 1
4:(b,a) 0
1
1
1
1
3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2
EDB: e(a,b), e(b,b)
Game
diagram
Instantiated
move graph
Flum, Kubierschky, Ludäscher, Total
and partial well-founded Datalog
coincide, ICDT-The-Bag-1997,
Delphi, Greece
25
Eureka moment:
1. query evaluation = evaluation game (argument about truth in a database)
2. provenance = winning strategies (justified/winning arguments)

Reproducing some TaPP’12 Graph Queries:
Datalog as a Lingua Franca for Querying … Provenance

Lowest Common Ancestors (LCAs)

Visualized in PWE via Python under the hood!

… for a few Python LOCs more …
(growing the target audience)

… we get highlighting of the LCAs!

“Boring” (ASCII) answer sets become
informative Timeline Visualization
(Here: IC Checking & Repair rules!)

… visualizing clusters of PWs (answer sets) …
… easily plug in different
ranking/distance/similarity functions!

… to discover additional structure!
• … discover similar (here:
isomorphic) solutions
• … and display them!

One more thing …
… time allowing!

35
Y X X YX Y X Y X Y
Congruence
X == Y
Inclusion
X > Y
Inverse Inclusion
X < Y
Overlap
X>< Y
Disjointness
X ! Y
Origins:
Euler diagrams ...
... limited FO reasoning
... RCC-5++ reasoning
Application: Geo-Taxonomy Alignment
The secret sauce inside: Moved from FO reasoner to … qualitative reasoning
(RCC-5) to … Answer Set Programming (ASP) + some more secret sauce
Taxonomy Alignment Problem

• Euler/X & LeanEuler projects
employ qualitative reasoning
(RCC-5), implemented in ASP to
align, merge taxonomies, debug
alignments, etc.
36
Reasoning with Incomplete Knowledge:
Exploring Possible Worlds

Summary & Conclusions
• Possible Worlds Explorer (PWE):
– loosely coupling (= wrapping) Datalog & ASP systems
• DLV, clingo, …, XSB, … , <you-name-it>
– … with Python
– … and Jupyter notebooks
=> where the users are!
=> leveraging Python, Pandas, … analytics and visualization!
• Datalog & ASP for the rest of us!
– … and for LP / DB-Theory gurus :-)
• Work in progress
– join or fork: https://github.com/idaks/PW-explorer
– or talk, to get started: ludaesch@Illinois.edu

Thank You!
Questions? Queries?
… Repairs? …

Some Partial Provenance ...
• [Cha88] Chan, D.: Constructive Negation Based on the Completed Database. 5th ICLP, Seattle, 1988
• [KLW95] Kifer, M., Lausen, Georg, G., Wu, J.: Logical Foundations of Object-oriented and Frame-based
Languages. JACM 42 (4), 1995,741–843.
• [LLM98] Lausen, G, Ludäscher, B, May, W.: On Active Deductive Databases: The Statelog Approach.
Transactions and Change in Logic Databases, LNCS 1472, 1998, 69–106.
• [MBZL09] McPhillips, T, Bowers, S., Zinn, D., Ludäscher, B: Scientific Workflow Design for Mere
Mortals. Future Generation Computer Systems 25(5), 2009, 541–551.
• [DKBL12] Dey, S., Köhler, S, Bowers, S, Ludäscher, B.: Datalog as a Lingua Franca for Provenance
Querying and Reasoning. Boston, TaPP, 2012.
• [KLS12] Köhler, S, Ludäscher, B., Smaragdakis, Y.: Declarative Datalog Debugging for Mere Mortals.
Datalog 2.0: Datalog in Academia & Industry, LNCS 7494, Vienna, 2012, 111–122.
• [CFSY17] Cheng, Y.-Y., Franz, N., Schneider, J., Yu, S., Rodenhausen, T., Ludäscher, B.: Agreeing to
disagree: Reconciling conflicting taxonomic views using a logic-based approach. In: Association for
Information Science and Technology, 54(1), 2017, 46–56.
• [GCL19] Gupta, S., Cheng, Yi-Yun, Ludäscher, B.: Possible Worlds Explorer: Datalog & Answer Set
Programming for the Rest of Us. Datalog 2.0: 3rd Workshop on the Resurgence of Datalog in
Academia & Industry, Philadelphia, 2019, 44–55.

Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us

Similar to Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us (20)

More from Bertram Ludäscher

More from Bertram Ludäscher (20)

Recently uploaded

Recently uploaded (20)

Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us