Learning Timed Automata with Cypher

Learning Timed Automata with Cypher
Gábor Szárnyas, Anna Gujgiczer, Márton Elekes
4th openCypher Implementers Meeting

Sicco Verwer, Mathijs de Weerdt, Cees Witteveen:
An algorithm for learning real-time automata.
Benelearn 2007

STATE MACHINE
Off
Stop
Continue
Prepare
Go
switchPhase
switchPhase
switch
Phase
onOff
onOff
onOff onOff
onOff
switch
Phase
switch
Phase

AUTOMATON
 States: accepting/rejecting
 Run: (finite) event sequence
 Accepted run:
ends in an accepting state
 Modelled behaviour:
accepted runsOff
Stop
Continue
Prepare
Go
switchPhase
switchPhase
switch
Phase
onOff
onOff
onOff onOff
onOff
switch
Phase
switch
Phase

AUTOMATON
 States: accepting/rejecting
 Run: (finite) event sequence
 Accepted run:
ends in an accepting state
 Modelled behaviour:
accepted runsOff
Stop
Continue
Prepare
Go
switchPhase
switchPhase
switch
Phase
onOff
onOff
onOff onOff
onOff
switch
Phase
+
- -
- -
switch
Phase

TIMED AUTOMATON
 Time spent in a state
 Guard condition: an interval
Off
Stop
Continue
Prepare
Go
switchPhase
switchPhase
switch
Phase
onOff
onOff
onOff onOff
onOff
switch
Phase
+
- -
- -
switch
Phase

TIMED AUTOMATON
 Time spent in a state
 Guard condition: an interval
Off
Stop
Continue
Prepare
Go
switchPhase
switchPhase
switch
Phase
onOff
onOff
onOff onOff
onOff
switch
Phase
+
- -
- -
[3,5]
[30,35]
[1,2]
switch
Phase
[30,35]
[0,∞]
[0,∞]
[0,∞]
[0,∞]
[0,∞]
[0,∞]

AUTOMATON LEARNING
 System: unknown internal implementation (black box)
o System Under Learning

AUTOMATON LEARNING
 Goal: determine the internal operation of the system
(~reverse engineering)
o Hypothesis Model

AUTOMATON LEARNING
o Hypothesis Model
 Approach: observation  execution trajectories  automaton

AUTOMATON LEARNING
o Hypothesis Model
Hypothesis
Model
System Under
Learning
Trajectory
of Runs
Learning
Algorithm

AUTOMATON LEARNING
o Hypothesis Model
 Application: monitor synthesis, testing
Hypothesis
Model
System Under
Learning
Trajectory
of Runs
Learning
Algorithm

AUTOMATON LEARNING: THE BASICS
+
-
-
+
a,1
a,5
+
a,3
a,5 a,7
a,2
+ -
a,[1,2]
+
a,[1,5]
+
a,[3,5]
-
a,[1,5]
+ -
a, [1,2]
a,[1,2]
a, [3,5] a, [3,5]

AUTOMATON LEARNING: AN ALGORITHM
 Repeating three operations
o Split inconsistent states
+ -
a,[1,2]
+
a,[1,5]
+
a,[3,5]
-
a,[1,5]
+ +/-
a,[1,5]
+/-
a,[1,5]
+
-
-
+
a,1
a,5
+
a,3
a,5 a,7
a,2

o Merge similar states
-
-
b
+
+
a
a
-b
+
+
a
a
-b +
a
Merge step

o Merge similar states
o Colour to finalise a state
 Selecting an operation
o Using scoring metrics based on the
result of the operation
 Operations have to be executed
Select operation
Split Merge Colour

INTERESTING QUERIES
 How to select a “longest path”
o i.e. a maximal path that cannot be extended further
 Merge
o Transitive reachability for the states to-be-merged
 Split: categorisation/copying subtrees
o Merge might result in multiple origNext edges
o Which one to use for categorisation?

WHERE NOT «PATTERN»
 Finding the beginning of a path
 Finding the end of a path
 These two can be used to select a complete path
MATCH (u)-[:Next*]->(v) WHERE NOT ()-[:Next]->(u)
MATCH (u)-[:Next*]->(v) WHERE NOT (v)-[:Next]->()
MATCH (u)-[:Next*]->(v)
WHERE NOT ()-[:Next]->(u)
AND NOT (v)-[:Next]->()
u:Next :Next
… …:Next :Next :Next
… …v :Next

MERGE
:indexPair :indexPair
-
-:Next
:Next
+
+
:Next
:Next
…
…
:indexPair
-:Next
+:Next
…
 Select states to merge
 Calculate scoring metrics
 Filter inconsistencies
 Return result

MERGE +
-
:indexPair
:indexPair
 Problem:
o Inconsistencies can be transitive
 Solution:
o Variable length path
:indexPair
:indexPair
+
-:Next
:Next

MERGE +
-
:indexPair
:indexPair
 Problem:
 Solution:
:indexPair
:indexPair
+
-:Next
:Next
while (driver.run(createIndexPairs, mergeMap))

MERGE +
-
:indexPair
:indexPair
 Problem:
 Solution:
:indexPair
:indexPair
+
-:Next
:Next
MATCH (s1:IndexedMerge)-[:IndexPair*..]-(s2:IndexedMerge),
s1p = (s1)-[:Next*0..]->(s12:State), s2p = (s2)-[:Next*0..]->(s22:State)
WHERE id(s1) > id(s2) AND length(s1p) = length(s2p)
AND NOT (s12)-[:IndexPair*0..]-(s22)
WITH s12, s22, s1p, s2p,[s1r IN rels(s1p) | s1r.symbol] AS s1ss, [s2r IN rels(s2p) | s2r.symbol] AS s2ss,
[s1r IN rels(s1p) | s1r.Tmin] AS s1mins, [s2r IN rels(s2p) | s2r.Tmin ] AS s2mins,
[s1r IN rels(s1p) | s1r.Tmax] AS s1maxs, [s2r IN rels(s2p) | s2r.Tmax ] AS s2maxs
WHERE s1ss = s2ss AND s1mins = s2mins AND s1maxs = s2maxs
WITH collect([s12, s22]) as pairs
MATCH ()-[ip:IndexPair]->()
WITH max(ip.index) + 1 as nextIndex, pairs
UNWIND range(0, length(pairs)-1) as idx
WITH pairs[idx][0] as s12, pairs[idx][1] as s22, idx + nextIndex as edgeIndex
CREATE (s12)-[:IndexPair {index: edgeIndex}]->(s22)
SET s12:IndexedMerge, s22:IndexedMerge

MERGE +
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Next
 Problem:
 Solution:

MERGE
MATCH (s1:IndexedMerge)-[:IndexPair*]-(s2:IndexedMerge),
s1p = (s1)-[:Next*0..]->(s12:State),
s2p = (s2)-[:Next*0..]->(s22:State)
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Next
 Problem:
 Solution:

MERGE
s1p = (s1)-[:Next*0..]->(s12:State),
s2p = (s2)-[:Next*0..]->(s22:State)
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Next
 Problem:
 Solution:
Find critical state pairs

MERGE
s1p = (s1)-[:Next*0..]->(s12:State),
s2p = (s2)-[:Next*0..]->(s22:State)
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Next
 Problem:
 Solution:
s1 s1
s2 s2

MERGE
s1p = (s1)-[:Next*0..]->(s12:State),
s2p = (s2)-[:Next*0..]->(s22:State)
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Next
 Problem:
 Solution:
s1 s1
s2 s2
WITH s12, s22, s1p, s2p,
[s1r IN rels(s1p) | s1r.symbol] AS s1ss, [s2r IN rels(s2p) | s2r.symbol] AS s2ss,
...

MERGE
s1p = (s1)-[:Next*0..]->(s12:State),
s2p = (s2)-[:Next*0..]->(s22:State)
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Next
 Problem:
 Solution:
s1 s1
s2 s2
WITH s12, s22, s1p, s2p,
[s1r IN rels(s1p) | s1r.symbol] AS s1ss, [s2r IN rels(s2p) | s2r.symbol] AS s2ss,
...
• Paths of same length
• Event sequences of same
lengths, etc.

MERGE
 Problem:
 Solution:
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Nexts1 s1
s2 s2

MERGE
 Problem:
 Solution:
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Nexts1 s1
s2 s2
Mark newly found
index pairs

MERGE
 Problem:
 Solution:
+
-
:indexPair
:indexPair
:indexPair
:indexPair
+
-:Next
:Nexts1 s1
s2 s2
Mark newly found
index pairs
Repeatedly called from
Java code until fixed-point

SPLIT
:Origin
:Origin
:Origin
:Origin
[1,6]
1
6
 Splits an edge
 Based on timestamp: t
o Original transition
 The two endpoints…
o Smaller: < t
o Larger: ≥ t
2
:Origin :Origin
+
+
-
+/-

SPLIT
:Origin
:Origin
:Origin
:Origin
[1,6]
1
6
2
:Origin :Origin
 Splits an edge
 Based on timestamp: 5
o Smaller: < t
o Larger: ≥ t
+
+
-
+/-

SPLIT
 Splits an edge
 Based on timestamp: 5
o Smaller: < t
o Larger: ≥ t
 Results
[1,4]
[5,6]
+
-

SPLIT
 Problem
o Node a is a result of an earlier
merge operation
o The subtree regenerated from
node b should belong to the…
• Larger subtree because of 6
• Smaller subtree because of 1
:Origin
:Origin :Origin
[1,6]
1
6
b
6
:Origin :Origin
…
a

 Solution
o “Shortest” path to node a
o Based on the last number: 6
SPLIT
:Origin
:Origin :Origin
[1,6]
1
6
b
6
:Origin :Origin
…
a

SPLIT
MATCH (r:Red)-[n:Next]->(b:Blue)
WITH r, b, n.Tmax as Tmax, n.Tmin as Tmin, n.symbol as symbol
UNWIND range(Tmin, Tmax-1) AS t
MATCH (b)-[:Next*0..]->(s1:State)-[:Origin]->(so1:OrigState), trace1=(so1)<-[OrigNext*0..]-(redOrigNext:OrigState), (redOrigNext)<-[no1:OrigNext]-(redOrig:OrigState)<-[:Origin]-(r)
WHERE none(x IN nodes(trace1) WHERE (x)<-[:Origin]-(r))
WITH r, b, t, Tmin, Tmax, symbol, s1, no1, so1
WHERE none(x IN nodes(trace2) WHERE (x)<-[:Origin]-(r))
WITH t, r, b, s1, s2, toInteger(no1.time)>t as cat1, toInteger(no2.time)>t as cat2, Tmin, Tmax, symbol, so1, so2
WHERE s1 = s2
AND cat1 <> cat2
AND id(so1) < id(so2)
WITH r, b, t, Tmin, Tmax, symbol,
[coalesce(so1.accepting, false), coalesce(so1.rejecting, false)] AS so1ar,
[coalesce(so2.accepting, false), coalesce(so2.rejecting, false)] AS so2ar
WITH
r, b, t, Tmin, Tmax, symbol,
CASE
WHEN so1ar[0] <> so2ar[0] AND so1ar[1] <> so2ar[1] THEN 1
WHEN so1ar = so2ar AND (so1ar = [false, true] OR so1ar = [true, false]) THEN -1 // there is at least one true
ELSE 0
END AS score
WITH r, b, t, sum(score) AS metric, Tmin, Tmax, symbol
ORDER BY metric DESC
LIMIT 1
WITH r, b, t, metric, Tmin, Tmax, symbol
WHERE none(x IN nodes(trace1) WHERE (x)<-[:Origin]-(r)) AND toInteger(no1.time)>t
WITH r, b, t, metric, Tmin, Tmax, symbol, collect(so1) as so1s
WHERE none(x IN nodes(trace2) WHERE (x)<-[:Origin]-(r)) AND toInteger(no2.time)<=t
RETURN r, b, t, metric, Tmin, Tmax, symbol, so1s, collect(so2) as so2s

SPLIT
:Origin
:Origin :Origin
[1,6]
1
6
6
:Origin :Origin
…
a s2
ao
an
b
 Solution
MATCH
(a)-[:Next*0..]->(s2:State)-[:Origin]->(b:OrigState),
trace=(b)<-[OrigNext*0..]-(an:OrigState),
(an)<-[no:OrigNext]-(ao:OrigState)<-[:Origin]-(a)
The split candidate in
the original runs

SPLIT
The split candidate in
the original runs
:Origin
:Origin :Origin
[1,6]
1
6
6
:Origin :Origin
…ao
a
an
s2
b
MATCH
 Solution

SPLIT
WHERE none(x IN nodes(trace) WHERE (x)<-[:Origin]-(a))
The shortest possible path, i.e.
a node, from which node a can
only be reached directly.
 Solution
:Origin
:Origin :Origin
[1,6]
1
6
6
:Origin :Origin
…
a s2
ao
an
b
MATCH

 Visualisation
ADVANTAGES OF NEO4J AND CYPHER

 Visualisation
 Flexible data model
o New labels, properties
o No need to define the schema upfront

 Visualisation
 Flexible data model
o New labels, properties
o No need to define the schema upfront
 Cypher: high-level declarative graph query language

WORKFLOW
 Neo4j web browser UI

WORKFLOW
MATCH (n)
RETURN n

WORKFLOW
We should combine
transformation and
visualisation

CHAINING QUERIES
 Chaining queries to write a complex step of the algorithm
with a single Cypher query
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
:Red {prop1: "val1"}

CHAINING QUERIES
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
WITH 1 AS dummy
RETURN *
dummy
1
1

CHAINING QUERIES
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
WITH 1 AS dummy
RETURN *
dummy
1
1
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
WITH 1 AS dummy
MATCH (n)
RETURN *
node dummy
:Red {prop1: "val1"} 1
:Green {value: 13} 1

CHAINING QUERIES
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
WITH 1 AS dummy
RETURN *
dummy
1
1
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
WITH 1 AS dummy
MATCH (n)
RETURN *
node dummy
Cartesian product
 all rows are
displayed twice

CHAINING QUERIES
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
WITH count(*) as dummy
RETURN *
dummy
2

CHAINING QUERIES
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
dummy
2
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
MATCH (n)
RETURN *
node dummy

CHAINING QUERIES
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
node
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
RETURN *
dummy
2
MATCH (node:Blue)
REMOVE node:Blue
SET node:Red
MATCH (n)
RETURN *
node dummy
Exactly 1 result
 The second query will not
produce unnecessary
duplicates

RUNNING AND VISUALISING THE ALGORITHM

RUNNING AND VISUALISING THE ALGORITHM
...
MATCH (n)
RETURN n

EXECUTING QUERIES
Passing information between queries:

EXECUTING QUERIES
 Node labels

EXECUTING QUERIES
 Node labels
 Nodes with temporary information

EXECUTING QUERIES
 Node labels
 Passing nodes/edges as parameters
(only supported in embedded mode)

EXECUTING QUERIES
 Node labels
redNode num

EXECUTING QUERIES
 Node labels
redNode num
Exactly
1 result

EXECUTING QUERIES
 Node labels
redNode num
WITH $redNode AS redNode
MATCH (g:Green {num: $num})
CREATE (redNode)-[:Edge]->(g)
Exactly
1 result

LOOPS
 Many loop-like problems can be solved using lists:
o List comprehensions: [x IN xs WHERE condition | f(x)]
o Reduce: reduce(acc = "", x IN list | acc + x.prop)

LOOPS
 However, loops still might be necessary:
o Run queries from Java code
gds.execute(step1Query);

LOOPS
 However, loops still might be necessary:
o Run queries from Java code
o The loop condition checks the number of rows in the result
while (gds.execute(conditionQuery).hasNext()) {
}

IMPORT-EXPORT
 Save current state of the algorithm

IMPORT-EXPORT
 APOC* GraphML import-export
*Awesome Procedures on Cypher

IMPORT-EXPORT
// save
CALL apoc.export.graphml.all('my.graphml',
{storeNodeIds:true, readLabels:true, useTypes:true})

IMPORT-EXPORT
// save
CALL apoc.export.graphml.all('my.graphml',
// load
MATCH (n) // delete all
DETACH DELETE n
WITH count(*) AS dummy //-----------------
CALL apoc.import.graphml('my.graphml',
YIELD nodes, relationships
WITH count(*) AS dummy //-----------------
MATCH (n) RETURN n // show all

BACKTRACKING
 In many cases, we need to restore a previous
state and continue from that one
o Next step is based on scoring metrics
 Solutions:
?

BACKTRACKING
 Solutions:
?
Metrics:

BACKTRACKING
 Solutions:
?
Metrics: 5

BACKTRACKING
 Solutions:
?
Metrics: 5 12

BACKTRACKING
 Solutions:
?
Metrics: 5 12 8

BACKTRACKING
 Solutions:
o Export -> Re-import
o Use transactions:
?
Metrics: 5 12 8

BACKTRACKING
 Solutions:
o Use transactions:
• Abort transaction
?
Metrics: 5 12 8

BACKTRACKING
 Solutions:
o Use transactions:
• Abort transaction
• Only applicable to one level of backtracking
?
Metrics: 5 12 8

DEBUGGING
 Save and play back the states that occurred during execution

DEBUGGING
o A counter for the states is stored in a node

DEBUGGING
o A counter for the states is stored in a node
o Saved query to step back

DEBUGGING
 For complex errors

DEBUGGING
o Formulate an error pattern in Cypher
MATCH (node:Faulty)
RETURN node

DEBUGGING
o Use this to define a breakpoint in the code
MATCH (node:Faulty)
RETURN node
if (gds.execute(query).hasNext())
// breakpoint

DEBUGGING
o Use this to define a breakpoint in the code
o Load the state where
the error occurred
o Debug step by step
MATCH (node:Faulty)
RETURN node
if (gds.execute(query).hasNext())
// breakpoint

Sicco Verwer:
Efficient identification
of timed automata.
PhD dissertation
PSEUDOCODE

EXPRESSIVE POWER OF CYPHER
 Most fixed-size graph queries can be decided in P-time
 Checking whether two disjoint paths exist is NP-complete
 Cypher is Turing-complete
o reduce
 Cypher 9 (current version)
 Cypher 10 (multiple graph support)
o This would simplify a lot of our queries.
N. Francis et al.,
Cypher: An Evolving Query Language for Property Graphs,
SIGMOD 2018

SUMMARY
 Neo4j prototype
o Rapid development
o Lots of APOC procedures for specific problems (e.g. mergeNodes)
 Issues
o APOC bug (reproted)
o Visualisation glitches
o Lack of fixed-point algorithms
 For graph algorithms, we’d recommend to
o create a prototype in Neo4j,
o once the algorithm is understood, rewrite it in a general purpose
programming language.

OUR TEAM
Co-advisor: Rebeka Farkas, Thanks to: Oszkár Semeráth, Tamás Tóth, András Vörös

Learning Timed Automata with Cypher

More Related Content

What's hot

Similar to Learning Timed Automata with Cypher

More from openCypher

Recently uploaded

Learning Timed Automata with Cypher