Efficient Tabling of Structured Data Using Indexing and Program Transformation

  • 206 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
206
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Efficient Tabling of Structured Data Using Indexing and Program Transformation Christian Theil Have and Henning Christiansen Research group PLIS: Programming, Logic and Intelligent Systems Department of Communication, Business and Information Technologies Roskilde University, P.O.Box 260, DK-4000 Roskilde, Denmark PADL in Philadelphia, January 23, 2012Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 2. Outline1 Motivation and background The trouble with tabling of structured data2 A workaround implemented in Prolog Examples Example: Edit Distance Example: Hidden Markov Model in PRISM3 An automatic program transformation Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 3. Motivation and backgroundMotivation In the LoSt project we explore the use of Probabilistic Logic programming (with PRISM) for biological sequence analysis. We had some problems analyzing very long sequences.. .. and identified the culprit: Tabling of structured data. Inspired by earlier work by Christiansen and Gallagher which address a similar problem related to tabling non-discrimininatory arguments. Henning Christiansen, John P. Gallagher Non-discriminating Arguments and their Uses. ICLP 2009. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 4. Motivation and background The trouble with tabling of structured dataThe trouble with tabling of structured dataTabling in logic programming is an established technique which, can give a significant speed-up of program execution. make it easier to write efficient programs in a declarative style. is similar to memoization in functional programming.The idea: The system maintains a table of calls and their answers. when a new call is entered, check if it is stored in the table if so, use previously found solution.Tabling is included in several recognized Prolog systems such as B-Prolog,YAP and XSB. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 5. Motivation and background The trouble with tabling of structured dataThe trouble with tabling of structured data An innocent looking call: last([1,2,3,4,5],X) predicate: last/2 last([1,2,3,4,5],X) last([X],X). last([1,2,3,4],X) last([_|L],X) :- last([1,2,3],X) last(L,X). last([1,2],X) last([1],X) Traverses a list to find the last element. call table Time/space complexity: last([1,2,3,4,5],X). O(n). last([1,2,3,4],X). last([1,2,3],X). If we table last/2: last([1,2],X). n + n − 1 + n − 2...1 last([1],X). ≈ O(n2 ) ! Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 6. Motivation and background The trouble with tabling of structured dataWait a minute..Tabling systems do employ some advanced techniques to avoid theexpensive copying and which may reduce memory consumption and/ortime complexity.For instance, B-Prolog uses hashing of goals. XSB uses a trie data structure. YAP uses a trie structure, which is refined into a so-called global trie which applies a sharing strategy for common subterms whenever possible.These techniques may reduce space consumption, but if there is no sharingbetween the tables and the actual arguments of an active call, eachexecution of a call may involve a full traversal (naive copying) of itsarguments. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 7. Motivation and background The trouble with tabling of structured dataBenchmarking last/2 for B-Prolog, Yap and XSBTo investigate, we benchmarked the tabled version of last/2 for each ofthe major Prolog tabling engines.To further investigate whether performance is data dependent, webenchmarked with both repeated data and random data. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 8. Motivation and background The trouble with tabling of structured dataSpace usage, tabled last/2 (1) b) Space usage 2000000 ● ● XSB (random data) ● ● B−Prolog (random data) ● Yap (random data) ● XSB (repeated data) 1500000 B−Prolog (repeated data) ● Yap (repeated data) space usage (kilobytes) ● ● ● 1000000 ● ● ● ● ● ● 500000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 500 1000 1500 2000 2500 3000 list length (N) Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 9. Motivation and background The trouble with tabling of structured dataSpace usage, tabled last/2 (2) d) Space usage expanded for the four lower curves 10000 ● B−Prolog (random data) XSB (repeated data) B−Prolog (repeated data) 8000 Yap (repeated data) space usage (kilobytes) 6000 4000 2000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 5000 10000 15000 list length (N) Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 10. Motivation and background The trouble with tabling of structured dataTime usage, tabled last/2 (1) a) Time usage ● 4 ● XSB (random data) ● ● B−Prolog (random data) ● Yap (random data) ● XSB (repeated data) B−Prolog (repeated data) ● 3 Yap (repeated data) ● time usage (seconds) ● ● ● ● 2 ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 500 1000 1500 2000 2500 3000 list length (N) Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 11. Motivation and background The trouble with tabling of structured dataTime usage, tabled last/2 (2) ● c) Time usage expanded for the three lower curves ● 1.0 ● ● B−Prolog (random data) XSB (repeated data) ● Yap (repeated data) ● 0.8 ● ● time usage (seconds) ● 0.6 ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● 0.2 ●● ● ●● ● ● ● ●● ● ●● ● ●● ●● ●●● ●●●● 0.0 0 10000 20000 30000 40000 50000 list length (N) Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 12. A workaround implemented in PrologA workaround implemented in PrologWe describe a workaround giving O(1) time and space complexity for tablelookups for programs with arbitrarily large ground structured data as inputarguments. A term is represented as a set of facts. A subterm is referenced by a unique integer serving as an abstract pointer. Matching related to tabling is done solely by comparison of such pointers. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 13. A workaround implemented in PrologAn abstract data typeThe representation is given by the following predicates which all togethercan be understood as an abstract datatype.store term( +ground-term, pointer ) The ground-term is any ground term, and the pointer returned is a unique reference (an integer) for that term.retrieve term( +pointer , ?functor , ?arg-pointers-list) Returns the functor and a list of pointers to representations of the substructures of the term represented by pointer.full retrieve term( +pointer , ?ground-term) Returns the term represented by pointer. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 14. A workaround implemented in PrologADT propertiesProperty 1It must hold for any ground term s, that the query store term(s, P), full retrieve term(P, S),assigns to the variable S a value identical to s.Property 2It must hold for any ground term s of the form f (s1 , . . . ,sn ) that store term(s, P), retrieve term(P, F, Ss),assigns to the variable F the symbol f , and to Ss a list of ground values[p1 ,. . .,pn ] such that additional queries full retrieve term(pi , Si ), i = 1, . . . , nassign to the variables Si values identical to si . Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 15. A workaround implemented in PrologADT exampleExampleThe following call converts the term f(a,g(b)) into its internalrepresentation and returns a pointer value in the variable P. store_term(f(a,g(b)),P).After this, the following sequence of calls will succeed. retrieve_term(P,f,[P1,P2]), retrieve_term(P1,a,[]), retrieve_term(P2,g,[P21]), retrieve_term(P21,b,[]), full_retrieve_term(P,f(a,g(b))). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 16. A workaround implemented in PrologImplementation with assertOne possible way of implementing the predicates introduced above is tohave store term/2 asserting facts for the retrieve term/3 predicateusing increasing integers as pointers.ExampleThe call store term(f(a,g(b)),P) considered in example 1 may assignthe value 100 to P and as a side-effect assert the following facts. retrieve_term(100,f,[101,102]). retrieve_term(101,a,[]). retrieve_term(102,g,[103]). retrieve_term(103,b,[]).Notice that Prolog’s indexing on first arguments ensures a constant lookuptime. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 17. A workaround implemented in PrologAnother level of abstraction: lookup pattern/2Finally we introduce a utility predicate which may simplify the use of therepresentation in application programs. It utilizes a special kind of terms,called patterns, which are not necessarily ground and which may containsubterms of the form lazy(variable).lookup pattern( +pointer , +pattern) The pattern is matched in a recursive way against the term represented by the pointer p in the following way. – lookup pattern(p,X) is treated as full retrieve term(p,X). – lookup pattern(p,lazy(X)) unifies X with p. – For any other pattern =.. [F,X1 ,. . . ,Xn ] we call retrieve term(p, F, [P1 ,. . .,Pn ]) followed by lookup pattern(Pi ,Xi ), i = 1, . . . , n. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 18. A workaround implemented in PrologA lookup pattern/2 exampleExampleContinuing the previous example, we get that after the callstore term(f(a,g(b)),P). lookup_pattern(100, f(X,lazy(Y)))leads to X=a and Y=102.The lookup pattern/2 predicate will be use to simplify the automaticprogram transformation introduced later.Further efficiency can be gained by compiling it out for each specificpattern (i.e. replacing it with calls to retrieve term/2). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 19. A workaround implemented in Prolog ExamplesExamplesWe consider two applying the workaround to two example programs: Edit Distance implemented in Prolog Hidden Markov Models in PRISMFor each of these we measure the impact (in time and space) of atransformed version using the workaround. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 20. A workaround implemented in Prolog ExamplesImplementation of store term/2 and retrieve term/2The programs have been transformed manually for these experimentsbased on the pointer based representation previously introduced, butsimplified slightly for lists.For store term/2 and retrieve term/2 we use the following simpleimplementation:store_term([],Index) :- assert(retrieve_term([],Index)).store_term([X|Xs],Idx) :- Idx1 is Idx + 1, assert(retrieve_term(Idx,[X,Idx1])), store_term(Xs,Idx1). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 21. A workaround implemented in Prolog ExamplesEdit DistanceCalculate the minimum number of edits (insertions, deletions andreplacements) to transform on list into another. a minimal edit-distance algorithm written in Prolog which is dependent on tabling for any non-trivial problem. The theoretical best time complexity of edit distance has been proven to be O(N 2 ). Measure for various problem sizes. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 22. A workaround implemented in Prolog ExamplesEdit distance: Implementation in Prolog.:- table edit/3. edit([X|Xs],[Y|Ys],Dist) :- edit([X|Xs],Ys,InsDist),edit([],[],0). edit(Xs,[Y|Ys],DelDist), edit(Xs,Ys,TailDist),edit([],[Y|Ys],Dist) :- (X==Y -> edit([],Ys,Dist1), Dist = TailDist Dist is 1 + Dist1. ; % Minimum of insertion,edit([X|Xs],[],Dist) :- % deletion or substitution edit(Xs,[],Dist1), sort([InsDist,DelDist,TailDist], Dist is 1 + Dist1. [MinDist|_]), Dist is 1 + MinDist). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 23. A workaround implemented in Prolog ExamplesEdit distance, transformed (1). original version transformed version edit(XIdx,YIdx,0) :- edit([],[],0). retrieve_term(XIdx,[]), retrieve_term(YIdx,[]). edit(XIdx,YIdx,Dist) :- edit([],[Y|Ys],Dist) :- retrieve_term(XIdx,[]), retrieve_term(YIdx,[_,YIdxNext]), edit([],Ys,Dist1), edit(XIdx,YIdxNext,Dist1), Dist is 1 + Dist1. Dist is Dist1 + 1. edit(XIdx,YIdx,Dist) :- edit([X|Xs],[],Dist) :- retrieve_term(YIdx,[]), retrieve_term(XIdx,[_,XIdxNext]), edit(Xs,[],Dist1), edit(XIdxNext,YIdx,Dist1), Dist is Dist1 + 1. Dist is 1 + Dist1. continued on next slide... Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 24. A workaround implemented in Prolog ExamplesEdit distance, transformed (2). original version transformed version edit([X|Xs],[Y|Ys],Dist) :- edit(XIdx,YIdx,Dist) :- retrieve_term(XIdx,[X,NextXIdx]), retrieve_term(YIdx,[Y,NextYIdx]), edit([X|Xs],Ys,InsDist), edit(XIdx,NextYIdx,InsDist), edit(Xs,[Y|Ys],DelDist), edit(NextXIdx,YIdx,DelDist), edit(Xs,Ys,TailDist), edit(NextXIdx,NextYIdx,TailDist), (X==Y -> (X==Y -> Dist = TailDist Dist = TailDist ; ; sort([InsDist,DelDist,TailDist], sort([InsDist,DelDist,TailDist], [MinDist|_]), [MinDist|_]), Dist is 1 + MinDist). Dist is 1 + MinDist). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 25. A workaround implemented in Prolog ExamplesEdit distance: Benchmarking results (time) ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● 1.0 ● ● ●● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● 0.8 ● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ●●● ●● time usage (seconds) ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● 0.6 ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● 0.4 ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ●● ● XSB 3.3 ●● ●● ● ●●● ●●●● 0.2 ● ● ● ●● ●● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●● ● ● ● ●●● ● B−Prolog 7.5#4 ●● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●●●●● ●● ● ● ●●●●● ● ●● ●● ●● ●● ●● ●●● ● ● ● ●● ●● ● ●● ●● Yap 6.2.1 ● ●●● ● ●● ●●● ●●● ●● ● ●●● ●●●● ●● ● ● index(XSB) ●●● ●●● ●● ●●● ●●● ●●● ●●● ●●● ●● ●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●●●●● ●●●●● ●●●● ●●● ●●● ●● ●●● ●●● ●●●●● ●●●●● index(B−Prolog) 0.0 index(Yap) 0 50 100 150 200 250 300 350 list length (N) Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 26. A workaround implemented in Prolog ExamplesEdit distance: Benchmarking results (space) ● ● ● ● 1e+05 ● ● ● ● XSB 3.3 ● ● ● B−Prolog 7.5#4 ● ● Yap 6.2.1 8e+04 ● ● index(XSB) ● space usage (kilobytes) index(B−Prolog) ● ● index(Yap) ● ● ● 6e+04 ● ● ● ● ● ● ● ● ● 4e+04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2e+04 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● 0e+00 ●● ●● ●● ●● ●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●● ●●● ●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●● 0 50 100 150 list length (N) Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 27. A workaround implemented in Prolog ExamplesPRISMPRISM ( PRogramming In Statistical Modelling ) Extends Prolog (B-Prolog) with special goals representing random variables (msws). Semantics: Probabilistic Herbrand models. Supports various probabilistic inferences. Relies heavily on tabling for the efficiency of the probabilistic inferences. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 28. A workaround implemented in Prolog ExamplesA Hidden Markov Model in PRISMvalues(init,[s0,s1]). 0.1values(out(_),[a,b]).values(tr(_),[s0,s1]). 0.8 s0hmm(L):- 0.3: a 0.5 0.9 s1 msw(init,S), 0.7: b 0.2 0.6: a hmm(S,L). init 0.4: b 0.5hmm(_,[]).hmm(S,[Ob|Y]) :- msw(out(S),Ob), msw(tr(S),Next), hmm(Next,Y). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 29. A workaround implemented in Prolog ExamplesTransformed Hidden Markov ModelWe only need to consider the recursive predicate, hmm/2.original version transformed versionhmm(_,[]). hmm(S,ObsPtr):- retrieve_term(ObsPtr,[]).hmm(S,[Ob|Y]) :- hmm(S,ObsPtr) :- msw(out(S),Ob), retrieve_term(ObsPtr,[Ob,Y]), msw(tr(S),Next), msw(out(S),Ob), hmm(Next,Y). msw(tr(S),Next), hmm(Next,Y). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 30. A workaround implemented in Prolog ExamplesHidden Markov Model: Benchmarking results (time) b) Running time without indexed lookup a) Running time with indexed lookup 140 ● ● 0.08 ●●● ● ● ● ● ● 120 ● ● ● ● ●● ●Running time (seconds) Running time (seconds) ● 100 0.06 ● ● ● ● ● ●● ● ● ● ● ● 80 ● ● ● ● ● 0.04 ● ●● ●● ● 60 ● ● ● ● ● ●● ● ● ● 40 ● ● ● ● 0.02 ● ● ● ● ●● ● ● ● ● ● 20 ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ●●● ●●●●●●●●●● 0.00 0 ● 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 sequence length sequence length Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 31. An automatic program transformationAn automatic program transformationWe introduce an automatic transformation from a tabled program to anefficient version using our approach.To support the transformation, the user must declare modes for whichpredicate arguments that should be indexed.table_index_mode(hmm(+))table_index_mode(hmm(-,+))Each clause whose head is covered by a table mode declaration istransformed and all other clauses are left untouched. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 32. An automatic program transformationTransformation of HMM in PRISMThe transformation moves any term appearing in an indexed position inthe head of a clause into a call to the lookup pattern predicate, which isadded to the body. Variables in such terms are marked lazy when they donot occur in any non-indexed argument inside the clause.original program transformed programhmm(_,[]). hmm(S,ObsPtr):- lookup_pattern(ObsPtr,[]).hmm(S,[Ob|Y]) :- hmm(S,ObsPtr) :- msw(out(S),Ob), lookup_pattern(ObsPtr,[Ob | lazy(Y)]), msw(tr(S),Next), msw(out(S),Ob), hmm(Next,Y). msw(tr(S),Next), hmm(Next,Y). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 33. An automatic program transformationEach clause whose head predicate is covered by a table mode de Transformation algorithmtransformed using the procedure outlined in algorithm 1, and all oare left untouched. The transformation moves any term appearing in for each clause H:-B in original program do if table index mode (M) matching H then for each argument Hi ∈ H, Mi ∈M do if Mi =’+’ then ￿ Hi ← MarkLazy(Hi , B) ￿ B ← (lookup pattern(Vi , Hi ), B) Hi ← V i end end where MarkLazy is defined as MarkLazy(Hi ,B) : P otentialLazy = variables in all goals G ∈ B where G has table index mode declaration of Structured Data Christian Theil Have and Henning Christiansen Efficient Tabling
  • 34. B ← (lookup pattern(Vi , Hi ), B) An automatic program transformation Hi ← V i end Transformation algorithm: MarkLazy end where MarkLazy is defined as MarkLazy(Hi ,B) : P otentialLazy = variables in all goals G ∈ B where G has table index mode declaration N onLazy = variables in all goals G ∈ B where G has no table index mode declaration Lazy = P otentialLazy N onLazy for each variable V ∈ Hi do if V ∈ Lazy then V ← lazy(V ) end Algorithm 1: Program transformation.position in the head of a clause into a call to theStructured Data pattern Christian Theil Have and Henning Christiansen Efficient Tabling of lookup
  • 35. An automatic program transformationConclusions All Prolog implementations handle tabling of structured data inefficiently. We presented a Prolog based program transformation that ensures O(1) time and space complexity of tabled lookups. The transformation is data invariant and works with all the existing tabling systems. The transformation makes it possible to scale to much larger problem instances.Some limitations and problems to be solved: Only applies to ground input arguments. Abstract pointers in head of clauses circumvents usual pattern based indexing (constant time overhead). Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
  • 36. An automatic program transformationThe road ahead...Our program transformation should be seen as workaround, until suchoptimizations find their way into the tabling systems. We hope thatProlog implementors will pick up on this and integrate such optimizationsdirectly in the tabling systems, so that the user does not need to transformhis program, and need not worry about the underlying tabledrepresentation and its implicit complexity. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data