• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ISWC 2012 "Efficient execution of top-k SPARQL queries"
 

ISWC 2012 "Efficient execution of top-k SPARQL queries"

on

  • 567 views

Slides of my talk on efficient execution of top-k queries in SPARQL at ISWC 2012 in Boston. Including some bonus back-up slides :)

Slides of my talk on efficient execution of top-k queries in SPARQL at ISWC 2012 in Boston. Including some bonus back-up slides :)

Statistics

Views

Total Views
567
Views on SlideShare
565
Embed Views
2

Actions

Likes
1
Downloads
4
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ISWC 2012 "Efficient execution of top-k SPARQL queries" ISWC 2012 "Efficient execution of top-k SPARQL queries" Presentation Transcript

    • Ef#icient  Execution  of  top-­‐k  SPARQL  queries  Sara  Magliacane  (VU  University  Amsterdam)  Alessandro  Bozzon  (Politecnico  di  Milano)  Emanuele  Della  Valle  (Politecnico  di  Milano)  
    • Outline  •  Introduc?on   •  What  are  top-­‐k  queries?   •  Why  do  we  need  to  op?mize  them?  •  Our  approach:   •  A  rank-­‐aware  SPARQL  algebra   •  A  rank-­‐aware  execu?on  model   •  Three  planning  strategies   1  •  Evalua?on  
    • What  is  a  top-­‐k  query?    • A  query  that  returns     1.  a  limited  number  of  results  k     2.  ordered  by  a  scoring  func?on  that   combines  several  criteria     2  
    • Rankings,  rankings  everywhere…   3  
    • Rankings,  rankings  everywhere…   4  
    • Rankings,  rankings  everywhere…   5  
    • Why  do  we  need  to  optimize   them?    A  very  intui?ve  and  simplified  example:  •  Top  3  largest  countries  (by  both  area  and   popula?on)     6  
    • The  standard  way:    materialize-­‐then-­‐sort  scheme   Fetch  3  best  results   Sort  all  the  242  join  combina?ons     …   Compute  all  the  242  join  combina?ons   242   242   Countries  by   Countries  by   …   …   area   popula?on   …   …   …   7  
    • Can  we  make  it  more  ef#icient?   Can  we  exploit  the  available  sorted  access  by  area  and   by  popula?on?   Fetch  3  best  results   Order  incrementally  the  combina?ons  using  par0al  orders   7   9   Countries  by   Countries  by   area   popula?on   …   8  
    • The  split-­‐and-­‐interleave  scheme    •  The  intui?on  of  the  previous  example  can  be  formalized   with  the  split-­‐and-­‐interleave  scheme  from  RDBMS  [Li2005,   Hwang2007,  Ilyas2004,  Ilyas2008]   1.  Split  the  evalua?on  of  the  scoring  func?on  into  single  criteria     2.  Interleave  them  with  other  operators   3.  Use  par?al  orders  to  construct  incrementally  the  final  order  •  Standard  assump?ons:   •  Monotone  scoring  func?on   •  Each  criterion  is  evaluated  as  a  [0,1]  number  (normaliza?on)  •  Op?mized  for  the  case  of  fast  sorted  access  for  each  criterion   9  
    • No  free  lunch…   /01(+! ! Split-­‐and-­‐interleave   /01(+! 234!,567! ",)*-).-!! ! Orders  of   234!,567! magnitude   ",)*-).-!!   Orders  of   magnitude     >! *8697.!0:!-7;5.7-!.7;8<,;!+! !!+=! >! *8697.!0:!-7;5.7-!.7;8<,;!+! !!!+=! ! ?61.0@767*,! /00!68AB!0@7.B7)-! ?61.0@767*,! /00!68AB!0@7.B7)-!Users  are  interested  in  1",)*-).-!"#$%&!(search  engines)   <=  k  <=  100   C537-!*8697.!0:!-7;5.7-!.7;8<,;!D! C537-!*8697.!0:!)<<!.7;8<,;!E!234!,567! C537-!*8697.!0:!-7;5.7-!.7;8<,;!D! ",)*-).-!"#$%&! "#$%&(%)*+!C537-! C537-!*8697.!0:!)<<!.7;8<,;!E!
    • 11  
    • Top-­‐k  queries  in  SPARQL  1.1  Example  query  on  BSBM  [Bizer2009]:  •  The  top  10  offers  ordered  by  the  product  ra?ngs  and  offer  price:   SELECT  ?product  ?offer     (norm1(?avgRat1)  +  norm2(?avgRat2)  +  norm3(?price)   AS  ?score)   WHERE  {     ?product  hasAvgRat1  ?avgRat1  .   ?product  hasAvgRat2  ?avgRat2  .   ?product  hasName  ?name  .   ?product  hasOffers  ?offer  .   ?offer  hasPrice  ?price       }     ORDER  BY  DESC  (?score)       LIMIT  10   12       t    Tens  of  seconds  on  5M    riples  (could  be  improved  to  milliseconds)  
    • Split-­‐and-­‐interleave  in  SPARQL?    Related  work    •  A  possible  solu?on  [Straccia2010,  Bozzon2011]:   •  Rewrite  SPARQL  into  SQL     •  Use  exis?ng  op?mized  RDBMS  (e.g.  RankSQL  [Li2005])  •  Disadvantages:     •  Works  if  data  are  already  in  a  RDBMS  •  What  about  na?ve  SPARQL  op?miza?ons?   •  Federated  queries  over  Linked  Data  [Wagner2012]:   13   complementary  to  our  approach  
    • Challenges  for  native  SPARQL  split-­‐and-­‐interleave  solutions     Query   Algebra   Algebraic     Query  plan   Planner   generator   tree     Physical   Planning   Algebra   operators   strategies   Differences  with  SQL  and  RDBMS   Proposed  solu0on   Different  algebra     STEP  1:  New  algebra  (algebraic   operators  and  algebraic   equivalences)   Different  cost  of  data  access  in   STEP  2:  New  algorithms  for   na?ve  RDF  triplestores   physical  operators,  possibly  using   (sorted  access  is  slow)   less  sorted  access   14   Addi?onal  op?miza?on  dimensions   STEP  3:  New  planning  strategies  
    • Step  1:  a  rank-­‐aware  algebra  •  SPARQL-­‐Rank  algebra  [Bozzon2011]   •  Extends  the  standard  SPARQL  algebra  [Perez2009]     •  Ranked  set  of  mappings:  set  of  mappings  augmented  with  an   order  rela?on     Extended   New   OPERATORS   EQUIVALENCES   15  
    • The  SPARQL-­‐Rank  algebraic  operators   ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?scoreNew  operator   SLICE [0,10] SLICE [0,10] SLICE [0,10]rank   g (?p1) Sequence 3 ?pr = ?pr g3(?p1)   g (?a1) 1 g3(?p1) ?pr hasN ?n seqScan g1(?a1) ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1 ?pr hasO ?of . ?of hasP1 ?p1 ?pr hasO ?of . ?of hasP1 ?p1 seqScan orderScan_a1 seqScan (a) (b) (c) 16  
    • The  Rank  Operator   ?x   ?y   ?p1   ?p2   ?x   ?y   ?p1   Fp1   µ1   1   8   0.8   0.8   ρp1   µ1   1   8   0.8   1.8   µ2   3   3   0.3   0.6   µ3   3   4   0.4   1.4   µ3   3   4   0.4   0.6   µ2   3   3   0.3   1.3   Ω   ρp1(Ω  )  
    • The  SPARQL-­‐Rank  algebraic  operators  Redefined    standard    operators     18  
    • The  Join  Operator   ?x   ?y   ?p1   Fp1   ?x   ?z   ?p2   Fp2   µ1   1   8   0.8   1.8   µ4   1   9   0.8   1.8   µ3   3   4   0.4   1.4   µ5   3   0   0.6   1.6   µ2   3   3   0.3   1.3   Ωp1   Ω’p2   ?x   ?y   ?z   ?p1   ?p2   Fp1Up2   µ1  U  µ4   1   8   9   0.8   0.8   1.6   µ3  U  µ5   3   4   0   0.4   0.6   1.0   µ2  U  µ5   3   3   0   0.3   0.6   0.9  
    • SPARQL-­‐Rank  algebraic  equivalences   Split   20  
    • SPARQL-­‐Rank  algebraic  equivalences  •  Allows  the  splimng  of  a  monolithic  scoring  func?on  into   several  rank  operators     21  
    • SPARQL-­‐Rank  algebraic  equivalences  Interleave   22  
    • SPARQL-­‐Rank  algebraic  equivalences  •  Allows  to  order  incrementally  the  results  by  pushing  the   rank  operator  inside  the  query  tree.        
    • From  algebra  to  execution   24  Image  from:    hnp://de-­‐?mekeeper.com/yahoo_site_admin/assets/images/benzinger20gold20gears200291.17120724_std.jpg  
    • Step  2:  physical  operators    (top-­‐k  algorithms)    •  Rank  operator     •  If  there  is  a  sorted  access  index  on  the  ranking  criterion  we  use  it   •  Otherwise:  rank  aggrega?on  algorithms,  e.g.  [Hwang2007]    •  Join  operator   •  If  the  right  operand  does  not  influence  the  ranking:  streaming   index  join   •  Otherwise:  a  rank-­‐join  algorithm  [see  next  slides]  •  Other  operators  are  straighsorward:   25   •  E.g.  the  standard  FILTER  conserves  the  ordering  of  its  input  
    • Rank-­‐Join  algorithms  •  Different  algorithms  based  on  available  RankJoin in  the  inputs:   access   (a) •  Hash  Rank-­‐Join   RankJoin •  e.g.  HRJN  [Ilyas2004]         (a) sortedAccess sortedAccess   RankSequence sortedAccess sortedAccess   (b) RankSequence   (b) sortedAccess randomAccess •  Random  Access  Rank-­‐Join   RA-RankJoin sortedAccess randomAccess •  e.g.  RA-­‐HRJN  [Ilyas2004]       (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess sortedAccess sortedAccess randomAccess randomAccess •  RankSequence  (e,g,  RSEQ)   RankSequence •  Minimum  sorted  access   (b) 26   •  Leverages  random  access   sortedAccess randomAccess RA-RankJoin (c)
    • Rank-­‐Join  algorithms  •  Different  algorithms  based  on  available  RankJoin in  the  inputs:   access   •  Hash  Rank-­‐Join   (a) RankJoin Literature   •  e.g.  HRJN  [Ilyas2004]         (a) sortedAccess sortedAccess   RankSequence sortedAccess sortedAccess   (b) RankSequence   (b) sortedAccess randomAccess •  Random  Access  Rank-­‐Join   RA-RankJoin sortedAccess randomAccess •  e.g.  RA-­‐HRJN  [Ilyas2004]       (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess sortedAccess sortedAccess randomAccess randomAccess •  RankSequence  (e,g,  RSEQ)   RankSequence •  Minimum  sorted  access   (b) 27   •  Leverages  random  access   sortedAccess randomAccess RA-RankJoin (c)
    • Rank-­‐Join  algorithms  •  Different  algorithms  based  on  available  RankJoin in  the  inputs:   access   (a) •  Hash  Rank-­‐Join   RankJoin •  e.g.  HRJN  [Ilyas2004]         (a) sortedAccess sortedAccess   RankSequence sortedAccess sortedAccess   (b) RankSequence   (b) sortedAccess randomAccess •  Random  Access  Rank-­‐Join   RA-RankJoin sortedAccess randomAccess •  e.g.  RA-­‐HRJN  [Ilyas2004]       (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess sortedAccess sortedAccess randomAccess randomAccess •  RankSequence  (e,g,  RSEQ)   RankSequence New   •  Minimum  sorted  access   (b) 28   •  Leverages  random  access   sortedAccess randomAccess RA-RankJoin (c)
    • Step3:  planning  strategies  •  Using  the  algebraic  equivalences  we  can  produce  several   equivalent  algebraic  trees  •  The  planner  can  use  them  to  implement  several  planning   strategies     ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?score ?pr, ?of, ?pr, ?of, ?pr, ?of, ?score ?score SLICE [0,10] SLICE [0,10] SLICE [0,10][0,10] SLICE SLICE [0,10] [0,10] SLICE SLICE [0,10] [0,10] SLICE Join ORDER Sequence Sequence [?score] ?pr = ?pr RankJoin g3(?p1)(?p1) g3 ?pr = ?pr?pr = ?pr g3(?p1) 3(?p1) g EXTEND ?pr = ?pr g3(?p1) g3(?p1) ?pr hasN [?score =g1(?a1)+g2(?a2)+g3(?p1)] ?n hasN ?n ?pr RankJoin ?pr hasN ?n . g2(?a2) g1(?a1)(?a1) g1 ?pr = ?pr g1(?a1) g1(?a1) seqScanseqScan ?pr hasA1 ?a1. ?pr hasA2 ?a2 . g3(?p1) g1(?a1) ?pr hasA1 ?a1 . ?a1hasN ?n . ?n ?pr hasA1 hasA1 ?a1 . ?pr hasN ?n . ?pr hasA1 ?pr . ?pr hasN . ?pr ?a1 . ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasOhasO ?of hasP1hasP1 ?p1 hasO ?of . ?of hasP1 ?p1 ?pr ?of . . ?of ?p1 ?pr ?pr hasO ?of . ?of hasP1 ?p1 hasO ?ofhasO hasP1 ?p1 ?pr ?pr . ?of ?of . ?of hasP1 ?p1 ?pr hasO ?of . ?pr hasO ?of . seqScan seqScan orderScan_a1 orderScan_a1 seqScan seqScan ?of hasP ?p1. ?of hasP ?p1 . ?pr hasA1 ?a1 . ?pr hasA2 ?a2 . (a) (a) (b) (b) (c) (c) (a) (b)1.  Rank  of  BGPs   2.  Interleaved   3.  Rank  Join   29  
    • 1.  Rank  of  BGPs  (ROB)   •  Split  the  monolithic  scoring  func?on  into  several  incremental   rank  operators  (rho)   ?product, ?offer, ?score ?product, ?offer, ?score SLICE [0,10] SLICE [0,10] ORDER norm3(?price) [?score] EXTEND norm2(?avgRat2)[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] norm1(?avgRat1) ?product hasAvgRat1 ?avgRat1. ?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?produc ?product hasName ?name . ?product hasOffer ?offer . ?produc ?product hasOffer ?offer . ?offer hasPrice ?price. ?produc ?offer hasPrice ?price. 30   ?offer h Materialize-­‐then-­‐sort   Rank  of  BGPs   ?product, ?offer, ?score SLICE [0,10]
    • 2.  Interleaved  (INTER)   •  Separate  the  panern  in  two  groups:   •  Triple  panerns  that  influence  the  ranking     •  Triple  panerns  that  don’t  influence  the  ranking   ?product, ?offer, ?score ?product, ?offer, ?score SLICE [0,10] SLICE [0,10] ORDER ?product = ?product [?score] norm3(?price) {?product hasName ?name } EXTEND[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] norm1(?avgRat1) ?product hasAvgRat1 ?avgRat1. norm2(?avgRat2) ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasAvgRat1 ?avgRat1. ?product hasOffer ?offer . ?product hasAvgRat2 ?avgRat2 . ?offer hasPrice ?price. ?product hasOffer ?offer . ?offer hasPrice ?price. 31   Materialize-­‐then-­‐sort   Interleaved     ?product, ?offer, ?score SLICE [0,10]
    • 3.  Rank-­‐Join  (RJ)   •  Split  into  one  triple  panern  for  each  ranking  criterion   Most  appropriate  join  algorithm  based  on  available  access   • ?product, ?offer, ?score SLICE [0,10] ORDER [?score] EXTEND[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] ?product, ?offer, ?score ?product, ?offer, ?score SLICE [0,10] ?product hasAvgRat1 ?avgRat1. SLICE [0,10] ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ORDER ?product = ?product [?score] ?product hasOffer ?offer . RankJoin ?offer hasPrice ?price. EXTEND ?product = ?product {?product hasName ?name} [?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)] RankJoin ?product = ?product norm2(?avgRat2) ?product hasAvgRat1 ?avgRat1. norm3(?price) norm1(?avgRat1) ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer . ?product hasAvgRat2 ?avgRat2} ?product hasOffer ?offer . ?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1} ?offer hasPrice ?price. 32   Materialize-­‐then-­‐sort   Rank-­‐Join   ?product, ?offer, ?score SLICE [0,10]
    • Experimental  evaluation   33  
    • Experimental  evaluation  •  Prototype  implementa?on  of  our  system:   •  ARQ-­‐Rank  (extends  Jena  ARQ  2.8.9)    •  Extended  version  of  Berlin  SPARQL  Benchmark   [Bizer2009]   •  Added  ranking  anributes     •  Added  top-­‐k  queries  •  Jena  TDB  0.8.11  as  storage   34  •  Code  and  experiments:  sparqlrank.search-­‐compu?ng.org  
    • Experiment  1:  compare  planning  strategies  •  Example  query,  5M  triples  dataset  •  Worst-­‐case  scenario:  no  sorted  access  indexes  (slow  sorted   access)   One  to  two   orders  of   magnitude   bener     35  
    • Experiment  1:  compare  planning  strategies  •  Example  query,  5M  triples  dataset  •  Standard  scenario:  sorted  access  indexes  (fast  sorted  access)     Two  orders  of   magnitude   bener     36  
    • Experiment  2:  Small  Benchmark     (8  queries)   ($ !"($ !" !"($ *+,-.$,/,0+123$14,$5467$ )$ !")$ !" !")$ %$ !"%$ !" !"%$ !$ !"!$ !" !"!$"#$ !$ &$ !""#$$ %&"#$$ &""#$ !$ &$ 37   !""#$$ %&"#$$ &""#$ !$ &$ !""#$$ %&"#$$ &""#$ !$ :$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$%#$ !&# $ !"#$ !%#$
    • Conclusions  and  Future  Work  •  A  system  that  speeds  up  the  execu?on  of  top-­‐k  queries  in  SPARQL   by  orders  of  magnitude:   •  STEP  1:  A  rank-­‐aware  SPARQL  algebra  (SPARQL-­‐Rank  algebra)   •  STEP  2:  A  rank-­‐join  algorithm  (RSEQ)   •  STEP  3:  Three  planning  strategies  (ROB,  INTER,  RJ)    •  ARQ-­‐Rank,  a  rank-­‐aware  extension  of  Jena  ARQ  •  A  small  benchmark  for  top-­‐k  queries,  based  on  BSBM  [Bizer2009]    •  All  available  at  sparqlrank.search-­‐compu?ng.org      •  Future  work:   •  More  advanced,  cost-­‐based,  op?miza?on  techniques   •  Extension  to  federated  top-­‐k  query  processing     38   •  Top-­‐k  queries  under  OWL2QL  entailment  regime  
    • Bibliography  •  [Bozzon2011]  A.  Bozzon  et  al.  Towards  and  efficient  SPARQL  top-­‐k   query  execu?on  in  virtual  RDF  stores.  In  DBRANK  workshop  at  VLDB   ’11,  2011.  •  [Wagner2012]  A.  Wagner  et  al.  Top-­‐k  Linked  Data  Query  Processing.   In  ESWC  ’12.  Springer,  2012.  •  [Bizer2009]  C.  Bizer  and  A.  Schultz.  The  Berlin  SPARQL  Benchmark.   Int.  J.  Seman?c  Web  Inf.  Syst.,  5(2),  2009.  •  [Li2005]  C.  Li  et  al.  RankSQL:  query  algebra  and  op?miza?on  for   rela?onal  top-­‐k  queries.  In  SIGMOD  ’05.  ACM,  2005.  •  [DellaValle2012]  E.  Della  Valle  et  al.  Order  maners!  harnessing  a   world  of  orderings  for  reasoning  over  massive  data.  Seman?c  Web   Journal,  2012.  •  [Hwang2007]  S.-­‐w.  Hwang  and  K.  Chang.  Probe  minimiza?on  by   39   schedule  op?miza?on:  Suppor?ng  top-­‐k  queries  with  expensive   predicates.  IEEE  TKDE,  19(5),  2007.  
    • Bibliography  •  [Ilyas2004]  I.  F.  Ilyas  et  al.  Rank-­‐aware  Query  Op?miza?on.  In   SIGMOD  ’04.  ACM,  2004.    •  [Ilyas2008]  I.F.Ilyas  et  al.  A  survey  of  top-­‐k  query  processing   techniques  in  rela?onal  database  systems.  ACM  Comput.  Surv.,  40 (4),  2008.    •  [Perez2009]  J.  Perez  et  al.  Seman?cs  and  complexity  of  SPARQL.   ACM  Trans.  Database  Syst.,  34(3),  2009.    •  [Schmidt2010]  M.  Schmidt  et  al.  Founda?ons  of  SPARQL  query   op?miza?on.  In  ICDT  ’10,  ACM,  2010.    •  [Straccia2010]  U.  Straccia.  SoxFacts:  A  top-­‐k  retrieval  engine  for   ontology  mediated  access  to  rela?onal  databases.  In  SMC  ’10.  IEEE,   2010.     40  
    • 41  
    • BACK-­‐UP  SLIDES   42  
    • Why  do  we  need  to  optimize   them?    An  addi?onal  less  intui?ve  and  less  simplified  example:  •  Top  2  couples  of  most  populated  ci?es  and   largest  countries   Moscow   Shanghai     43  
    • The  materialize-­‐then-­‐sort   scheme   Moscow   Shanghai   Fetch  2  best  results   Sort  all  14K  join  combina?ons     Shanghai   …   Va?can   Materialize  all  14K  combina?ons   1   249   14K*   Shanghai  0.567   Istanbul  0.563   Karachi   Mumbai  0.497   Countries  by   Ci?es  by   Moscow  0.185   area   popula?on   0.05   …   0.04   Va?can   44  2e-­‐08   *  According  to  DBPedia,  but  probably  more  
    • Can  we  make  it  more  ef#icient?   Can  we  exploit  the  sorted  access  by  area  and  by   popula?on?     Moscow   Shanghai   Fetch  2  best  results   Order  incrementally  the  combina?ons  using  par0al  orders   9   13   Shanghai   Istanbul   Karachi   Countries  by   Ci?es  by   Mumbai   area   popula?on   Moscow   …   …   45  
    • SPARQL-­‐Rank  algebra  De#initions   Mapping µ … an intermediate SPARQL solution, equivalent to a SQL tuple ?x   ?y   ?p1   ?p2   µ1   1   8   0.8   0.8   set of mappings µ2   3   3   0.3   0.6   Maximal possible score Given a scoring function F (p1, …, pn) and a set of predicates P = {p1, …, pj} the maximal possible score for a mapping µ is defined as: FP (p1, …, pn) [µ] = F ( pi = pi [µ] if pi ∈ P pi = 1 otherwise ∀i )
    • SPARQL-­‐Rank  algebra  De#initions   Ranking principle Given two mappings µ1 e µ2 with FP [µ1]> FP [µ2] , if we process µ2 we need to process also µ1. Ranked set of mappings Given a set of predicates P, a ranked set of mappings ΩP is a set of mappings Ω augmented with the following properties: •  Score: for each mapping µ, the maximal possible score FP [µ] •  Order: the order relation <ΩP is defined on ΩP based on the scores of the single mappings
    • The  SPARQL-­‐Rank  algebraic  operators   48  
    • SPARQL-­‐Rank  algebraic  equivalences   49  
    • SPARQL-­‐Rank  algebraic  equivalences   Allows to order incrementally the results by pushing the rank operator inside the query execution tree.
    • The  RSEQ  algorithm   51  
    • Evaluation:  additional  technical  information  •  Experimental  semng:   •  AMD  64  bit  processor  2.66  GHz   •  4  GB  RAM   •  Debian  kernel  2.6.26-­‐2   •  Sun  Java  1.6.0     •  Maximum  heap  size  2GB    •  8  queries  available  at  sparqlrank.search-­‐compu?ng.org   52  
    • More  experimental  results  the  RankJoin  operators  •  Example  query,  5M  triples  dataset  •  Worst-­‐case  scenario:  no  sorted  access  indexes  (lex)   •  RSEQ  is  the  best,  especially  for  k  <  1000  •  Standard  scenario:  sorted  access  indexes  (right)   •  All  three  are  comparable,  RA-­‐HRJN  is  best  for  k  >  1000   53  
    • ARQ-­‐Rank  architecture   54