Your SlideShare is downloading. ×
BowlognaBench --- Benchmarking RDF Analytics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

BowlognaBench --- Benchmarking RDF Analytics

133
views

Published on

Conference talk at SIMPDA 2011, June 2011, Campione d'Italia, Italy

Conference talk at SIMPDA 2011, June 2011, Campione d'Italia, Italy

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
133
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BowlognaBench   Benchmarking  RDF  Analy5cs   Gianluca  Demar5ni,  Iliya  Enchev,   Joël  Gapany,  and  Philippe  Cudré-­‐Mauroux    eXascale  Infolab  &  Faculty  of  Humani5es   University  of  Fribourg,  Switzerland   30-­‐Jun-­‐11   Gianluca  Demar5ni   1  
  • 2. Mo5va5on   •  Seman5c  Data  keeps  increasing  on  the  Web   •  More  common  is  the  need  to  run  OLAP-­‐type   queries   – How  did  university  student  performance  evolve   over  last  5  years?   •  A  novel  benchmark  for  Knowledge  Bases   focusing  on  complex  Analy5cs  queries   30-­‐Jun-­‐11   Gianluca  Demar5ni   2  
  • 3. Why  do  we  need  a  new  RDF   benchmark?   •  Exis5ng  RDF  benchmarks  (e.g.,  LUBM)   – Don’t  deal  with  complex  analy5c  queries   – Don’t  look  at  the  temporal  dimension   – Don’t  model  a  realis5c  se_ng   •  Analy5c  benchmarks  exist  for  rela5onal   systems  (e.g.  TPC-­‐H)   30-­‐Jun-­‐11   Gianluca  Demar5ni   3  
  • 4. The  Bologna  Reform   •  Started  in  June  1999   •  Framework  for  higher  educa5on  systems   •  47  Countries   •  Common  academic  degrees   •  Common  study  structure   •  Common  terminology   30-­‐Jun-­‐11   Gianluca  Demar5ni   4  
  • 5. The  university  se_ng  ader  Bologna   •  A  lot  of  data  is  available   –  Not  following  standard  schemas   –  Comprehensive  and  available  data  is  a  success  factor     •  Shared  data   –  Erasmus  exchanges   –  Courses  in  a  given  language   •  Analy5c  tools  may  help  monitoring  university   performance   30-­‐Jun-­‐11   Gianluca  Demar5ni   5  
  • 6. An  ontology  about  Bologna     •  A  Lexicon  for  the  Bologna  Reform   – Basic  set  of  terms  for  the  new  system   – Stable  across  5me  and  ins5tu5ons   – Developed  by  a  professional  terminologist   30-­‐Jun-­‐11   Gianluca  Demar5ni   6  
  • 7. The  ontology  crea5on  process     •  The  Bowlogna  Ontology   – 29  top  classes  (67  in  total)   – Classes:  student,  professor,  evalua5on,  teaching   unit,  ECTS  credit,  semester,  etc.   – Concept  defini5ons  in  English,  French,  German   30-­‐Jun-­‐11   Gianluca  Demar5ni   7  
  • 8. Bowlogna  Ontology   30-­‐Jun-­‐11   Gianluca  Demar5ni   8  
  • 9. Bowlogna  Ontology   •  Private  /  Public  parts   – Public  data  can  be  shared  with  other  uni  (e.g.,   course  descrip5ons)   – Private  data  in  sensible  (e.g.,  evalua5on  results)   •  Private  data  might  contain  more  instances   •  Aggrega5ons  over  private  data  may  be  shared   (e.g.,  number  of  enrolled  students)   30-­‐Jun-­‐11   Gianluca  Demar5ni   9  
  • 10. The  Benchmark   •  Bowlogna  Ontology   – 67  classes   •  12  Analy5cs  queries   – Natural  language  and  SPARQL  transla5on   •  Automa5c  Instance  Generator   – Populated  ontology  with  given  num  of  instances   •  Test  over  num  of  instances  and  universi5es   30-­‐Jun-­‐11   Gianluca  Demar5ni   10  
  • 11. Analy5c  Queries   •  Count   •  Molecule   –  Query  4.  Return  all  informa5on  about  Student0  within  a   scope  of  two   •  Max  Min   •  Ranking  and  TopK     •  Temporal   –  Query  8.  What  is  the  average  comple5on  5me  of  Bachelor   studies  for  each  Study  Track?   •  Path   •  Mul5ple  Universi5es   30-­‐Jun-­‐11   Gianluca  Demar5ni   11  
  • 12. Query  Classifica5on   we classify a query as having a large input size if it involves more than 5% of instances, and small otherwise. Selectivity measures the amount of instances that match the query: we classify a query as having high selectivity if less than 10% of instances match the query, and low otherwise. Complexity measures the amount of classes and properties involved in the query: queries are classified as having high or low complexity accordingly to the RDF schema we have defined. Table 1. Classification of queries according to their need to access private and public data, input size, selectivity, and complexity. Count Molecule MaxMin TopK Temp Path MultiUniv Query 1 2 3 4 5 6 7 8 9 10 11 12 Public x x x x x x x x Private x x x x x x x x Input Size Small Large Small Small Large Small Large Large Large Large Large Large Selectivity High Low Low Low Low Low High Low Low Low Low Low Complexity Low Low Low High High Low High Low High High Low High As we can see the majority of queries have a low selectivity which reflects our intent of performing analytic queries, that is, queries for which a lot of data is retrieved and aggregated. For the same reason, most of the queries have a large input. Finally, queries are equally divided in high and low complexities.30-­‐Jun-­‐11   Gianluca  Demar5ni   12  
  • 13. From  the  process  analyst  point  of  view   •  Which  system  should  I  pick  for  my  specific   problem?   – Not  looking  for  the  best  system   – Look  at  Problem-­‐specific  query  sets   30-­‐Jun-­‐11   Gianluca  Demar5ni   13  
  • 14. Which  system  to  use?   Count   Queries   Path   Queries   Rank   Queries   Temporal   Queries   System  A   0.5s   5s   0.1s   2s   System  B   3s   0.4s   2s   1s   System  C   0.5s   0.5s   0.5s   0.5s   30-­‐Jun-­‐11   Gianluca  Demar5ni   14  
  • 15. Conclusions   •  BowlognaBench  for  Analy5c  Queries   •  OWL  Ontology  for  Higher  Educa5on  Systems   •  Next  Steps   – Run  a  compara5ve  evalua5on  of  RDF  systems   – Set  up  a  wiki-­‐like  space  where  groups  can  upload   experimental  results   30-­‐Jun-­‐11   Gianluca  Demar5ni   15  
  • 16. hlp://diuf.unifr.ch/xi/bowlognabench/   30-­‐Jun-­‐11   Gianluca  Demar5ni   16