BowlognaBench	
  
Benchmarking	
  RDF	
  Analy5cs	
  
Gianluca	
  Demar5ni,	
  Iliya	
  Enchev,	
  
Joël	
  Gapany,	
  and...
Mo5va5on	
  
•  Seman5c	
  Data	
  keeps	
  increasing	
  on	
  the	
  Web	
  
•  More	
  common	
  is	
  the	
  need	
  t...
Why	
  do	
  we	
  need	
  a	
  new	
  RDF	
  
benchmark?	
  
•  Exis5ng	
  RDF	
  benchmarks	
  (e.g.,	
  LUBM)	
  
– Don...
The	
  Bologna	
  Reform	
  
•  Started	
  in	
  June	
  1999	
  
•  Framework	
  for	
  higher	
  educa5on	
  systems	
  ...
The	
  university	
  se_ng	
  ader	
  Bologna	
  
•  A	
  lot	
  of	
  data	
  is	
  available	
  
–  Not	
  following	
  ...
An	
  ontology	
  about	
  Bologna 	
  	
  
•  A	
  Lexicon	
  for	
  the	
  Bologna	
  Reform	
  
– Basic	
  set	
  of	
 ...
The	
  ontology	
  crea5on	
  process	
  	
  
•  The	
  Bowlogna	
  Ontology	
  
– 29	
  top	
  classes	
  (67	
  in	
  to...
Bowlogna	
  Ontology	
  
30-­‐Jun-­‐11	
   Gianluca	
  Demar5ni	
   8	
  
Bowlogna	
  Ontology	
  
•  Private	
  /	
  Public	
  parts	
  
– Public	
  data	
  can	
  be	
  shared	
  with	
  other	
...
The	
  Benchmark	
  
•  Bowlogna	
  Ontology	
  
– 67	
  classes	
  
•  12	
  Analy5cs	
  queries	
  
– Natural	
  languag...
Analy5c	
  Queries	
  
•  Count	
  
•  Molecule	
  
–  Query	
  4.	
  Return	
  all	
  informa5on	
  about	
  Student0	
  ...
Query	
  Classifica5on	
  
we classify a query as having a large input size if it involves more than 5%
of instances, and s...
From	
  the	
  process	
  analyst	
  point	
  of	
  view	
  
•  Which	
  system	
  should	
  I	
  pick	
  for	
  my	
  spe...
Which	
  system	
  to	
  use?	
  
Count	
  
Queries	
  
Path	
  
Queries	
  
Rank	
  
Queries	
  
Temporal	
  
Queries	
  ...
Conclusions	
  
•  BowlognaBench	
  for	
  Analy5c	
  Queries	
  
•  OWL	
  Ontology	
  for	
  Higher	
  Educa5on	
  Syste...
hlp://diuf.unifr.ch/xi/bowlognabench/	
  
30-­‐Jun-­‐11	
   Gianluca	
  Demar5ni	
   16	
  
Upcoming SlideShare
Loading in …5
×

BowlognaBench --- Benchmarking RDF Analytics

349 views

Published on

Conference talk at SIMPDA 2011, June 2011, Campione d'Italia, Italy

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
349
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BowlognaBench --- Benchmarking RDF Analytics

  1. 1. BowlognaBench   Benchmarking  RDF  Analy5cs   Gianluca  Demar5ni,  Iliya  Enchev,   Joël  Gapany,  and  Philippe  Cudré-­‐Mauroux    eXascale  Infolab  &  Faculty  of  Humani5es   University  of  Fribourg,  Switzerland   30-­‐Jun-­‐11   Gianluca  Demar5ni   1  
  2. 2. Mo5va5on   •  Seman5c  Data  keeps  increasing  on  the  Web   •  More  common  is  the  need  to  run  OLAP-­‐type   queries   – How  did  university  student  performance  evolve   over  last  5  years?   •  A  novel  benchmark  for  Knowledge  Bases   focusing  on  complex  Analy5cs  queries   30-­‐Jun-­‐11   Gianluca  Demar5ni   2  
  3. 3. Why  do  we  need  a  new  RDF   benchmark?   •  Exis5ng  RDF  benchmarks  (e.g.,  LUBM)   – Don’t  deal  with  complex  analy5c  queries   – Don’t  look  at  the  temporal  dimension   – Don’t  model  a  realis5c  se_ng   •  Analy5c  benchmarks  exist  for  rela5onal   systems  (e.g.  TPC-­‐H)   30-­‐Jun-­‐11   Gianluca  Demar5ni   3  
  4. 4. The  Bologna  Reform   •  Started  in  June  1999   •  Framework  for  higher  educa5on  systems   •  47  Countries   •  Common  academic  degrees   •  Common  study  structure   •  Common  terminology   30-­‐Jun-­‐11   Gianluca  Demar5ni   4  
  5. 5. The  university  se_ng  ader  Bologna   •  A  lot  of  data  is  available   –  Not  following  standard  schemas   –  Comprehensive  and  available  data  is  a  success  factor     •  Shared  data   –  Erasmus  exchanges   –  Courses  in  a  given  language   •  Analy5c  tools  may  help  monitoring  university   performance   30-­‐Jun-­‐11   Gianluca  Demar5ni   5  
  6. 6. An  ontology  about  Bologna     •  A  Lexicon  for  the  Bologna  Reform   – Basic  set  of  terms  for  the  new  system   – Stable  across  5me  and  ins5tu5ons   – Developed  by  a  professional  terminologist   30-­‐Jun-­‐11   Gianluca  Demar5ni   6  
  7. 7. The  ontology  crea5on  process     •  The  Bowlogna  Ontology   – 29  top  classes  (67  in  total)   – Classes:  student,  professor,  evalua5on,  teaching   unit,  ECTS  credit,  semester,  etc.   – Concept  defini5ons  in  English,  French,  German   30-­‐Jun-­‐11   Gianluca  Demar5ni   7  
  8. 8. Bowlogna  Ontology   30-­‐Jun-­‐11   Gianluca  Demar5ni   8  
  9. 9. Bowlogna  Ontology   •  Private  /  Public  parts   – Public  data  can  be  shared  with  other  uni  (e.g.,   course  descrip5ons)   – Private  data  in  sensible  (e.g.,  evalua5on  results)   •  Private  data  might  contain  more  instances   •  Aggrega5ons  over  private  data  may  be  shared   (e.g.,  number  of  enrolled  students)   30-­‐Jun-­‐11   Gianluca  Demar5ni   9  
  10. 10. The  Benchmark   •  Bowlogna  Ontology   – 67  classes   •  12  Analy5cs  queries   – Natural  language  and  SPARQL  transla5on   •  Automa5c  Instance  Generator   – Populated  ontology  with  given  num  of  instances   •  Test  over  num  of  instances  and  universi5es   30-­‐Jun-­‐11   Gianluca  Demar5ni   10  
  11. 11. Analy5c  Queries   •  Count   •  Molecule   –  Query  4.  Return  all  informa5on  about  Student0  within  a   scope  of  two   •  Max  Min   •  Ranking  and  TopK     •  Temporal   –  Query  8.  What  is  the  average  comple5on  5me  of  Bachelor   studies  for  each  Study  Track?   •  Path   •  Mul5ple  Universi5es   30-­‐Jun-­‐11   Gianluca  Demar5ni   11  
  12. 12. Query  Classifica5on   we classify a query as having a large input size if it involves more than 5% of instances, and small otherwise. Selectivity measures the amount of instances that match the query: we classify a query as having high selectivity if less than 10% of instances match the query, and low otherwise. Complexity measures the amount of classes and properties involved in the query: queries are classified as having high or low complexity accordingly to the RDF schema we have defined. Table 1. Classification of queries according to their need to access private and public data, input size, selectivity, and complexity. Count Molecule MaxMin TopK Temp Path MultiUniv Query 1 2 3 4 5 6 7 8 9 10 11 12 Public x x x x x x x x Private x x x x x x x x Input Size Small Large Small Small Large Small Large Large Large Large Large Large Selectivity High Low Low Low Low Low High Low Low Low Low Low Complexity Low Low Low High High Low High Low High High Low High As we can see the majority of queries have a low selectivity which reflects our intent of performing analytic queries, that is, queries for which a lot of data is retrieved and aggregated. For the same reason, most of the queries have a large input. Finally, queries are equally divided in high and low complexities.30-­‐Jun-­‐11   Gianluca  Demar5ni   12  
  13. 13. From  the  process  analyst  point  of  view   •  Which  system  should  I  pick  for  my  specific   problem?   – Not  looking  for  the  best  system   – Look  at  Problem-­‐specific  query  sets   30-­‐Jun-­‐11   Gianluca  Demar5ni   13  
  14. 14. Which  system  to  use?   Count   Queries   Path   Queries   Rank   Queries   Temporal   Queries   System  A   0.5s   5s   0.1s   2s   System  B   3s   0.4s   2s   1s   System  C   0.5s   0.5s   0.5s   0.5s   30-­‐Jun-­‐11   Gianluca  Demar5ni   14  
  15. 15. Conclusions   •  BowlognaBench  for  Analy5c  Queries   •  OWL  Ontology  for  Higher  Educa5on  Systems   •  Next  Steps   – Run  a  compara5ve  evalua5on  of  RDF  systems   – Set  up  a  wiki-­‐like  space  where  groups  can  upload   experimental  results   30-­‐Jun-­‐11   Gianluca  Demar5ni   15  
  16. 16. hlp://diuf.unifr.ch/xi/bowlognabench/   30-­‐Jun-­‐11   Gianluca  Demar5ni   16  

×