RDF Analytics... SPARQL and Beyond

  • 1,346 views
Uploaded on

 

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,346
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
20
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge © Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge @fsheer Fadi Maali RDF Analytics… SPARQL and Beyond… fadi.maali@deri.org
  • 2. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Why analytics (1/2)
  • 3. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Why analytics (2/2)
  • 4. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Appetite Whetting (1/3) Google accurately detects Flu trend ahead of the U.S. Center for Disease Control. http://www.google.org/flutrends/about/how.html
  • 5. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices- accurately-investment-tactic-say-scientists.html Appetite Whetting (2/3)
  • 6. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Appetite Whetting (3/3) http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html Flavor pyramids for North American and East Asian cuisines
  • 7. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Data Science and RDF Ø  Can we do “data science” using RDF data? §  Do we have the data? §  Do we have the tools? Ø  Why should we use RDF?
  • 8. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge RDF Characteristics §  Graph data model §  Clearly defined semantics §  Support Web-scale distributed publication
  • 9. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Available RDF Data §  Freebase has 1.2 billion triples (Google) §  The LOD Cloud has more than 31 billion triples §  Embedded RDF data: schema.org, Drupal… http://lod-cloud.net/
  • 10. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Available RDF Tools In this presentation we focus on the standard SPARQL: q  W3C Recommendation q  Supports Querying, transforming and updating RDF data q  Large number of available implementations q  Define a communication protocol q  427 public SPARQL endpoints registered on the DataHub* * http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf
  • 11. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge RDF Data… a graph
  • 12. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name   WHERE{      ?p  :name  ?name  .   }ORDER  BY  ?name   SPARQL… Simple queries
  • 13. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?gender  (COUNT(*)  AS  ?count)   WHERE{      ?p  :gender  ?gender   }  GROUP  BY  ?gender   SPARQL… BI queries
  • 14. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?gender  (COUNT(*)  AS  ?count)   WHERE{      ?p  :gender  ?gender   }  GROUP  BY  ?gender   SPARQL… BI queries
  • 15. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name  (COUNT(?n)  AS  ?neighbours)   WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .   }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)   SPARQL… BI queries
  • 16. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name  (COUNT(?n)  AS  ?neighbours)   WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .   }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)   SPARQL… BI queries
  • 17. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… BI queries Ø  How influential a person is within a social network Ø  How a road is within an urban network Ø  How central an employee in an enterprise
  • 18. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… Graph measure Can we use SPARQL to compute shortest paths in the graph? Short answer: NO! Long answer: Let’s try!
  • 19. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?v1  ?v2  (MIN(?l)  AS  ?shortestPath)   WHERE{      {          ?v1  :knows  ?v2  BIND  (1  AS  ?l)      }  UNION        {          ?v1  :knows{2}  ?v2  BIND  (2  AS  ?l)      }  UNION        {          ?v1  :knows{3}  ?v2  BIND  (3  AS  ?l)      }        FILTER  (?v1  !=  ?v2)   }  GROUP  BY  ?v1  ?v2   SPARQL… graph measure
  • 20. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… graph measure
  • 21. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… graph measure Ø  finding directions between physical locations Ø  finding the most direct way to contact a person Ø  finding the min-delay communication path
  • 22. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… clustering Can we do clustering using SPARQL? YES! Peer-pressure algorithm implemented using (almost only) SPARQL* * http://yarcdata.com/blog/?p=318
  • 23. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge DROP  GRAPH  <urn:ga/g/xjz1>  ;     CREATE  GRAPH  <urn:ga/g/xjz1>;     INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s     }   SPARQL… clustering
  • 24. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge DROP  GRAPH  <urn:ga/g/xjz1>  ;     CREATE  GRAPH  <urn:ga/g/xjz1>;     INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s     }   SPARQL… clustering
  • 25. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL Expressivity Ø  BI-like operations (rollup and drilldown) Ø  Graph Measures Ø  Iterative algorithms (Clustering)
  • 26. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL Scalability… One approach is to use a scale-out architecture… think MapReduce or Hadoop q  Translate SPARQL into MapReduce q  Process RDF data directly in MapReduce
  • 27. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge All examples used in this presentation and equivalent of some of them using Pig Latin is available at: https://github.com/fadmaa/rdf-analytics Conclusion Ø  Can we do “data science” using RDF data? §  Do we have the data? YES §  Do we have the tools? Almost v  Is SPARQL expressive enough? Almost v  Does it scale? Yes… in principle, No in practice v  Is it usable/easy? Not really