Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
© Copyright 2011 Digital Enterprise Researc...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Why analytics (1/2)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Why analytics (2/2)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (1/3)
Google accurately d...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
http://www.dailymail.co.uk/sciencetech/arti...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (3/3)
http://www.nature.c...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Data Science and RDF
Ø  Can we do “data sc...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Characteristics
§  Graph data model
§...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Data
§  Freebase has 1.2 bil...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Tools
In this presentation we...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Data… a graph
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?name	
  
WHERE{	
  
	
  	
  ?p	
...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?gender	
  (COUNT(*)	
  AS	
  ?co...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?gender	
  (COUNT(*)	
  AS	
  ?co...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?name	
  (COUNT(?n)	
  AS	
  ?nei...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?name	
  (COUNT(?n)	
  AS	
  ?nei...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… BI queries
Ø  How influential a pe...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… Graph measure
Can we use SPARQL to ...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?v1	
  ?v2	
  (MIN(?l)	
  AS	
  ?...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
Ø  finding direction...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… clustering
Can we do clustering usi...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP	
  GRAPH	
  <urn:ga/g/xjz1>	
  ;	
  	
...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP	
  GRAPH	
  <urn:ga/g/xjz1>	
  ;	
  	
...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Expressivity
Ø  BI-like operations ...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Scalability…
One approach is to use ...
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
All examples used in this presentation and ...
Upcoming SlideShare
Loading in...5
×

RDF Analytics... SPARQL and Beyond

1,631

Published on

Published in: Education

RDF Analytics... SPARQL and Beyond

  1. 1. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge © Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge @fsheer Fadi Maali RDF Analytics… SPARQL and Beyond… fadi.maali@deri.org
  2. 2. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Why analytics (1/2)
  3. 3. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Why analytics (2/2)
  4. 4. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Appetite Whetting (1/3) Google accurately detects Flu trend ahead of the U.S. Center for Disease Control. http://www.google.org/flutrends/about/how.html
  5. 5. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices- accurately-investment-tactic-say-scientists.html Appetite Whetting (2/3)
  6. 6. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Appetite Whetting (3/3) http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html Flavor pyramids for North American and East Asian cuisines
  7. 7. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Data Science and RDF Ø  Can we do “data science” using RDF data? §  Do we have the data? §  Do we have the tools? Ø  Why should we use RDF?
  8. 8. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge RDF Characteristics §  Graph data model §  Clearly defined semantics §  Support Web-scale distributed publication
  9. 9. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Available RDF Data §  Freebase has 1.2 billion triples (Google) §  The LOD Cloud has more than 31 billion triples §  Embedded RDF data: schema.org, Drupal… http://lod-cloud.net/
  10. 10. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Available RDF Tools In this presentation we focus on the standard SPARQL: q  W3C Recommendation q  Supports Querying, transforming and updating RDF data q  Large number of available implementations q  Define a communication protocol q  427 public SPARQL endpoints registered on the DataHub* * http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf
  11. 11. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge RDF Data… a graph
  12. 12. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name   WHERE{      ?p  :name  ?name  .   }ORDER  BY  ?name   SPARQL… Simple queries
  13. 13. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?gender  (COUNT(*)  AS  ?count)   WHERE{      ?p  :gender  ?gender   }  GROUP  BY  ?gender   SPARQL… BI queries
  14. 14. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?gender  (COUNT(*)  AS  ?count)   WHERE{      ?p  :gender  ?gender   }  GROUP  BY  ?gender   SPARQL… BI queries
  15. 15. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name  (COUNT(?n)  AS  ?neighbours)   WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .   }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)   SPARQL… BI queries
  16. 16. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name  (COUNT(?n)  AS  ?neighbours)   WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .   }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)   SPARQL… BI queries
  17. 17. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… BI queries Ø  How influential a person is within a social network Ø  How a road is within an urban network Ø  How central an employee in an enterprise
  18. 18. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… Graph measure Can we use SPARQL to compute shortest paths in the graph? Short answer: NO! Long answer: Let’s try!
  19. 19. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?v1  ?v2  (MIN(?l)  AS  ?shortestPath)   WHERE{      {          ?v1  :knows  ?v2  BIND  (1  AS  ?l)      }  UNION        {          ?v1  :knows{2}  ?v2  BIND  (2  AS  ?l)      }  UNION        {          ?v1  :knows{3}  ?v2  BIND  (3  AS  ?l)      }        FILTER  (?v1  !=  ?v2)   }  GROUP  BY  ?v1  ?v2   SPARQL… graph measure
  20. 20. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… graph measure
  21. 21. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… graph measure Ø  finding directions between physical locations Ø  finding the most direct way to contact a person Ø  finding the min-delay communication path
  22. 22. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… clustering Can we do clustering using SPARQL? YES! Peer-pressure algorithm implemented using (almost only) SPARQL* * http://yarcdata.com/blog/?p=318
  23. 23. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge DROP  GRAPH  <urn:ga/g/xjz1>  ;     CREATE  GRAPH  <urn:ga/g/xjz1>;     INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s     }   SPARQL… clustering
  24. 24. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge DROP  GRAPH  <urn:ga/g/xjz1>  ;     CREATE  GRAPH  <urn:ga/g/xjz1>;     INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s     }   SPARQL… clustering
  25. 25. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL Expressivity Ø  BI-like operations (rollup and drilldown) Ø  Graph Measures Ø  Iterative algorithms (Clustering)
  26. 26. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL Scalability… One approach is to use a scale-out architecture… think MapReduce or Hadoop q  Translate SPARQL into MapReduce q  Process RDF data directly in MapReduce
  27. 27. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge All examples used in this presentation and equivalent of some of them using Pig Latin is available at: https://github.com/fadmaa/rdf-analytics Conclusion Ø  Can we do “data science” using RDF data? §  Do we have the data? YES §  Do we have the tools? Almost v  Is SPARQL expressive enough? Almost v  Does it scale? Yes… in principle, No in practice v  Is it usable/easy? Not really
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×