Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Functional manipulations of large data graphs 20160601

489 views

Published on

Talk presented at The University of Queensland's Data & Knowledge Engineering Lab (DKE) within the School of ITEE on 1 June 2016.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Functional manipulations of large data graphs 20160601

  1. 1. Un chem2 bio2rdf DBpedia live URI Burner Opencyc Diseasome FU-Berlin DNB GND Bio2RDF NDC Bio2RDF Mesh CKAN Freebase Linklion Organic Edunet Biomodels RDF Reactome RDF Disgenet IServe Linked TCGA RDF License Harvest RKB Explorer Lisbon Austrian Ski Racers RKB Explorer LAAS RKB Explorer Wiki JISC RKB Explorer Eprints RKB Explorer CurriculumRKB Explorer NSF RKB Explorer DBLP RKB Explorer ACM RKB Explorer Southampton RKB Explorer Deepblue RKB Explorer Irit RKB Explorer RAE2001 Geo nked Data Bio2RDF Ncbigene Bio2RDF DBSNP DBpedia DBpedia ES DBpedia CS Alpino RDF YAGO KUPKB Bio2RDF Taxon- concept Assets GNU Licenses DBpedia VIVO University of Florida StatusNet Mrblog Bio2RDF Dataset EUNIS Uniprot KB StatusNet Timttmy StatusNet Somsants StatusNet Drugbank FU-Berlin StatusNet Dtdns StatusNet Status.net StatusNet Fragdev Morelab StatusNet Macno DBpedia EU Bio2RDF Taxon Uniprot Metadata Linked Geo Data Project Wiki Enipedia Linked MDB Sider FU-Berlin DBpedia DE DBpedia EL DBpedia Lite Drug Interaction Knowledge Base StatusNet Qdnx Hellenic ire Brigade StatusNet Lydiastench Taxon- concept Occurences W3C StatusNet 1w6 Linked Life Data Semantic Web DogFood UMBEL StatusNet Ssweeny StatusNet Quitter StatusNet Jonkman StatusNet Thelovebug Bio2RDF Uniprot Taxonomy DBpedia NL StatusNet Russwurm DBpedia KO Dailymed FU-Berlin DBpedia IT Aves3D LT StatusNet Gomertronic StatusNet Progval Testee DBpedia JA StatusNet Cooleysekula Product StatusNet Postblue StatusNet Skilledtests StatusNet Fcac Clean Energy Data Reegle StatusNet Legadolibre Geo Names Bio2RDF GeneID GNI Archiveshub Linked Data Code Haus Ordnance Survey Linked Data NUTS Geo- vocab LOD ACBDLS FOAF- Profiles Net ble DBpedia FR h StatusNet Ourcoffs StatusNet Hackerposse LOV Bio2RDF Taxonomy StatusNet Morphtown StatusNet chromic Geospecies linkedct StatusNet linuxwrangling Linked Open Data of Ecology StatusNet chickenkiller Taxon concept Functional Manipulation of Large Data Graphs David Hyland-Wood david.wood@ephox.com @prototypo 1 June 2016
  2. 2. Something Something else a relationship
  3. 3. UQ Universityis a
  4. 4. UQ The University of Queensland label Universityis a Group of 8 affiliation
  5. 5. We’ve Seen This Before
  6. 6. 08 Oct 2007
  7. 7. The RDF Data Model • Turtle • TriG • N-Triples • N-Quads • JSON-LD • RDFa • RDF/XML Standard serialisation formats: }Turtle family of RDF formats Possibly lossy alternatives: • CSV • ODATA • etc
  8. 8. $ curl http://dbpedia.org/page/University_of_Queensland $ curl http://dbpedia.org/data/University_of_Queensland $ curl http://dbpedia.org/data/University_of_Queensland.n3 > University_of_Queensland.n3 https://en.wikipedia.org/wiki/University_of_Queensland HTML RDF in XML (Yuck!) Many formats, e.g. sane RDF, ODATA, Microdata, JSON…
  9. 9. UQ The University of Queensland label affiliation Group of 8 34228 number of undergraduate students 48771 number of students
  10. 10. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  11. 11. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  12. 12. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  13. 13. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  14. 14. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  15. 15. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  16. 16. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  17. 17. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
  18. 18. OpenStreetMap Wikimedia Commons DBpedia US EPA RCRA US EPA FRS ABT Associates
  19. 19. UQ The University of Queensland label ANU Australian National University label Monash affiliation UMelbourne affiliation UNSW affiliation USydney affiliation UAdelaide affiliation Go8 memberOf memberOf memberOf memberOf memberOf memberOf memberOf University of Melbourne label Monash University label University of Adelaide label Group of 8 label University of Sydney label University of NSW label
  20. 20. UQ The University of Queensland label ANU Australian National University label Monash affiliation UMelbourne affiliation UNSW affiliation USydney affiliation UAdelaide affiliation
  21. 21. Graphs in Scala val graph: Graph[String, String] = Graph(vertexRDD, edgeRDD) // Create a subgraph based on the vertices connected // by an "affiliation" property. val affiliationRelatedSubgraph = graph.subgraph(t => t.attr == "http://dbpedia.org/ontology/affiliation") // Find connected components of affiliationRelatedSubgraph. val ccGraph = affiliationRelatedSubgraph.connectedComponents()
  22. 22. Graphs in Scala // Create a hashmap of componentLists. affiliationRelatedSubgraph.vertices.leftJoin (ccGraph.vertices) { case (id, u, comp) => comp.get }.foreach { case (id, startingNode) => { if (!(componentLists.contains(startingNode))) { componentLists(startingNode) = new ListBuffer[VertexId] } componentLists(startingNode) += id } }
  23. 23. Graphs in Scala // Output a report on the connected components. println("------ connected components in related triples ------ n") for ((component, componentList) <- componentLists){ if (componentList.size > 1) { for(c <- componentList) { println(labelMap(c)); } println("--------------------------") } }
  24. 24. ------ connected components in related triples ------ Australian National University University of Sydney University of Adelaide University of New South Wales -------------------------- The University of Queensland University of Melbourne Monash University --------------------------
  25. 25. Resources • Slides: http://w3id.org/people/prototypo/talks/UQ- DKE-20160601/slides • Code: http://w3id.org/people/prototypo/talks/UQ- DKE-20160601/code
  26. 26. Resources • Callimachus: http://callimachusproject.org • Apache Spark: http://spark.apache.org • GraphX Programming Guide: http://spark.apache.org/docs/latest/graphx- programming-guide.html
  27. 27. Attributions • Linking Open Data cloud diagram by Richard Cyganiak and Anja Jentzsch, used under a CC license: http://lod-cloud.net/
  28. 28. This work is Copyright © 2015 David Hyland-Wood It is licensed under the Creative Commons Attribution 3.0 Unported License
 Full details at: http://creativecommons.org/licenses/by/3.0/ You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

×