Your SlideShare is downloading. ×
0
BigData & Wikidata - no lies
SPARQL queries on DBPedia
Camelia Boban
BigData & Wikidata - no lies
Resources for the codelab:
Eclipse Luna for J2EE developers - https://www.eclipse.org/downloads/index-developer.php
Java S...
JAR needed:
httpclient-4.2.3.jar httpcore-4.2.2.jar Jena-arq-2.11.1.jar
Jena-core-2.11.1.jar Jena-iri-1.0.1.jar jena-sdb-1...
The Semantic Web
The Semantic Web is a project that intends to add computer-processable meaning
(semantics) to the Word Wi...
BigData & Wikidata - no lies
DBpedia.org
Is the Semantic Web mirror of Wikipedia.
RDF
Is a data model of graphs on subject, predicate, object triples.
...
BigData & Wikidata - no lies
DBpedia.org extracts from Wikipedia editions in 119 languages, convert it into RDF
and make this information available on ...
The dataset consists of 2.46 billion RDF triples (470 million were extracted from
the English edition of Wikipedia), 1.98 ...
BigData & Wikidata - no lies
What is a Triple?
A Triple is the minimal amount of information expressable in Semantic Web. It is
composed of 3 elements:...
John has the email address john@email.com
(subject) (predicate) (object)
Subjects, predicates, and objects are represented...
Why SPARQL?
SPARQL is a quey language of the Semantic Web that lets us:
1. Extract values from structured and semi-strutur...
Structure of a SPARQL query:
● Prefix declarations, for abbreviating URIs ( PREFIX dbpowl:
<http://dbpedia.org/ontology/Mo...
BigData & Wikidata - no lies
##EXAMPLE - Give me all cities & towns in Abruzzo with more than 50,000
inhabitants
PREFIX dbpclass: <http://dbpedia.org/c...
BigData & Wikidata - no lies
Some PREFIX:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http...
DBPEDIA
----------------------------------------------------------------------------------
PREFIX dbp: <http://dbpedia.org...
Wikipedia articles consist mostly of free text, but also contain different types of
structured information: infobox templa...
BigData & Wikidata - no lies
Example:
https://en.wikipedia.org/wiki/Pulp_Fiction describes the movie. DBpedia creates a
URI: http://dbpedia.org/resourc...
Public SPARQL Endpoint - use OpenLink Virtuoso
Wikipedia page: http://en.wikipedia.org/wiki/Pulp_Fiction
DBPedia resource:...
Big&Wikidata - no lies
Big&Wikidata - no lies
PREFIX prop: <http://dbpedia.org/property/>
PREFIX res:<http://dbpedia.org/resource/>
PREFIX owl:<h...
Big&Wikidata - no lies
...
Linked Data is a method of publishing RDF data on the Web and of interlinking
data between different data sources.
Query b...
The current RDF vocabularies are available at the following locations:
➔ W3: http://www.w3.org/TR/vcard-rdf/ vCard Ontolog...
➔ GEO NAMES: http://www.geonames.org/ geospatial semantic information
(postal code)
➔ DUBLIN CORE: http://www.dublincore.o...
➔ MUSIC ONTOLOGY: http://musicontology.com/, provides terms for
describing artists, albums and tracks.
➔ REVIEW VOCABULARY...
➔ Semantically-Interlinked Online Communities (SIOC): www.sioc-
project.org/, vocabulary for representing online communiti...
BigData & Wikidata - no lies
SPARQL queries have two parts (FROM is not indispensable):
1. The query (WHERE) part, which produces a list of variable bi...
SELECT - is effectively what the query returns (a ResultSet)
ASK - just looks to see if there are any results
COSTRUCT - u...
What linked data il good for? Don’t search a single thing, but explore a whole
set of related things together!
1) Revoluti...
BigData & Wikidata - no lies
MOBILE
QRpedia.org - MIT Licence
BigData & Wikidata - no lies
WIKIPEDIA DUMPS
● Arabic Wikipedia dumps: http://dumps.wikimedia.org/arwiki/
● Dutch Wikipedia dumps: http://dumps.wikimed...
WIKIPEDIA DUMPS
● Portuguese Wikipedia dumps: http://dumps.wikimedia.org/ptwiki/
● Russian Wikipedia dumps: http://dumps.w...
LINK
Codelab’s project code: http://github.com/GDG-L-Ab/SparqlOpendataWS
http://dbpedia.org/sparql & http://it.dbpedia.org...
Projects that use linked data:
JAVA: Open Learn Linked data: free access to Open University course materials
PHP: Semantic...
BigData & Wikidata - no lies
THANK YOU! :-)
I AM
CAMELIA BOBAN
G+ : https://plus.google.com/u/0/+cameliaboban
Twitter : ht...
Upcoming SlideShare
Loading in...5
×

GDG Meets U event - Big data & Wikidata - no lies codelab

637

Published on

RDF triples informations on Wikipedia. Making SPARQL queries on DBpedia endpoint

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
637
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "GDG Meets U event - Big data & Wikidata - no lies codelab"

  1. 1. BigData & Wikidata - no lies SPARQL queries on DBPedia Camelia Boban
  2. 2. BigData & Wikidata - no lies
  3. 3. Resources for the codelab: Eclipse Luna for J2EE developers - https://www.eclipse.org/downloads/index-developer.php Java SE 1.8 - http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html Apache Tomcat 8.0.5 - http://tomcat.apache.org/download-80.cgi Axis2 1.6.2 - http://axis.apache.org/axis2/java/core/download.cgi Apache Jena 2.11.1 - http://jena.apache.org/download/ Dbpedia Sparql endpoint: - dbpedia.org/sparql BigData & Wikidata - no lies
  4. 4. JAR needed: httpclient-4.2.3.jar httpcore-4.2.2.jar Jena-arq-2.11.1.jar Jena-core-2.11.1.jar Jena-iri-1.0.1.jar jena-sdb-1.4.1.jar jena-tdb-1.0.1.jar slf4j-api-1.6.4.jar slf4j-log4j12-1.6.4.jar xercesImpl-2.11.0.jar xml-apis-1.4.01.jar Attention!! NO jcl-over-slf4j-1.6.4.jar (slf4j-log4j12-1.6.4 conflict, “Can’t override final class exception”) NO httpcore-4.0.jar (made by Axis, httpcore-4.2.2.jar conflict, don’t let create the WS) BigData & Wikidata - no lies
  5. 5. The Semantic Web The Semantic Web is a project that intends to add computer-processable meaning (semantics) to the Word Wide Web. SPARQL A a protocol and a query language SQL-like for querying RDF graphs via pattern matching VIRTUOSO Both back-end database engine and the HTTP/SPARQL server. BigData & Wikidata - no lies
  6. 6. BigData & Wikidata - no lies
  7. 7. DBpedia.org Is the Semantic Web mirror of Wikipedia. RDF Is a data model of graphs on subject, predicate, object triples. APACHE JENA A free and open source Java framework for building Semantic Web and Linked Data applications. ARQ - A SPARQL Processor for Jena for querying Remote SPARQL Services BigData & Wikidata - no lies
  8. 8. BigData & Wikidata - no lies
  9. 9. DBpedia.org extracts from Wikipedia editions in 119 languages, convert it into RDF and make this information available on the Web: ★ 24.9 million things (16.8 million from the English Dbpedia); ★ labels and abstracts for 12.6 million unique things; ★ 24.6 million links to images and 27.6 million links to external web pages; ★ 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories. BigData & Wikidata - no lies
  10. 10. The dataset consists of 2.46 billion RDF triples (470 million were extracted from the English edition of Wikipedia), 1.98 billion from other language editions, and 45 million are links to external datasets. DBpedia uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web. We use the SPARQL query language to query this data. BigData & Wikidata - no lies
  11. 11. BigData & Wikidata - no lies
  12. 12. What is a Triple? A Triple is the minimal amount of information expressable in Semantic Web. It is composed of 3 elements: 1. A subject which is a URI (e.g., a "web address") that represents something. 2. A predicate which is another URI that represents a certain property of the subject. 3. An object which can be a URI or a literal (a string) that is related to the subject through the predicate. BigData & Wikidata - no lies
  13. 13. John has the email address john@email.com (subject) (predicate) (object) Subjects, predicates, and objects are represented with URIs, which can be abbreviated as prefixed names. Objects can also be literals: strings, integers, booleans, etc. BigData & Wikidata - no lies
  14. 14. Why SPARQL? SPARQL is a quey language of the Semantic Web that lets us: 1. Extract values from structured and semi-strutured data 2. Explore data by querying unknown relatioships 3. Perform complex join query of various dataset in a unique query 4. Trasform data from a vocabulary in another BigData & Wikidata - no lies
  15. 15. Structure of a SPARQL query: ● Prefix declarations, for abbreviating URIs ( PREFIX dbpowl: <http://dbpedia.org/ontology/Mountain> = dbpowl:Mountain) ● Dataset definition, stating what RDF graph(s) are being queried (DBPedia, Darwin Core Terms, Yago, FOAF - Friend of a Friend) ● A result clause, identifying what information to return from the query The query pattern, specifying what to query for in the underlying dataset (Select) ● Query modifiers, slicing, ordering, and otherwise rearranging query results - ORDER BY, GROUP BY BigData & Wikidata - no lies
  16. 16. BigData & Wikidata - no lies
  17. 17. ##EXAMPLE - Give me all cities & towns in Abruzzo with more than 50,000 inhabitants PREFIX dbpclass: <http://dbpedia.org/class/yago/> PREFIX dbpprop: <http://dbpedia.org/property/> SELECT ?resource ?value WHERE { ?resource a dbpclass:CitiesAndTownsInAbruzzo . ?resource dbpprop:populationTotal ?value . FILTER ( ?value > 50000 ) } ORDER BY ?resource ?value BigData & Wikidata - no lies
  18. 18. BigData & Wikidata - no lies
  19. 19. Some PREFIX: PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#> BigData & Wikidata - no lies
  20. 20. DBPEDIA ---------------------------------------------------------------------------------- PREFIX dbp: <http://dbpedia.org/> PREFIX dbpowl: <http://dbpedia.org/ontology/> PREFIX dbpres: <http://dbpedia.org/resource/> PREFIX dbpprop: <http://dbpedia.org/property/> PREFIX dbpclass: <http://dbpedia.org/class/yago/> BigData & Wikidata - no lies
  21. 21. Wikipedia articles consist mostly of free text, but also contain different types of structured information: infobox templates, categorisation information, images, geo-coordinates, and links to external Web pages. DBpedia transforms into RDF triples data that are entered in Wikipedia. So creating a page in Wikipedia creates RDF in DBpedia. BigData & Wikidata - no lies
  22. 22. BigData & Wikidata - no lies
  23. 23. Example: https://en.wikipedia.org/wiki/Pulp_Fiction describes the movie. DBpedia creates a URI: http://dbpedia.org/resource/wikipedia_page_name (where wikipedia_page_name is the name of the regular Wikipedia html page) = http://dbpedia.org/page/Pulp_Fiction. Underscore characters replace spaces. DBpedia can be queried via a Web interface at ttp://dbpedia.org/sparql . The interface uses the Virtuoso SPARQL Query Editor to query the DBpedia endpoint. BigData & Wikidata - no lies
  24. 24. Public SPARQL Endpoint - use OpenLink Virtuoso Wikipedia page: http://en.wikipedia.org/wiki/Pulp_Fiction DBPedia resource: http://dbpedia.org/page/Pulp_Fiction InfoBox: dbpedia-owl:abstract; dbpedia-owl:starring; dbpedia-owl:budget; dbpprop:country; dbpprop:caption ecc. For instance, the figure below shows the source code and the visualisation of an infobox template containing structured information about Pulp Fiction. BigData & Wikidata - no lies
  25. 25. Big&Wikidata - no lies
  26. 26. Big&Wikidata - no lies PREFIX prop: <http://dbpedia.org/property/> PREFIX res:<http://dbpedia.org/resource/> PREFIX owl:<http://dbpedia.org/ontology/> SELECT DISTINCT ?name ?abstract ?caption ?image ?budget ?director ?cast ?country ?category WHERE { res:Pulp_Fiction prop:name ?name ; owl:abstract ?abstract ; prop:caption ?caption; owl:thumbnail ?image; owl:budget ?budget ; owl:director ?director ; owl:starring ?cast ; prop:country ?country ; dcterms:subject ?category . FILTER langMatches( lang(?abstract), 'en'). }
  27. 27. Big&Wikidata - no lies ...
  28. 28. Linked Data is a method of publishing RDF data on the Web and of interlinking data between different data sources. Query builder: ➢ http://dbpedia.org/snorql/ ➢ http://querybuilder.dbpedia.org/ ➢ http://dbpedia.org/isparql/ ➢ http://dbpedia.org/fct/ ➢ http://it.dbpedia.org/sparql Prefix variables start with "?" BigData & Wikidata - no lies
  29. 29. The current RDF vocabularies are available at the following locations: ➔ W3: http://www.w3.org/TR/vcard-rdf/ vCard Ontology - for describing People and Organizations http://www.w3.org/2003/01/geo/ Geo Ontology - for spatially-located things http://www.w3.org/2004/02/geo/ SKOS Simple Knowledge Organization System BigData & Wikidata - no lies
  30. 30. ➔ GEO NAMES: http://www.geonames.org/ geospatial semantic information (postal code) ➔ DUBLIN CORE: http://www.dublincore.org/ defines general metadata attributes used in a particular application ➔ FOAF: http://www.foaf-project.org/ Friend of a Friend, vocabulary for describing people ➔ UNIPROT: http://www.uniprot.org/core/, http://beta.sparql.uniprot.org/uniprot for science articles BigData & Wikidata - no lies
  31. 31. ➔ MUSIC ONTOLOGY: http://musicontology.com/, provides terms for describing artists, albums and tracks. ➔ REVIEW VOCABULARY: http://purl.org/stuff/rev , vocabulary for representing reviews. ➔ CREATIVE COMMONS (CC): http://creativecommons.org/ns , vocabulary for describing license terms. ➔ OPEN UNIVERSITY: http://data.open.ac.uk/ BigData & Wikidata - no lies
  32. 32. ➔ Semantically-Interlinked Online Communities (SIOC): www.sioc- project.org/, vocabulary for representing online communities ➔ Description of a Project (DOAP): http://usefulinc.com/doap/, vocabulary for describing projects ➔ Simple Knowledge Organization System (SKOS): http://www.w3.org/2004/02/skos/, vocabulary for representing taxonomies and loosely structured knowledge BigData & Wikidata - no lies
  33. 33. BigData & Wikidata - no lies
  34. 34. SPARQL queries have two parts (FROM is not indispensable): 1. The query (WHERE) part, which produces a list of variable bindings (although some variables may be unbound). 2. The part which puts together the results. SELECT, ASK, CONSTRUCT, or DESCRIBE. Other keywords: UNION, OPTIONAL (optional display if data exists), FILTER (conditions), ORDER BY, GROUP BY BigData & Wikidata - no lies
  35. 35. SELECT - is effectively what the query returns (a ResultSet) ASK - just looks to see if there are any results COSTRUCT - uses a template to make RDF from the results. For each result row it binds the variables and adds the statements to the result model. If a template triple contains an unbound variable it is skipped. Return a new RDF-Graph DESCRIBE - unusual, since it takes each result node, finds triples associated with it, and adds them to a result model. Return a new RDF-Graph BigData & Wikidata - no lies
  36. 36. What linked data il good for? Don’t search a single thing, but explore a whole set of related things together! 1) Revolutionize Wikipedia Search 2) Include DBpedia data in our own web page 3) Mobile and Geographic Applications 4) Document Classification, Annotation and Social Bookmarking 5) Multi-Domain Ontology 6) Nucleus for the Web of Data BigData & Wikidata - no lies
  37. 37. BigData & Wikidata - no lies
  38. 38. MOBILE QRpedia.org - MIT Licence BigData & Wikidata - no lies
  39. 39. WIKIPEDIA DUMPS ● Arabic Wikipedia dumps: http://dumps.wikimedia.org/arwiki/ ● Dutch Wikipedia dumps: http://dumps.wikimedia.org/nlwiki/ ● English Wikipedia dumps: http://dumps.wikimedia.org/enwiki/ ● French Wikipedia dumps: http://dumps.wikimedia.org/frwiki/ ● German Wikipedia dumps: http://dumps.wikimedia.org/dewiki/ ● Italian Wikipedia dumps: http://dumps.wikimedia.org/itwiki/ ● Persian Wikipedia dumps: http://dumps.wikimedia.org/fawiki/ ● Polish Wikipedia dumps: http://dumps.wikimedia.org/plwiki/ BigData & Wikidata - no lies
  40. 40. WIKIPEDIA DUMPS ● Portuguese Wikipedia dumps: http://dumps.wikimedia.org/ptwiki/ ● Russian Wikipedia dumps: http://dumps.wikimedia.org/ruwiki/ ● Serbian Wikipedia dumps: http://dumps.wikimedia.org/srwiki/ ● Spanish Wikipedia dumps: http://dumps.wikimedia.org/eswiki/ ● Swedish Wikipedia dumps: http://dumps.wikimedia.org/svwiki/ ● Ukrainian Wikipedia dumps: http://dumps.wikimedia.org/ukwiki/ ● Vietnamese Wikipedia dumps: http://dumps.wikimedia.org/viwiki/ BigData & Wikidata - no lies
  41. 41. LINK Codelab’s project code: http://github.com/GDG-L-Ab/SparqlOpendataWS http://dbpedia.org/sparql & http://it.dbpedia.org/sparql http://wiki.dbpedia.org/Datasets http://en.wikipedia.org/ & http://it.wikipedia.org/ http://dbpedia.org/snorql, http://data.semanticweb.org/snorql/ SPARQL Explorer http://downloads.dbpedia.org/3.9/ & http://wiki.dbpedia.org/Downloads39 BigData & Wikidata - no lies
  42. 42. Projects that use linked data: JAVA: Open Learn Linked data: free access to Open University course materials PHP: Semantic MediaWiki -Lllets you store and query data within the wiki's pages. PEARL: WikSAR PYTHON: Braindump - semantic search in Wikipedia RUBY: SemperWiki BigData & Wikidata - no lies
  43. 43. BigData & Wikidata - no lies THANK YOU! :-) I AM CAMELIA BOBAN G+ : https://plus.google.com/u/0/+cameliaboban Twitter : http://twitter.com/GDGRomaLAb LinkedIn: it.linkedin.com/pub/camelia-boban/22/191/313/ Blog: http://blog.aissatechnologies.com/ Skype: camelia.boban camelia.boban@gmail.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×