These slides accompanied the first part of the workshop that Vinayak Das Gupta and myself gave at the Data Visualization for the Arts and Humanities event, which was held in Queen's University, Belfast on 5-6 March 2015. The workshop, entitled 'Data-mining the Semantic Web and spatially visualising the results', introduced the participants to the concepts and technologies of Open Data, the Semantic Web, RDF, SPARQL, GeoJSON and Leaflet.js. These slides cover the data-mining of online cultural heritage resources.
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Data-mining the Semantic Web
1. Data-mining the Semantic Web
and spatially visualising the results
Data Visualization for the Arts and Humanities
Queen’s University Belfast 5-6 March 2015
2. 1 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Workshop overview
• Day 1 : Data-mining
– Open Data
– Linked Data
– Linked Open Data implementation
– Semantic Web and ontologies
– Hands-on practicals
3. 2 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Workshop overview
• Day 2 : Data visualisation
– Data visualisation concepts introduction
– Web maps and geo-tagging
– Hands-on practical
– Interpretations
– Hermeneutic circle
4. 3 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
From the horse’s mouth
(source: www.ted.com/talks/tim_berners_lee_on_the_next_web)
5. 4 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
6. 5 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Open Access
Terminology
Open Data
Big Data
The web of data
The Semantic Web
Linked Data
data mining
7. 6 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Asking questions of digital datasets
Terminology
8. 7 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Open Access
Terminology
9. 8 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Design by Julie Beck
for the Harvard University Neuroinformatics dept
(source: www.juliebcreative.com/portfolio/open-data-logo/)
10. 9 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
http://linkedarc.net/surveys/arch-datasharing
11. 10 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Linked Data
Terminology
The linkages between the major Linked Data datasets (source: lod-cloud.net)
12. 11 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Big Data
Terminology
Wordle of terms associated with Big Data activity (source: sfdata.startupweekend.org)
13. 12 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
5 Stars of Open Data
put your data online under an open license
make it structured (e.g. as an Excel file)
use non-proprietary formats (e.g. XML and not Excel)
use URIs to identify resources
link your data to external datasets
14. 13 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
The RDF Triple
15. 14 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
A Triple Example
‘…the boy’s name is Tom…’
subject
predicate
object
16. 15 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Triple Linking
‘…Tom is short for Thomas…’
subject
predicate
object
17. 16 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Graph data
18. 17 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Serialising RDF
• Turtle
• JSON
• RDF/XML
• N-Triples
19. 18 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
RDF Turtle
@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<green-goblin>
rel:enemyOf <spiderman> ;
a foaf:Person ; # in the context of the Marvel universe
foaf:name "Green Goblin" .
<spiderman>
rel:enemyOf <green-goblin> ;
a foaf:Person ;
foaf:name "Spiderman", "Человек-паук"@ru .
1
2
3
20. 19 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
As N-Triples
<http://example.org/green-goblin> <http://www.perceive.net/schemas/relationship/enemyOf>
<http://example.org/spiderman> .
<http://example.org/green-goblin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person> .
<http://example.org/green-goblin> <http://xmlns.com/foaf/0.1/name> "Green Goblin" .
<http://example.org/spiderman> <http://www.perceive.net/schemas/relationship/enemyOf>
<http://example.org/green-goblin> .
<http://example.org/spiderman> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person> .
<http://example.org/spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman" .
<http://example.org/spiderman> <http://xmlns.com/foaf/0.1/name>
"u00D0u00A7u00D0u00B5u00D0u00BBu00D0u00BEu00D0u00B2u00D0u00B5u00D0u0
0BA-u00D0u00BFu00D0u00B0u00D1u0083u00D0u00BA"@ru .
21. 20 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
As JSON
{"http://example.org/green-
goblin":{"http://www.perceive.net/schemas/relationship/enemyOf":[{"ty
pe":"uri","value":"http://example.org/spiderman"}],"http://www.w3.org
/1999/02/22-rdf-syntax-
ns#type":[{"type":"uri","value":"http://xmlns.com/foaf/0.1/Person"}],"ht
tp://xmlns.com/foaf/0.1/name":[{"type":"literal","value":"Green
Goblin"}]},"http://example.org/spiderman":{"http://www.perceive.net/s
chemas/relationship/enemyOf":[{"type":"uri","value":"http://example.org
/green-goblin"}],"http://www.w3.org/1999/02/22-rdf-syntax-
ns#type":[{"type":"uri","value":"http://xmlns.com/foaf/0.1/Person"}],"ht
tp://xmlns.com/foaf/0.1/name":[{"type":"literal","value":"Spiderman"},{
"type":"literal","value":"u0427u0435u043bu043eu0432u0435u043a-
u043fu0430u0443u043a","lang":"ru"}]}}
22. 21 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
As RDF/XML
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:ns0="http://www.perceive.net/schemas/relationship/">
<foaf:Person rdf:about="http://example.org/green-goblin">
<ns0:enemyOf>
<foaf:Person rdf:about="http://example.org/spiderman">
<ns0:enemyOf rdf:resource="http://example.org/green-goblin"/>
<foaf:name>Spiderman</foaf:name>
<foaf:name xml:lang="ru">Человек-паук</foaf:name>
</foaf:Person>
</ns0:enemyOf>
<foaf:name>Green Goblin</foaf:name>
</foaf:Person>
</rdf:RDF>
23. 22 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Visualised as a Graph
24. 23 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Triplestores and Infrastructure
25. 24 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Practical: Making RDF
http://www.franklynam.com/blog.aspx?id=85
Q: Create RDF representations of yourself and
your relationships
26. 25 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
The Semantic Web and Ontologies
The stages of the Web (source: urenio.org)
27. 26 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Ontological Classes and Properties
28. 27 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
The British Museum data mapping onto the CIDOC CRM
(source: confluence.ontotext.com/display/ResearchSpace/BM+Mapping)
29. 28 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
The CIDOC CRM basic entity types and their relationships
(source: www.cidoc-crm.org/)
30. 29 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Vocabularies
31. 30 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Graph data
32. 31 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Minna Sundberg (source: www.sssscomic.com/comic.php?page=196)
33. 32 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Querying using SPARQL
SELECT *
WHERE {
?s ?p ?o
} LIMIT 10
34. 33 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
More complex SPARQL
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX letters1916: <http://letters1916.linkedarc.net/ontology/>
PREFIX letters1916data: <http://letters1916.linkedarc.net/data/>
PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?letter ?letterName ?recipientPostalAddressName ?recipientLongitude ?recipientLatitude
WHERE {
?letter rdf:type letters1916:Letter ;
schema:name ?letterName ;
letters1916:recipientLocation ?recipientPostalAddress .
?recipientPostalAddress schema:addressRegion ?recipientPostalAddressRegion ;
FILTER regex(?recipientPostalAddressRegion, 'Galway', 'i')
?recipientPostalAddress schema:name ?recipientPostalAddressName .
?recipientPlace schema:address ?recipientPostalAddress ;
schema:geo ?recipientGeoCoordinates .
?recipientGeoCoordinates schema:longitude ?recipientLongitude ;
schema:latitude ?recipientLatitude
}
1
2
3
35. 34 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Practical: Universities on DBpedia
http://www.franklynam.com/blog.aspx?id=86
Q: Get a list of all of the universities that DBpedia
knows about
36. 35 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
SKOS
@prefix dct: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix cc: <http://creativecommons.org/ns#> .
<http://linkedarc.net/vocabs/vessel-jar> a skos:Concept ;
cc:license <http://creativecommons.org/licenses/by/3.0> ;
cc:attributionURL <http://linkedarc.net> ;
cc:attributionName "linkedarc.net" ;
skos:inScheme <http://linkedarc.net/vocabs> ;
skos:prefLabel “Jar" ;
skos:scopeNote ”A jar concept. Pottery. This isn’t a great scope note." ;
dct:publisher <http://linkedarc.net> ;
dct:identifier <http://linkedarc.net/vocabs/vessel-jar> ;
dct:issued "2015-02-23"^^xsd:date ;
skos:exactMatch <http://purl.org/heritagedata/schemes/mda_obj/concepts/97609> .
37. 36 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
SPARQL + FILTER
SELECT * WHERE {
?s rdfs:label ?label .
FILTER langMatches(lang(?label), "en”)
}
38. 37 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
SPARQL + FILTER
SELECT * WHERE {
?s rdfs:label ?label .
FILTER langMatches(lang(?label), "en") .
FILTER regex(?label, ”bell", "i”)
}
39. 38 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
SPARQL + FILTER
SELECT * WHERE {
?s dct:dateCreated ?dateCreated .
FILTER (?dateCreated > '1900-01-01'
}
40. 39 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Practical: Getty Concepts
Q: Get all of the Getty URIs that represent
concepts related to amphorae
SPARQL endpoint: http://vocab.getty.edu/sparql
41. 40 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Practical: British Museum Sarcophagi
Q: Get the find spots of all of the sarcophagi in
the British Museum collection
SPARQL endpoint: http://collection.britishmuseum.org/sparql
42. 41 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Geo-coding the Find Spots
with Google Refine
43. 42 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
The Google Maps API
Address String
Geo-coordinates as JSON
44. 43 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Export as CSV
45. 44 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Practical: data.cso.ie
Q: Get the employment figures generated by the
2011 Irish census by region
SPARQL endpoint: http://nomisma.org/sparql
46. 45 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Practical: Nomisma and Ancient Coins
Q: Get the geo-coordinates of all of the coin
hoards stored in the Nomisma triplestore
SPARQL endpoint: http://data.cso.ie/query.html
47. 46 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Additional Linked Data Resources
http://www.franklynam.com/blog.aspx?id=89
48. 47 of 47@flynam @bilusaurus
Data-mining the Semantic Web and spatially visualising the results
Data Visualization for the Arts and Humanities
Thank you!
Martin Lemay (source: twitter.com/martinlemay)
Editor's Notes
Me
DAH PhD
Archaeology as UG and MPhil
IT industry background
Bit of overlap
Themes of the day
Using LD and OD as a tool of the CH researcher
As a way of dealing with Big Data
And as a way of combining data from different datasets
From perspective of data consumer and provider. Mainly the former.
Practically focused
Have your laptops ready
Introduction to RDF and most popular LD technologies
Introduce yourselves
Tomorrow and Bilu
Data viz
Taking the data mined today and visualising it spatially
Theory briefly
Mapping on the web
Geo-tagging content
Very practical
Visualisations aren’t the end point. They lead to more questions.
Back to data-mining.
Easy start
Tim Berners-Lee at Ted
http://www.ted.com/talks/tim_berners_lee_on_the_next_web#t-327012
This will necessarily include a potted history of the field
Open Access and Open Data
Open Access
What is it?
Sharing
Web 2.0
Democracy
Open government
Sectors affected
Academia
Business
Journalism
Typically human-readable content
HTML
Images
Video
Legality of sharing
This will necessarily include a potted history of the field
Open Access
Open Access
What is it?
Sharing
Web 2.0
Democracy
Open government
Sectors affected
Academia
Business
Journalism
Typically human-readable content
HTML
Images
Video
Legality of sharing
Open Data
As we saw in TBL Ted
Model is the Document Web
But for data
What is data?
Is it publications?
Raw data
Text
Binary data
3D data
Images
Video
Metadata
Paradata
Clement: live data sources for data viz
Shameless plug
Linked Data or Linked Open Data
Expands the Open Data idea
But more
Make datasets transparent
Make them inter-dependant
The document web model
First used by John Mashey in the mid-1990s
Handling and analysis of massive datasets (Kitchin 2014, 67)
By 2013 it had move from:
The ‘peak of inflated expectation’ to the ‘trough of disillusionment’
Cf. Dr. Clément Levallois: plateau of productivity
According to Gartner
It still retains a lot of popularity in government, biz and academic sectors
Data size?
EAA 2014 Gabriele Gattiglia, Uni of Pisa paper
Focus on approaches to data
Not data size
Having said that global data sizes are growing exponentially
thanks to sensor data, more digital bureaucracy, commerce mainly
Stat: data size growth
Berners-Lee in 2006
He calls it Open Data
but really should be Linked Data or LOD
In fact back to earliest proposal for WWW
“Evolution of objects from being principally human-readable documents to contain more machine-oriented semantic information” (Berners-Lee et al., 1994)
Use the existing architecture of the WWW
Publish data
Link data
Data-mine
For one star…
OK. Pause. Review
Lots of terms. Lots of overlap. In a word.
Open Data espouses the free movement of nodes of information within and across knowledge domains
Linked Data is a superset of OD. And is often called LOD. It is everything that OD is and these data nodes are linkable. See later.
Big Data: is the environment in which LOD lives. It is modus operandi. A way of approaching questions. It doesn’t have to be about massive datasets but it often is.
We have done the WHAT in a very general sense.
Now on to how to the HOW.
Linked Open Data is a knowledge philosophy
It is abstract
It needs implementation
Resource Description Framework
Based around simple concept of the triple
Very simple but when combined, it can encode great complexity
Based on linguistic theory
URI at core
See previous 5 stars
The boy’s name is Tom
Tom is short for Thomas
This is KEY
Links create graphs of data.
Graphs are not hierarchical in the sense that any one node can only have one parent.
They are poly-hierarchical. Multiple parents and children.
RDF needs to encoded or serialised in some way
Many serialisations out there
Formats
N-Triples
Turtle
RDF XML
JSON
There are others
We will look at Turtle
Header
Resource 1: Green goblin
Resource 1: Spiderman
Link between the two
Different serialisations
Same data
From data provider point of view
Need to think about:
Storage
Native triplestores
Apache Jena
Quad stores
Named graphs
Virtuoso Quad stores
Interfaces
Static RDF files
Web API
SPARQL
Key. Come back to this.
You have been introduced to LD and RDF
Now write some
Encode some meaning
Using a popular ontology
Read the instructions on my blog
Create RDF representations of yourselves and your relationships.
Back to terminology
SW
Web of Data
Needs semantics
Plus ability to find out about the structure of remote datasets
What we have just been talking about
Structure
What do we mean?
Ontologies
Philosophical sense
Relationship of humans to world around us
CS sense
Way of ordering data
Car example
Structured
Good for data-mining
Bad for determinism, essentialism
General ontologies
Schema.org
FOAF
Dublin Core
CH ontologies
CIDOC CRM
Extensions
EH
ARIADNE
linkedARC.net
ARCHAEO-ML
CHARM
Or build your own
CIDOC CRM
Aka thesauri, taxonomies
Literals
Weak for indexing
Controlled lists
Balance needed
Control
Flexibility
Seneschal project
Getty AAT
See practical
Marc Alexander this morning
The data is RDF but how do we get at the semantics?
Similar to MySQL querying
Can be difficult to get head around
Try it out
Explain. Spend a good bit of time here. This is key to the practicals.
Ask students
Get an overview of the predicates associated with the dbpedia-owl:University type.
Might have to use http://live.dbpedia.org/sparql instead of http://dbpedia.org/sparql
Back to vocabularies
SKOS
Simple Knowledge Organization System
Key to how CH institutions work. Since the library of Alexandria
Combine our understanding of SKOS concepts and filters.
Get me all the Getty URIs that represent concepts related to amphorae.
No one correct answer.
What good is a place string?
Get URL for GMaps reverse geo-coding
Need a GMaps API key. Signup.