Your SlideShare is downloading. ×
0
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM

1,326

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,326
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Digital Humanities 101 - 2013/2014 - Course 6 Digital Humanities Laboratory Fr´d´ric Kaplan e e frederic.kaplan@epfl.ch
  2. o Semester 1 : Content of each course • (1) 19.09 Introduction to the course / Live Tweeting and Collective note taking • (2) 25.09 Introduction to Digital Humanities / Wordpress / First assignment • (3) 2.10 Introduction to the Venice Time Machine project / Zotero • 9.10 No course • (4) 16.10 Digitization techniques / Deadline first assignment • (5) 23.10 Datafication / Presentation of projects • (6) 30.10 Semantic modelling / RDF / Deadline peer-reviewing of first assignment Digital Humanities 101 - 2013/2014 - Course 6 | 2013 2
  3. o Semester 1 : Content of each course • (7) 6.11 Pattern recognition / OCR / Semantic disambiguation • (8) 13.11 Historical Geographical Information Systems, Procedural modelling / City Engine / Deadline Project selection • (9) 20.11 Crowdsourcing / Wikipedia / OpenStreetMap • (10) 27.11 Cultural heritage interfaces and visualisation / Museographic experiences • 4.12 Group work on the projects • 11.12 Oral exam / Presentation of projects / Deadline Project blog • 18.12 Oral exam / Presentation of projects Digital Humanities 101 - 2013/2014 - Course 6 | 2013 3
  4. o Objective of today's course • Showing you the beauty and making you feel the power of semantic coding • Give you a quick idea about what is behind the following strange acronyms : RDF, URI, OWL, SPARQL, SWRL, CIDOC-CRM • Motivate you to look deeper. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 4
  5. o A short introduction to semantic coding • Many good books exist. I recommend this one. • I will reuse some of their example in the following slides. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 5
  6. Doris Stockly Digital Humanities 101 - 2013/2014 - Course 6 | 2013 6
  7. o incanti.dhlab.ch Digital Humanities 101 - 2013/2014 - Course 6 | 2013 7
  8. o The simplest kind of dataset, that everyone is familiar with, is tabular data (any data kept in a table such as an Excel spreadsheet). Digital Humanities 101 - 2013/2014 - Course 6 | 2013 8
  9. o Digital Humanities 101 - 2013/2014 - Course 6 | 2013 9
  10. o Data kept in table is easy to display, sort, print, edit. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 10
  11. o You might not even think of data in an Excel spreadsheet as modeled. But there are semantics in data table. Where ? Digital Humanities 101 - 2013/2014 - Course 6 | 2013 11
  12. o There are also obvious limitations with this kind of storage. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 12
  13. o Digital Humanities 101 - 2013/2014 - Course 6 | 2013 13
  14. o You cannot search for the routes that stay more than 2 days at Corfu. Sorting the columns does not capture the deeper meaning of the text we entered. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 14
  15. o Relational databases are a solution. Many very mature products exist like Oracle DB, MySQL and PostgreSQL. A relational database allows multiple tables to be joined in a standardized way. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 15
  16. o Digital Humanities 101 - 2013/2014 - Course 6 | 2013 16
  17. o But, as our project goes we may need to reformate our tables.This is called schema migration. A painful process. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 17
  18. o For big databases, schema can get incredibly complex. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 18
  19. o Trying to normalize these databases in a single schema is a labor-intensive process. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 19
  20. o How to make future-proof schemata Digital Humanities 101 - 2013/2014 - Course 6 | 2013 20
  21. o How to make future-proof schemata • With this mode of coding we can add easily new properties (price of Route, captain, etc.). The schema is future-proof. • In addition, the data about the data (i.e. the medadata, the name of columns) is now part of the data itself. • This is ideal for projects in Perpetual Beta. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 21
  22. o and most important it makes a direct and simple connection with a well-developed research field : logic. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 22
  23. o Indeed, this can be written in a different way Digital Humanities 101 - 2013/2014 - Course 6 | 2013 23
  24. o Indeed, this can be written in a different way • (Subject Predicate Object) • (R1 departure Venice) • This is called a RDF statement, an atomic relation in a database Digital Humanities 101 - 2013/2014 - Course 6 | 2013 24
  25. o RDF statements • (Subject Predicate Object) • (R1 departure Venice) • This is called a RDF statement, an atomic relation in a database • (R1 departure-date 2.7.1422) Digital Humanities 101 - 2013/2014 - Course 6 | 2013 25
  26. o This is a graph Digital Humanities 101 - 2013/2014 - Course 6 | 2013 26
  27. o As RDF statements can be understood both a logic statements and as parts of a graph, one can use many tools and idea from logic and graph theory to manipulate them. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 27
  28. o URIs • The nodes of the Graph are called Resources. • When you want to coordinate multiple datasets it can become increasingly difficult to guarantee unique and consistent identifiers fore ach node. • R1 that we use in our database may mean something else in an other database. • For naming resources, RDF uses URIs (Unique Resource Identifiers) and an optional Fragment identifier. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 28
  29. o URIs • You are probably familiar with URL (Universal Resource Locators), the string used to specify how web pages are retrieved. • URIs generalize this concept further by saying that anything, whether you can retrieve it electronically or not, can be uniquely identified in a similar way. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 29
  30. o URIs Digital Humanities 101 - 2013/2014 - Course 6 | 2013 30
  31. o Since URIs can identified anything as a resource, the subject of an RDF statement can be a resource, the object can be a resource and most importantly predicates are always resources. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 31
  32. o An example of URI Ref for a common RDF predicate Digital Humanities 101 - 2013/2014 - Course 6 | 2013 32
  33. o It is common in RDF to shorten URIs by assigning a namespace to the base URI and writing only the distinctive part of the identifier. The last URIs can be written in a shorter manner : rdf:type Digital Humanities 101 - 2013/2014 - Course 6 | 2013 33
  34. o Serialization • While the data model that RFD uses is very simple, the serialized representation tends to get complicated when a RDF graph is saved in a file or sent over a network. • Different serialization formats exist :, N3, RDF/XML(the most freq. used), RDFa (RDF in attributes) Digital Humanities 101 - 2013/2014 - Course 6 | 2013 34
  35. o Vocabularies • A set of URIRefs is known as a vocabulary. • We can design a specific vocabulary for our maritime route examples. • There are also famous vocabularies like the RDF vocabulary (the set of URIRefs describing the RDF concepts, ex. rdf :resource, rdf :type) Digital Humanities 101 - 2013/2014 - Course 6 | 2013 35
  36. o SPARQL • Just as SQL provides a standard query language across relational databases, SPARQL provides a query language for RDF graphs. (pronounce sparkle) • SPARQL queries attempt to match patterns in the graph and bind wildcard variables as its finds solutions. • Departure( ?x1,Venice) • Captain( ?x1, ?x2), Gender( ?x2,Women) • Semantic coding is all about asking bigger questions. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 36
  37. o SWRL • With RDF coding, we can also write rules to infer new triples • If hasParent( ?x1, ?x2) and hasBrother( ?x2, ?x3) then hasUncle( ?x1, ?x3) • This is also a way of detecting possible incoherence in the set of knowledge coded in the triple store (actors doing things after their death) • One standard language to do this is SWRL (Semantic Web Rule Language) Digital Humanities 101 - 2013/2014 - Course 6 | 2013 37
  38. o Ontologies • An ontology provides a special vocabulary with which knowledge can be represented. • This vocabulary allows us to specify which entities will be represented, how they can be grouped and what relationship connect them together. • (Venice isa Place), (Corfu isa Place), (Place haslat latitude), (Place haslong longitude) • Now, something very beautiful... Digital Humanities 101 - 2013/2014 - Course 6 | 2013 38
  39. o An ontology can be expressed as RDF triples and stored in a graph alongside the data it describes. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 39
  40. o An ontology can be expressed as RDF triples and stored in a graph alongside the data it describes. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 40
  41. o OWL • OWL (Web Ontology Language) is an ontology language layered on top of RDF and RDFs • Terminology statements • ex:Bridge rdf:type rdfs:class • ex:Bridge rdfs:subclass ex:Place • Assertion statements • ex:Rialto rdf:type ex:Bridge • ex:ex:RialtoCons ex:broughtIntoExistence ex:Rialto Digital Humanities 101 - 2013/2014 - Course 6 | 2013 41
  42. o It is relatively easy to create your own ontology using a software like Protégé. But some ontologies aim at being universal Digital Humanities 101 - 2013/2014 - Course 6 | 2013 42
  43. o CIDOC-CRM Digital Humanities 101 - 2013/2014 - Course 6 | 2013 43
  44. o CIDOC-CRM • CIDOC-CRM is an ontology for Cultural heritage. • About 20 years of work. • An ISO standard 21127. • 100+ schema. Very stable. • CIDOC-CRM is a tentative to formalise an underlying semantics common to many classifications. It includes very interesting ideas. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 44
  45. o CIDOC-CRM : Events • In CIDOC-CRM, the modelling is event-centric. • The underlying idea is to model change, not state. Therefore, temporal entities play a central role. • Instead of coding the birthdate of a actor, it is better to code the event of its birth. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 45
  46. o Actors relate to things only via temporal entities and events. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 46
  47. o CIDOC-CRM : Events • The participation or presence of several non-temporal entities in an event e1 allows to conclude that they have been in the same time-interval and space, even without knowledge of the particular time or space. • They must have existed at that time. They have not been somewhere else at that time (with electronic communication, the space volume in which events occur can become very large). • The events e0i of creation of each participant i have happened before or at the time of e1. The events e2i of destruction (or vanishing) of each participant have happened after or at the time of e1. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 47
  48. o CIDOC-CRM : Properties • The property P11 had participants denotes active or passive involvement of Actors, whereas P12 occurred in the presence of ranges from objects just being there (e.g. a desk where a treaty was signed) • The properties P92 brought into existence, P93 took out of existence are limiting the existence of things which have a persistent existence. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 48
  49. o CIDOC-CRM : Properties Digital Humanities 101 - 2013/2014 - Course 6 | 2013 49
  50. o CIDOC-CRM : Properties Digital Humanities 101 - 2013/2014 - Course 6 | 2013 50
  51. o CIDOC-CRM : Properties Digital Humanities 101 - 2013/2014 - Course 6 | 2013 51
  52. o CIDOC-CRM : Place • CIDOC-CRM has also implemented a very interesting model for places. What is hard about places ? • The question where is it can be answered in natural language by relation to two different kinds of entities : geometric areas or objects. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 52
  53. o In France, in Athens, 39N 124E. Points given by spatial coordinates are typically understood as the centre of a wider, extended area. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 53
  54. o on mount St Helens, at the Rhine river. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 54
  55. o on Queen Elizabeth (the ship), in my suitcase, at home. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 55
  56. o CIDOC-CRM : Place • Following the CIDOC CRM, geometric areas (E53 Place) can only be defined relative to larger objects, including the surface of earth. • Those objects in turn may be located at different times at different places (relative to a larger object). • The cultural interest is in the relation to other things and not to an abstract absolute space. Absolute coordinates seem to make no sense when the reference objects move. • As historical information is incomplete and sparse, and many reference objects move, normalization of place information to absolute coordinates should not replace the primary information, which is typically relative. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 56
  57. o CIDOC-CRM : Places Digital Humanities 101 - 2013/2014 - Course 6 | 2013 57
  58. o CIDOC-CRM : Influence • Another problematic issue is the notion of influence. It is difficult to develop a systematic understanding of the different forms of influence and their mutual relations • Some are more physical, like using a mould or a tool. The influence of a mould on a produced object is strong and can often be verified on the object afterwards. The influence of a hammer is less specific. • Similarly, making a copy of a painting has a strong influence on the product, copying the idea of a painting, a weak one. The latter is more an intellectual influence than a physical one. • If a real influence existed, a temporal sequence can be deduced. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 58
  59. o CIDOC-CRM : Influence Digital Humanities 101 - 2013/2014 - Course 6 | 2013 59
  60. o CIDOC-CRM : Influence Digital Humanities 101 - 2013/2014 - Course 6 | 2013 60
  61. o CIDOC-CRM Digital Humanities 101 - 2013/2014 - Course 6 | 2013 61
  62. o CIDOC-CRM Digital Humanities 101 - 2013/2014 - Course 6 | 2013 62
  63. o Summary : Guidelines for coding historical data Digital Humanities 101 - 2013/2014 - Course 6 | 2013 63
  64. o (1) Prefer events to properties. Actors do not have properties, they participate to event. Instead of coding the birthdate of a actor, it is better to code the event of its birth. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 64
  65. o (2) Code date intervals instead of dates. This is much more flexible and permits to detect inconsistencies. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 65
  66. o (3) Code places in a relative manner and not an absolute manner. The cultural interest is in the relation to other things and not to an abstract absolute space. Absolute coordinates seem to make no sense when the reference objects move. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 66
  67. o All this is very beautifut, but is it sufficient to do the kind of historical modeling we want to do ? We have an issue, which one ? Digital Humanities 101 - 2013/2014 - Course 6 | 2013 67
  68. o Metaknowledge : Knowledge about how knowledge is produced. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 68
  69. o How can we encode metaknowledge • Expressed knowledge (RDF triples) is not in the same space as resources (URI). We can easily attach new information to resource but not to triples. • It is not easy to represent metaknowledge like the origin of the uncertainty linked with an information. • To overcome this issue we need to introduce two levels of knowledge and use a trick. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 69
  70. o Reifued RDF vs. Standard RDF • An expressed RDF (RialtoReconstruction hasTimeSpan 1588-1591) can be transformed in 3 reified triplets • (s1 rdf:subject RialtoReconstruction) • (s1 rdf:predicate hasTimeSpan) • (s1 rdf:object 1588-1591) Digital Humanities 101 - 2013/2014 - Course 6 | 2013 70
  71. o Reifued RDF vs. Standard RDF • An expressed RDF (RialtoReconstruction hasTimeSpan 1588-1591) can be transformed in 3 reified triplets • (s1 rdf:subject RialtoReconstruction) • (s1 rdf:predicate hasTimeSpan) • (s1 rdf:object 1588-1591) • (s1 metardf:reliability 0.8) • (s1 metardf:creator FredericKaplan) Digital Humanities 101 - 2013/2014 - Course 6 | 2013 71
  72. o Possible historical spaces • Now our RDF store includes both historical knowledge and knowledge about the creation of this historical knowledge. • These kinds of metainformation can document all the construction phases (whether realized by humans or machines) • With this approach, we can extract through queries the historical knowledge corresponding to some specific sources and thus create a possible historical reality. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 72
  73. o Summary Digital Humanities 101 - 2013/2014 - Course 6 | 2013 73
  74. o Encoding metahistorical information • We must not only model historical information, but model each step of the construction of historical knowledge. • There is a need for semantic framework capable of coding historical information and meta-historical information. • Coding meta-historical information implies documenting the choice of sources, transcription phases, interpretation processes realized by humans or machines. Digital Humanities 101 - 2013/2014 - Course 6 | 2013 74
  75. o No unique global truth but fully documented possible historical reconstructions Digital Humanities 101 - 2013/2014 - Course 6 | 2013 75

×