Schema and Identity for Linked Data


Published on

IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Schema and Identity for Linked Data

  1. 1. 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, KoreaIdentity and schema for Linked Data Hideaki Takeda National Institute of Informatics Hideaki Takeda / National Institute of Informatics
  2. 2. How to put the data into computer?• How to describe the data? – The way to describe individual data • Schema/Class/Concept – The way to describe relationship among schema/class/concept • Ontology/Taxonomy/Thesaurus• How to refer the data? – The way to identify individual data • Identifier – Relationship among identifiers Hideaki Takeda / National Institute of Informatics
  3. 3. Architecture for the Semantic Web The world of classes (Ontologies) The world of instances (Linked Data) Tim Berners-Lee Hideaki Takeda / National Institute of Informatics
  4. 4. Layers of Semantic Web• Ontology – Descriptions on classes – RDFS, OWL – Challenges for ontology building • Ontology building is difficult by nature – Consistency, comprehensiveness, logicality • Alignment of ontologies is more difficult Descriptions on classes Ontology インスタンスに関する記述 Linked Data Tim Berners-Lee Hideaki Takeda / National Institute of Informatics
  5. 5. Layers of Semantic Web• Linked Data – Descriptions on instances (individuals) – RDF + (RDFS, OWL) – Pros for Linked Data • Easy to write (mainly fact description) • Easy to link (fact to fact link) – Cons for Linked Data • Difficult to describe complex structures • Still need for class description (-> ontology) Descriptions on classes Ontology Description on instances Linked Data Tim Berners-Lee Hideaki Takeda / National Institute of Informatics
  6. 6. Importance of Identifiers for Entities• Everything should be identifiable!• Human can identify things with vague identifiers or even without identifiers with help from the context around things• On the web, the context is usually not available and the computer can seldom understand the context even if it exists• So we need identifiers for all things Hideaki Takeda / National Institute of Informatics
  7. 7. Identification System• Identification is one of the primary functions for human information processing – Naming: e.g., names for people, pets, and some daily things • OK if the number of things is not so big – Systematic Identification • e.g., phone number, post-code, passport number, product number, ISBN • If the number of things is big enough• Requirements for Systematic Identification – Identifier is stable and sustainable – Uniqueness is guaranteed – Identifier publisher is reliable and sustainable Hideaki Takeda / National Institute of Informatics
  8. 8. Identification system for Web• Not so different from conventional identification systems• Difference – Cross-system use – Truly digitized• Requirements for Systematic Identification for web – Identifier is stable and sustainable (even after an entity may disappear) – Uniqueness is guaranteed over all systems – Description on should be associated to identifiers • since entities may not accessible – Identifier publisher is reliable and sustainable Hideaki Takeda / National Institute of Informatics
  9. 9. Solutions for the Requirements by LOD• Requirements for Systematic Identification for web – 1. Identifier is stable and sustainable (even after an entity may disappear) • (up to each identifier publisher) – 2. Uniqueness is guaranteed over all systems • URI (not URN) – 3. Description on should be associated to identifiers • Dereferenceable URI – If URI is accessed, a description associated to it should be returned – 4. Identifier publisher is reliable and sustainable Hideaki Takeda / National Institute of Informatics
  10. 10. Some examplesISBN(International Standard Book Number) • Abstract – a unique numeric commercial book identifier – 13 digits • Prefix: 978 or 979 (for compatibility with EAN code) • Group(language-sharing country group): 1 to 5 digits • Publisher code: • Item number: • Check num: 1 digit – Management: two layers • National ISBN Agency – Publisher • Requirement Satisfaction – 1. (Stable ID) Maybe (versioning often matters, and sometimes publisher may re-use ISBN) – 2. (Unique ID) Uniqueness is guaranteed but not URI – 3. (Dereferenceable) No mechanisms (amazon does instead!) – 4. (Reliable publisher) Yes Hideaki Takeda / National Institute of Informatics
  11. 11. Some examples DOI (Digital Object Identifier)• Abstract – An identifier for scientific digital objects (mostly scientific articles) – An unfixed string: “prefix/suffix” • Prefix: assigned for publishers • Suffix: assigned for each object – Management: three layers • IDF (International DOI Foundation) – Registration Agency – Publisher• Requirement Satisfaction – 1. (Stable ID) Yes (not re-usable) – 2. (Unique ID)Uniqueness is guaranteed and URI accessible (”DOI”) – 3. (Dereferenaceable)Mapping to object pages but no RDF – 4. (Reliable publisher) Maybe Hideaki Takeda / National Institute of Informatics
  12. 12. Some examples Dbpedia (as Identifier)• Abstract – A wikipedia page – Name of wikipedia page • Maintained manually – Disambiguation page – Redirect page• Requirement Satisfaction – 1. (Stable ID) maybe (sometimes disappear, sometimes change names, sometime change contents) – 2. (Unique ID) Uniqueness is mostly guaranteed and URI accessible – 3. (Dereferenceable) RDF – 4. (Reliable publisher) Maybe• Hideaki Takeda / National Institute of Informatics
  13. 13. Identification of relationship between identifiers• Co-existence of multiple identification systems on a field – Difference of coverage – Difference of Viewpoint An entity can have multiple identifiers Need for mapping between identifiers in different identification systems Method: Use special properties  owl:sameAs, (rdfs:seeAlso, skos:exactMatch)  Some problems – Logical inconsistency with owl:sameAs – Maintainance Hideaki Takeda / National Institute of Informatics
  14. 14. LOD Cloud(Linking Open Data) Hideaki Takeda / National Institute of Informatics
  15. 15. Summary for ID• Identification is the crucial part in LOD – Data availability – Data inconsistency – Data interoperability• Establishment of a good identification system leads a reliable and sustainable LOD. Hideaki Takeda / National Institute of Informatics
  16. 16. Structuring Information• A wide range of structuring information – Keywords, tags • A freely chosen word or phrase just indicating some features – Controlled vocabulary • Mapping to the fixed set of words or phrases • e.g., the list of countries, the name authorities – Classification • System for classifying entities. Often hierarchical. Class may not carry meaning. – Taxonomy • Hierarchical term system for classification. Upper/lower relation usually means general/specific relation • e.g., the subject headings of LC – Thesaurus • System for semantics. More different types of relations: (hypersym, hyposym), synonym, antonym, homonym, holonym, meronym – Ontology • System of concepts. Concepts rather than words. More various relations, the definitions of concepts Hideaki Takeda / National Institute of Informatics
  17. 17. Examples in Library Science• Many systems in the library community• Classification – Universal Decimal Classification (UDC)• Controlled Vocabulary – the authority files for person names, organizations, location names • Library of Congress : 8 Million records, MADS &SKOS • British Library: 2.6 million records, foaf & BIO (A vocabulary for biographical information) • National Diet Library (Japan): 1 million records, foaf • Deutsche Nationalbibliothek (DNB, Germany): 1.8 & 1.3 million records (names & organization), • Virtual International Authority File (VIAF): 4 million records• Taxonomy – Subject Heading: LC, NDL, • Library of Congress: MADS &SKOS • British Library: • National Diet Library (Japan): 0.1 million records, SKOS • Deutsche Nationalbibliothek (DNB, Germany): 0.16 million records Hideaki Takeda / National Institute of Informatics
  18. 18. Hideaki Takeda / National Institute of Informatics
  19. 19. Hideaki Takeda / National Institute of Informatics
  20. 20. UDC ELEMENT DEFINITION UDC as Linked Data SKOS TERM UDC SUBPROPERTYUDC number (notation) UDC notation is combination of symbols (numerals, signs and letters) that represent a class, its skos:notation --- position in the hierarchy and its relation to other classes. Notation is a language-independent indexing term that enables mechanical sorting and filing of subjects. Also called UDC number and UDC classmarkclass identifier (URI) A unique identifier assigned to each UDC class. It identifies the relationship between a class skos:Concept --- meaning and its notational representationbroader class (URI) Superordinate class: the class hierarchically above the class in question skos:broader ---caption Verbal description of the class content skos:prefLabel ---including note Extension of the caption containing verbal examples of the class content (usually a selection of skos:note udc:includingN important terms that do not appear in the subdivision) oteapplication note Instructions for number building, further extension and specification of the class skos:note udc:application Notescope note Note explaining the extent and the meaning of a UDC class. Used to resolve disambiguation or skos:scopeNot --- to distinguish this class from other similar classes eexamples Examples of combination are used to illustrate UDC class building i.e. complex subject skos:example --- statementssee also reference Indication of conceptual relationship between UDC classes from different hierarchies skos:related --- <skos:Concept rdf:about=""> 69,000 records <skos:inScheme rdf:resource=""/> 40 Languages <skos:broader rdf:resource=""/> <skos:notation rdf:datatype="">510.6</skos:notation> <skos:prefLabel xml:lang="en">Mathematical logic</skos:prefLabel> <skos:prefLabel xml:lang="ja">記号論理学</skos:prefLabel> <skos:related rdf:resource=""/> </skos:Concept> Hideaki Takeda / National Institute of Informatics
  21. 21. <> <> <> . <> <> <> . <> <> "Natsume, Sōseki, 1867-1916"@en . <> <> _:bnode7authoritiesnamesn79084664 . _:bnode7authoritiesnamesn79084664 <> _:bnode8authoritiesnamesn79084664 . _:bnode7authoritiesnamesn79084664 <> _:bnode010 . _:bnode8authoritiesnamesn79084664 <> <> . _:bnode8authoritiesnamesn79084664 <> "Natsume, Sōseki,"@en . _:bnode010 <> _:bnode11authoritiesnamesn79084664 . _:bnode010 <> <> . _:bnode11authoritiesnamesn79084664 <> <> . Hideaki Takeda / National Institute of Informatics
  22. 22. Hideaki Takeda / National Institute of Informatics
  23. 23. Hideaki Takeda / National Institute of Informatics
  24. 24. Some examples Scientific Names for Species and Taxa• Abstract – Names for biological species and other taxa (kingdom, divison, class, order, family, tribe, genus) – A string • Binomial name for species • Academic societies maintain taxon names individually – E.g., Papilo xuthus (Asian Swallowtail, ナミアゲハ,호랑나비)• Requirement Satisfaction – 1. Mostly yes (sometimes disappear, change names, change contents) – 2. Uniqueness is generally guaranteed but precise speaking some ambiguity because of change. – 3. No. Many systems exists but none covers all species – 4. Maybe Hideaki Takeda / National Institute of Informatics
  25. 25. 植物 藻類 菌類 動物 分類群 Taxon Plants Algae Fungi Animalsドメイン Domain 界 Kingdom 門 Division/Phylum -phyta -phyta -mycota 亜門 Subdivision/Subphylum -phytina -phytina -mycotina 綱 Class -opsida -phyceae -mycetes 亜綱 Subclass -idae -phycidae -mycetidae 目 Order -ales -ales -ales 亜目 Suborder -ineae -ineae -ineae 上科 Superfamily -acea -acea -acea -oidea 科 Family -aceae -aceae -aceae -idae 亜科 Subfamily -oideae -oideae -oideae -inae 族/連 Tribe -eae -eae -eae -ini亜族/亜連 Subtribe -inae -inae -inae -ina 属 Genus 亜属 Subgenus 種 Species 亜種 Subspecies Hideaki Takeda / National Institute of Informatics
  26. 26. OntologyAn ontology is an explicit specification of a conceptualization [Gruber]  An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an Ontology is a systematic account of Existence. For AI systems, what "exists" is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory. Hideaki Takeda / National Institute of Informatics
  27. 27. Conceptualization object on_desk(A) box on(A, B) put(A,B) red box blue box yellow box object on_desk(A) object on(A/box, B/object) on(A, B) put(A/box,B/object) put(A,B) box box desk boxbox color:{red, blue, yellow} color:{red, blue, yellow}There are many possible ways to conceptualize the target world Trade off between generality and efficiency Hideaki Takeda / National Institute of Informatics
  28. 28. Types of Ontologies• Upper (top-level) ontology vs. Domain ontology – Upper Ontology: A common ontology throughout all domains – Domain Ontology: An ontology which is meaningful in a specific domain• Object ontology vs. Task ontology – Object Ontology: An ontology on “things” and “events” – Task Ontology: An ontology on “doing”• Heavy-weight ontology vs. light-weight ontology – Heavy-weight ontology: fully described ontology including concept definitions and relations, in particular in a logical way – Light-weight ontology: partially described ontology including typically only is-a relations Hideaki Takeda / National Institute of Informatics
  29. 29. Top-level ontology• Ontology which covers all of the world!• Very…. Difficult – e.g., how does a thing exist? • A thing is four dimensional existence? • A thing exists three-dimensionally over time?• Common requirements – A small number of concepts can cover the world – Concepts can be used in lower ontologies – Concept should be general and abstract Hideaki Takeda / National Institute of Informatics
  30. 30. • Three approaches Top-level ontology – Formal approach • Logical formalization • Fully Abstract • Pros: clean • Cons: hardly understandable • e.g., Sowa’s top-level ontology, DOLCE – Linguistic approach • Use and extension of linguistic concepts • Partially abstract and partially general • Pros: understandable • Cons: limitation to the linguistic world • e.g., Penman Upper Model, WordNet – Empirical Approach • Use and extension of everyday concepts • Mostly general • Pros: understandable and applicable to all the world • Cons: lack of solid foundation • e.g. SUMO, Cyc, EDR Hideaki Takeda / National Institute of Informatics
  31. 31. Empirical top-level ontology• SUMO(Suggested Upper Merged Ontology) – Collection and organization of Substance concepts used frequently Object SelfConnectedObject CorpuscularObject Organic Inorganic – Simple relationship between Phsical Collection Biological Phisiologic Process NaturalProcess concepts Process Pathojogic Process ChangeOfProssession Process Intentionally Caused Searching Communication Process Entity Social Cooperation Interaction Contest Meeting Transfer Impelling Putting Impacting Motion Removing BringingTogether Abstract ChangeOf Transportation State Separating Hideaki Takeda / National Institute of Informatics
  32. 32. Formal Ontology: DOLCE• DOLCE(a Descriptive Ontology for Linguistic and Cognitive Engineering) – Intended to a reference system for top-level ontology – Logical definition – Particular (DOLCE) vs. Universal • Particular: ontology about things, phenomena, quality… • Universal: ontology for describing particular like categories and attributes Hideaki Takeda / National Institute of Informatics
  33. 33. M Formal Ontology: DOLCE Amount of Matter PED F Physical APO Feature Endurant Agentive Physical Object POB Physical• Concepts Object NAPO Non-agentive – Endurant / Perdurant / Quality / Abstract NPED Non-Physical Physical Object NPOB MOB • Endurant: ED Endurant Non-physical Object Mental Object Endurant – “Things” AS Arbitrary SOB Social Object – An existence over time Sum ACH – May change its attribute Achievement EV Event PD ACC • Perdurant ALL Entity Perdurant Occurence Accomplishment – “process” STV ST State – No change over time Stative PRO – May switch a part to the other Process• Relations Q TQ Temporal Quality TL Temporal Location – Parthood (abstract or perdurant) Quality PQ SL Physical Quality – Temporally Parthood (endurant) AQ Spatial Location – Constitution (endurant or perdurant) Abstract Quality – Participation between perdurant and endurantAB Fact TR Temporal Region Abstract T Set Time Interval PR Physical Region R S Region Space Region AR Abstract Region Hideaki Takeda / National Institute of Informatics
  34. 34. Linguistic top-level ontology• WordNet – A lexical reference system • “Link-based electronic dictionary” – Concepts • synset – Noun 79,689 – Verb 13,508 – Relations • synonym • hypernym/hyponym (is-a) • holonym/meronym (a-part-of) Hideaki Takeda / National Institute of Informatics
  35. 35. • Linguistic top-level ontology WordNet – Top-level • { entity, physical thing (that which is perceived or known or inferred to have its own physical existence (living or nonliving)) } • { psychological_feature, (a feature of the mental life of a living organism) } • { abstraction, (a general concept formed by extracting common features from specific examples) } • { state, (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") } • { event, (something that happens at a given place and time) } • { act, human_action, human_activity, (something that people do or cause to happen) } • { group, grouping, (any number of entities (members) considered as a unit) } • { possession, (anything owned or possessed) } • { phenomenon, (any state or process known through the senses rather than by intuition or reasoning) } Hideaki Takeda / National Institute of Informatics
  36. 36. Summary for structuring information• Keywords, tags/Controlled vocabulary /Classification/Taxonomy /Thesaurus/Ontology – The difference is not clear, not important – The trend is to go more structured ones – The same requirements to Identification systems Hideaki Takeda / National Institute of Informatics
  37. 37. Summary• Requirements for Successful Structuring Systems – 1. Entity is stable and sustainable LOD Tech. – 2. Uniqueness is guaranteed over all systems can help – 3. Description on should be associated to entity – 4. System publisher is reliable and sustainable • Learn from success in the library community Hideaki Takeda / National Institute of Informatics
  38. 38. Schema/Vocabulary for LOD• Class/Concept description – Axiom of a concept in ontology – Database schema for a table in Relational database – Object definition in Object-Oriented Programming/DB• Class description in Semantic Web – RDFS/OWL description for a class • RDFS: Simple class system • OWL: Description Logic-based• Class description in Linked Data – Mostly RDFS-based (exception: owl:sameAs) – Simple Structure (mostly property-value pair) Hideaki Takeda / National Institute of Informatics
  39. 39. Schema/Vocabulary for LOD• The importance of sharing schema – Interoperability – Generic applications• Some famous and frequently used shemata – Dublin Core – FOAF (Friend-Of-A-Friend) – SKOS (Simple Knowledge Organization System) Hideaki Takeda / National Institute of Informatics
  40. 40. Usage of Common Vocabularies Prefix Namespace Used bydc 66 (31.88 %)foaf 55 (26.57 %)dcterms 38 (18.36 %)skos 29 (14.01 %)akt 17 (8.21 %)geo 14 (6.76 %)mo 13 (6.28 %)bibo 8 (3.86 %)vcard 6 (2.90 %)frbr 5 (2.42 %)sioc 4 (1.93 %) LDOW2011 Presentation, Christian Bizer (Freie Universität Berlin), 2011 Hideaki Takeda / National Institute of Informatics
  41. 41. (Simple) Dublin Core• Started from the library • 15 elements community – Title• Now maintained by DCMI (Dublin – Creator Core Metadata Initiative) – Subject• (Simple) Dublin Core – Description – Just 15 elements – Publisher – Simple is best – Contributor – No range restriction – Date – – Type – Format – Identifier – Source – Language – Relation – Coverage – Rights Hideaki Takeda / National Institute of Informatics
  42. 42. dc terms • Qualified Dublin Core – Domain & Range – More precise terms • Extension of simple dcProperties in the / abstract , accessRights , accrualMethod , accrualPeriodicity , accrualPolicy , alternative , audience , available , bibliograp hicCitation ,conformsTo , contributor , coverage , created , creator , date , dateAccepted , dateCopyrighted , dateSubmit ted , description ,educationLevel , extent , format , hasFormat , hasPart , hasVersion , identifier , instructionalMethod , i sFormatOf , isPartOf , isReferencedBy ,isReplacedBy , isRequiredBy , issued , isVersionOf , language , license , mediator , medium , modified , provenance , publisher , references ,relation , replaces , requires , rights , rightsHolder , source , sp atial , subject , tableOfContents , temporal , title , type , validProperties in the contributor , coverage , creator , date , description , format , identifier , language , publisher , relation , rights , source , s/elements/1.1/namespace ubject , title , typeVocabulary Encoding Schemes DCMIType , DDC , IMT , LCC , LCSH , MESH , NLM , TGN , UDCSyntax Encoding Schemes Box , ISO3166 , ISO639-2 , ISO639-3 , Period , Point , RFC1766 , RFC3066 , RFC4646 , RFC5646 , URI , W3CDTFClasses Agent , AgentClass , BibliographicResource , FileFormat , Frequency , Jurisdiction , LicenseDocument , LinguisticSystem , Location ,LocationPeriodOrJurisdiction , MediaType , MediaTypeOrExtent , MethodOfAccrual , MethodOfInstruction , Pe riodOfTime , PhysicalMedium ,PhysicalResource , Policy , ProvenanceStatement , RightsStatement , SizeOrDuration , Sta ndardDCMI Type Vocabulary Collection , Dataset , Event , Image , InteractiveResource , MovingImage , PhysicalObject , Service , Software , Sound , Sti llImage , TextTerms related to the DCMI memberOf , VocabularyEncodingSchemeAbstract Model Hideaki Takeda / National Institute of Informatics
  43. 43. Dcterms subPropertyOf Domain Range Dcterms subPropertyOf Domain Rangecontributor dc:contributor rdfs:Resource dcterms:Agent conformsTo dc:relation, dcterms:relation rdfs:Resource dcterms:Standard hasFormat dc:relation, dcterms:relation rdfs:Resource rdfs:Resource dc:creator,creator rdfs:Resource dcterms:Agent dcterms:contributor hasPart dc:relation, dcterms:relation rdfs:Resource rdfs:Resource dcterms:LocationPeriodOrcoverage dc:coverage rdfs:Resource hasVersion dc:relation, dcterms:relation rdfs:Resource rdfs:Resource Jurisdiction dc:coverage, isFormatOf dc:relation, dcterms:relation rdfs:Resource rdfs:Resourcespatial rdfs:Resource dcterms:Location dcterms:coverage isPartOf dc:relation, dcterms:relation rdfs:Resource rdfs:Resource dc:coverage,Temporal rdfs:Resource dcterms:PeriodOfTime dcterms:coverage isReferencedBy dc:relation, dcterms:relation rdfs:Resource rdfs:ResourceDate dc:date rdfs:Resource rdfs:Literal isReplacedBy dc:relation, dcterms:relation rdfs:Resource rdfs:ResourceAvailable dc:date, dcterms:date rdfs:Resource rdfs:LiteralCreated dc:date, dcterms:date rdfs:Resource rdfs:Literal isRequiredBy dc:relation, dcterms:relation rdfs:Resource rdfs:ResourcedateAccepted dc:date, dcterms:date rdfs:Resource rdfs:Literal isVersionOf dc:relation, dcterms:relation rdfs:Resource rdfs:ResourcedateCopyrighted dc:date, dcterms:date rdfs:Resource rdfs:Literal References dc:relation, dcterms:relation rdfs:Resource rdfs:ResourcedateSubmitted dc:date, dcterms:date rdfs:Resource rdfs:Literal Replaces dc:relation, dcterms:relation rdfs:Resource rdfs:ResourceIssued dc:date, dcterms:date rdfs:Resource rdfs:Literal Requires dc:relation, dcterms:relation rdfs:Resource rdfs:ResourceModified dc:date, dcterms:date rdfs:Resource rdfs:Literal Rights dc:rights rdfs:Resource dcterms:RightsStatement accessRights dc:rights, dcterms:rights rdfs:Resource dcterms:RightsStatementValid dc:date, dcterms:date rdfs:Resource rdfs:Literal License dc:rights, dcterms:rights rdfs:Resource dcterms:LicenseDocumentdescription dc:description rdfs:Resource rdfs:Resource Subject dc:subject rdfs:Resource rdfs:Resource dc:description, title dc:title rdfs:Resource rdfs:Resourcerdfs:LiteralAbstract rdfs:Resource rdfs:Resource dcterms:description alternative dc:title, dcterms:title rdfs:Resource rdfs:Resourcerdfs:Literal dc:description, type dc:type rdfs:Resource rdfs:ClasstableOfContents rdfs:Resource rdfs:Resource dcterms:description audience rdfs:Resource dcterms:AgentClass dcterms:MediaTypeOrExte educationLevel dcterms:audience rdfs:Resource dcterms:AgentClassformat dc:format rdfs:Resource mediator dcterms:audience rdfs:Resource dcterms:AgentClass nt dcmitype:Collecextent dc:format, dcterms:format rdfs:Resource dcterms:SizeOrDuration accrualMethod dcterms:MethodOfAccrual tion dcterms:PhysicalR dcmitype:CollecMedium dc:format, dcterms:format dcterms:PhysicalMedium accrualPeriodicity dcterms:Frequency esource tionIdentifier dc:identifier rdfs:Resource rdfs:Literal dcmitype:Collec accrualPolicy dcterms:PolicybibliographicCitat dc:identifier, dcterms:Bibliograp tion rdfs:Literalion dcterms:identifier hicResource instructionalMethod rdfs:Resource dcterms:MethodOfInstructio provenance rdfs:Resource dcterms:ProvenanceStatemLanguage dc:language rdfs:Resource dcterms:LinguisticSystem rightsHolder rdfs:Resource dcterms:AgentPublisher dc:publisher rdfs:Resource dcterms:AgentRelation dc:relation rdfs:Resource rdfs:Resource dc:source, dcterms:relation rdfs:Resource rdfs:Resource Hideaki Takeda / National Institute of Informatics
  44. 44. The Friend of a Friend (FOAF) • Metadata describe persons and their relationship • Voluntary projectClasses: | Agent | Document | Group | Image | LabelProperty |OnlineAccount | OnlineChatAccount |OnlineEcommerceAccount | OnlineGamingAccount |Organization | Person | PersonalProfileDocument | Project | @prefix rdf: <> .Properties: @prefix foaf: <> . | account | accountName | accountServiceHomepage | age | @prefix rdfs: <> .aimChatID | based_near | birthday | currentProject | <#JW>depiction | depicts | dnaChecksum | familyName | a foaf:Person ;family_name | firstName | focus | fundedBy | geekcode | foaf:name "Jimmy Wales" ; foaf:mbox <> ;gender | givenName | givenname | holdsAccount | foaf:homepage <> ;homepage | icqChatID | img | interest | isPrimaryTopicOf | foaf:nick "Jimbo" ; foaf:depictionjabberID | knows | lastName | logo | made | maker | mbox | <> ;mbox_sha1sum | member | membershipClass | msnChatID foaf:interest <> ; foaf:knows [| myersBriggs | name | nick | openid | page | pastProject | a foaf:Person ;phone | plan | primaryTopic | publications | foaf:name "Angela Beesley" ].schoolHomepage | sha1 | skypeID | status | surname | theme| thumbnail | tipjar | title | topic | topic_interest | weblog | <>workInfoHomepage | workplaceHomepage | yahooChatID | Takeda /"Wikipedia" . Institute of Informatics Hideaki rdfs:label National
  45. 45. SKOS (Simple Knowledge Organization System)• Metadata for taxonomy – Hierarchical structure of concepts • Invented to represent taxonomy such as subject heading • =/= subclass relationship among classes• W3C Recommendation 18 August 2009 Hideaki Takeda / National Institute of Informatics
  46. 46. SKOS (Simple Knowledge Organization System)• SKOS Core (hierarchical concept structure) – skos:semanticRelation – skos:broaderTransitive subPropertyOf – skos:narrowerTransitive – skos:broader – skos:narrower – skos:related – skos:preflabel – skos:altlabel – skos:hiddenlabel Hideaki Takeda / National Institute of Informatics
  47. 47. SKOS (Simple Knowledge Organization System)• SKOS Mapping – skos:mappingRelation – skos:closeMatch subPropertyOf – skos:exactMatch – skos:broadMatch – skos:narrowMatch – skos:relatedMatch Hideaki Takeda / National Institute of Informatics
  48. 48. Linked Open Vocabulary (LOV)• A technical platform for search and quality assessment among the vocabularies ecosystem – Register schemata – Search schemata• Hideaki Takeda / National Institute of Informatics
  49. 49. XHideaki Takeda / National Institute of Informatics
  50. 50. More Info.• ocabulary_and_Dataset Hideaki Takeda / National Institute of Informatics
  51. 51. Summary for schema• Some major schemata – DC, DC terms, FOAF, SKOS …• More domain-specific schemata – CIDOC CRM – PRISM –…• Re-using is highly recommended – LOV Hideaki Takeda / National Institute of Informatics
  52. 52. Summary• Three layers – Ontology/Thesaurus/Taxonomy – Schema – Identification• Not just top-down, rather bottom-up• Each layer has own role• Not pursue the value of each layer, rather make a good combination of them Hideaki Takeda / National Institute of Informatics