Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Some thoughts about the gaps across
languages and domains
through the experience on building the
core common vocabularies
...
Who am I?
Hideaki Takeda, Dr., Eng.
• Professor, National Institute of Informatics
– Research Institute mainly for Compute...
Core Vocabularies
• Background
– Everything is on infosphere, i.e., web
– Lots of information, lots of data, lots of syste...
Core Vocabularies
• Aim
– Increase interoperability of information/data
– Bridge human and machine understanding
• Target
...
Core Vocabularies
• Activities worldwide
– USA: NIEM Core
• NIEM (National Information Exchange Model)
– Europe: ISA Core ...
ISA Core Vocabularies v 1.1
NIEM Architecture
http://niem.github.io/technical/iepd-versions/
NIEM
http://reference.niem.gov/niem/guidance/user-guide/vol1/user-guide-vol1.pdf
http://www.epa.gov/oei/symposium/2010/roy...
IMI Project
• Supported by
– Ministry of Economy, Trade,
and Industry, Japan
• Technical Framework
– Data Model
– Core Voc...
IMI as a template for schema
Registration form for Confere
Name:
Address:
Gender:
Affiliation:
Affiliation
Address:
Attend...
Roles of IMI
• Structured concept dictionary
– Concept dictionary
• Terms as notation of concepts
– The entry is concept, ...
Use of IMI
• Define the concept model
• “Serialize” it into specific “physical” forms
• Use suitable a physical form
IMI C...
IMI Core vocabulary v2.2
• Published on Feb.3 2015
• 48 core class terms
– person, address, facility, location, date, …
• ...
Class definition (person class)
person 人
説明:人の情報を表現するためのデータ型 Data Type to describe a person
継承(inherit from) : ic:実体型
prop...
Class Structure
person 人
name ic:氏名型
Contact ic:連絡先型
: :
氏名
Family name xsd:string
Romanized Family name xsd:string
: :
co...
Concept of the IMI framework
International interoperability is highly
considered in preparing IMI.
Core
Vocabulary
Shelter...
Mapping between concepts in
different core vocabularies
• Difficulty of concept-concept mapping
– Matching of meaning tend...
Mapping between concepts in
different core vocabularies
• Difficulty of concept-concept mapping
– Matching of meaning tend...
Mapping between concepts in
different core vocabularies
• Difficulty of concept-concept mapping
– Syntactical mapping vs. ...
Person
person 人
説明:人の情報を表現するためのデータ型 Data Type to describe a
person
継承(inherit from) : ic:実体型
prop
erty
Data
type
cardi
nal...
Postal Code
?
?
“101-8430” ^^xsd:string “SW1A 0AA”@en
(postal code in Japan) (postal code in Europe)
Systems World
Cogniti...
Semantic Mapping
• Semantic Mapping
– Mapping on the cognitive layer
– Two ways of judging mapping
• Extensional Mapping
–...
Types of matching: SKOS
• Exact Match
• Close Match
• Broad/Narrow Match
• Related Match
Close match
• Close match: nearly matched but not exactly
matched.
• Extensional mapping
– Coverage of ‘things’ are overla...
Broad match/narrow match
• Broad/narrow match
– One subsumes the other
• Extensional mapping
– Coverage of ‘things’ are su...
More different matching
• Complicated match
– An element of a system matches a combination of
two or more elements.
– “Pat...
Results
Core Vocabulary Identifier Link Mapping relation Data model Identifier
Address Link Has exact match IMI ic:住所型
Add...
Results
Identifier Link Has exact match IMI ic:ID型
IdentifierIdentifier Link Has exact match IMI ic:ID型.ic:識別値
IdentifierI...
Results
Person Link Has exact match IMI ic:人型
PersonAddress Link Has exact match IMI ic:人型.ic:住所
PersonAlternativeName Lin...
Bridging core and domain vocabularies
(working in progress)
• Aim: Core vocabulary would be extended to
domain vocabularie...
Agricultural Activity Ontology (AAO)
Agricultural activity
crop production activity
activity for propagation
activity in t...
An example: “activity” (and “event”)
• S: (n) activity (any specific behavior) "they avoided all recreational activity"
– ...
Summary
• Sharing concepts is a very long way
• No ground truth
– Step-by-step understanding of the world
– Careful consen...
Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies
Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies
Upcoming SlideShare
Loading in …5
×

Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

745 views

Published on

Building core vocabularies is becoming important to enable seamless digital communication and use of open data. Based on the experience to build the core vocabulary, I will talk about easiness and difficulty of building the core vocabulary and furthermore those of bridging between different core vocabularies across languages and
domains.
Presented in Glocal KO Workshop, Thursday August 13, 2015, Copenhagen

Published in: Technology
  • Be the first to comment

Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

  1. 1. Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies Hideaki Takeda National Institute of Informatics takeda@nii.ac.jp Glocal KO Workshop, Thursday August 13, 2015, Copenhagen
  2. 2. Who am I? Hideaki Takeda, Dr., Eng. • Professor, National Institute of Informatics – Research Institute mainly for Computer Science • Background: Computer Science, in particular, Artificial Intelligence • Current interest: Semantic Web, Ontology, Linked Open Data (LOD), Social Media Analysis • Social activities – President, Linked Open Data Initiative (NPO) – Founder, Dbpedia Japanese Chapter – Specialist, Information-technology Promotion Agency, Japan (IPA) – Chair, Japan Link Center (Registration Agency of International DOI Foundation) – Board, ORCID
  3. 3. Core Vocabularies • Background – Everything is on infosphere, i.e., web – Lots of information, lots of data, lots of systems • Problems – Misunderstanding/mis-matching/”missing links“ across different domains – Gap between human and machines (computers)
  4. 4. Core Vocabularies • Aim – Increase interoperability of information/data – Bridge human and machine understanding • Target – Governmental documents/data • Method – Define a set of concepts which bridge (human- readable) terms and (computer-processable) symbols (URIs) – Starting from the most common concepts
  5. 5. Core Vocabularies • Activities worldwide – USA: NIEM Core • NIEM (National Information Exchange Model) – Europe: ISA Core Vocabularies – UN: United Nations Centre for Trade Facilitation and Electronic Business (UN/CEFACT) • Core Components Library (UN/CCL) – Japan: IMI Core Vocabulary
  6. 6. ISA Core Vocabularies v 1.1
  7. 7. NIEM Architecture http://niem.github.io/technical/iepd-versions/
  8. 8. NIEM http://reference.niem.gov/niem/guidance/user-guide/vol1/user-guide-vol1.pdf http://www.epa.gov/oei/symposium/2010/roy.pdf
  9. 9. IMI Project • Supported by – Ministry of Economy, Trade, and Industry, Japan • Technical Framework – Data Model – Core Vocabulary – Design Rules • Support Framework – Tools • for data developer • for schema developer – Database • schema / tools / templates/ … Person Type Name Gender Gender Code Birth Date Address … Name Type Type Name Family Name Given Name … Address Type Type Notation Zip Code Prefecture City … String String String Code TypeString String String String String String Code Type Type Value Name Type Address Type Codelist Type String Thing Type 10
  10. 10. IMI as a template for schema Registration form for Confere Name: Address: Gender: Affiliation: Affiliation Address: Attending date: - M / Person Type Name Gender Gender Code Birth Date Address … Name Type Type Name Family Name Given Name … Address Type Type Notation Zip Code Prefecture City … String String String Code Type String String String String String String Code Type Type Value Name Type Address Type Codelist Type String Thing Type IMI Individual Form Person Type Name Gender Address Affiliation Name Type Name Address Type Notation Zip-code String String String String Name Address Org. Person Date Event Participation Type Participant Date Design Schema Remove unnecessary items Add necessary items
  11. 11. Roles of IMI • Structured concept dictionary – Concept dictionary • Terms as notation of concepts – The entry is concept, not term • Class concept and relation concept • General-specific relation – Structured dictionary • Concepts form a network of concepts which in tern represents meaning of individual concepts • A class concept consists of relation concepts representing attributes and general/specific relations • A relation concept consists of class concepts connected as domains and ranges and general/specific relations • Template for schemata – Add or remove items for the specific needs
  12. 12. Use of IMI • Define the concept model • “Serialize” it into specific “physical” forms • Use suitable a physical form IMI Concept Model RDF XML Natural Language Form For Open Data For data exchange For spread sheets and documents • Relax definition • Interoperability with other open data schemata • Strict definition • Interoperability with DB schemata • Relax definition with simple structure • Readability by humans
  13. 13. IMI Core vocabulary v2.2 • Published on Feb.3 2015 • 48 core class terms – person, address, facility, location, date, … • 206 core property terms – name of person, birth date, birth country, … • Multi format – rdf schema, xml schema and documents for human http://imi.ipa.go.jp/ns/core/2/ 14
  14. 14. Class definition (person class) person 人 説明:人の情報を表現するためのデータ型 Data Type to describe a person 継承(inherit from) : ic:実体型 property Data type cardinality 説明 (ja) Description (en) ID ID ic:ID型 0..n ID Identification of a Person Name of person 氏名 ic:氏名型 0..n 氏名 Name of a Person Gender 性別 xsd:string 0..1 性別の表記 Gender of a Person Gender code 性別コード ic:コード型 0..1 性別コード Gender of a Person Birth date 生年月日 ic:日付型 0..1 生年月日 Date of Birth of a Person Death date 死亡年月日 ic:日付型 0..1 死亡年月日 Date of Death of a Person Residence address 住所 ic:住所型 0..n 現住所 Present address of a Person Domicile of origin 本籍 ic:住所型 0..1 本籍 Legal residence address of a Person Contact information 連絡先 ic:連絡先型 0..n 連絡先 Contact information of a Person Nationality 国籍 xsd:string 0..n 国籍の表記 A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country. Nationality code 国籍コード ic:コード型 0..n 住民基本台帳で利用さ れている国籍コード A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country. Birth country 出生国 xsd:string 0..1 生まれた国名 A location where a person was born. Birth country code 出生国コード ic:コード型 0..1 生まれた国のコード A location where a person was born. Birth place 出生地 ic:住所型 0..1 生まれた場所 A location where a person was born. 16
  15. 15. Class Structure person 人 name ic:氏名型 Contact ic:連絡先型 : : 氏名 Family name xsd:string Romanized Family name xsd:string : : contact 連絡先 Phone number ic:電話番号型 Address ic:住所型 : : 電話番号 : : address 住所 Country xsd:string Prefecture xsd:string : :  A class term has a property term as a sub element and the property term can refer a class term. Again, the class term has a list of property terms. That constructs a layered structure of terms as the following figure. phone number name
  16. 16. Concept of the IMI framework International interoperability is highly considered in preparing IMI. Core Vocabulary Shelter Location Hospital Station Geographical Space /Facilities Transportation Disaster Prevention Finance Domain-specific Vocabularies Disaster Restoration Cost Cross Domain Vocabulary IMI Japanese Local government Standard (APPLIC) DE fact Standards (DC, foaf, etc) NIEM (US) ISA (EU) Schema.org 18
  17. 17. Mapping between concepts in different core vocabularies • Difficulty of concept-concept mapping – Matching of meaning tends to be very abstract discussion Concept reference Ontology Real world Concept reference ?
  18. 18. Mapping between concepts in different core vocabularies • Difficulty of concept-concept mapping – Matching of meaning tends to be very abstract discussion – Matching of references is easier Concept reference Ontology Real world Concept reference ?
  19. 19. Mapping between concepts in different core vocabularies • Difficulty of concept-concept mapping – Syntactical mapping vs. semantic mapping • Just consider what it refers in the real world, not how it is represented in systems. Concept reference Ontology Concept reference ? Systems World Cognitive World
  20. 20. Person person 人 説明:人の情報を表現するためのデータ型 Data Type to describe a person 継承(inherit from) : ic:実体型 prop erty Data type cardi nalit y 説明 (ja) Description (en) ID ID ic:ID型 0..n ID Identification of a Person Name of person 氏名 ic:氏名 型 0..n 氏名 Name of a Person Gender 性別 xsd:strin g 0..1 性別の表記 Gender of a Person ender code 性別 コード ic:コード 型 0..1 性別コード Gender of a Person Birth date 生年月 日 ic:日付 型 0..1 生年月日 Date of Birth of a Person Death date 死亡年 月日 ic:日付 型 0..1 死亡年月日 Date of Death of a Person Residence address 住所 ic:住所 型 0..n 現住所 Present address of a Person Domicile of origin 本籍 ic:住所 型 0..1 本籍 Legal residence address of a Person Contact nformation 連絡先 ic:連絡 先型 0..n 連絡先 Contact information of a Person Nationality 国籍 xsd:strin g 0..n 国籍の表記 A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country. 住民基本台帳 A county that assigns rights, duties, and privileges to a ? ? Systems World Cognitive World
  21. 21. Postal Code ? ? “101-8430” ^^xsd:string “SW1A 0AA”@en (postal code in Japan) (postal code in Europe) Systems World Cognitive World
  22. 22. Semantic Mapping • Semantic Mapping – Mapping on the cognitive layer – Two ways of judging mapping • Extensional Mapping – Check whether ‘things’ are shared – e.g., person – Mostly for Class Mapping • Intensional Mapping – Check whether ‘values’ are shared – e.g., postal-code – Mostly for Property Mapping • Syntactical Mapping – Mapping on the systems layer
  23. 23. Types of matching: SKOS • Exact Match • Close Match • Broad/Narrow Match • Related Match
  24. 24. Close match • Close match: nearly matched but not exactly matched. • Extensional mapping – Coverage of ‘things’ are overlapped so much • Coverage of ‘Country’ is slightly different – ‘things’ are close • Reference of ‘Person’ is slightly different (person vs. legal Person) • Intensional mapping – Coverage of ‘values’ are overlapped so much
  25. 25. Broad match/narrow match • Broad/narrow match – One subsumes the other • Extensional mapping – Coverage of ‘things’ are subsumed, i.e., the subset is exact match • Intensional mapping – Coverage of ‘values’ are subsumed, i.e., the subset is exact match
  26. 26. More different matching • Complicated match – An element of a system matches a combination of two or more elements. – “Pathway” match • A single property matches the combination of two or more properties – “Conditional” match • An element matches the other element if some condition is hold IdentifierIssuingAuthority Link Has related match IMI ic:ID型.ic:ID体系.ic:発行者 LegalEntityRegisteredAddress Link Has broad match IMI ic:法人型.ic:住所 It is exact match if the value of ic:住所.種別 should be "登記住所".
  27. 27. Results Core Vocabulary Identifier Link Mapping relation Data model Identifier Address Link Has exact match IMI ic:住所型 AddressAddressArea Link Has narrow match IMI ic:住所型.ic:町名 AddressAddressArea Link Has narrow match IMI ic:住所型.ic:丁目 AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地補足 AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地 AddressAddressArea Link Has narrow match IMI ic:住所型.ic:号 AddressAddressID Link Has exact match IMI ic:住所型.ic:ID AddressAdminUnitL1 Link Has exact match IMI ic:住所型.ic:国 AddressAdminUnitL2 Link Has narrow match IMI ic:住所型.ic:都道府県 AddressFullAddress Link Has exact match IMI ic:住所型.ic:表記 AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:ビル番号 AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:部屋番号 AddressLocatorName Link Has narrow match IMI ic:住所型.ic:ビル名 AddressPOBox Link Has related match IMI ic:住所型.ic:方書 AddressPostCode Link Has exact match IMI ic:住所型.ic:郵便番号 AddressPostName Link Has narrow match IMI ic:住所型.ic:市区町村 AddressPostName Link Has narrow match IMI ic:住所型.ic:区 AddressThoroughfare Link Has no match IMI Agent Link Has exact match IMI ic:実体型
  28. 28. Results Identifier Link Has exact match IMI ic:ID型 IdentifierIdentifier Link Has exact match IMI ic:ID型.ic:識別値 IdentifierIssueDate Link Has no match IMI IdentifierIssuingAuthority Link Has related match IMI ic:ID型.ic:ID体系.ic:発行者 IdentifierIssuingAuthorityURI Link Has exact match IMI ic:ID型.ic:ID体系.ic:URI IdentifierType Link Has no match IMI JurisdictionIdentifier Link Has related match IMI ic:国籍コード JurisdictionName Link Has related match IMI ic:国籍 LegalEntity Link Has exact match IMI ic:法人型 LegalEntityAddress Link Has broad match IMI ic:法人型.ic:住所 LegalEntityAlternativeName Link Has no match IMI LegalEntityCompanyActivity Link Has close match IMI ic:法人型.ic:事業種目 LegalEntityCompanyStatus Link Has related match IMI ic:法人型.ic:活動状況 LegalEntityCompanyType Link Has exact match IMI ic:法人型.ic:組織種別 LegalEntityIdentifier Link Has exact match IMI ic:法人型.ic:ID LegalEntityLegalIdentifier Link Has no match IMI LegalEntityLegalName Link Has broad match IMI ic:法人型.ic:名称.表記 LegalEntityLocation Link Has related match IMI ic:法人型.ic:地物.説明 LegalEntityRegisteredAddress Link Has broad match IMI ic:法人型.ic:住所 Location Link Has exact match IMI ic:場所型 LocationAddress Link Has exact match IMI ic:場所型.ic:住所 LocationGeographicIdentifier Link Has broad match IMI ic:場所型.ic:地理識別子 LocationGeographicName Link Has exact match IMI ic:場所型.ic:名称.ic:表記 LocationGeometry Link Has exact match IMI ic:場所型.ic:地理座標
  29. 29. Results Person Link Has exact match IMI ic:人型 PersonAddress Link Has exact match IMI ic:人型.ic:住所 PersonAlternativeName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名 PersonBirthName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名 PersonCitizenship Link Has no match IMI PersonCountryOfBirth Link Has exact match IMI ic:人型.ic:出生国 PersonCountryOfDeath Link Has no match IMI PersonDateOfBirth Link Has exact match IMI ic:人型.ic:生年月日 PersonDateOfDeath Link Has exact match IMI ic:人型.ic:死亡年月日 PersonFamilyName Link Has exact match IMI ic:人型.ic:氏名.ic:姓 PersonFullName Link Has exact match IMI ic:人型.ic:氏名.ic:姓名 PersonGender Link Has exact match IMI ic:人型.ic:性別コード PersonGivenName Link Has exact match IMI ic:人型.ic:氏名.ic:名 PersonIdentifier Link Has broad match IMI ic:人型.ic:ID PersonPatronymicName Link Has no match IMI ic:人型.ic:氏名.ic:姓名 PersonPlaceOfBirth Link Has narrow match IMI ic:人型.ic:出生地
  30. 30. Bridging core and domain vocabularies (working in progress) • Aim: Core vocabulary would be extended to domain vocabularies – Agriculture – Finance – Traffic – … • Task: – Can concepts be shared between core and domains? really?
  31. 31. Agricultural Activity Ontology (AAO) Agricultural activity crop production activity activity for propagation activity in the vegetative growth stage activity in the reproductive growth stage activity for environment control activity for soil control activity for climate control activity for water control activity for biotic control activity for chemical control post production activity activity for harvesting activity for processing activity for extending shelf-life activity for wrapping indirect activity activity for preparing materials activity for cleaning activity for transport activity for monitoring activity for maintaining farm equipment administrative activity activity for business administration http://cavoc.org/aao/
  32. 32. An example: “activity” (and “event”) • S: (n) activity (any specific behavior) "they avoided all recreational activity" – direct hyponym / full hyponym – direct hypernym / inherited hypernym / sister term • S: (n) act, deed, human action, human activity (something that people do or cause to happen) – S: (n) event (something that happens at a given place and time) – [WordNet] • Each activity is a Happening which involves volition and participants. It has temporal dimension. It is distinguished from Events by the fact that the activity does not trigger change of state and does not have a conceptual end point. – [PROTON Extent module (a lightweight upper-level ontology)] • Activity: This class represents the abstract content of an event, which may be repeated many times, once or never. For example a training course, or a play. – [The Event Programme Vocabulary (prog)] • E5 Event – Subclass of: E4 Period – Superclass of: E7 Activity, E63 Beginning of Existence, E64 End of Existence • E7 Activity – Subclass of: E5 Event – Superclass of: E8 Acquisition, E9 Move, E10 Transfer of Custody, E11 Modification, E13 Attribute Assignment, E65 Creation … – [CIDOC Conceptual Reference Model]
  33. 33. Summary • Sharing concepts is a very long way • No ground truth – Step-by-step understanding of the world – Careful consensus making • More flexible framework is needed – Simple mapping is not so happy

×