OKFN Korea
Hackathon Day
2013. 06. 22.
Toward Open Data World
OKFN Korea2
What is linked
data, Open
data?
Refine
Modelling
Access
Triple
Storage
other topics
image: Leo Oosterloo @ flickr.com
서울시 데이터 Enrichment
 목표
 서울시 데이터 상세화를 위한 온톨로지 설계 또는 매핑
 구조화, 의미화, 그리고 연결: 서울시 데이터 (비정형 데이터)를 온톨로지를 이용해
모델링하고, 외부 데이터와 연결
 영문화: 비 한국어권 사용자가 사용할 수 있는 서울시 데이터 제공
 범위
 서울시 데이터셋 약 40종
 문화재: 문화재청에서 수집한 국내 문화재 (국보, 보물, 지정문화재, 무형문화재 등)
 방법론: 기존 RDF 어휘의 재사용을 통해 데이터 모델링
 1) 데이터 선정: 서울시 열린데이터 광장에서 모델링 대상 데이터셋 선정
 2) 데이터 셋 항목 검토: 데이터 셋의 개별 항목과 Dbpedia 온톨로지 (클래스, 속성)
의 매핑 관계 검토
• Dbpedia 온톨로지: 사물에 대한 개념 및 위키피디아 infobox 항목을 포함하고 있음
OKFN Korea3
서울시 데이터 Enrichment
 예를 들어, '박물관'을 모델링 할 경우,
• 박물관에 대한 infobox 템플릿을 위키피디아에서 선택
• Dbpedia에서 박물관 infobox와 매핑한 어휘 선택
• 어휘와 데이터셋 항목 매핑
• 매핑되지 않는 항목의 모델링 여부 결정 (클래스, 속성 포함): 모델링 도구 결정 필요
• URI 체계 (별도 설계 필요) 적용
• 온톨로지 스키마 설계 완료
 3) 데이터 정제
• Google Refine을 통해 데이터 정제
• Refine에서 추가하기 전에 할 작업
• 위치 데이터: 원본 데이터 (서울시)에 위치값을 변환 또는 추가
• 영문명: 한글명의 변환, 매핑 (수작업 필요)
• Refine에서 할 작업
– 한글, 영문 위키피디아 URL 추가
– Dbpedia, Freebase URL 추가: Refine reconciliation을 이용해서 추가
– RDF 변환 매핑 Skelton 작업
– RDF, Excel 추출
 4) 데이터 업로드 (RDF 또는 Excel)
 데이터 스토어 선택
 Jena, 4Store, …
OKFN Korea4
Contents
OKFN Korea
Modeling Issues1
Management Issues2
5
Modelling – RDF
Subject Predicate Object
Modelling – RDF
Subject Predicate Object
some school has a name/label some literal
Modelling – RDF
Subject Predicate Object
http://education.data.gov.uk
/id/school/401874
has a name/label ―Cardiff High School‖
Modelling – RDF
Subject Predicate Object
http://education.data.gov.uk
/id/school/401874
http://www.w3.org/2000/01/
rdf-schema#label
―Cardiff High School‖
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
where
school: = http://education.data.gov.uk/id/school/
rdfs: = http://www.w3.org/2000/01/rdf-schema#
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
school:401874 ont:districtAdministrative la:00PT
la:00PT rdfs:label Cardiff
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
school:401874 ont:districtAdministrative la:00PT
la:00PT rdfs:label ―Cardiff‖
school:401874
―Cardiff High School‖
ont:districtAdministrative
la:00PT
―Cardiff‖
rdfs:label
rdfs:label
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
school:401874 ont:districtAdministrative la:00PT
la:00PT rdfs:label ―Cardiff‖
la:00PT rdfs:label ―Caerdydd‖@cy
Modelling – vocabularies
Logical modelling
modelling the domain, not a particular
data structure
 what exists
 what is asserted? what can you deduce from
that?
 not about constraints as such
 monotonic, open world
controlled
vocabulary
taxonomy
thesaurus
ontology
Ontology
Modelling – vocabularies
unfamiliar terminology but related to
 information architecture and conceptual
modelling
 domain-driven design
 ... and yes knowledge representation
Elements of:
 Vocabulary (defining terms)
• I define a relationship called “prescribed dose.”
 Schema (defining types)
• “prescribed dose” relates “treatments” to “dosagee
s”
 Taxonomy (defining hierarchies)
• Any “doctor” is a “medical professional”
16
RDF Schema is…
Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
ont:School rdfs:Class
rdf:type
―School‖
rdfs:label
Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
ont:WelshEstablishment
ont:School rdfs:Class
rdf:type
rdf:typerdfs:subClassOf
―School‖
rdfs:label
Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
―School‖
rdfs:label
Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
school:401874
ont:WelshEstablishment
ont:School
rdf:type

―School‖
rdfs:label
―School‖
rdfs:label
Modelling – RDFS
RDF vocabulary description language
properties, property hierarchy
school:401874
person:JoeBloggs
ont:staffAt
ont:headOf
rdf:Property
ont:headOf
rdf:type
rdfs:subPropertyOf

school:401874person:JoeBloggs
ont:staffAt
ont:headOf
Modelling – RDFS
RDF vocabulary description language
class/property relations
 domain
 range
Already have power to do some vocab
ulary mapping
 declare classes or properties from different vo
cabularies to be equivalent:
A rdfs:subClassOf B
B rdfs:subClassOf A
WOL OWL is…
23
Web Ontology Language
Elements of ontology
 Same/different identity
• “author” and “auteur” are the same relation
• two resources with the same “ISBN” are the same
“book”
 More expressive type definitions
• A “cycle” is a “vehicle” with at least one “wheel”
• A “bicycle” is a “cycle” with exactly two “wheels”
 More expressive relation definitions
• “sibling” is a symmetric predicate
• the value of the “favorite dwarf” relation must be one of
“happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”,
“bashful”, “doc”
OWL is…
24
Answer questions of
 Consistency
• Are there any contradictions in this model?
 Classification
• What are all the inferred types of this resource?
 Satisfiability
• Are there any classes in this ontology that cannot p
ossibly have any members?
What can we do with OWL?
25
Building Useful Ontologies
 Developing and maintaining quality ontolgies is very
challenging
 Users need tools and services, e.g., to help check
if ontology is:
 Meaningful — all named classes can have instances
http://www.aber.ac.uk/compsci/public/media/presentations/OUCL-seminar.ppt
Building Useful Ontologies
 Developing and maintaining quality ontolgies is very
challenging
 Users need tools and services, e.g., to help check
if ontology is:
 Meaningful — all named classes can have instances
 Correct — captures intuitions of domain experts
Building Useful Ontologies
 Developing and maintaining quality ontolgies is very
challenging
 Users need tools and services, e.g., to help check if ont
ology is:
 Meaningful — all named classes can have instances
 Correct — captures intuitions of domain experts
 Minimally redundant — no unintended synonyms

Banana split Banana sundae
Modelling - OWL
 richer modelling and semantics
 axioms on properties
 transitive, symmetric, inverseOf, ...
 functional, inverse functional
 equivalent property
 axioms on classes
 intersection, union, disjoint, equivalent
 restrictions on classes
 some value from, all values from, cardinality, has value,
one of, keys
 axioms on individuals
 same as, different from, all different
 imports
Modelling – OWL
supports much richer modelling
consistency checking of model
consistency checking of data
 some surprises if used to schema languages
 open world, no unique name assumption
 can extend to closed world checking
inference
 classification
 inferred relationships
Modelling
Spectrum of goals and styles
Lightweight vocabularies Rich ontological models
 simple modelling
 just enough agreement
to get useful work done
 removing boundaries to
enable information to be
found and connected
 global consistency not
possible
 a little semantics goes
a long way
 rich domain models
 need expressivity
 consistency is critical
 make complex infere
nces you can rely on,
across data you trust
 knowledge is power
Modelling
Ontology reuse
invest in complete ontology for a domain
 rich but general model, may be modular inside
 strong ―ontological commitment‖
 e.g. medical ontologies
reuse small, common, vocabularies
 FOAF, SIOC, Dublin Core, Org ...
 pick and choose classes and properties you need
 fill in a few missing links for your domain
generic reusable vocabularies
 Data cube vocabulary
Reusable, public on
tologies
33
Measurement Units Ontology
The Event Ontology
FOAF
schema.org is one of a number of
microdata vocabularies
it is a shared collection of microdata
schemas for use by webmasters
includes a type hierarchy, like an
RDFS schema
 starts with top-level Thing and DataType
types
 properties are inherited by descendant types
Schema.org
34
annotate an item with text-valued
properties using the “itemprop”
attribute
microdata properties
35
<div itemscope>
<p>My name is <span itemprop="name">Daniel</span>.</p>
</div>
<div itemscope>
<p>Flavors in my favorite ice cream:</p>
<ul>
<li itemprop="flavor">Lemon sorbet</li>
<li itemprop="flavor">Apricot sorbet</li>
</ul>
</div>
Google
Yahoo
Bing
Why should you use schema.org?
36
Top types
37
maintains schema.org ↔RDF
mappings
 there are mappings for BIBO, DBpedia,
Dublin Core, FOAF, GoodRelations, SIOC,
and WordNet
also provides examples, tutorials, and
data dumps
Schema.rdfs.org
38
Triple Store
OKFN Korea39
Triple Store & RDB
OKFN Korea
http://blog.gniewoslaw.pl/2012/11/relational-databases-vs-triple-stores/
40
Storage Solutions
for RDF Data
Triple Table (Basic Idea)
 Store all RDF triples in a single table
 Create indexes on combinations of S, P, and O
OKFN Korea41
The Internet Map
OKFN Korea
http://internet-map.net/
42
credits
These slides are partially based on
“Linked data and its role in the
semantic web” by Dave Reynolds,
Epimorphics Ltd.
OKFN Korea43
OKFN Korea

20130622 okfn hackathon t2

  • 1.
    OKFN Korea Hackathon Day 2013.06. 22. Toward Open Data World
  • 2.
    OKFN Korea2 What islinked data, Open data? Refine Modelling Access Triple Storage other topics image: Leo Oosterloo @ flickr.com
  • 3.
    서울시 데이터 Enrichment 목표  서울시 데이터 상세화를 위한 온톨로지 설계 또는 매핑  구조화, 의미화, 그리고 연결: 서울시 데이터 (비정형 데이터)를 온톨로지를 이용해 모델링하고, 외부 데이터와 연결  영문화: 비 한국어권 사용자가 사용할 수 있는 서울시 데이터 제공  범위  서울시 데이터셋 약 40종  문화재: 문화재청에서 수집한 국내 문화재 (국보, 보물, 지정문화재, 무형문화재 등)  방법론: 기존 RDF 어휘의 재사용을 통해 데이터 모델링  1) 데이터 선정: 서울시 열린데이터 광장에서 모델링 대상 데이터셋 선정  2) 데이터 셋 항목 검토: 데이터 셋의 개별 항목과 Dbpedia 온톨로지 (클래스, 속성) 의 매핑 관계 검토 • Dbpedia 온톨로지: 사물에 대한 개념 및 위키피디아 infobox 항목을 포함하고 있음 OKFN Korea3
  • 4.
    서울시 데이터 Enrichment 예를 들어, '박물관'을 모델링 할 경우, • 박물관에 대한 infobox 템플릿을 위키피디아에서 선택 • Dbpedia에서 박물관 infobox와 매핑한 어휘 선택 • 어휘와 데이터셋 항목 매핑 • 매핑되지 않는 항목의 모델링 여부 결정 (클래스, 속성 포함): 모델링 도구 결정 필요 • URI 체계 (별도 설계 필요) 적용 • 온톨로지 스키마 설계 완료  3) 데이터 정제 • Google Refine을 통해 데이터 정제 • Refine에서 추가하기 전에 할 작업 • 위치 데이터: 원본 데이터 (서울시)에 위치값을 변환 또는 추가 • 영문명: 한글명의 변환, 매핑 (수작업 필요) • Refine에서 할 작업 – 한글, 영문 위키피디아 URL 추가 – Dbpedia, Freebase URL 추가: Refine reconciliation을 이용해서 추가 – RDF 변환 매핑 Skelton 작업 – RDF, Excel 추출  4) 데이터 업로드 (RDF 또는 Excel)  데이터 스토어 선택  Jena, 4Store, … OKFN Korea4
  • 5.
  • 6.
    Modelling – RDF SubjectPredicate Object
  • 7.
    Modelling – RDF SubjectPredicate Object some school has a name/label some literal
  • 8.
    Modelling – RDF SubjectPredicate Object http://education.data.gov.uk /id/school/401874 has a name/label ―Cardiff High School‖
  • 9.
    Modelling – RDF SubjectPredicate Object http://education.data.gov.uk /id/school/401874 http://www.w3.org/2000/01/ rdf-schema#label ―Cardiff High School‖
  • 10.
    Modelling – RDF SubjectPredicate Object school:401874 rdfs:label ―Cardiff High School‖ where school: = http://education.data.gov.uk/id/school/ rdfs: = http://www.w3.org/2000/01/rdf-schema#
  • 11.
    Modelling – RDF SubjectPredicate Object school:401874 rdfs:label ―Cardiff High School‖ school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label Cardiff
  • 12.
    Modelling – RDF SubjectPredicate Object school:401874 rdfs:label ―Cardiff High School‖ school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label ―Cardiff‖ school:401874 ―Cardiff High School‖ ont:districtAdministrative la:00PT ―Cardiff‖ rdfs:label rdfs:label
  • 13.
    Modelling – RDF SubjectPredicate Object school:401874 rdfs:label ―Cardiff High School‖ school:401874 ont:districtAdministrative la:00PT la:00PT rdfs:label ―Cardiff‖ la:00PT rdfs:label ―Caerdydd‖@cy
  • 14.
    Modelling – vocabularies Logicalmodelling modelling the domain, not a particular data structure  what exists  what is asserted? what can you deduce from that?  not about constraints as such  monotonic, open world controlled vocabulary taxonomy thesaurus ontology Ontology
  • 15.
    Modelling – vocabularies unfamiliarterminology but related to  information architecture and conceptual modelling  domain-driven design  ... and yes knowledge representation
  • 16.
    Elements of:  Vocabulary(defining terms) • I define a relationship called “prescribed dose.”  Schema (defining types) • “prescribed dose” relates “treatments” to “dosagee s”  Taxonomy (defining hierarchies) • Any “doctor” is a “medical professional” 16 RDF Schema is…
  • 17.
    Modelling – RDFS RDFvocabulary description language classes, types and type hierarchy ont:School rdfs:Class rdf:type ―School‖ rdfs:label
  • 18.
    Modelling – RDFS RDFvocabulary description language classes, types and type hierarchy ont:WelshEstablishment ont:School rdfs:Class rdf:type rdf:typerdfs:subClassOf ―School‖ rdfs:label
  • 19.
    Modelling – RDFS RDFvocabulary description language classes, types and type hierarchy school:401874 ont:WelshEstablishment ont:WelshEstablishment ont:School rdfs:Class rdf:typerdf:type rdf:typerdfs:subClassOf ―School‖ rdfs:label
  • 20.
    Modelling – RDFS RDFvocabulary description language classes, types and type hierarchy school:401874 ont:WelshEstablishment ont:WelshEstablishment ont:School rdfs:Class rdf:typerdf:type rdf:typerdfs:subClassOf school:401874 ont:WelshEstablishment ont:School rdf:type  ―School‖ rdfs:label ―School‖ rdfs:label
  • 21.
    Modelling – RDFS RDFvocabulary description language properties, property hierarchy school:401874 person:JoeBloggs ont:staffAt ont:headOf rdf:Property ont:headOf rdf:type rdfs:subPropertyOf  school:401874person:JoeBloggs ont:staffAt ont:headOf
  • 22.
    Modelling – RDFS RDFvocabulary description language class/property relations  domain  range Already have power to do some vocab ulary mapping  declare classes or properties from different vo cabularies to be equivalent: A rdfs:subClassOf B B rdfs:subClassOf A
  • 23.
    WOL OWL is… 23 WebOntology Language
  • 24.
    Elements of ontology Same/different identity • “author” and “auteur” are the same relation • two resources with the same “ISBN” are the same “book”  More expressive type definitions • A “cycle” is a “vehicle” with at least one “wheel” • A “bicycle” is a “cycle” with exactly two “wheels”  More expressive relation definitions • “sibling” is a symmetric predicate • the value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc” OWL is… 24
  • 25.
    Answer questions of Consistency • Are there any contradictions in this model?  Classification • What are all the inferred types of this resource?  Satisfiability • Are there any classes in this ontology that cannot p ossibly have any members? What can we do with OWL? 25
  • 26.
    Building Useful Ontologies Developing and maintaining quality ontolgies is very challenging  Users need tools and services, e.g., to help check if ontology is:  Meaningful — all named classes can have instances http://www.aber.ac.uk/compsci/public/media/presentations/OUCL-seminar.ppt
  • 27.
    Building Useful Ontologies Developing and maintaining quality ontolgies is very challenging  Users need tools and services, e.g., to help check if ontology is:  Meaningful — all named classes can have instances  Correct — captures intuitions of domain experts
  • 28.
    Building Useful Ontologies Developing and maintaining quality ontolgies is very challenging  Users need tools and services, e.g., to help check if ont ology is:  Meaningful — all named classes can have instances  Correct — captures intuitions of domain experts  Minimally redundant — no unintended synonyms  Banana split Banana sundae
  • 29.
    Modelling - OWL richer modelling and semantics  axioms on properties  transitive, symmetric, inverseOf, ...  functional, inverse functional  equivalent property  axioms on classes  intersection, union, disjoint, equivalent  restrictions on classes  some value from, all values from, cardinality, has value, one of, keys  axioms on individuals  same as, different from, all different  imports
  • 30.
    Modelling – OWL supportsmuch richer modelling consistency checking of model consistency checking of data  some surprises if used to schema languages  open world, no unique name assumption  can extend to closed world checking inference  classification  inferred relationships
  • 31.
    Modelling Spectrum of goalsand styles Lightweight vocabularies Rich ontological models  simple modelling  just enough agreement to get useful work done  removing boundaries to enable information to be found and connected  global consistency not possible  a little semantics goes a long way  rich domain models  need expressivity  consistency is critical  make complex infere nces you can rely on, across data you trust  knowledge is power
  • 32.
    Modelling Ontology reuse invest incomplete ontology for a domain  rich but general model, may be modular inside  strong ―ontological commitment‖  e.g. medical ontologies reuse small, common, vocabularies  FOAF, SIOC, Dublin Core, Org ...  pick and choose classes and properties you need  fill in a few missing links for your domain generic reusable vocabularies  Data cube vocabulary
  • 33.
    Reusable, public on tologies 33 MeasurementUnits Ontology The Event Ontology FOAF
  • 34.
    schema.org is oneof a number of microdata vocabularies it is a shared collection of microdata schemas for use by webmasters includes a type hierarchy, like an RDFS schema  starts with top-level Thing and DataType types  properties are inherited by descendant types Schema.org 34
  • 35.
    annotate an itemwith text-valued properties using the “itemprop” attribute microdata properties 35 <div itemscope> <p>My name is <span itemprop="name">Daniel</span>.</p> </div> <div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul> </div>
  • 36.
  • 37.
  • 38.
    maintains schema.org ↔RDF mappings there are mappings for BIBO, DBpedia, Dublin Core, FOAF, GoodRelations, SIOC, and WordNet also provides examples, tutorials, and data dumps Schema.rdfs.org 38
  • 39.
  • 40.
    Triple Store &RDB OKFN Korea http://blog.gniewoslaw.pl/2012/11/relational-databases-vs-triple-stores/ 40
  • 41.
    Storage Solutions for RDFData Triple Table (Basic Idea)  Store all RDF triples in a single table  Create indexes on combinations of S, P, and O OKFN Korea41
  • 42.
    The Internet Map OKFNKorea http://internet-map.net/ 42
  • 43.
    credits These slides arepartially based on “Linked data and its role in the semantic web” by Dave Reynolds, Epimorphics Ltd. OKFN Korea43
  • 44.

Editor's Notes