SlideShare a Scribd company logo
1 of 104
Illuminating Chaos
Using Semantics to Harness the
Web
Dagobert Soergel
Department of Library and Information Studies,
University at Buffalo
1
AAT Workshop
Academia Sinica, Taipei
June 7,2010
Outline
• Overview of issues
• Semantics for whom and for what
• Representation to assist with query formulation
• Representation for comprehension
• Systems of representation
• Support for finding: Indexing
• Building KOS
• How can it all get done
• Zeroing in on the conceptual foundation
• Issues in the realm of AAT Taiwan
2
Semantics, structure,
meaning
• Classification
• Meaningful arrangement
• All kinds of relationships
3
Semantics for whom?
• Semantics for computer systems
inference
answers and solutions instead of lots of Web pages
• Semantics for people
assist users in creating meaning and making sense
structure for learning
4
Semantics for what
• Finding
• Comprehending
• To know what to look for, a user (a person or a
system) must first comprehend something – a cycle
• Both finding and comprehending require navigating in
an information space – need meaningful structure
5
Representation
to assist with query
formulation
6
Problem clarification for
search
JG prevention approach
JG10 . individual-level prevention
JG10.2 . . individual- vs. family-focused prevention
JG10.2.2 . . . individual-focused prevention
JG10.2.4 . . . family-focused prevention
JG10.4 . . prevention through information and education
JG10.4.2 . . . social marketing prevention approach
JG10.4.4 . . . prevention through information dissemination
JG10.4.6 . . . prevention through education
JG10.4.8 . . . peer prevention
JG10.8 . . prevention through spirituality and religion
JG10.10 . . prevention through public commitment
JG12 . environmental-level prevention
JG12.4 . . social policy prevention approach
JG14 . multi-level prevention
7
Problem clarification for
search
churches (buildings)
. <church buildings by function>
. . chapels of ease (buildings)
. . fortified churches
. . pilgrimage churches (buildings)
. . procathedrals (buildings)
. <church buildings by location or context>
. . abbey churches
. . cathedrals (buildings)
. . cave churches
. . collegiate churches
. . . . .
. <churches by form>
. . double churches
. . hall churches
. . rock-cut churches
. . stave churches
8
Browse structure for search
• Make a table of contents for the entire Wikipedia
using UDC
• Make a classified (hierarchically structured) index for
an art textbook using the Art and Architecture
Thesaurus
• Make a classified index for the collection of an art
museum using the Art and Architecture Thesaurus
9
10
Facet structure to guide search
A Area of ability combines with B Degree of ability
A1 psychomotor ability
A2 senses
A2.1 . vision
A2.1.1 . . night vision
A2.2 . hearing
A3 intelligence
A4 artistic ability
B1 low degree of ability, disabled
B2 average degree of ability
B3 above average degree of ability
B3.1 . very high degree of ability
Examples A2.1B1 visually impaired
A2.2B1 hearing impaired
A3B1 mentally handicapped
A3B3 intellectually gifted
Provide front-ends to assist
users
• Elicit a query with a facet-based interfaces,
then the system creates a free-text query
• Create a structure that normalizes terms assigned
through social tagging and arranges them in a
meaningful structure.
The user can than browse and select concepts
The system maps to all appropriate tags
11
Problem space for diseases
Used by people or computer systems
for search and arranging search output
Pathologic process
Body system affected
12
Pathologic process
Body system affected
Cause (condition, organism, chemical substance,
environmental factors)
Treatment
Representation for
comprehension
A question of information representation (knowledge
representation)
• For computer systems: formal representation
• For people: Text, images, graphical representation,
visualization
• Transformations between representations, such as
• from text to formal: information extraction
• from text to a map showing the text structure
• from a conventional thesaurus display to a concept map
13
Two representations
Text (for people)
High blood pressure is a serious disease often caused
by being overweight. In kids 4 – 12 it can be treated
highly effectively with Nystatin.
Formal representation (for computer system)
Causation (HighBloodPressure, Obesity)
Treatment (HighBloodPressure, {Human, [Age, 4-12y]},
Nystatin, [Effectiveness, 4])
14
Answering questions
Question
How can high blood pressure be prevented?
Answer
Loose weight?
15
Two representations
Text
Kids begin grazing independently from their mothers at
three months
Formal representation
Separation (Mother, Child, {Goat, [Age, 3m]})
16
Information extraction
• Information extraction produces representations
needed for the semantic Web
• Also useful for people if formal expressions are
transformed into sentences that state the findings of
a document as individual "bullets"
• Could arrange statements from one or more
documents in UDC order as a kind of summary
• Information extraction needs rich KOS
17
Representation of text structure
18
Meaningful arrangement of
terms
in document representations
19
• Terms assigned in social tagging
• Terms assigned from controlled vocabulary, e.g., AAT
The Martyrdom of Saint Bartholomew
20
Tags arranged alphabetically
• 1634
• 17th century
• bearded
• biblical
• Christ’s sacrifice
and crucifixion
{Christ metaphor}
• confronts
• executioner
• expressive hands
• flayed alive
• gestures
• Intensity
• Jusepe de Ribera
• luminous
• lurking
• martyrdom
• mystical experience
• nude body
• old man
• physical anguish
• profound emotion {emotion}
• Pulls the viewer into the scene
• religious
• Saint Bartholomew
• torture
21
Tags arranged
by how they relate to the
image
22
Matching topic (Direct)
• Image theme
• martyrdom
• mystical experience
• biblical
• religious
• Image content: Focal
• Reference
• nude body
• old man
• Saint Bartholomew
• executioner
• knife
• Elaboration (Adj.)
• Bearded
• physical anguish
• profound emotion}
• luminous
• Elaboration (Adv.)
• expressive hands
• gestures
• confronts
• flayed alive
• torture
• Image content: Peripheral
• Elaboration (Adv.)
• lurking
23
Comparison
• By similarity:
Metaphor / analogy
• Christ’s sacrifice
and crucifixion
{Christ metaphor}
Cause / Effect
• Reaction or feeling
• Intensity
• Effect / Outcome
• Pulls the viewer into the scene
Context
• Biographic info: Artist
• Jusepe de Ribera
• Biographic info: Time / period
• 1634
• 17th century
24
Comparison, cause/effect, context
Tags arranged
by how they relate to the
image
with descriptors from the
Art and Architecture
Thesaurus
25
• Image theme AAT
• martyrdom sacrifice
• mystical experience mysticism
• Biblical biblical stories
• Religious religion and religious concepts
• Image content: Focal
• Reference
• nude body nudes (representations)
• old man elderly
• Saint Bartholomew saints
• Executioner executioners
• Knife knives
• Elaboration (Adj.)
• Bearded
• physical anguish pain (sensation)
• profound emotion {emotional}
• Luminous shine
Matching topic (Direct)
26
Matching topic (Direct)
• Image content: Focal
• Elaboration (Adv.)
• expressive hands hands
• Gestures gesture
gesture drawings
• confronts
• flayed alive
• torture torturing
• Image content: Peripheral
• Elaboration (Adv.)
• lurking
27
Comparison
• By similarity:
Metaphor / analogy
• Christ’s sacrifice
and crucifixion
{Christ metaphor}
Cause / Effect
• Reaction or feeling
• Intensity
• Effect / Outcome
• Pulls the viewer into the scene
Context
• Biographic info: Artist
• Jusepe de Ribera
• Biographic info: Time / period
• 1634
• 17th century
28
Comparison, cause/effect, context
No AAT terms
Support comprehension
through links to KOS
• Map text term to concept in KOS,
show definition,
show place in hierarchical structure
29
Example
mysticism
Note: Refers in a general sense to a spiritual quest for hidden
truth, the goal of which is to be united with the divine. It also
refers more specifically to a belief in the existence of important
realities beyond perceptual or intellectual understanding that are
accessible by subjective experience, such as by intuition or
meditation. Forms of mysticism are found in all major religions
as well as in secular experience.
30
Example, continued
Associated Concepts Facet
. Associated Concepts
. . <philosophical concepts>
. . . <philosophical movements and attitudes>
. . . . aestheticism (philosphical movements and attitudes)
. . . . existentialism
. . . . holism
. . . . idealism (philosophical movement)
. . . . individualism
. . . . mysticism
. . . . . Hasidism
. . . . spiritualism
. . . . utlitarianism
31
Comprehension "in the
large"
• Learning and sense making require comprehension
across multiple sources
• Requires structure – can be supplied by KOS
• Requires tools for the manipulation of external
structures the learner / sensemaker builds, such as
concept maps
32
Representation systems
33
Representations need rules
• Formal representations need logical formalisms, such
as full first-order logic or subsets (for ease of
processing) or extensions (to be more expressive)
• Text needs rules of syntax and broader document
structure
• Graphical representations need rules of design
34
Representations
need names for entities
• Names for (abstract) concepts – classification
• Names for many different types of other entities, such
as persons, places, buildings, events, currencies, …
(named entities)
• Systems of such names – Knowledge Organization
Systems, authority lists of personal names
• Mappings between such systems
35
Representations need
relationships
• Relationships are used to connect entities,
thus forming statements
obesity <causes> high blood pressure
• Need system of relationships
Many such systems exist (a type of KOS)
Problem of mapping
36
Rhetorical relationships
• To map text structure
• To discern how a retrieved document, paragraph,
statement, or image relates to the topic of a search
37
Function-based Reasoning-based
38
Argument structure
Grounds
Warrants
Claim
Generic inference
Comparison-based
Induction / rule-based
Causal-based
Transitivity-based
Topical relevance typology
Rhetorical structure
Matching topic
Evidence (Indirect)
Context
Comparison
Evaluation
Method / Solution
Purpose/ Goal Semantic-based
(Green & Bean, 1995)
Taxonomy
Partonomy
Frame-based,
etc.
Matching topic (Direct)
. Manifestation
. Image content
. Image theme
Evidence (Indirect)
Context
. Scope
. Framework
. Environmental setting
. Social background
. Time & sequence
. Assumption / expectation
. Biographic information
Condition
. Helping or hindering factor
. Unconditional
. Exceptional condition
Purpose / Motivation
Cause / Effect
. Cause
. Effect / Outcome
. Explanation (causal)
. Prediction
Comparison
. By similarity (analogy) /
By difference (contrast)
. By factor that is different
Method / Solution
. Method / Approach
. Instrument
. Technique / Style
Evaluation
. Significance
. Limitation
. Criterion / Standard
. Comparative evaluation
39
RST+ Functional Role
Functional role: Comparison
Comparison
. By similarity vs. By difference (Contrast)
. . By similarity
. . . Analogy & metaphor
. . By difference (Contrast)
. By factor that is different
. . Different external factor
. . . Different time
. . . Different place
. . Different participant
. . . Different actor
. . . Different subject acted upon
. . Different act or experience
. . . Different act
. . . Different experience 40
Support for finding: Indexing
• Finding based on text:
Knowledge-based expansion of query
Front-end as discussed earlier
• Finding based on indexing:
Semantically enriched documents
41
A semantically enriched document
Reis et al. (2008)
Impact of Environment and Social Gradient on Leptospira infection in Urban
Slums (doi:10.1371/journal.pntd.0000228).
Infectious disease studied: Leptospirosis
Pathogen (causative agent of disease): Leptospira spirochete
Vector of disease pathogen: Rat (Rattus norvegicus)
Pathogen host subjected to study: Human (Homo sapiens)
Number of subject individuals in study: 3,171
. . .
Purpose of study: Quantify risk factors for leptospirosis . . .
Principal finding 1: Prevalence of Leptospira antibodies . . .
Principal finding 2: Disease risk . . .open sewers . . .
42
(http://dx.doi.org/10.1371/journal.pntd.0000228.x002)
A semantically enriched document
43
Tag Trees of Individual Semantic
Classes of Highlighted Terms
disease
infectious diseases
diarrheal disease
childhood diarrhea
dengue
leptospirosis
human leptospirosis
meningococcal disease
pulmonary hemorrhage syndrome
visceral leishmaniasis
Weil's disease
occupational disease
zoonotic disease
ID = Infectious Disease Ontology
GO = Gene Ontology term used in ID
ID:0000012 immunity
ID:0000017 mortality
ID:0000023 zoonotic
ID:0000025 pathogenicity
ID:0000034 endemic
ID:0000038 parasite
ID:0000056 host
ID:0000057 carrier
ID:0000063 vector
ID:0000064 pathogen
ID:0000066 infectious agent
ID:0000069 primary pathogen
ID:0000104 infection
44
ID = Infectious Disease Ontology GO = Gene Ontology
IDO:0000000 ! process
IDO:0000083 transmission
IDO:0000231 horizontal transmission (GO:0000031)
IDO:0000104 infection
IDO:0000084 pathogenesis
IDO:0000221 ! infectious disease progression
IDO:0000100 ! pathogen evasion of host immune response
IDO:0000111 antigenic variation
IDO:0000115 genetic diversification
IDO:0000226 pathogen life cycle (GO:0000026)
IDO:0000001 ! role
IDO:0000036 ! colonizer
IDO:0000038 parasite
IDO:0000048 symptom
IDO:0000056 host
IDO:0000057 carrier
IDO:0000059 reservoir
IDO:0000063 vector
IDO:0000064 pathogen
IDO:0000066 infectious agent
IDO:0000069 primary pathogen
IDO:0000200 mode of transmission (GO:0000000)
IDO:0000002 ! quality
IDO:0000215 ! quality of host population
IDO:0000098 infectious disease
IDO:0000210 ! quality of host
IDO:0000012 immunity
Semantically enriched documents
• Semantic enrichment supports semantic retrieval
• Broad area of its own
• Many different forms
• Explicit document structure
• Concept and named entity tagging and identification
• Assigning additional concepts or named entities
• Assigning extracted propositions
• Closely linked with information extraction
• IE produces elements of semantic enrichment
45
Need KOS
Needed for all this
• Large Knowledge Organization Systems
• Large knowledge bases with mappings
• Methods and procedures for developing KOS
46
How to get all this work
done?
The forces that created the problem
also support the solution
• Use automation
• Automated information extraction gets better every day and also
provides input to building KOS
• Automated classification could be used for the UDC Wikipedia project
• Use Web-enabled collaborative work ("crowdsourcing")
• Use computer systems to assist people
• Use Web-based systems to collect and integrate results
• Bootstrap: The more knowledge is in formal systems, the more
information extraction and structuring tasks can be automated
47
Example: Guided tagging
• Use facet structure to get taggers think a bit more
out of the box
For example, could ask
What does this image remind you of
• Could assign some terms automatically, for example,
extracting terms from text assigned to an image
48
DH June 2009
DH June 2009
Semantic analysis
as the basis for
everything
52
Hub
Water transport
Inland water transport
Ocean transport
Traffic station Water transport⊓
Traffic station Inland water tr.⊓
Traffic station Ocean transport⊓
Dewey
387 Water, air, space transportation
386 Inland waterway & ferry transportation
387.5 Ocean transportation
386.8 Inland waterway tr. > Ports
387.1 Ports
LCSH
Shipping
Inland water transport
Merchant marine
Harbors
German
Hafen
Mapping through a Hub
Outline
• Objective: Interoperability Plus
• KOS concept hub
• Method: Knowledge-based, computer-assisted
creation of canonical representations of concepts
• Resulting knowledge base and applications
53
Objective
Improve semantic-based search
across multiple collections in multiple languages.
• Interoperability between any two participating KOS
(Knowledge Organization Systems)
• Support for search, esp. facet-based search
• for any collection indexed by a participating KOS
• for search based on free-text or free-form social tagging
• Assistance in cataloging (metadata creation)
by catalogers or users (social tagging)
• Long-range goal: Web service where a KOS can be uploaded
and mappings to specified target KOS are returned
54
KOS Concept Hub
• Interoperability is achieved by
expressing concepts from all participating KOS
as a canonical representation,
such as a description logic formula
using atomic concepts and relationships
• The backbone of the proposed system is a
faceted core classification of atomic concepts
together with a set of relationships
• Mapping from KOS to KOS is achieved by reasoning
over these canonical representations
55
56
Hub
Water transport
Inland water transport
Ocean transport
Traffic station Water transport⊓
Traffic station Inland water tr.⊓
Traffic station Ocean transport⊓
Dewey
387 Water, air, space transportation
386 Inland waterway & ferry transportation
387.5 Ocean transportation
386.8 Inland waterway tr. > Ports
387.1 Ports
LCSH
Shipping
Inland water transport
Merchant marine
Harbors
German
Hafen
Mapping through a Hub
57
Hub
Traffic station
Vehicle parking
Terminal facilities
Water transport
Inland water transport
Ocean transport
Traffic station Water transport⊓
By type of water transport
Traffic station Inland water tr.⊓
Traffic station Ocean transport⊓
By component of traffic station
Vehicle parking Water transport⊓
Terminal facilities Water transport⊓
Dewey
387 Water, air, space transportation
386 Inland waterway & ferry transportation
387.5 Ocean transportation
386.8 Inland waterway tr. > Ports
387.1 Ports
LCSH/AAT
Shipping
water transport
Inland water transport
Merchant marine
Harbors
ports
harbors
Mapping through a Hub
Method: How to get DL formulas
Key: Efficient creation of canonical representations (DL formulas)
• Apply existing knowledge:
Large knowledge base ▬► less effort for processing a new KOS
• Use knowledge of KOS structure for hierarchical inheritance
• Use linguistic analysis of terms and captions
• Eliminate redundant atomic concepts
• Check or produce mapping results from assignment of concepts to
the same records
• Get human editors’ input and verification where needed through a
user-friendly interface
• KOS “owners” may verify and edit data pertaining to their KOS
58
Knowledge base
Requires an ever larger classification and lexical
knowledge base containing many kinds of data:
1. A faceted classification of atomic concepts
Seeded from sources with well-developed facets such as
UDC
the Alcohol and Other Drug (AOD) Thesaurus
the Harvard Business Thesaurus
the Art and Architecture Thesaurus
various systems called ontologies
59
Knowledge base 2
Requires an ever larger classification and lexical
knowledge base containing many kinds of data:
2. Linguistic knowledge bases such as WordNet and mono-,bi-, and
multi-lingual dictionaries and thesauri
3. Many KOS (Knowledge Organization Systems), such as LCC, UDC,
DDC, DMOZ directory, LCSH, Gene Ontology, Schlagwortnormdatei
4. These will over time be fused into one large multilingual knowledge
base with many terminological and translation relationships and
relationships linking terms to concepts,
with an increasing number of concepts semantically represented by a
DL formula.
60
Examples of deriving
DL formulas
61
L00 Transportation and traffic
L10 Traffic system components
L13 Traffic facilities
L15Traffic stations
L17 Vehicles
L30 Modes of transportation
L33 Air transport
L37 Water transport
P00 Buildings, construction
P23 Buildings
P27 Architecture
P43 Construction
R00 Engineering
R30 Acoustics
R37 Soundproofing
T70 Military vs. civilian
T73 Military
T77 Civilian
62
Underlying faceted classification
HE Transportation
HE550-560 Ports, harbors,
docks, wharves, etc.
L00 Transportation and traffic T77 Civilian⊓
Inherited:
L00 Transportation and traffic T77 Civilian⊓
Added by editor:
L15 Traffic stations L37 Water transport⊓
Resolved to:
L15 Traffic stations L37 Water transport⊓ ⊓
T77 Civilian
63
Method: Assigning atomic concepts 1
NA6300-6307 Airport buildings From database already established:
Airport =
L15 Traffic stations L33 Air transport⊓
Buildings = P23 Buildings
Added by editor T77 Civilian
Resolved to
L15 Traffic stations L33 Air transport⊓ ⊓
P23 Buildings T77 Civilian⊓
64
Method: Assigning atomic concepts 2
TL681.S6 Airplanes. Soundproofing From database already established:
Airplane =
L17 Vehicles L33 Air transport⊓
Soundproofing = R37 Soundproofing
Added by editor: Nothing
Resolved to
L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing
65
Method: Assigning atomic concepts 3
Aeroplanes-Soundproofing From database already established:
Aeroplanes = Airplane [Spelling variant]
Therefore
Term is recognized as same as
Airplanes. Soundproofing
Resolved to
L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing
66
Method: Assigning atomic concepts 4
Any class formed by geographical
subdivision
Such as
NA6300-6307 Airport buildings
NA6305.E3 Egypt
Recognized using a dictionary of
geographical names
Inherits from subject class above it;
simply add the country
L15 Traffic stations L33 Air transport⊓
P23 Buildings T77 Civilian⊓ ⊓ Egypt⊓
No editor checking needed
67
Method: Assigning atomic concepts 5
Examples from the resulting
knowledge base
68
HE550-560 Ports, harbors, docks, wharves,
etc.
NA2800 Architectural acoustics
NA6300-6307 Airport buildings
NA6330 Dock buildings, ferry houses, etc.
TC350-374 Harbor works
TH1725 Soundproof construction
TL681.S6 Airplanes. Soundproofing
TL725-726 Airways (Routes). Airports and
landing fields. Aerodromes
VA67-79 Naval ports, bases, reservations,
docks
VM367.S6 Submarines. Soundproofing
= L15 Traffic stations L37 Water transport⊓
T77 Civilian⊓
= P27 Architecture R30 Acoustics⊓
= L15 Traffic stations L33 Air transport⊓ ⊓
P23 Buildings T77 Civilian⊓
= L15 Traffic stations L37 Water transport⊓
P23 Buildings T77 Civilian⊓ ⊓
= L15 Traffic stations L37 Water transport⊓
R00 Engineering T77 Civilian⊓ ⊓
= P23 Buildings P43 Construction⊓ ⊓
R37 Soundproofing
= L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing
= L13 Traffic facilities L33 Air transport⊓ ⊓
Technical aspects
= L15 Traffic stations L37 Water transport⊓
T73 Military⊓
= L17 Vehicles L37 Water transport⊓ ⊓
R37 Soundproofing T73 Military⊓ ⊓
Underwater
69
Aeroplanes-Soundproofing
Airports-Buildings
Buildings-Soundproofing
Ships-Soundproofing
= L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing
= P23 Buildings L15 Traffic stations⊓ ⊓
L33 Air transport
= P23 Buildings P43 Construction⊓ ⊓
R37 Soundproofing
= L17 Vehicles L37 Water transport R37⊓ ⊓
Soundproofing
70
LC subject headings
with combinations of atomic concepts
71
Hub
L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing
L17 Vehicles ⊓ L37 Water
transport ⊓ R37 Soundproofing
L17 Vehicles ⊓ L37 Water
transport ⊓
R37 Soundproofing T73⊓
Military⊓
Underwater
LCC
TL681.S6 Airplanes. Soundproofing
VM367.S6 Submarines.
Soundproofing
LCSH
Aeroplanes-
Soundproofing
Ships-Soundproofing
Mapping through a Hub
72
Hub
Canonical form of
query
(DL formula)
User query
Free text
Combination of elemental
concepts through facets
(guided query formulation)
Controlled term(s) from a
KOS, possibly found
through browsing a KOS
Final query
(Enriched) free
text query
Query in terms
of a KOS
Mapping user queries
TL681.S6 Airplanes. Soundproofing
VM367.S6 Submarines. Soundproofing
Aeroplanes-Soundproofing
Ships-Soundproofing
[L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing]
[L17 Vehicles L37 Water transport⊓ ⊓
R37 Soundproofing Military]⊓
[L17 Vehicles L33 Air transport⊓ ⊓
R37 Soundproofing]
[L17 Vehicles L37 Water transport⊓ ⊓
R37 Soundproofing]
73
Query:
L17 Vehicles AND R37
Soundproofing
Examples
from NALT and LCSH
• NALT National Agricultural Library Thesaurus
• LCSH Library of Congress Subject Headings
74
Air pollution laws
LCSH term
Air – Pollution – Laws and regulations
[isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy]
Pollutant [property] Undesirable}
NALT terms
Air pollution
[isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable
Laws and regulations
[isa] Legal rule
Mapping LCSH ▬► NALT
Air – Pollution – Laws and regulations ▬► Air pollution AND
Laws and regulations
Interpretation for indexing and searching in both directions
75
Soil moisture vs. Soil water
LCSH term
Soil moisture
[isa] Water [containedIn] Soil
NALT term
Soil water
[isa] Water [containedIn] Soil
Mapping LCSH ▬► NALT
Soil moisture ▬► Soil water
76
Greenhouse gardening
LCSH term
Greenhouse gardening
[isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home
NALT terms
Home gardening
[isa] Gardening [inEnvironment] Home
Greenhouse
[isa] Greenhouse
Mapping LCSH ▬► NALT
Greenhouse gardening ▬► Home gardening AND
Greenhouse
77
Salad greens
LCSH term
Salad greens
[isa] Green leafy vegetable [usedFor] Salad
NALT term
Green leafy vegetables
[isa] Green leafy vegetable
Mapping LCSH ▬► NALT
Salad greens ▬► BT Green leafy vegetables
78
Emerging diseases
LCSH term
Emerging infectious diseases
[isa] Disease [hasProperty] Infectious [hasProperty] Emerging
NALT term
Emerging diseases
[isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging
Mapping LCSH ▬► NALT
Emerging infectious diseases ▬► Emerging diseases
Emerging infectious diseases ▬► BT Emerging diseases
79
Distributed implementation
• A KOS on the Web could assign DL formulas to its
concepts − let's call this a
semantically enhanced KOS or SEKOS
• Could use any of a number of faceted core
classifications or even several (using a unique URI
for each elemental concept)
• Core classifications could be mapped to each other
• It is now a simple matter to map from any SEKOS
to any other (somewhat dependent on the core
classifications used)
80
Examples
from the realm of AAT Taiwan
AAT Art and Architecture Thesaurus (Getty)
AAT Taiwan TELDAP, Institute for Information Science
Academia Sinica
TGM Thesaurus of Graphic Materials,
Library of Congress
E-HowNet A Lexical Knowledge Base for Semantic
Composition, Academia Sinica
81
82
Hub
Facility Worship⊓
Facility Worship Judaism⊓ ⊓
Facility Worship Christianity⊓ ⊓
Facility Worship Islam⊓ ⊓
Facility Worship Buddhism⊓ ⊓
Facility Worship Taoism⊓ ⊓
TGM
temples
synagogues
churches
mosques
Buddhist temples
Taoist temples
AAT
temples (buildings)
synagogues
(buildings)
churches (buildings)
mosques (buildings)
Mapping through a Hub
Mapping to Chinese
• Use E-HowNet formal semantic expressions
83
E-HowNet ontology 廣義知識知識本體
• Building | 建築物
Facilities |設施
Chinese Word: 廟
English: Temple
Conceptual expression: {facilities |設施 : domain = {religion |宗教 }}
Chinese Word: 禪寺
English: Buddhist temple
Conceptual expression: {facilities |設施 : domain = {Buddhist |佛教 }}
Chinese Word: 道觀
English: Taoist temple/ Taoist quan
Conceptual expression: {facilities |設施 : domain = {Taoism |道教 }}
84
Mapping to Chinese
• Use E-HowNet formal semantic expressions
• Use terms that already exist in E-HowNet
• Add terms using computer-assisted derivation of
semantic expressions as described above for English
85
Cross-language
mapping problems
Example
AAT stitching maps to two Chinese terms:
縫合 (feng he) for needleworking and
縫訂 (feng ding) for bookbinding
86
Analysis
Since English has only one word stitching,
AAT does not distinguishbetween the two specific concepts
even though the AAT scope note describes the two concepts
Solution
AAT AAT Taiwan
stitching 縫 (feng)
stitching (needlework) 縫合 (feng he)
stitching (bookbinding) 縫訂 (feng ding)
87
Principle
The classification should include all concepts that are lexicalized in
any language participating in a cross-language mapping system
If a language does not have a term for a concept, a term must be
invented.
This also happens when a concept is found through conceptual
analysis
88
Shades of meaning
Example
The AAT defines temple as
Buildings housing places devoted to the worship of a deity or
deities
But in Chinese culture, a temple (Miao( 廟 ) is devoted to
worshiping or honoring or communing with ancestors or spirits.
There are a number of further terms in Chinese for buildings
devoted to worshiping/ commemorating saints, or some famous
scholars, poets, or people with great achievement.
89
Shades of meaning
Thus in the concept structure we need
Temple (broad definition)
Building housing places devoted to the worshiping, communing
with, or honoring or commemorating a deity or deities or ancestors
or spirits or saints, or some famous scholars, poets, people with
great achievement.
Temple (narrow AAT defintion)
Miao( 廟 )
Other Chinese terms
90
The importance of good defintions
AAT Taiwan must make sure that all readers, English
and Chinese, understand all terms, English and
Chinese, and the often subtle differences.
The table on the next slide illustrates that
91
Uses of AAT Taiwan
92
Searching
Western art
Searching
Chinese art
Western user Understands English
terms
Needs to understand
Chinese terms
Chinese user Needs to understand
English terms
Understands Chinese
terms
All users need a good conceptual structure
Take-home message
Semantics gives powerful systems
93
Dagobert Soergel
dsoergel @ buffalo.edu
www.dsoergel.com
94
T
95
E-HowNet ontology 廣義知識知識本
體• Building | 建築物
Facilities |設施
Chinese Word: 廟
English: Temple
Conceptual expression: {facilities |設施 : domain = {religion |宗教 }}
Chinese Word: 禪寺
English: Buddhist temple
Conceptual expression: {facilities |設施 : domain = {Buddhist |佛教 }}
Chinese Word: 道觀
English: Taoist temple/ Taoist quan
Conceptual expression: {facilities |設施 : domain = {Taoism |道教 }}96
9797
Mapping Issues- 1Mapping Issues- 1
Terms related to Chinese religious concept
The word “temples” is frequently considered as an equivalent term “ 廟 miao” in Chinese.
However, due to different purposes of the building and the spirit that it worships, names of religious
buildings in Taiwan are varied.
Temples (buildings) (religious buildings, <religious structures>, ... Built Environment (Hierarchy
Name))
Note: Buildings housing places devoted to the worship of a deity or deities. In the strictest sense, it refers to
the dwelling place of a deity, and thus often houses a cult image. In modern usage a temple is generally a
structure, but it was originally derived from the Latin "templum" and historically has referred to an uncovered
place affording a view of the surrounding region. For Christian or Islamic religious buildings the terms
"churches" or "mosques" are generally used, but an exception is that "temples" is used for Protestant, as
opposed to Roman Catholic, places of worship in France and some French-speaking regions.
Q1. The mapping team has found that “temple” in AAT is broader than the concept in Chinese.
Therefore it is necessary to distinguish the differences in each Chinese terms before mapping.
9898
Mapping Issues-Mapping Issues- 1
Terms related to Chinese religious concept
Despite the similar appearance, each of them has slight
difference from the others.
Miao( 廟 ): In the past, it was a place to worship ancestors.
Since Han dynasty, it had been used as a place both
worship ancestor and the spirits.
•ci ( 祠 ): It is built for the purpose to worship/
commemorate saints, or some famous scholars, poets,
people with great achievement. Sometimes also refers to
those places that worship ancestors.
• si ( 寺 ): Generally refers to a place that worship the
Buddhist spirits. Sometimes it also refers to the place where
Buddhist monk live.
• an ( 庵 ): used to refers to scholars’ study place ( 書齋 ).
Nowadays it refers to where Buddhist nuns live.
• guan( 觀 ): only refers to Taoist building
• yan( 巖 ): refers to those miaos( 廟 ) established nearby or
at mountain.
9999
Mapping Issues- 2Mapping Issues- 2
A Chinese set term stands for broader meaning
文玩 (Wenwan)
•   A word combined with two words “ 文物 cultural object” and “ 古玩 antique curio”.  
  ( 文玩兼有文物與古玩的特點 )
•   It specifically refers to those objects used in the educated people’s reading room,
including those writing equipments, small tools and decorations.
( 特指文人書齋中的書寫設備、小工具和擺飾 )
•   It represents the culture of reading room, by combining the practical function of
educated people’s study equipments and art crafts for people’s appreciation.
( 文玩是種書齋文化,結合了文人書生的實用器物與具觀賞價值的藝術品 )
•   Common objects including: ink stones, seals, washing vessels, fine sculptured
decoration…etc. “Elegant” and “exquisite” are its essential characters.
( 文玩為以下器物的泛稱 : 古硯、印章、洗器、牙雕…等,“雅” 與“巧”是其基本特徵 )
•   It is produced in a highly artistic manner. Nowadays it has become popular collection that
values more as an artifact than equipment.
( 以高藝術性的方式製造,現今多為賞而勿用的文房珍玩 )
100100
Mapping Issues- 2Mapping Issues- 2
A Chinese set term stands for broader meaning
• lotus pod shaped vessel for
injecting water 雙蓮房水注
• banana leaf shaped
wooden plate 癭木蕉葉盤
• olive stone boat sculpture
果核小舟
• blue snuff bottle
藍地金星套料鼻煙
壺
• lotus leaf shaped washing
vessel 白玉荷葉式洗
• seal 鴛錦雲章循
連環田黃石印
• ivory desk tidy 象牙
雕山水人物筆筒
101101
Mapping Issues- 2Mapping Issues- 2
A Chinese set term stands for broader meaning
Q2. The mapping team has found the meaning of Wenwan is boarder than the
term “desk sets”, while some part of them are equal. Therefore, the 2 terms
are inexact equivalent relations.
Is it more suitable to create a new term “Wenwan” in the structure, or it should
be referred as desk sets?
desk sets (sets (groups), <object groupings by general context>, ... Object Groupings and Systems)
Note: Sets of matching articles intended to be used on a desk including such articles as inkstands,
pen trays, and stamp boxes.
When English terms have broader meanings (1/2)
EX1:
• ID: 300053660 Record Type: concept
stitching (<processes and techniques by specific type>, <processes and techniques>, Processes and Techniques)
Note: Refers to the process of fastening, joining, closing, uniting, mending, or creating ornamentation by stitches,
which are the portions of thread left in fabric or another material by the in and out movement of a threaded needle
through the thickness or surface of the material, or the loops of thread created on a needle in knitting or other
needlework. In the context of textiles and needleworking, its meaning overlaps with "sewing." In the context of
bookbinding, it refers to the fastening together a number of leaves or gatherings by passing the thread or wire through
all of the sheets at once; it is distinct from "sewing," which, in the context of bookbinding, is used for the joining of
leaves or gatherings together one by one by drawing thread or wire backwards and forwards through the back fold of
each sheet to attach it to the cords.
縫綴 / 縫訂 (< 依特定種類區分之過程與技術 >, < 過程與技術 >, 過程與技術 )
範圍註:意指藉由針線進出穿過材料或其表面的動作,將針腳留在布料或其他材料上,或是在編織或針織時形成針目,
以固定、結合、閉合、合併、修補或製作裝飾的過程。若指涉的是紡織品與手工繡品方面,則其意義與「縫紉
(sewing) 」一詞重疊。若指涉的是書籍裝幀方面,則意指將若干頁面或疊層,用線或金屬線一次穿過所有紙張固定在一
起。而「線訂( sewing )」在書籍裝幀方面,是指用針線或金屬線,在一疊書頁的摺縫處上下穿梭,使其與裝訂線固定
的方法。
 In different contexts (bookbinding vs. needleworking), the meaning of stitching may change accordingly. In
AAT, two kinds of meanings are explained in the same record, but when translating the term into Chinese,
there will be two ways of translation, 縫合 (feng he) for needleworking and 縫訂 (feng ding) for
bookbinding. The same problem occurs in the record of sewing (ID: 300053658).
Stiching in needleworkingStiching in bookbinding
When English terms have broader meanings (2/2)
EX2:
300004184 Record Type: concept
patios (<uncovered spaces>, <rooms and spaces by form>, ... Components (Hierarchy Name))
Note: Paved recreation areas adjoining contemporary houses and the paved interior courts of
Spanish or Spanish-style buildings.
The term refers to two types of open spaces, so the translations could be 屋外休憩區 or ( 西班牙 ) 內
院 .
Spanish patioPatio adjoining a house
When English terms have broader meanings (2/2)
EX3:
• 300266238
Record Type: concept
maculatures (<prints by process or technique>, prints (visual works), ... Visual and Verbal
Communication)
Note: Prints made by taking a second impression without reinking the plate, often
used for cleaning the plate. May also refer to blotting paper. Also used for scrap
paper that can reinforce fabric in Medieval embroidery.
 The term maculatures could be used in three different contexts (prints, blotting paper, and
scrap paper) , and there are three kinds of translations ( 吸墨紙版畫、吸墨紙、固定刺繡布
料的紙片 ).
Q3: In this case, since the record contains multiple meanings, it’s not a problem
of which one being the preferred term, so how should the Chinese translations be
displayed?

More Related Content

Similar to Illuminating Chaos Using Semantics to Harness the Web

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsRafael Alvarado
 
FRBR, Physics, And The World Wide Web
FRBR, Physics, And The World Wide WebFRBR, Physics, And The World Wide Web
FRBR, Physics, And The World Wide WebRonald Murray
 
Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Toby Burrows
 
Free the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern CommunityFree the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern CommunityDouglas Schuler
 
Philosophy and policy in higher education
Philosophy and policy in higher educationPhilosophy and policy in higher education
Philosophy and policy in higher educationGeorge Roberts
 
Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...PhDEng Ruel Bongcansiso
 
Doctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentationDoctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentationMrDampha
 
Week 1 intro to research 702 2015
Week 1 intro to research 702 2015Week 1 intro to research 702 2015
Week 1 intro to research 702 2015wawaaa789
 
Week 1 intro to research 702 2015
Week 1 intro to research 702 2015Week 1 intro to research 702 2015
Week 1 intro to research 702 2015Ryrinn Azman
 
The Architecture of Understanding
The Architecture of UnderstandingThe Architecture of Understanding
The Architecture of UnderstandingPeter Morville
 
ITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docx
ITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docxITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docx
ITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docxvrickens
 
1. introduction to tok, ch.1 3 p. 1-41
1. introduction to tok, ch.1 3 p. 1-411. introduction to tok, ch.1 3 p. 1-41
1. introduction to tok, ch.1 3 p. 1-41Justin Morris
 
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Marko Grobelnik
 
Marshall research design and methodology
Marshall research design and methodologyMarshall research design and methodology
Marshall research design and methodologyHannah Marshall
 
Frameworks for studies of information behaviour and use
Frameworks for studies of information behaviour and useFrameworks for studies of information behaviour and use
Frameworks for studies of information behaviour and useDiane Rasmussen Pennington
 

Similar to Illuminating Chaos Using Semantics to Harness the Web (20)

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
 
FRBR, Physics, And The World Wide Web
FRBR, Physics, And The World Wide WebFRBR, Physics, And The World Wide Web
FRBR, Physics, And The World Wide Web
 
Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...
 
Free the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern CommunityFree the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern Community
 
Philosophy and policy in higher education
Philosophy and policy in higher educationPhilosophy and policy in higher education
Philosophy and policy in higher education
 
Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...
 
Philosophy
Philosophy Philosophy
Philosophy
 
Dove, "A Model of the User's Psychological State as a Framework for Understan...
Dove, "A Model of the User's Psychological State as a Framework for Understan...Dove, "A Model of the User's Psychological State as a Framework for Understan...
Dove, "A Model of the User's Psychological State as a Framework for Understan...
 
Doctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentationDoctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentation
 
Week 1 intro to research 702 2015
Week 1 intro to research 702 2015Week 1 intro to research 702 2015
Week 1 intro to research 702 2015
 
Week 1 intro to research 702 2015
Week 1 intro to research 702 2015Week 1 intro to research 702 2015
Week 1 intro to research 702 2015
 
The Architecture of Understanding
The Architecture of UnderstandingThe Architecture of Understanding
The Architecture of Understanding
 
Pml 8
Pml 8Pml 8
Pml 8
 
Interdisciplinary research
Interdisciplinary researchInterdisciplinary research
Interdisciplinary research
 
Tell It Like It Seems: Challenges Identifying Requirements of a Learning He...
Tell It Like It Seems: Challenges Identifying Requirements of a Learning He...Tell It Like It Seems: Challenges Identifying Requirements of a Learning He...
Tell It Like It Seems: Challenges Identifying Requirements of a Learning He...
 
ITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docx
ITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docxITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docx
ITS 832 CHAPTER 10VALUES IN COMPUTATIONAL MODELS REVALUED.docx
 
1. introduction to tok, ch.1 3 p. 1-41
1. introduction to tok, ch.1 3 p. 1-411. introduction to tok, ch.1 3 p. 1-41
1. introduction to tok, ch.1 3 p. 1-41
 
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
 
Marshall research design and methodology
Marshall research design and methodologyMarshall research design and methodology
Marshall research design and methodology
 
Frameworks for studies of information behaviour and use
Frameworks for studies of information behaviour and useFrameworks for studies of information behaviour and use
Frameworks for studies of information behaviour and use
 

More from AAT Taiwan

German AAT 2013
German AAT 2013German AAT 2013
German AAT 2013AAT Taiwan
 
Chile AAT 2013
Chile AAT 2013Chile AAT 2013
Chile AAT 2013AAT Taiwan
 
The Dutch AAT 2013
The Dutch AAT 2013The Dutch AAT 2013
The Dutch AAT 2013AAT Taiwan
 
Challenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesChallenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesAAT Taiwan
 
2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導AAT Taiwan
 
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605AAT Taiwan
 
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun ChenAAT Taiwan
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...AAT Taiwan
 
2011 chinese aat update
2011 chinese aat update2011 chinese aat update
2011 chinese aat updateAAT Taiwan
 
Metadata for architectural contents in europe
Metadata for architectural contents in europeMetadata for architectural contents in europe
Metadata for architectural contents in europeAAT Taiwan
 
Te papa, collections online & thesauri
Te papa, collections online & thesauriTe papa, collections online & thesauri
Te papa, collections online & thesauriAAT Taiwan
 
The spanish language version of the aat
The spanish language version of the  aatThe spanish language version of the  aat
The spanish language version of the aatAAT Taiwan
 
Union catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapUnion catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapAAT Taiwan
 
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemIntroduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemAAT Taiwan
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectAAT Taiwan
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aatAAT Taiwan
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aatAAT Taiwan
 
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...AAT Taiwan
 
(Final) aat taiwan system
(Final) aat taiwan system(Final) aat taiwan system
(Final) aat taiwan systemAAT Taiwan
 

More from AAT Taiwan (20)

German AAT 2013
German AAT 2013German AAT 2013
German AAT 2013
 
Chile AAT 2013
Chile AAT 2013Chile AAT 2013
Chile AAT 2013
 
The Dutch AAT 2013
The Dutch AAT 2013The Dutch AAT 2013
The Dutch AAT 2013
 
Challenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesChallenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different Cultures
 
2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導
 
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
 
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
 
2011 chinese aat update
2011 chinese aat update2011 chinese aat update
2011 chinese aat update
 
Metadata for architectural contents in europe
Metadata for architectural contents in europeMetadata for architectural contents in europe
Metadata for architectural contents in europe
 
Te papa, collections online & thesauri
Te papa, collections online & thesauriTe papa, collections online & thesauri
Te papa, collections online & thesauri
 
The spanish language version of the aat
The spanish language version of the  aatThe spanish language version of the  aat
The spanish language version of the aat
 
The dutch aat
The dutch aatThe dutch aat
The dutch aat
 
Union catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapUnion catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldap
 
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemIntroduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
 
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
 
(Final) aat taiwan system
(Final) aat taiwan system(Final) aat taiwan system
(Final) aat taiwan system
 

Illuminating Chaos Using Semantics to Harness the Web

  • 1. Illuminating Chaos Using Semantics to Harness the Web Dagobert Soergel Department of Library and Information Studies, University at Buffalo 1 AAT Workshop Academia Sinica, Taipei June 7,2010
  • 2. Outline • Overview of issues • Semantics for whom and for what • Representation to assist with query formulation • Representation for comprehension • Systems of representation • Support for finding: Indexing • Building KOS • How can it all get done • Zeroing in on the conceptual foundation • Issues in the realm of AAT Taiwan 2
  • 3. Semantics, structure, meaning • Classification • Meaningful arrangement • All kinds of relationships 3
  • 4. Semantics for whom? • Semantics for computer systems inference answers and solutions instead of lots of Web pages • Semantics for people assist users in creating meaning and making sense structure for learning 4
  • 5. Semantics for what • Finding • Comprehending • To know what to look for, a user (a person or a system) must first comprehend something – a cycle • Both finding and comprehending require navigating in an information space – need meaningful structure 5
  • 6. Representation to assist with query formulation 6
  • 7. Problem clarification for search JG prevention approach JG10 . individual-level prevention JG10.2 . . individual- vs. family-focused prevention JG10.2.2 . . . individual-focused prevention JG10.2.4 . . . family-focused prevention JG10.4 . . prevention through information and education JG10.4.2 . . . social marketing prevention approach JG10.4.4 . . . prevention through information dissemination JG10.4.6 . . . prevention through education JG10.4.8 . . . peer prevention JG10.8 . . prevention through spirituality and religion JG10.10 . . prevention through public commitment JG12 . environmental-level prevention JG12.4 . . social policy prevention approach JG14 . multi-level prevention 7
  • 8. Problem clarification for search churches (buildings) . <church buildings by function> . . chapels of ease (buildings) . . fortified churches . . pilgrimage churches (buildings) . . procathedrals (buildings) . <church buildings by location or context> . . abbey churches . . cathedrals (buildings) . . cave churches . . collegiate churches . . . . . . <churches by form> . . double churches . . hall churches . . rock-cut churches . . stave churches 8
  • 9. Browse structure for search • Make a table of contents for the entire Wikipedia using UDC • Make a classified (hierarchically structured) index for an art textbook using the Art and Architecture Thesaurus • Make a classified index for the collection of an art museum using the Art and Architecture Thesaurus 9
  • 10. 10 Facet structure to guide search A Area of ability combines with B Degree of ability A1 psychomotor ability A2 senses A2.1 . vision A2.1.1 . . night vision A2.2 . hearing A3 intelligence A4 artistic ability B1 low degree of ability, disabled B2 average degree of ability B3 above average degree of ability B3.1 . very high degree of ability Examples A2.1B1 visually impaired A2.2B1 hearing impaired A3B1 mentally handicapped A3B3 intellectually gifted
  • 11. Provide front-ends to assist users • Elicit a query with a facet-based interfaces, then the system creates a free-text query • Create a structure that normalizes terms assigned through social tagging and arranges them in a meaningful structure. The user can than browse and select concepts The system maps to all appropriate tags 11
  • 12. Problem space for diseases Used by people or computer systems for search and arranging search output Pathologic process Body system affected 12 Pathologic process Body system affected Cause (condition, organism, chemical substance, environmental factors) Treatment
  • 13. Representation for comprehension A question of information representation (knowledge representation) • For computer systems: formal representation • For people: Text, images, graphical representation, visualization • Transformations between representations, such as • from text to formal: information extraction • from text to a map showing the text structure • from a conventional thesaurus display to a concept map 13
  • 14. Two representations Text (for people) High blood pressure is a serious disease often caused by being overweight. In kids 4 – 12 it can be treated highly effectively with Nystatin. Formal representation (for computer system) Causation (HighBloodPressure, Obesity) Treatment (HighBloodPressure, {Human, [Age, 4-12y]}, Nystatin, [Effectiveness, 4]) 14
  • 15. Answering questions Question How can high blood pressure be prevented? Answer Loose weight? 15
  • 16. Two representations Text Kids begin grazing independently from their mothers at three months Formal representation Separation (Mother, Child, {Goat, [Age, 3m]}) 16
  • 17. Information extraction • Information extraction produces representations needed for the semantic Web • Also useful for people if formal expressions are transformed into sentences that state the findings of a document as individual "bullets" • Could arrange statements from one or more documents in UDC order as a kind of summary • Information extraction needs rich KOS 17
  • 18. Representation of text structure 18
  • 19. Meaningful arrangement of terms in document representations 19 • Terms assigned in social tagging • Terms assigned from controlled vocabulary, e.g., AAT
  • 20. The Martyrdom of Saint Bartholomew 20
  • 21. Tags arranged alphabetically • 1634 • 17th century • bearded • biblical • Christ’s sacrifice and crucifixion {Christ metaphor} • confronts • executioner • expressive hands • flayed alive • gestures • Intensity • Jusepe de Ribera • luminous • lurking • martyrdom • mystical experience • nude body • old man • physical anguish • profound emotion {emotion} • Pulls the viewer into the scene • religious • Saint Bartholomew • torture 21
  • 22. Tags arranged by how they relate to the image 22
  • 23. Matching topic (Direct) • Image theme • martyrdom • mystical experience • biblical • religious • Image content: Focal • Reference • nude body • old man • Saint Bartholomew • executioner • knife • Elaboration (Adj.) • Bearded • physical anguish • profound emotion} • luminous • Elaboration (Adv.) • expressive hands • gestures • confronts • flayed alive • torture • Image content: Peripheral • Elaboration (Adv.) • lurking 23
  • 24. Comparison • By similarity: Metaphor / analogy • Christ’s sacrifice and crucifixion {Christ metaphor} Cause / Effect • Reaction or feeling • Intensity • Effect / Outcome • Pulls the viewer into the scene Context • Biographic info: Artist • Jusepe de Ribera • Biographic info: Time / period • 1634 • 17th century 24 Comparison, cause/effect, context
  • 25. Tags arranged by how they relate to the image with descriptors from the Art and Architecture Thesaurus 25
  • 26. • Image theme AAT • martyrdom sacrifice • mystical experience mysticism • Biblical biblical stories • Religious religion and religious concepts • Image content: Focal • Reference • nude body nudes (representations) • old man elderly • Saint Bartholomew saints • Executioner executioners • Knife knives • Elaboration (Adj.) • Bearded • physical anguish pain (sensation) • profound emotion {emotional} • Luminous shine Matching topic (Direct) 26
  • 27. Matching topic (Direct) • Image content: Focal • Elaboration (Adv.) • expressive hands hands • Gestures gesture gesture drawings • confronts • flayed alive • torture torturing • Image content: Peripheral • Elaboration (Adv.) • lurking 27
  • 28. Comparison • By similarity: Metaphor / analogy • Christ’s sacrifice and crucifixion {Christ metaphor} Cause / Effect • Reaction or feeling • Intensity • Effect / Outcome • Pulls the viewer into the scene Context • Biographic info: Artist • Jusepe de Ribera • Biographic info: Time / period • 1634 • 17th century 28 Comparison, cause/effect, context No AAT terms
  • 29. Support comprehension through links to KOS • Map text term to concept in KOS, show definition, show place in hierarchical structure 29
  • 30. Example mysticism Note: Refers in a general sense to a spiritual quest for hidden truth, the goal of which is to be united with the divine. It also refers more specifically to a belief in the existence of important realities beyond perceptual or intellectual understanding that are accessible by subjective experience, such as by intuition or meditation. Forms of mysticism are found in all major religions as well as in secular experience. 30
  • 31. Example, continued Associated Concepts Facet . Associated Concepts . . <philosophical concepts> . . . <philosophical movements and attitudes> . . . . aestheticism (philosphical movements and attitudes) . . . . existentialism . . . . holism . . . . idealism (philosophical movement) . . . . individualism . . . . mysticism . . . . . Hasidism . . . . spiritualism . . . . utlitarianism 31
  • 32. Comprehension "in the large" • Learning and sense making require comprehension across multiple sources • Requires structure – can be supplied by KOS • Requires tools for the manipulation of external structures the learner / sensemaker builds, such as concept maps 32
  • 34. Representations need rules • Formal representations need logical formalisms, such as full first-order logic or subsets (for ease of processing) or extensions (to be more expressive) • Text needs rules of syntax and broader document structure • Graphical representations need rules of design 34
  • 35. Representations need names for entities • Names for (abstract) concepts – classification • Names for many different types of other entities, such as persons, places, buildings, events, currencies, … (named entities) • Systems of such names – Knowledge Organization Systems, authority lists of personal names • Mappings between such systems 35
  • 36. Representations need relationships • Relationships are used to connect entities, thus forming statements obesity <causes> high blood pressure • Need system of relationships Many such systems exist (a type of KOS) Problem of mapping 36
  • 37. Rhetorical relationships • To map text structure • To discern how a retrieved document, paragraph, statement, or image relates to the topic of a search 37
  • 38. Function-based Reasoning-based 38 Argument structure Grounds Warrants Claim Generic inference Comparison-based Induction / rule-based Causal-based Transitivity-based Topical relevance typology Rhetorical structure Matching topic Evidence (Indirect) Context Comparison Evaluation Method / Solution Purpose/ Goal Semantic-based (Green & Bean, 1995) Taxonomy Partonomy Frame-based, etc.
  • 39. Matching topic (Direct) . Manifestation . Image content . Image theme Evidence (Indirect) Context . Scope . Framework . Environmental setting . Social background . Time & sequence . Assumption / expectation . Biographic information Condition . Helping or hindering factor . Unconditional . Exceptional condition Purpose / Motivation Cause / Effect . Cause . Effect / Outcome . Explanation (causal) . Prediction Comparison . By similarity (analogy) / By difference (contrast) . By factor that is different Method / Solution . Method / Approach . Instrument . Technique / Style Evaluation . Significance . Limitation . Criterion / Standard . Comparative evaluation 39 RST+ Functional Role
  • 40. Functional role: Comparison Comparison . By similarity vs. By difference (Contrast) . . By similarity . . . Analogy & metaphor . . By difference (Contrast) . By factor that is different . . Different external factor . . . Different time . . . Different place . . Different participant . . . Different actor . . . Different subject acted upon . . Different act or experience . . . Different act . . . Different experience 40
  • 41. Support for finding: Indexing • Finding based on text: Knowledge-based expansion of query Front-end as discussed earlier • Finding based on indexing: Semantically enriched documents 41
  • 42. A semantically enriched document Reis et al. (2008) Impact of Environment and Social Gradient on Leptospira infection in Urban Slums (doi:10.1371/journal.pntd.0000228). Infectious disease studied: Leptospirosis Pathogen (causative agent of disease): Leptospira spirochete Vector of disease pathogen: Rat (Rattus norvegicus) Pathogen host subjected to study: Human (Homo sapiens) Number of subject individuals in study: 3,171 . . . Purpose of study: Quantify risk factors for leptospirosis . . . Principal finding 1: Prevalence of Leptospira antibodies . . . Principal finding 2: Disease risk . . .open sewers . . . 42 (http://dx.doi.org/10.1371/journal.pntd.0000228.x002)
  • 43. A semantically enriched document 43 Tag Trees of Individual Semantic Classes of Highlighted Terms disease infectious diseases diarrheal disease childhood diarrhea dengue leptospirosis human leptospirosis meningococcal disease pulmonary hemorrhage syndrome visceral leishmaniasis Weil's disease occupational disease zoonotic disease ID = Infectious Disease Ontology GO = Gene Ontology term used in ID ID:0000012 immunity ID:0000017 mortality ID:0000023 zoonotic ID:0000025 pathogenicity ID:0000034 endemic ID:0000038 parasite ID:0000056 host ID:0000057 carrier ID:0000063 vector ID:0000064 pathogen ID:0000066 infectious agent ID:0000069 primary pathogen ID:0000104 infection
  • 44. 44 ID = Infectious Disease Ontology GO = Gene Ontology IDO:0000000 ! process IDO:0000083 transmission IDO:0000231 horizontal transmission (GO:0000031) IDO:0000104 infection IDO:0000084 pathogenesis IDO:0000221 ! infectious disease progression IDO:0000100 ! pathogen evasion of host immune response IDO:0000111 antigenic variation IDO:0000115 genetic diversification IDO:0000226 pathogen life cycle (GO:0000026) IDO:0000001 ! role IDO:0000036 ! colonizer IDO:0000038 parasite IDO:0000048 symptom IDO:0000056 host IDO:0000057 carrier IDO:0000059 reservoir IDO:0000063 vector IDO:0000064 pathogen IDO:0000066 infectious agent IDO:0000069 primary pathogen IDO:0000200 mode of transmission (GO:0000000) IDO:0000002 ! quality IDO:0000215 ! quality of host population IDO:0000098 infectious disease IDO:0000210 ! quality of host IDO:0000012 immunity
  • 45. Semantically enriched documents • Semantic enrichment supports semantic retrieval • Broad area of its own • Many different forms • Explicit document structure • Concept and named entity tagging and identification • Assigning additional concepts or named entities • Assigning extracted propositions • Closely linked with information extraction • IE produces elements of semantic enrichment 45
  • 46. Need KOS Needed for all this • Large Knowledge Organization Systems • Large knowledge bases with mappings • Methods and procedures for developing KOS 46
  • 47. How to get all this work done? The forces that created the problem also support the solution • Use automation • Automated information extraction gets better every day and also provides input to building KOS • Automated classification could be used for the UDC Wikipedia project • Use Web-enabled collaborative work ("crowdsourcing") • Use computer systems to assist people • Use Web-based systems to collect and integrate results • Bootstrap: The more knowledge is in formal systems, the more information extraction and structuring tasks can be automated 47
  • 48. Example: Guided tagging • Use facet structure to get taggers think a bit more out of the box For example, could ask What does this image remind you of • Could assign some terms automatically, for example, extracting terms from text assigned to an image 48
  • 51. Semantic analysis as the basis for everything
  • 52. 52 Hub Water transport Inland water transport Ocean transport Traffic station Water transport⊓ Traffic station Inland water tr.⊓ Traffic station Ocean transport⊓ Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports LCSH Shipping Inland water transport Merchant marine Harbors German Hafen Mapping through a Hub
  • 53. Outline • Objective: Interoperability Plus • KOS concept hub • Method: Knowledge-based, computer-assisted creation of canonical representations of concepts • Resulting knowledge base and applications 53
  • 54. Objective Improve semantic-based search across multiple collections in multiple languages. • Interoperability between any two participating KOS (Knowledge Organization Systems) • Support for search, esp. facet-based search • for any collection indexed by a participating KOS • for search based on free-text or free-form social tagging • Assistance in cataloging (metadata creation) by catalogers or users (social tagging) • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned 54
  • 55. KOS Concept Hub • Interoperability is achieved by expressing concepts from all participating KOS as a canonical representation, such as a description logic formula using atomic concepts and relationships • The backbone of the proposed system is a faceted core classification of atomic concepts together with a set of relationships • Mapping from KOS to KOS is achieved by reasoning over these canonical representations 55
  • 56. 56 Hub Water transport Inland water transport Ocean transport Traffic station Water transport⊓ Traffic station Inland water tr.⊓ Traffic station Ocean transport⊓ Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports LCSH Shipping Inland water transport Merchant marine Harbors German Hafen Mapping through a Hub
  • 57. 57 Hub Traffic station Vehicle parking Terminal facilities Water transport Inland water transport Ocean transport Traffic station Water transport⊓ By type of water transport Traffic station Inland water tr.⊓ Traffic station Ocean transport⊓ By component of traffic station Vehicle parking Water transport⊓ Terminal facilities Water transport⊓ Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports LCSH/AAT Shipping water transport Inland water transport Merchant marine Harbors ports harbors Mapping through a Hub
  • 58. Method: How to get DL formulas Key: Efficient creation of canonical representations (DL formulas) • Apply existing knowledge: Large knowledge base ▬► less effort for processing a new KOS • Use knowledge of KOS structure for hierarchical inheritance • Use linguistic analysis of terms and captions • Eliminate redundant atomic concepts • Check or produce mapping results from assignment of concepts to the same records • Get human editors’ input and verification where needed through a user-friendly interface • KOS “owners” may verify and edit data pertaining to their KOS 58
  • 59. Knowledge base Requires an ever larger classification and lexical knowledge base containing many kinds of data: 1. A faceted classification of atomic concepts Seeded from sources with well-developed facets such as UDC the Alcohol and Other Drug (AOD) Thesaurus the Harvard Business Thesaurus the Art and Architecture Thesaurus various systems called ontologies 59
  • 60. Knowledge base 2 Requires an ever larger classification and lexical knowledge base containing many kinds of data: 2. Linguistic knowledge bases such as WordNet and mono-,bi-, and multi-lingual dictionaries and thesauri 3. Many KOS (Knowledge Organization Systems), such as LCC, UDC, DDC, DMOZ directory, LCSH, Gene Ontology, Schlagwortnormdatei 4. These will over time be fused into one large multilingual knowledge base with many terminological and translation relationships and relationships linking terms to concepts, with an increasing number of concepts semantically represented by a DL formula. 60
  • 61. Examples of deriving DL formulas 61
  • 62. L00 Transportation and traffic L10 Traffic system components L13 Traffic facilities L15Traffic stations L17 Vehicles L30 Modes of transportation L33 Air transport L37 Water transport P00 Buildings, construction P23 Buildings P27 Architecture P43 Construction R00 Engineering R30 Acoustics R37 Soundproofing T70 Military vs. civilian T73 Military T77 Civilian 62 Underlying faceted classification
  • 63. HE Transportation HE550-560 Ports, harbors, docks, wharves, etc. L00 Transportation and traffic T77 Civilian⊓ Inherited: L00 Transportation and traffic T77 Civilian⊓ Added by editor: L15 Traffic stations L37 Water transport⊓ Resolved to: L15 Traffic stations L37 Water transport⊓ ⊓ T77 Civilian 63 Method: Assigning atomic concepts 1
  • 64. NA6300-6307 Airport buildings From database already established: Airport = L15 Traffic stations L33 Air transport⊓ Buildings = P23 Buildings Added by editor T77 Civilian Resolved to L15 Traffic stations L33 Air transport⊓ ⊓ P23 Buildings T77 Civilian⊓ 64 Method: Assigning atomic concepts 2
  • 65. TL681.S6 Airplanes. Soundproofing From database already established: Airplane = L17 Vehicles L33 Air transport⊓ Soundproofing = R37 Soundproofing Added by editor: Nothing Resolved to L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing 65 Method: Assigning atomic concepts 3
  • 66. Aeroplanes-Soundproofing From database already established: Aeroplanes = Airplane [Spelling variant] Therefore Term is recognized as same as Airplanes. Soundproofing Resolved to L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing 66 Method: Assigning atomic concepts 4
  • 67. Any class formed by geographical subdivision Such as NA6300-6307 Airport buildings NA6305.E3 Egypt Recognized using a dictionary of geographical names Inherits from subject class above it; simply add the country L15 Traffic stations L33 Air transport⊓ P23 Buildings T77 Civilian⊓ ⊓ Egypt⊓ No editor checking needed 67 Method: Assigning atomic concepts 5
  • 68. Examples from the resulting knowledge base 68
  • 69. HE550-560 Ports, harbors, docks, wharves, etc. NA2800 Architectural acoustics NA6300-6307 Airport buildings NA6330 Dock buildings, ferry houses, etc. TC350-374 Harbor works TH1725 Soundproof construction TL681.S6 Airplanes. Soundproofing TL725-726 Airways (Routes). Airports and landing fields. Aerodromes VA67-79 Naval ports, bases, reservations, docks VM367.S6 Submarines. Soundproofing = L15 Traffic stations L37 Water transport⊓ T77 Civilian⊓ = P27 Architecture R30 Acoustics⊓ = L15 Traffic stations L33 Air transport⊓ ⊓ P23 Buildings T77 Civilian⊓ = L15 Traffic stations L37 Water transport⊓ P23 Buildings T77 Civilian⊓ ⊓ = L15 Traffic stations L37 Water transport⊓ R00 Engineering T77 Civilian⊓ ⊓ = P23 Buildings P43 Construction⊓ ⊓ R37 Soundproofing = L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing = L13 Traffic facilities L33 Air transport⊓ ⊓ Technical aspects = L15 Traffic stations L37 Water transport⊓ T73 Military⊓ = L17 Vehicles L37 Water transport⊓ ⊓ R37 Soundproofing T73 Military⊓ ⊓ Underwater 69
  • 70. Aeroplanes-Soundproofing Airports-Buildings Buildings-Soundproofing Ships-Soundproofing = L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing = P23 Buildings L15 Traffic stations⊓ ⊓ L33 Air transport = P23 Buildings P43 Construction⊓ ⊓ R37 Soundproofing = L17 Vehicles L37 Water transport R37⊓ ⊓ Soundproofing 70 LC subject headings with combinations of atomic concepts
  • 71. 71 Hub L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing T73⊓ Military⊓ Underwater LCC TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing LCSH Aeroplanes- Soundproofing Ships-Soundproofing Mapping through a Hub
  • 72. 72 Hub Canonical form of query (DL formula) User query Free text Combination of elemental concepts through facets (guided query formulation) Controlled term(s) from a KOS, possibly found through browsing a KOS Final query (Enriched) free text query Query in terms of a KOS Mapping user queries
  • 73. TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing Aeroplanes-Soundproofing Ships-Soundproofing [L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing] [L17 Vehicles L37 Water transport⊓ ⊓ R37 Soundproofing Military]⊓ [L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing] [L17 Vehicles L37 Water transport⊓ ⊓ R37 Soundproofing] 73 Query: L17 Vehicles AND R37 Soundproofing
  • 74. Examples from NALT and LCSH • NALT National Agricultural Library Thesaurus • LCSH Library of Congress Subject Headings 74
  • 75. Air pollution laws LCSH term Air – Pollution – Laws and regulations [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable} NALT terms Air pollution [isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable Laws and regulations [isa] Legal rule Mapping LCSH ▬► NALT Air – Pollution – Laws and regulations ▬► Air pollution AND Laws and regulations Interpretation for indexing and searching in both directions 75
  • 76. Soil moisture vs. Soil water LCSH term Soil moisture [isa] Water [containedIn] Soil NALT term Soil water [isa] Water [containedIn] Soil Mapping LCSH ▬► NALT Soil moisture ▬► Soil water 76
  • 77. Greenhouse gardening LCSH term Greenhouse gardening [isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home NALT terms Home gardening [isa] Gardening [inEnvironment] Home Greenhouse [isa] Greenhouse Mapping LCSH ▬► NALT Greenhouse gardening ▬► Home gardening AND Greenhouse 77
  • 78. Salad greens LCSH term Salad greens [isa] Green leafy vegetable [usedFor] Salad NALT term Green leafy vegetables [isa] Green leafy vegetable Mapping LCSH ▬► NALT Salad greens ▬► BT Green leafy vegetables 78
  • 79. Emerging diseases LCSH term Emerging infectious diseases [isa] Disease [hasProperty] Infectious [hasProperty] Emerging NALT term Emerging diseases [isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging Mapping LCSH ▬► NALT Emerging infectious diseases ▬► Emerging diseases Emerging infectious diseases ▬► BT Emerging diseases 79
  • 80. Distributed implementation • A KOS on the Web could assign DL formulas to its concepts − let's call this a semantically enhanced KOS or SEKOS • Could use any of a number of faceted core classifications or even several (using a unique URI for each elemental concept) • Core classifications could be mapped to each other • It is now a simple matter to map from any SEKOS to any other (somewhat dependent on the core classifications used) 80
  • 81. Examples from the realm of AAT Taiwan AAT Art and Architecture Thesaurus (Getty) AAT Taiwan TELDAP, Institute for Information Science Academia Sinica TGM Thesaurus of Graphic Materials, Library of Congress E-HowNet A Lexical Knowledge Base for Semantic Composition, Academia Sinica 81
  • 82. 82 Hub Facility Worship⊓ Facility Worship Judaism⊓ ⊓ Facility Worship Christianity⊓ ⊓ Facility Worship Islam⊓ ⊓ Facility Worship Buddhism⊓ ⊓ Facility Worship Taoism⊓ ⊓ TGM temples synagogues churches mosques Buddhist temples Taoist temples AAT temples (buildings) synagogues (buildings) churches (buildings) mosques (buildings) Mapping through a Hub
  • 83. Mapping to Chinese • Use E-HowNet formal semantic expressions 83
  • 84. E-HowNet ontology 廣義知識知識本體 • Building | 建築物 Facilities |設施 Chinese Word: 廟 English: Temple Conceptual expression: {facilities |設施 : domain = {religion |宗教 }} Chinese Word: 禪寺 English: Buddhist temple Conceptual expression: {facilities |設施 : domain = {Buddhist |佛教 }} Chinese Word: 道觀 English: Taoist temple/ Taoist quan Conceptual expression: {facilities |設施 : domain = {Taoism |道教 }} 84
  • 85. Mapping to Chinese • Use E-HowNet formal semantic expressions • Use terms that already exist in E-HowNet • Add terms using computer-assisted derivation of semantic expressions as described above for English 85
  • 86. Cross-language mapping problems Example AAT stitching maps to two Chinese terms: 縫合 (feng he) for needleworking and 縫訂 (feng ding) for bookbinding 86
  • 87. Analysis Since English has only one word stitching, AAT does not distinguishbetween the two specific concepts even though the AAT scope note describes the two concepts Solution AAT AAT Taiwan stitching 縫 (feng) stitching (needlework) 縫合 (feng he) stitching (bookbinding) 縫訂 (feng ding) 87
  • 88. Principle The classification should include all concepts that are lexicalized in any language participating in a cross-language mapping system If a language does not have a term for a concept, a term must be invented. This also happens when a concept is found through conceptual analysis 88
  • 89. Shades of meaning Example The AAT defines temple as Buildings housing places devoted to the worship of a deity or deities But in Chinese culture, a temple (Miao( 廟 ) is devoted to worshiping or honoring or communing with ancestors or spirits. There are a number of further terms in Chinese for buildings devoted to worshiping/ commemorating saints, or some famous scholars, poets, or people with great achievement. 89
  • 90. Shades of meaning Thus in the concept structure we need Temple (broad definition) Building housing places devoted to the worshiping, communing with, or honoring or commemorating a deity or deities or ancestors or spirits or saints, or some famous scholars, poets, people with great achievement. Temple (narrow AAT defintion) Miao( 廟 ) Other Chinese terms 90
  • 91. The importance of good defintions AAT Taiwan must make sure that all readers, English and Chinese, understand all terms, English and Chinese, and the often subtle differences. The table on the next slide illustrates that 91
  • 92. Uses of AAT Taiwan 92 Searching Western art Searching Chinese art Western user Understands English terms Needs to understand Chinese terms Chinese user Needs to understand English terms Understands Chinese terms All users need a good conceptual structure
  • 93. Take-home message Semantics gives powerful systems 93
  • 94. Dagobert Soergel dsoergel @ buffalo.edu www.dsoergel.com 94
  • 95. T 95
  • 96. E-HowNet ontology 廣義知識知識本 體• Building | 建築物 Facilities |設施 Chinese Word: 廟 English: Temple Conceptual expression: {facilities |設施 : domain = {religion |宗教 }} Chinese Word: 禪寺 English: Buddhist temple Conceptual expression: {facilities |設施 : domain = {Buddhist |佛教 }} Chinese Word: 道觀 English: Taoist temple/ Taoist quan Conceptual expression: {facilities |設施 : domain = {Taoism |道教 }}96
  • 97. 9797 Mapping Issues- 1Mapping Issues- 1 Terms related to Chinese religious concept The word “temples” is frequently considered as an equivalent term “ 廟 miao” in Chinese. However, due to different purposes of the building and the spirit that it worships, names of religious buildings in Taiwan are varied. Temples (buildings) (religious buildings, <religious structures>, ... Built Environment (Hierarchy Name)) Note: Buildings housing places devoted to the worship of a deity or deities. In the strictest sense, it refers to the dwelling place of a deity, and thus often houses a cult image. In modern usage a temple is generally a structure, but it was originally derived from the Latin "templum" and historically has referred to an uncovered place affording a view of the surrounding region. For Christian or Islamic religious buildings the terms "churches" or "mosques" are generally used, but an exception is that "temples" is used for Protestant, as opposed to Roman Catholic, places of worship in France and some French-speaking regions. Q1. The mapping team has found that “temple” in AAT is broader than the concept in Chinese. Therefore it is necessary to distinguish the differences in each Chinese terms before mapping.
  • 98. 9898 Mapping Issues-Mapping Issues- 1 Terms related to Chinese religious concept Despite the similar appearance, each of them has slight difference from the others. Miao( 廟 ): In the past, it was a place to worship ancestors. Since Han dynasty, it had been used as a place both worship ancestor and the spirits. •ci ( 祠 ): It is built for the purpose to worship/ commemorate saints, or some famous scholars, poets, people with great achievement. Sometimes also refers to those places that worship ancestors. • si ( 寺 ): Generally refers to a place that worship the Buddhist spirits. Sometimes it also refers to the place where Buddhist monk live. • an ( 庵 ): used to refers to scholars’ study place ( 書齋 ). Nowadays it refers to where Buddhist nuns live. • guan( 觀 ): only refers to Taoist building • yan( 巖 ): refers to those miaos( 廟 ) established nearby or at mountain.
  • 99. 9999 Mapping Issues- 2Mapping Issues- 2 A Chinese set term stands for broader meaning 文玩 (Wenwan) •   A word combined with two words “ 文物 cultural object” and “ 古玩 antique curio”.     ( 文玩兼有文物與古玩的特點 ) •   It specifically refers to those objects used in the educated people’s reading room, including those writing equipments, small tools and decorations. ( 特指文人書齋中的書寫設備、小工具和擺飾 ) •   It represents the culture of reading room, by combining the practical function of educated people’s study equipments and art crafts for people’s appreciation. ( 文玩是種書齋文化,結合了文人書生的實用器物與具觀賞價值的藝術品 ) •   Common objects including: ink stones, seals, washing vessels, fine sculptured decoration…etc. “Elegant” and “exquisite” are its essential characters. ( 文玩為以下器物的泛稱 : 古硯、印章、洗器、牙雕…等,“雅” 與“巧”是其基本特徵 ) •   It is produced in a highly artistic manner. Nowadays it has become popular collection that values more as an artifact than equipment. ( 以高藝術性的方式製造,現今多為賞而勿用的文房珍玩 )
  • 100. 100100 Mapping Issues- 2Mapping Issues- 2 A Chinese set term stands for broader meaning • lotus pod shaped vessel for injecting water 雙蓮房水注 • banana leaf shaped wooden plate 癭木蕉葉盤 • olive stone boat sculpture 果核小舟 • blue snuff bottle 藍地金星套料鼻煙 壺 • lotus leaf shaped washing vessel 白玉荷葉式洗 • seal 鴛錦雲章循 連環田黃石印 • ivory desk tidy 象牙 雕山水人物筆筒
  • 101. 101101 Mapping Issues- 2Mapping Issues- 2 A Chinese set term stands for broader meaning Q2. The mapping team has found the meaning of Wenwan is boarder than the term “desk sets”, while some part of them are equal. Therefore, the 2 terms are inexact equivalent relations. Is it more suitable to create a new term “Wenwan” in the structure, or it should be referred as desk sets? desk sets (sets (groups), <object groupings by general context>, ... Object Groupings and Systems) Note: Sets of matching articles intended to be used on a desk including such articles as inkstands, pen trays, and stamp boxes.
  • 102. When English terms have broader meanings (1/2) EX1: • ID: 300053660 Record Type: concept stitching (<processes and techniques by specific type>, <processes and techniques>, Processes and Techniques) Note: Refers to the process of fastening, joining, closing, uniting, mending, or creating ornamentation by stitches, which are the portions of thread left in fabric or another material by the in and out movement of a threaded needle through the thickness or surface of the material, or the loops of thread created on a needle in knitting or other needlework. In the context of textiles and needleworking, its meaning overlaps with "sewing." In the context of bookbinding, it refers to the fastening together a number of leaves or gatherings by passing the thread or wire through all of the sheets at once; it is distinct from "sewing," which, in the context of bookbinding, is used for the joining of leaves or gatherings together one by one by drawing thread or wire backwards and forwards through the back fold of each sheet to attach it to the cords. 縫綴 / 縫訂 (< 依特定種類區分之過程與技術 >, < 過程與技術 >, 過程與技術 ) 範圍註:意指藉由針線進出穿過材料或其表面的動作,將針腳留在布料或其他材料上,或是在編織或針織時形成針目, 以固定、結合、閉合、合併、修補或製作裝飾的過程。若指涉的是紡織品與手工繡品方面,則其意義與「縫紉 (sewing) 」一詞重疊。若指涉的是書籍裝幀方面,則意指將若干頁面或疊層,用線或金屬線一次穿過所有紙張固定在一 起。而「線訂( sewing )」在書籍裝幀方面,是指用針線或金屬線,在一疊書頁的摺縫處上下穿梭,使其與裝訂線固定 的方法。  In different contexts (bookbinding vs. needleworking), the meaning of stitching may change accordingly. In AAT, two kinds of meanings are explained in the same record, but when translating the term into Chinese, there will be two ways of translation, 縫合 (feng he) for needleworking and 縫訂 (feng ding) for bookbinding. The same problem occurs in the record of sewing (ID: 300053658). Stiching in needleworkingStiching in bookbinding
  • 103. When English terms have broader meanings (2/2) EX2: 300004184 Record Type: concept patios (<uncovered spaces>, <rooms and spaces by form>, ... Components (Hierarchy Name)) Note: Paved recreation areas adjoining contemporary houses and the paved interior courts of Spanish or Spanish-style buildings. The term refers to two types of open spaces, so the translations could be 屋外休憩區 or ( 西班牙 ) 內 院 . Spanish patioPatio adjoining a house
  • 104. When English terms have broader meanings (2/2) EX3: • 300266238 Record Type: concept maculatures (<prints by process or technique>, prints (visual works), ... Visual and Verbal Communication) Note: Prints made by taking a second impression without reinking the plate, often used for cleaning the plate. May also refer to blotting paper. Also used for scrap paper that can reinforce fabric in Medieval embroidery.  The term maculatures could be used in three different contexts (prints, blotting paper, and scrap paper) , and there are three kinds of translations ( 吸墨紙版畫、吸墨紙、固定刺繡布 料的紙片 ). Q3: In this case, since the record contains multiple meanings, it’s not a problem of which one being the preferred term, so how should the Chinese translations be displayed?

Editor's Notes

  1. For the medical data, one important finding is the multi-level topical structure, by that it means the typology can be applied to multiple levels. Here is an example. The answer to the question of “What is the most effective treatment for ADHD in children?” is analyzed; the Figure shows the top levels of the topical map from the coding. The central topic is ADHD in Children, we have inattention and hyperactivity or comorbid conditions as symptoms directly matching the topic; we have stimulant medication therapy as medical treatment for the relevance category of method and solution. Further on, the answer provides information about the significance of the therapy, medical trial of the therapy, the side effect of the therapy, and other treatment methods in comparison to the therapy. We have poor patient and parent education as hindering factor to the medical treatment. The general idea here is that some information presented in the answer relates to the central topic only through “steps” of connection. For example, the poor patient and parent education is not the hindering factor to the disease, but a hindering condition of the medical treatment to the disease. This coupling structure is very interesting and important, it indicates that the same set of topical relevance relationships can be applied on many levels; although the level can vary, the relationship types remain stable on each level. the presented information relates to the central topic only through “steps” of connection. For example, “A large random trial” does not directly connect to the central topic of “ADHD in children”: It is not the “Evaluation” of “ADHD in children”; instead, it is the “Evaluation” of “Stimulant medication therapy”, which in turn is the “Medical treatment” of “ADHD in children”.
  2. The typology also applies to image tagging
  3. The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  4. The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  5. The tag, Christ’s sacrifice and crucifixion is analogy to the image; biographic information of artist and creation time period as context.
  6. The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  7. The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  8. The tag, Christ’s sacrifice and crucifixion is analogy to the image; biographic information of artist and creation time period as context.
  9. The study focuses on the rhetorical functional roles. It includes evidence, context, comparison, evaluation, method, etc. Let’s zoom in on this facet. Since the typology has many levels and it’s easy to get lost. This is just an overview, I will discuss some of the categories in greater detail as we move along. I will present the findings from literature analysis and also from MALACH data analysis together, in this way, you get to see how literature analysis contributes to developing the typology, you also get to see how the typology is exemplified by the MALACH data. The typology has two facets, 1, functional role and 2, mode of reasoning. These two facets are diagonal but equally important to characterize types of topical relevance. Functional role is concerned about …I’d like to focus the talk on this perspective
  10. This slide gives you a second-level detail of the function-based facet. RST stands for Rhetorical structural theory, it provides a comprehensive framework for investigating relationships based on functional role. It was developed by Mann &amp; Thompson in 1980s for the purpose of guiding natural language generation. Later, it is widely applied in discourse analyses in various domains. [RST looks at the relationships hold between text parts in a coherent discourse by identifying the functional role of each text part in the discourse.] [You may ask how discourse relations relate to topical relevance? Well, in most cases, a coherent discourse is organized around a topic, different text parts play different roles and work together to contribute to the reader’s understanding of the topic. In information search, this is not much different. we also have a topic, and we gather and organize different pieces of relevant information to improve the user’s understanding on the topic. In terms of contributing to the receiver&amp;apos;s understanding of a topic, the functional roles played by different parts of text and those by different pieces of relevant information are much the same.] From this framework, close relations can be drawn to the MALACH relevance types, direct, indirect, context, comparison, also it supplements other element, such as method, solution, evaluation and so on. Now let’s zoom into Direct relevance.
  11. Comparison is based on perceived similarities, [but what really make it interesting are the pieces that are different.] Under comparison, we have two sub-facets, first, comparison by similarity or comparison by difference, Both similar and contrasting cases are considered relevant. Second, by factor that is different. These two sub-facets are coupled, for example, by varying the external factors or participants, we often get similar cases happening in a different place, at a different time, or with a different person; by fixing these factors and varying the act itself, we often get contrasting cases happening in the same time-space or with the same participant. Varying values of the first two topical facets, we get the same or comparable event/ experience/ phenomenon happening in a different place, at a different time, in a different situation, or with a different person; varying values of the last facet, we get an opposite event/ experience/ phenomenon happening in the same time-space or involving the same participant(s). The three major topical facets define three specific types of comparative evidence: This is the detailed typology for comparison relevance By similarity and contrast By factor that is different
  12. Guided tagging
  13. Guided Tagging
  14. E-HowNet ontology- http://ckip.iis.sinica.edu.tw/taxonomy/?lang=eng E-HowNet technical report- http://rocling.iis.sinica.edu.tw/CKIP/paper/Technical_Reprt_E-HowNet.pdf
  15. E-HowNet ontology- http://ckip.iis.sinica.edu.tw/taxonomy/?lang=eng E-HowNet technical report- http://rocling.iis.sinica.edu.tw/CKIP/paper/Technical_Reprt_E-HowNet.pdf
  16. 台灣民間信仰: http://61.60.100.220/%E5%8F%B0%E7%81%A3%E6%B0%91%E9%96%93%E4%BF%A1%E4%BB%B0/unit05-1.htm (交通部觀光局)
  17. 台灣民間信仰: http://61.60.100.220/%E5%8F%B0%E7%81%A3%E6%B0%91%E9%96%93%E4%BF%A1%E4%BB%B0/unit05-1.htm (交通部觀光局)
  18. (序) 文玩賞讀- 韓天衡 韓回之 (上海人民出版社) http://www.books.com.tw/exep/prod/china/chinafile.php?item=CN10086239
  19. 南宋至元 雙蓮房水注- http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=836791 明 癭木蕉葉盤 - http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1094790 清 陳祖章 果核小舟- http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1094815 白玉荷葉式洗 - http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1122025 清 鴛錦雲章循連環田黃石印 - http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1885289 清 象牙雕山水人物筆筒- http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=3345839
  20. Desk sets- AAT Taiwan