Illuminating Chaos Using Semantics to Harness the Web

441 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
441
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • For the medical data, one important finding is the multi-level topical structure, by that it means the typology can be applied to multiple levels. Here is an example. The answer to the question of “What is the most effective treatment for ADHD in children?” is analyzed; the Figure shows the top levels of the topical map from the coding.
    The central topic is ADHD in Children, we have inattention and hyperactivity or comorbid conditions as symptoms directly matching the topic; we have stimulant medication therapy as medical treatment for the relevance category of method and solution. Further on, the answer provides information about the significance of the therapy, medical trial of the therapy, the side effect of the therapy, and other treatment methods in comparison to the therapy. We have poor patient and parent education as hindering factor to the medical treatment.
    The general idea here is that some information presented in the answer relates to the central topic only through “steps” of connection. For example, the poor patient and parent education is not the hindering factor to the disease, but a hindering condition of the medical treatment to the disease.
    This coupling structure is very interesting and important, it indicates that the same set of topical relevance relationships can be applied on many levels; although the level can vary, the relationship types remain stable on each level.
    the presented information relates to the central topic only through “steps” of connection. For example, “A large random trial” does not directly connect to the central topic of “ADHD in children”: It is not the “Evaluation” of “ADHD in children”; instead, it is the “Evaluation” of “Stimulant medication therapy”, which in turn is the “Medical treatment” of “ADHD in children”.
  • The typology also applies to image tagging
  • The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  • The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  • The tag, Christ’s sacrifice and crucifixion is analogy to the image; biographic information of artist and creation time period as context.
  • The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  • The tags are organized first by the functional relevance categories, then by the presentation type. This slide shows all the tags directly matching topic of the image; such as saint bartholomew, executioner, knife referring to the focal image content; physical anguish, profound emotion as adjectival elaboration; expressive hands, gestures, confronts as adverbial elaboration; martyrdom is the theme of image;
  • The tag, Christ’s sacrifice and crucifixion is analogy to the image; biographic information of artist and creation time period as context.
  • The study focuses on the rhetorical functional roles. It includes evidence, context, comparison, evaluation, method, etc. Let’s zoom in on this facet.
    Since the typology has many levels and it’s easy to get lost. This is just an overview, I will discuss some of the categories in greater detail as we move along.
    I will present the findings from literature analysis and also from MALACH data analysis together, in this way, you get to see how literature analysis contributes to developing the typology, you also get to see how the typology is exemplified by the MALACH data.
    The typology has two facets, 1, functional role and 2, mode of reasoning. These two facets are diagonal but equally important to characterize types of topical relevance.
    Functional role is concerned about …I’d like to focus the talk on this perspective
  • This slide gives you a second-level detail of the function-based facet.
    RST stands for Rhetorical structural theory, it provides a comprehensive framework for investigating relationships based on functional role. It was developed by Mann & Thompson in 1980s for the purpose of guiding natural language generation. Later, it is widely applied in discourse analyses in various domains.
    [RST looks at the relationships hold between text parts in a coherent discourse by identifying the functional role of each text part in the discourse.]
    [You may ask how discourse relations relate to topical relevance? Well, in most cases, a coherent discourse is organized around a topic, different text parts play different roles and work together to contribute to the reader’s understanding of the topic. In information search, this is not much different. we also have a topic, and we gather and organize different pieces of relevant information to improve the user’s understanding on the topic. In terms of contributing to the receiver's understanding of a topic, the functional roles played by different parts of text and those by different pieces of relevant information are much the same.]
    From this framework, close relations can be drawn to the MALACH relevance types, direct, indirect, context, comparison, also it supplements other element, such as method, solution, evaluation and so on.
    Now let’s zoom into Direct relevance.
  • Comparison is based on perceived similarities, [but what really make it interesting are the pieces that are different.]
    Under comparison, we have two sub-facets, first, comparison by similarity or comparison by difference, Both similar and contrasting cases are considered relevant. Second, by factor that is different. These two sub-facets are coupled, for example, by varying the external factors or participants, we often get similar cases happening in a different place, at a different time, or with a different person; by fixing these factors and varying the act itself, we often get contrasting cases happening in the same time-space or with the same participant.
    Varying values of the first two topical facets, we get the same or comparable event/ experience/ phenomenon happening in a different place, at a different time, in a different situation, or with a different person; varying values of the last facet, we get an opposite event/ experience/ phenomenon happening in the same time-space or involving the same participant(s). The three major topical facets define three specific types of comparative evidence:
    This is the detailed typology for comparison relevance
    By similarity and contrast
    By factor that is different
  • Guided tagging
  • Guided Tagging
  • E-HowNet ontology- http://ckip.iis.sinica.edu.tw/taxonomy/?lang=eng
    E-HowNet technical report- http://rocling.iis.sinica.edu.tw/CKIP/paper/Technical_Reprt_E-HowNet.pdf
  • E-HowNet ontology- http://ckip.iis.sinica.edu.tw/taxonomy/?lang=eng
    E-HowNet technical report- http://rocling.iis.sinica.edu.tw/CKIP/paper/Technical_Reprt_E-HowNet.pdf
  • 台灣民間信仰: http://61.60.100.220/%E5%8F%B0%E7%81%A3%E6%B0%91%E9%96%93%E4%BF%A1%E4%BB%B0/unit05-1.htm (交通部觀光局)
  • 台灣民間信仰: http://61.60.100.220/%E5%8F%B0%E7%81%A3%E6%B0%91%E9%96%93%E4%BF%A1%E4%BB%B0/unit05-1.htm (交通部觀光局)
  • (序) 文玩賞讀- 韓天衡 韓回之 (上海人民出版社)
    http://www.books.com.tw/exep/prod/china/chinafile.php?item=CN10086239
  • 南宋至元 雙蓮房水注- http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=836791
    明 癭木蕉葉盤 - http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1094790
    清 陳祖章 果核小舟- http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1094815
    白玉荷葉式洗 - http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1122025
    清 鴛錦雲章循連環田黃石印 - http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=1885289
    清 象牙雕山水人物筆筒- http://catalog.digitalarchives.tw/dacs5/System/Exhibition/Detail.jsp?OID=3345839
  • Desk sets- AAT Taiwan
  • Illuminating Chaos Using Semantics to Harness the Web

    1. 1. Illuminating Chaos Using Semantics to Harness the Web Dagobert Soergel Department of Library and Information Studies, University at Buffalo 1 AAT Workshop Academia Sinica, Taipei June 7,2010
    2. 2. Outline • Overview of issues • Semantics for whom and for what • Representation to assist with query formulation • Representation for comprehension • Systems of representation • Support for finding: Indexing • Building KOS • How can it all get done • Zeroing in on the conceptual foundation • Issues in the realm of AAT Taiwan 2
    3. 3. Semantics, structure, meaning • Classification • Meaningful arrangement • All kinds of relationships 3
    4. 4. Semantics for whom? • Semantics for computer systems inference answers and solutions instead of lots of Web pages • Semantics for people assist users in creating meaning and making sense structure for learning 4
    5. 5. Semantics for what • Finding • Comprehending • To know what to look for, a user (a person or a system) must first comprehend something – a cycle • Both finding and comprehending require navigating in an information space – need meaningful structure 5
    6. 6. Representation to assist with query formulation 6
    7. 7. Problem clarification for search JG prevention approach JG10 . individual-level prevention JG10.2 . . individual- vs. family-focused prevention JG10.2.2 . . . individual-focused prevention JG10.2.4 . . . family-focused prevention JG10.4 . . prevention through information and education JG10.4.2 . . . social marketing prevention approach JG10.4.4 . . . prevention through information dissemination JG10.4.6 . . . prevention through education JG10.4.8 . . . peer prevention JG10.8 . . prevention through spirituality and religion JG10.10 . . prevention through public commitment JG12 . environmental-level prevention JG12.4 . . social policy prevention approach JG14 . multi-level prevention 7
    8. 8. Problem clarification for search churches (buildings) . <church buildings by function> . . chapels of ease (buildings) . . fortified churches . . pilgrimage churches (buildings) . . procathedrals (buildings) . <church buildings by location or context> . . abbey churches . . cathedrals (buildings) . . cave churches . . collegiate churches . . . . . . <churches by form> . . double churches . . hall churches . . rock-cut churches . . stave churches 8
    9. 9. Browse structure for search • Make a table of contents for the entire Wikipedia using UDC • Make a classified (hierarchically structured) index for an art textbook using the Art and Architecture Thesaurus • Make a classified index for the collection of an art museum using the Art and Architecture Thesaurus 9
    10. 10. 10 Facet structure to guide search A Area of ability combines with B Degree of ability A1 psychomotor ability A2 senses A2.1 . vision A2.1.1 . . night vision A2.2 . hearing A3 intelligence A4 artistic ability B1 low degree of ability, disabled B2 average degree of ability B3 above average degree of ability B3.1 . very high degree of ability Examples A2.1B1 visually impaired A2.2B1 hearing impaired A3B1 mentally handicapped A3B3 intellectually gifted
    11. 11. Provide front-ends to assist users • Elicit a query with a facet-based interfaces, then the system creates a free-text query • Create a structure that normalizes terms assigned through social tagging and arranges them in a meaningful structure. The user can than browse and select concepts The system maps to all appropriate tags 11
    12. 12. Problem space for diseases Used by people or computer systems for search and arranging search output Pathologic process Body system affected 12 Pathologic process Body system affected Cause (condition, organism, chemical substance, environmental factors) Treatment
    13. 13. Representation for comprehension A question of information representation (knowledge representation) • For computer systems: formal representation • For people: Text, images, graphical representation, visualization • Transformations between representations, such as • from text to formal: information extraction • from text to a map showing the text structure • from a conventional thesaurus display to a concept map 13
    14. 14. Two representations Text (for people) High blood pressure is a serious disease often caused by being overweight. In kids 4 – 12 it can be treated highly effectively with Nystatin. Formal representation (for computer system) Causation (HighBloodPressure, Obesity) Treatment (HighBloodPressure, {Human, [Age, 4-12y]}, Nystatin, [Effectiveness, 4]) 14
    15. 15. Answering questions Question How can high blood pressure be prevented? Answer Loose weight? 15
    16. 16. Two representations Text Kids begin grazing independently from their mothers at three months Formal representation Separation (Mother, Child, {Goat, [Age, 3m]}) 16
    17. 17. Information extraction • Information extraction produces representations needed for the semantic Web • Also useful for people if formal expressions are transformed into sentences that state the findings of a document as individual "bullets" • Could arrange statements from one or more documents in UDC order as a kind of summary • Information extraction needs rich KOS 17
    18. 18. Representation of text structure 18
    19. 19. Meaningful arrangement of terms in document representations 19 • Terms assigned in social tagging • Terms assigned from controlled vocabulary, e.g., AAT
    20. 20. The Martyrdom of Saint Bartholomew 20
    21. 21. Tags arranged alphabetically • 1634 • 17th century • bearded • biblical • Christ’s sacrifice and crucifixion {Christ metaphor} • confronts • executioner • expressive hands • flayed alive • gestures • Intensity • Jusepe de Ribera • luminous • lurking • martyrdom • mystical experience • nude body • old man • physical anguish • profound emotion {emotion} • Pulls the viewer into the scene • religious • Saint Bartholomew • torture 21
    22. 22. Tags arranged by how they relate to the image 22
    23. 23. Matching topic (Direct) • Image theme • martyrdom • mystical experience • biblical • religious • Image content: Focal • Reference • nude body • old man • Saint Bartholomew • executioner • knife • Elaboration (Adj.) • Bearded • physical anguish • profound emotion} • luminous • Elaboration (Adv.) • expressive hands • gestures • confronts • flayed alive • torture • Image content: Peripheral • Elaboration (Adv.) • lurking 23
    24. 24. Comparison • By similarity: Metaphor / analogy • Christ’s sacrifice and crucifixion {Christ metaphor} Cause / Effect • Reaction or feeling • Intensity • Effect / Outcome • Pulls the viewer into the scene Context • Biographic info: Artist • Jusepe de Ribera • Biographic info: Time / period • 1634 • 17th century 24 Comparison, cause/effect, context
    25. 25. Tags arranged by how they relate to the image with descriptors from the Art and Architecture Thesaurus 25
    26. 26. • Image theme AAT • martyrdom sacrifice • mystical experience mysticism • Biblical biblical stories • Religious religion and religious concepts • Image content: Focal • Reference • nude body nudes (representations) • old man elderly • Saint Bartholomew saints • Executioner executioners • Knife knives • Elaboration (Adj.) • Bearded • physical anguish pain (sensation) • profound emotion {emotional} • Luminous shine Matching topic (Direct) 26
    27. 27. Matching topic (Direct) • Image content: Focal • Elaboration (Adv.) • expressive hands hands • Gestures gesture gesture drawings • confronts • flayed alive • torture torturing • Image content: Peripheral • Elaboration (Adv.) • lurking 27
    28. 28. Comparison • By similarity: Metaphor / analogy • Christ’s sacrifice and crucifixion {Christ metaphor} Cause / Effect • Reaction or feeling • Intensity • Effect / Outcome • Pulls the viewer into the scene Context • Biographic info: Artist • Jusepe de Ribera • Biographic info: Time / period • 1634 • 17th century 28 Comparison, cause/effect, context No AAT terms
    29. 29. Support comprehension through links to KOS • Map text term to concept in KOS, show definition, show place in hierarchical structure 29
    30. 30. Example mysticism Note: Refers in a general sense to a spiritual quest for hidden truth, the goal of which is to be united with the divine. It also refers more specifically to a belief in the existence of important realities beyond perceptual or intellectual understanding that are accessible by subjective experience, such as by intuition or meditation. Forms of mysticism are found in all major religions as well as in secular experience. 30
    31. 31. Example, continued Associated Concepts Facet . Associated Concepts . . <philosophical concepts> . . . <philosophical movements and attitudes> . . . . aestheticism (philosphical movements and attitudes) . . . . existentialism . . . . holism . . . . idealism (philosophical movement) . . . . individualism . . . . mysticism . . . . . Hasidism . . . . spiritualism . . . . utlitarianism 31
    32. 32. Comprehension "in the large" • Learning and sense making require comprehension across multiple sources • Requires structure – can be supplied by KOS • Requires tools for the manipulation of external structures the learner / sensemaker builds, such as concept maps 32
    33. 33. Representation systems 33
    34. 34. Representations need rules • Formal representations need logical formalisms, such as full first-order logic or subsets (for ease of processing) or extensions (to be more expressive) • Text needs rules of syntax and broader document structure • Graphical representations need rules of design 34
    35. 35. Representations need names for entities • Names for (abstract) concepts – classification • Names for many different types of other entities, such as persons, places, buildings, events, currencies, … (named entities) • Systems of such names – Knowledge Organization Systems, authority lists of personal names • Mappings between such systems 35
    36. 36. Representations need relationships • Relationships are used to connect entities, thus forming statements obesity <causes> high blood pressure • Need system of relationships Many such systems exist (a type of KOS) Problem of mapping 36
    37. 37. Rhetorical relationships • To map text structure • To discern how a retrieved document, paragraph, statement, or image relates to the topic of a search 37
    38. 38. Function-based Reasoning-based 38 Argument structure Grounds Warrants Claim Generic inference Comparison-based Induction / rule-based Causal-based Transitivity-based Topical relevance typology Rhetorical structure Matching topic Evidence (Indirect) Context Comparison Evaluation Method / Solution Purpose/ Goal Semantic-based (Green & Bean, 1995) Taxonomy Partonomy Frame-based, etc.
    39. 39. Matching topic (Direct) . Manifestation . Image content . Image theme Evidence (Indirect) Context . Scope . Framework . Environmental setting . Social background . Time & sequence . Assumption / expectation . Biographic information Condition . Helping or hindering factor . Unconditional . Exceptional condition Purpose / Motivation Cause / Effect . Cause . Effect / Outcome . Explanation (causal) . Prediction Comparison . By similarity (analogy) / By difference (contrast) . By factor that is different Method / Solution . Method / Approach . Instrument . Technique / Style Evaluation . Significance . Limitation . Criterion / Standard . Comparative evaluation 39 RST+ Functional Role
    40. 40. Functional role: Comparison Comparison . By similarity vs. By difference (Contrast) . . By similarity . . . Analogy & metaphor . . By difference (Contrast) . By factor that is different . . Different external factor . . . Different time . . . Different place . . Different participant . . . Different actor . . . Different subject acted upon . . Different act or experience . . . Different act . . . Different experience 40
    41. 41. Support for finding: Indexing • Finding based on text: Knowledge-based expansion of query Front-end as discussed earlier • Finding based on indexing: Semantically enriched documents 41
    42. 42. A semantically enriched document Reis et al. (2008) Impact of Environment and Social Gradient on Leptospira infection in Urban Slums (doi:10.1371/journal.pntd.0000228). Infectious disease studied: Leptospirosis Pathogen (causative agent of disease): Leptospira spirochete Vector of disease pathogen: Rat (Rattus norvegicus) Pathogen host subjected to study: Human (Homo sapiens) Number of subject individuals in study: 3,171 . . . Purpose of study: Quantify risk factors for leptospirosis . . . Principal finding 1: Prevalence of Leptospira antibodies . . . Principal finding 2: Disease risk . . .open sewers . . . 42 (http://dx.doi.org/10.1371/journal.pntd.0000228.x002)
    43. 43. A semantically enriched document 43 Tag Trees of Individual Semantic Classes of Highlighted Terms disease infectious diseases diarrheal disease childhood diarrhea dengue leptospirosis human leptospirosis meningococcal disease pulmonary hemorrhage syndrome visceral leishmaniasis Weil's disease occupational disease zoonotic disease ID = Infectious Disease Ontology GO = Gene Ontology term used in ID ID:0000012 immunity ID:0000017 mortality ID:0000023 zoonotic ID:0000025 pathogenicity ID:0000034 endemic ID:0000038 parasite ID:0000056 host ID:0000057 carrier ID:0000063 vector ID:0000064 pathogen ID:0000066 infectious agent ID:0000069 primary pathogen ID:0000104 infection
    44. 44. 44 ID = Infectious Disease Ontology GO = Gene Ontology IDO:0000000 ! process IDO:0000083 transmission IDO:0000231 horizontal transmission (GO:0000031) IDO:0000104 infection IDO:0000084 pathogenesis IDO:0000221 ! infectious disease progression IDO:0000100 ! pathogen evasion of host immune response IDO:0000111 antigenic variation IDO:0000115 genetic diversification IDO:0000226 pathogen life cycle (GO:0000026) IDO:0000001 ! role IDO:0000036 ! colonizer IDO:0000038 parasite IDO:0000048 symptom IDO:0000056 host IDO:0000057 carrier IDO:0000059 reservoir IDO:0000063 vector IDO:0000064 pathogen IDO:0000066 infectious agent IDO:0000069 primary pathogen IDO:0000200 mode of transmission (GO:0000000) IDO:0000002 ! quality IDO:0000215 ! quality of host population IDO:0000098 infectious disease IDO:0000210 ! quality of host IDO:0000012 immunity
    45. 45. Semantically enriched documents • Semantic enrichment supports semantic retrieval • Broad area of its own • Many different forms • Explicit document structure • Concept and named entity tagging and identification • Assigning additional concepts or named entities • Assigning extracted propositions • Closely linked with information extraction • IE produces elements of semantic enrichment 45
    46. 46. Need KOS Needed for all this • Large Knowledge Organization Systems • Large knowledge bases with mappings • Methods and procedures for developing KOS 46
    47. 47. How to get all this work done? The forces that created the problem also support the solution • Use automation • Automated information extraction gets better every day and also provides input to building KOS • Automated classification could be used for the UDC Wikipedia project • Use Web-enabled collaborative work ("crowdsourcing") • Use computer systems to assist people • Use Web-based systems to collect and integrate results • Bootstrap: The more knowledge is in formal systems, the more information extraction and structuring tasks can be automated 47
    48. 48. Example: Guided tagging • Use facet structure to get taggers think a bit more out of the box For example, could ask What does this image remind you of • Could assign some terms automatically, for example, extracting terms from text assigned to an image 48
    49. 49. DH June 2009
    50. 50. DH June 2009
    51. 51. Semantic analysis as the basis for everything
    52. 52. 52 Hub Water transport Inland water transport Ocean transport Traffic station Water transport⊓ Traffic station Inland water tr.⊓ Traffic station Ocean transport⊓ Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports LCSH Shipping Inland water transport Merchant marine Harbors German Hafen Mapping through a Hub
    53. 53. Outline • Objective: Interoperability Plus • KOS concept hub • Method: Knowledge-based, computer-assisted creation of canonical representations of concepts • Resulting knowledge base and applications 53
    54. 54. Objective Improve semantic-based search across multiple collections in multiple languages. • Interoperability between any two participating KOS (Knowledge Organization Systems) • Support for search, esp. facet-based search • for any collection indexed by a participating KOS • for search based on free-text or free-form social tagging • Assistance in cataloging (metadata creation) by catalogers or users (social tagging) • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned 54
    55. 55. KOS Concept Hub • Interoperability is achieved by expressing concepts from all participating KOS as a canonical representation, such as a description logic formula using atomic concepts and relationships • The backbone of the proposed system is a faceted core classification of atomic concepts together with a set of relationships • Mapping from KOS to KOS is achieved by reasoning over these canonical representations 55
    56. 56. 56 Hub Water transport Inland water transport Ocean transport Traffic station Water transport⊓ Traffic station Inland water tr.⊓ Traffic station Ocean transport⊓ Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports LCSH Shipping Inland water transport Merchant marine Harbors German Hafen Mapping through a Hub
    57. 57. 57 Hub Traffic station Vehicle parking Terminal facilities Water transport Inland water transport Ocean transport Traffic station Water transport⊓ By type of water transport Traffic station Inland water tr.⊓ Traffic station Ocean transport⊓ By component of traffic station Vehicle parking Water transport⊓ Terminal facilities Water transport⊓ Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports LCSH/AAT Shipping water transport Inland water transport Merchant marine Harbors ports harbors Mapping through a Hub
    58. 58. Method: How to get DL formulas Key: Efficient creation of canonical representations (DL formulas) • Apply existing knowledge: Large knowledge base ▬► less effort for processing a new KOS • Use knowledge of KOS structure for hierarchical inheritance • Use linguistic analysis of terms and captions • Eliminate redundant atomic concepts • Check or produce mapping results from assignment of concepts to the same records • Get human editors’ input and verification where needed through a user-friendly interface • KOS “owners” may verify and edit data pertaining to their KOS 58
    59. 59. Knowledge base Requires an ever larger classification and lexical knowledge base containing many kinds of data: 1. A faceted classification of atomic concepts Seeded from sources with well-developed facets such as UDC the Alcohol and Other Drug (AOD) Thesaurus the Harvard Business Thesaurus the Art and Architecture Thesaurus various systems called ontologies 59
    60. 60. Knowledge base 2 Requires an ever larger classification and lexical knowledge base containing many kinds of data: 2. Linguistic knowledge bases such as WordNet and mono-,bi-, and multi-lingual dictionaries and thesauri 3. Many KOS (Knowledge Organization Systems), such as LCC, UDC, DDC, DMOZ directory, LCSH, Gene Ontology, Schlagwortnormdatei 4. These will over time be fused into one large multilingual knowledge base with many terminological and translation relationships and relationships linking terms to concepts, with an increasing number of concepts semantically represented by a DL formula. 60
    61. 61. Examples of deriving DL formulas 61
    62. 62. L00 Transportation and traffic L10 Traffic system components L13 Traffic facilities L15Traffic stations L17 Vehicles L30 Modes of transportation L33 Air transport L37 Water transport P00 Buildings, construction P23 Buildings P27 Architecture P43 Construction R00 Engineering R30 Acoustics R37 Soundproofing T70 Military vs. civilian T73 Military T77 Civilian 62 Underlying faceted classification
    63. 63. HE Transportation HE550-560 Ports, harbors, docks, wharves, etc. L00 Transportation and traffic T77 Civilian⊓ Inherited: L00 Transportation and traffic T77 Civilian⊓ Added by editor: L15 Traffic stations L37 Water transport⊓ Resolved to: L15 Traffic stations L37 Water transport⊓ ⊓ T77 Civilian 63 Method: Assigning atomic concepts 1
    64. 64. NA6300-6307 Airport buildings From database already established: Airport = L15 Traffic stations L33 Air transport⊓ Buildings = P23 Buildings Added by editor T77 Civilian Resolved to L15 Traffic stations L33 Air transport⊓ ⊓ P23 Buildings T77 Civilian⊓ 64 Method: Assigning atomic concepts 2
    65. 65. TL681.S6 Airplanes. Soundproofing From database already established: Airplane = L17 Vehicles L33 Air transport⊓ Soundproofing = R37 Soundproofing Added by editor: Nothing Resolved to L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing 65 Method: Assigning atomic concepts 3
    66. 66. Aeroplanes-Soundproofing From database already established: Aeroplanes = Airplane [Spelling variant] Therefore Term is recognized as same as Airplanes. Soundproofing Resolved to L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing 66 Method: Assigning atomic concepts 4
    67. 67. Any class formed by geographical subdivision Such as NA6300-6307 Airport buildings NA6305.E3 Egypt Recognized using a dictionary of geographical names Inherits from subject class above it; simply add the country L15 Traffic stations L33 Air transport⊓ P23 Buildings T77 Civilian⊓ ⊓ Egypt⊓ No editor checking needed 67 Method: Assigning atomic concepts 5
    68. 68. Examples from the resulting knowledge base 68
    69. 69. HE550-560 Ports, harbors, docks, wharves, etc. NA2800 Architectural acoustics NA6300-6307 Airport buildings NA6330 Dock buildings, ferry houses, etc. TC350-374 Harbor works TH1725 Soundproof construction TL681.S6 Airplanes. Soundproofing TL725-726 Airways (Routes). Airports and landing fields. Aerodromes VA67-79 Naval ports, bases, reservations, docks VM367.S6 Submarines. Soundproofing = L15 Traffic stations L37 Water transport⊓ T77 Civilian⊓ = P27 Architecture R30 Acoustics⊓ = L15 Traffic stations L33 Air transport⊓ ⊓ P23 Buildings T77 Civilian⊓ = L15 Traffic stations L37 Water transport⊓ P23 Buildings T77 Civilian⊓ ⊓ = L15 Traffic stations L37 Water transport⊓ R00 Engineering T77 Civilian⊓ ⊓ = P23 Buildings P43 Construction⊓ ⊓ R37 Soundproofing = L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing = L13 Traffic facilities L33 Air transport⊓ ⊓ Technical aspects = L15 Traffic stations L37 Water transport⊓ T73 Military⊓ = L17 Vehicles L37 Water transport⊓ ⊓ R37 Soundproofing T73 Military⊓ ⊓ Underwater 69
    70. 70. Aeroplanes-Soundproofing Airports-Buildings Buildings-Soundproofing Ships-Soundproofing = L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing = P23 Buildings L15 Traffic stations⊓ ⊓ L33 Air transport = P23 Buildings P43 Construction⊓ ⊓ R37 Soundproofing = L17 Vehicles L37 Water transport R37⊓ ⊓ Soundproofing 70 LC subject headings with combinations of atomic concepts
    71. 71. 71 Hub L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing T73⊓ Military⊓ Underwater LCC TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing LCSH Aeroplanes- Soundproofing Ships-Soundproofing Mapping through a Hub
    72. 72. 72 Hub Canonical form of query (DL formula) User query Free text Combination of elemental concepts through facets (guided query formulation) Controlled term(s) from a KOS, possibly found through browsing a KOS Final query (Enriched) free text query Query in terms of a KOS Mapping user queries
    73. 73. TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing Aeroplanes-Soundproofing Ships-Soundproofing [L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing] [L17 Vehicles L37 Water transport⊓ ⊓ R37 Soundproofing Military]⊓ [L17 Vehicles L33 Air transport⊓ ⊓ R37 Soundproofing] [L17 Vehicles L37 Water transport⊓ ⊓ R37 Soundproofing] 73 Query: L17 Vehicles AND R37 Soundproofing
    74. 74. Examples from NALT and LCSH • NALT National Agricultural Library Thesaurus • LCSH Library of Congress Subject Headings 74
    75. 75. Air pollution laws LCSH term Air – Pollution – Laws and regulations [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable} NALT terms Air pollution [isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable Laws and regulations [isa] Legal rule Mapping LCSH ▬► NALT Air – Pollution – Laws and regulations ▬► Air pollution AND Laws and regulations Interpretation for indexing and searching in both directions 75
    76. 76. Soil moisture vs. Soil water LCSH term Soil moisture [isa] Water [containedIn] Soil NALT term Soil water [isa] Water [containedIn] Soil Mapping LCSH ▬► NALT Soil moisture ▬► Soil water 76
    77. 77. Greenhouse gardening LCSH term Greenhouse gardening [isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home NALT terms Home gardening [isa] Gardening [inEnvironment] Home Greenhouse [isa] Greenhouse Mapping LCSH ▬► NALT Greenhouse gardening ▬► Home gardening AND Greenhouse 77
    78. 78. Salad greens LCSH term Salad greens [isa] Green leafy vegetable [usedFor] Salad NALT term Green leafy vegetables [isa] Green leafy vegetable Mapping LCSH ▬► NALT Salad greens ▬► BT Green leafy vegetables 78
    79. 79. Emerging diseases LCSH term Emerging infectious diseases [isa] Disease [hasProperty] Infectious [hasProperty] Emerging NALT term Emerging diseases [isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging Mapping LCSH ▬► NALT Emerging infectious diseases ▬► Emerging diseases Emerging infectious diseases ▬► BT Emerging diseases 79
    80. 80. Distributed implementation • A KOS on the Web could assign DL formulas to its concepts − let's call this a semantically enhanced KOS or SEKOS • Could use any of a number of faceted core classifications or even several (using a unique URI for each elemental concept) • Core classifications could be mapped to each other • It is now a simple matter to map from any SEKOS to any other (somewhat dependent on the core classifications used) 80
    81. 81. Examples from the realm of AAT Taiwan AAT Art and Architecture Thesaurus (Getty) AAT Taiwan TELDAP, Institute for Information Science Academia Sinica TGM Thesaurus of Graphic Materials, Library of Congress E-HowNet A Lexical Knowledge Base for Semantic Composition, Academia Sinica 81
    82. 82. 82 Hub Facility Worship⊓ Facility Worship Judaism⊓ ⊓ Facility Worship Christianity⊓ ⊓ Facility Worship Islam⊓ ⊓ Facility Worship Buddhism⊓ ⊓ Facility Worship Taoism⊓ ⊓ TGM temples synagogues churches mosques Buddhist temples Taoist temples AAT temples (buildings) synagogues (buildings) churches (buildings) mosques (buildings) Mapping through a Hub
    83. 83. Mapping to Chinese • Use E-HowNet formal semantic expressions 83
    84. 84. E-HowNet ontology 廣義知識知識本體 • Building | 建築物 Facilities |設施 Chinese Word: 廟 English: Temple Conceptual expression: {facilities |設施 : domain = {religion |宗教 }} Chinese Word: 禪寺 English: Buddhist temple Conceptual expression: {facilities |設施 : domain = {Buddhist |佛教 }} Chinese Word: 道觀 English: Taoist temple/ Taoist quan Conceptual expression: {facilities |設施 : domain = {Taoism |道教 }} 84
    85. 85. Mapping to Chinese • Use E-HowNet formal semantic expressions • Use terms that already exist in E-HowNet • Add terms using computer-assisted derivation of semantic expressions as described above for English 85
    86. 86. Cross-language mapping problems Example AAT stitching maps to two Chinese terms: 縫合 (feng he) for needleworking and 縫訂 (feng ding) for bookbinding 86
    87. 87. Analysis Since English has only one word stitching, AAT does not distinguishbetween the two specific concepts even though the AAT scope note describes the two concepts Solution AAT AAT Taiwan stitching 縫 (feng) stitching (needlework) 縫合 (feng he) stitching (bookbinding) 縫訂 (feng ding) 87
    88. 88. Principle The classification should include all concepts that are lexicalized in any language participating in a cross-language mapping system If a language does not have a term for a concept, a term must be invented. This also happens when a concept is found through conceptual analysis 88
    89. 89. Shades of meaning Example The AAT defines temple as Buildings housing places devoted to the worship of a deity or deities But in Chinese culture, a temple (Miao( 廟 ) is devoted to worshiping or honoring or communing with ancestors or spirits. There are a number of further terms in Chinese for buildings devoted to worshiping/ commemorating saints, or some famous scholars, poets, or people with great achievement. 89
    90. 90. Shades of meaning Thus in the concept structure we need Temple (broad definition) Building housing places devoted to the worshiping, communing with, or honoring or commemorating a deity or deities or ancestors or spirits or saints, or some famous scholars, poets, people with great achievement. Temple (narrow AAT defintion) Miao( 廟 ) Other Chinese terms 90
    91. 91. The importance of good defintions AAT Taiwan must make sure that all readers, English and Chinese, understand all terms, English and Chinese, and the often subtle differences. The table on the next slide illustrates that 91
    92. 92. Uses of AAT Taiwan 92 Searching Western art Searching Chinese art Western user Understands English terms Needs to understand Chinese terms Chinese user Needs to understand English terms Understands Chinese terms All users need a good conceptual structure
    93. 93. Take-home message Semantics gives powerful systems 93
    94. 94. Dagobert Soergel dsoergel @ buffalo.edu www.dsoergel.com 94
    95. 95. T 95
    96. 96. E-HowNet ontology 廣義知識知識本 體• Building | 建築物 Facilities |設施 Chinese Word: 廟 English: Temple Conceptual expression: {facilities |設施 : domain = {religion |宗教 }} Chinese Word: 禪寺 English: Buddhist temple Conceptual expression: {facilities |設施 : domain = {Buddhist |佛教 }} Chinese Word: 道觀 English: Taoist temple/ Taoist quan Conceptual expression: {facilities |設施 : domain = {Taoism |道教 }}96
    97. 97. 9797 Mapping Issues- 1Mapping Issues- 1 Terms related to Chinese religious concept The word “temples” is frequently considered as an equivalent term “ 廟 miao” in Chinese. However, due to different purposes of the building and the spirit that it worships, names of religious buildings in Taiwan are varied. Temples (buildings) (religious buildings, <religious structures>, ... Built Environment (Hierarchy Name)) Note: Buildings housing places devoted to the worship of a deity or deities. In the strictest sense, it refers to the dwelling place of a deity, and thus often houses a cult image. In modern usage a temple is generally a structure, but it was originally derived from the Latin "templum" and historically has referred to an uncovered place affording a view of the surrounding region. For Christian or Islamic religious buildings the terms "churches" or "mosques" are generally used, but an exception is that "temples" is used for Protestant, as opposed to Roman Catholic, places of worship in France and some French-speaking regions. Q1. The mapping team has found that “temple” in AAT is broader than the concept in Chinese. Therefore it is necessary to distinguish the differences in each Chinese terms before mapping.
    98. 98. 9898 Mapping Issues-Mapping Issues- 1 Terms related to Chinese religious concept Despite the similar appearance, each of them has slight difference from the others. Miao( 廟 ): In the past, it was a place to worship ancestors. Since Han dynasty, it had been used as a place both worship ancestor and the spirits. •ci ( 祠 ): It is built for the purpose to worship/ commemorate saints, or some famous scholars, poets, people with great achievement. Sometimes also refers to those places that worship ancestors. • si ( 寺 ): Generally refers to a place that worship the Buddhist spirits. Sometimes it also refers to the place where Buddhist monk live. • an ( 庵 ): used to refers to scholars’ study place ( 書齋 ). Nowadays it refers to where Buddhist nuns live. • guan( 觀 ): only refers to Taoist building • yan( 巖 ): refers to those miaos( 廟 ) established nearby or at mountain.
    99. 99. 9999 Mapping Issues- 2Mapping Issues- 2 A Chinese set term stands for broader meaning 文玩 (Wenwan) •   A word combined with two words “ 文物 cultural object” and “ 古玩 antique curio”.     ( 文玩兼有文物與古玩的特點 ) •   It specifically refers to those objects used in the educated people’s reading room, including those writing equipments, small tools and decorations. ( 特指文人書齋中的書寫設備、小工具和擺飾 ) •   It represents the culture of reading room, by combining the practical function of educated people’s study equipments and art crafts for people’s appreciation. ( 文玩是種書齋文化,結合了文人書生的實用器物與具觀賞價值的藝術品 ) •   Common objects including: ink stones, seals, washing vessels, fine sculptured decoration…etc. “Elegant” and “exquisite” are its essential characters. ( 文玩為以下器物的泛稱 : 古硯、印章、洗器、牙雕…等,“雅” 與“巧”是其基本特徵 ) •   It is produced in a highly artistic manner. Nowadays it has become popular collection that values more as an artifact than equipment. ( 以高藝術性的方式製造,現今多為賞而勿用的文房珍玩 )
    100. 100. 100100 Mapping Issues- 2Mapping Issues- 2 A Chinese set term stands for broader meaning • lotus pod shaped vessel for injecting water 雙蓮房水注 • banana leaf shaped wooden plate 癭木蕉葉盤 • olive stone boat sculpture 果核小舟 • blue snuff bottle 藍地金星套料鼻煙 壺 • lotus leaf shaped washing vessel 白玉荷葉式洗 • seal 鴛錦雲章循 連環田黃石印 • ivory desk tidy 象牙 雕山水人物筆筒
    101. 101. 101101 Mapping Issues- 2Mapping Issues- 2 A Chinese set term stands for broader meaning Q2. The mapping team has found the meaning of Wenwan is boarder than the term “desk sets”, while some part of them are equal. Therefore, the 2 terms are inexact equivalent relations. Is it more suitable to create a new term “Wenwan” in the structure, or it should be referred as desk sets? desk sets (sets (groups), <object groupings by general context>, ... Object Groupings and Systems) Note: Sets of matching articles intended to be used on a desk including such articles as inkstands, pen trays, and stamp boxes.
    102. 102. When English terms have broader meanings (1/2) EX1: • ID: 300053660 Record Type: concept stitching (<processes and techniques by specific type>, <processes and techniques>, Processes and Techniques) Note: Refers to the process of fastening, joining, closing, uniting, mending, or creating ornamentation by stitches, which are the portions of thread left in fabric or another material by the in and out movement of a threaded needle through the thickness or surface of the material, or the loops of thread created on a needle in knitting or other needlework. In the context of textiles and needleworking, its meaning overlaps with "sewing." In the context of bookbinding, it refers to the fastening together a number of leaves or gatherings by passing the thread or wire through all of the sheets at once; it is distinct from "sewing," which, in the context of bookbinding, is used for the joining of leaves or gatherings together one by one by drawing thread or wire backwards and forwards through the back fold of each sheet to attach it to the cords. 縫綴 / 縫訂 (< 依特定種類區分之過程與技術 >, < 過程與技術 >, 過程與技術 ) 範圍註:意指藉由針線進出穿過材料或其表面的動作,將針腳留在布料或其他材料上,或是在編織或針織時形成針目, 以固定、結合、閉合、合併、修補或製作裝飾的過程。若指涉的是紡織品與手工繡品方面,則其意義與「縫紉 (sewing) 」一詞重疊。若指涉的是書籍裝幀方面,則意指將若干頁面或疊層,用線或金屬線一次穿過所有紙張固定在一 起。而「線訂( sewing )」在書籍裝幀方面,是指用針線或金屬線,在一疊書頁的摺縫處上下穿梭,使其與裝訂線固定 的方法。  In different contexts (bookbinding vs. needleworking), the meaning of stitching may change accordingly. In AAT, two kinds of meanings are explained in the same record, but when translating the term into Chinese, there will be two ways of translation, 縫合 (feng he) for needleworking and 縫訂 (feng ding) for bookbinding. The same problem occurs in the record of sewing (ID: 300053658). Stiching in needleworkingStiching in bookbinding
    103. 103. When English terms have broader meanings (2/2) EX2: 300004184 Record Type: concept patios (<uncovered spaces>, <rooms and spaces by form>, ... Components (Hierarchy Name)) Note: Paved recreation areas adjoining contemporary houses and the paved interior courts of Spanish or Spanish-style buildings. The term refers to two types of open spaces, so the translations could be 屋外休憩區 or ( 西班牙 ) 內 院 . Spanish patioPatio adjoining a house
    104. 104. When English terms have broader meanings (2/2) EX3: • 300266238 Record Type: concept maculatures (<prints by process or technique>, prints (visual works), ... Visual and Verbal Communication) Note: Prints made by taking a second impression without reinking the plate, often used for cleaning the plate. May also refer to blotting paper. Also used for scrap paper that can reinforce fabric in Medieval embroidery.  The term maculatures could be used in three different contexts (prints, blotting paper, and scrap paper) , and there are three kinds of translations ( 吸墨紙版畫、吸墨紙、固定刺繡布 料的紙片 ). Q3: In this case, since the record contains multiple meanings, it’s not a problem of which one being the preferred term, so how should the Chinese translations be displayed?

    ×