UWN: A Large Multilingual Lexical Knowledge Base


We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

  1. 1. Step 1: Link PredictionStep 1: Link Prediction UWN's Multilingual GraphUWN's Multilingual Graph • Goal: Richer, Less Sparse Features • How: Model Synonymy, Polysemy, Semantic Relatedness, Taxonomy. (within and across languages) UWN: A Large Multilingual Lexical Knowledge Base Gerard de Melo and Gerhard Weikum ICSI Berkeley / Max Planck Institute for Informatics Better NLP Features using Lexical SemanticsBetter NLP Features using Lexical Semantics More Information: • Downloadable API available • Web User Interface EntityEntitypor: “entidade”por: “entidade” cmn: “ 制度”cmn: “ 制度” InstitutionInstitution Educational institution Educational institution UniversityUniversity heb: “‫ישות‬.”heb: “‫ישות‬.” deu: “Bildungs- einrichtung” deu: “Bildungs- einrichtung” srp: “универзитете” srp: “универзитете” ... University of California, Berkeley University of California, Berkeley eng: “Berkeley ”eng: “Berkeley ” ara: “‫كينونة‬ ،‫”وجود‬ ara: “‫كينونة‬ ،‫”وجود‬ tha: “ สถาบัน”tha: “ สถาบัน” fin: “oppilaitos”fin: “oppilaitos” fin: “yliopisto”fin: “yliopisto” cmn: “ 柏克萊加州大學” cmn: “ 柏克萊加州大學” Berkeley, CABerkeley, CA George BerkeleyGeorge Berkeley deu: “Schulgebäude”deu: “Schulgebäude” school (group of fish) school (group of fish) school (institution) school (institution) school (building) school (building) deu: “Schulhaus”deu: “Schulhaus” deu: “Fischschwarm”deu: “Fischschwarm” ces: “hejno”ces: “hejno” fra: “banc”fra: “banc” chv: “шкул”chv: “шкул” jpn: “ 学校”jpn: “ 学校” kor: “ 학교”kor: “ 학교” lao: “ໂຮງຮຽນ”lao: “ໂຮງຮຽນ” kat: “სკოლა”kat: “სკოლა” • Over 16 million words and names in over 200 languages semantically connected • Ambiguity and synonymy captured eng: “UC Berkeley”eng: “UC Berkeley” eng: “Cal”eng: “Cal” CityCity Geopolitical Entity Geopolitical Entity ChuvashChuvash GeorgianGeorgian Language Descriptions: Languages Scripts Characters Countries Cyrllic (Script) Cyrllic (Script) Russia (Country) Russia (Country) UWN: Meaning Distinctions Ontological Taxonomy Encyclopedic Knowledge, Pictures, Video, Sounds, Maps Etymological and other word relationships Millions of Named Entities (People, Places, Proteins, Asteroids, Companies, etc.) 200+ languages Step 2: Entity IntegrationStep 2: Entity Integration Step 3: Taxonomy InductionStep 3: Taxonomy Induction ExtrasExtras • Markov Chain to rank taxonomic parents • 270 Wikipedia taxonomies integrated with WordNet's hypernym hierarchy es: Televisores: Televisor es: Televisiónes: Televisión ru: Телевизорru: Телевизор hi: दूरदर्शनhi: दूरदर्शन ja: テレビja: テレビ en: Televisionen: Television en: Television set en: Television set zh: 电视机zh: 电视机 ja: テレビ受像機ja: テレビ受像機 en: TV seten: TV set en: T.V.en: T.V. V1 ,u V1 ,u V1 ,v V1 ,v • LP for constraint-based computation of equivalence classes of entities • Region Growing approximation algorithm • Link multilingual words to WordNet • Connect Wikipedia with WordNet (equivalence and taxonomic links) • FrameNet Linking • Common-Sense Knowledge Extraction • Multilingual Roget's Thesaurus