Cwn aat talk

718 views

Published on

  • Be the first to comment

  • Be the first to like this

Cwn aat talk

  1. 1. Sinica Bow 與中文詞網的方法與作法 謝舒凱 Lab of Ontologies, Language Processing and e-Humanities, NTNU CWN group, Institute of Linguistics, Academia Sinica shukai@gmail.com April 24, 2010
  2. 2. Background Sinica BOW and Chinese Wordnet (CWN) On-going Efforts and Future Perspectives
  3. 3. Background Sinica BOW and Chinese Wordnet (CWN) On-going Efforts and Future Perspectives
  4. 4. Who are We? 我們是一群很有「sense」又懂得搞「關係」的人
  5. 5. What We have been Working on? Language Resources Construction, Evaluation and Knowledge Modelling: Corpus 語料庫 (ASBC, LDC-Gigaword, twWaC(balanced, domain and social media)) Lexicon 詞彙知識庫 (Core Vocabulary, Domain lexicon knowledge base) Ontology 知識本體 (Sinica BOW (SUMO), KYOTO-DOLCHE, Hanzi/radical Ontology, Domain ontologies)
  6. 6. Corpus and Query Tools
  7. 7. Ontology and Cross-languages Validation SUMO Chinese example ®É¶¡ ®ÉÂI ²y-±«× °ê»Ú³æ¦ì
  8. 8. Lexicon Corpus distribution-based approach Simulation-based computational approach (Psycho-) linguistic approach
  9. 9. Latent Semantics in the Mental Lexicon
  10. 10. Random Walk in the Mental Lexicon
  11. 11. WordNet
  12. 12. WordNet Browser (e.g., Dubey)
  13. 13. Background Sinica BOW and Chinese Wordnet (CWN) On-going Efforts and Future Perspectives
  14. 14. Bootstrapping Bilingual Wordnet (I): Sinica BOW
  15. 15. Bootstrapping Bilingual Wordnet (II): GoogleCWN
  16. 16. Chinese-anchored Bilingual Wordnet from Scratch
  17. 17. Methodologies, Issues and Solutions 1. Word segmentatin and selection (frequency and lexical semantic theory-based) 2. Word sense distinction: 同義詞集 (synset), 詞義 (sense)、義 面 (meaning facet)、異體詞 3. Word sense relations: LSR algegra (transitivity in the network), paronymy, troponymy, morpho-semantic relations, etc.
  18. 18. Implementation 1. From MS Access to MySQL database. 2. Python-NLTK modules for CWN (and other resources) 3. Convert to LMF-compatible markup
  19. 19. Lexicon Standard and Markup Languages LMF (Lexical Markup Framework) GLML(Generative Lexicon Markup Language) KAF (KYOTO-Annotation Format)
  20. 20. KAF Example
  21. 21. Current status
  22. 22. Toward a Global Wordnet Grids HanziGrid among CJKV (partly done with Chinese Hanzi and Japanese Kanji mapping) Chinese-Italian WordNet Web Service (RDF/OWL representation as a data model for Semantic Web) Global Wordnets Sense Tagging (Environmental domain for SemEval 2010)
  23. 23. Toward Mashup approach to dynamic LKB: Wordnik Test online
  24. 24. Toward a better understanding of Lexical and Social Network
  25. 25. KYOTO-CWN WORKSHOP Around mid September Release of tools, resources, technical reports, browsing system 竭誠歡迎您的參加、批評、指教、與合作,謝謝!

×