A Multilingual Semantic Wiki Based on
Controlled Natural Language
Tobias Kuhn
Chair of Sociology, in particular of Modelin...
About This Talk
This talk is mainly based on the following papers:
Kaarel Kaljurand and Tobias Kuhn. A Multilingual Semant...
Imagine ...
... that Wikipedia can check consistency and answer
questions about the contained knowledge, and
... that all ...
• AceWiki is a semantic wiki
• Articles are written in Attempto Controlled English (ACE)
• These sentences are internally ...
Monolingual AceWiki: Screenshot
Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 5 / 27
Attempto Controlled English (ACE)
Subset of natural English:
• Conjunction, disjunction, negation, if-then, ...
• Anaphori...
Predictive Editor
Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 7 / 27
Consistency Checking
AceWiki ensures consistency by checking every new statement:
Tobias Kuhn, ETH Zurich A Multilingual S...
Question Answering
AceWiki supports simple wh-questions:
Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 9 / 27
Monolingual AceWiki: Demo
http://attempto.ifi.uzh.ch/acewiki/
Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 10 / 27
ACE Reasoning via Translation to OWL
Every country that does not border a sea is a landlocked-country.
SubClassOf(
ObjectI...
Evaluation
Two small usability experiments with earlier versions of AceWiki:
• Altogether 26 untrained participants
• Task...
Multilingual AceWiki: AceWiki-GF
General ideas:
• Make wiki content available in different languages
• Automatically transl...
Grammatical Framework (GF)
GF is a framework for multilingual grammar engineering:
• Rule-based
• Functional programming l...
GF grammars and translations
GF grammars consist of:
• One language-neutral abstract syntax
• Concrete syntaxes specify wo...
Multilingual AceWiki: Screenshot
Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 16 / 27
Multilingual AceWiki: Demo
http://attempto.ifi.uzh.ch/acewiki-gf/
Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 17 ...
ACE-in-GF
• Multiple controlled versions of natural languages that map to
ACE (and to each other)
• As a result, they can ...
ACE-in-GF: Example
German: Jedes Land, das nicht an ein Meer grenzt, ist ein
Binnenland.
ACE-in-GF tree:
baseText (sText (...
ACE-in-GF: Implementation
Implementation of the ACE syntax:
• Targeting the subset of ACE that can be mapped to OWL
• Almo...
Evaluation of AceWiki-GF
Hypothesis: A group of users reaches almost the same level of agreement
on the content of an arti...
Evaluation of AceWiki-GF: Results
30 participants spent on average 37 minutes using AceWiki-GF,
creating 316 sentences in ...
Evaluation of AceWiki-GF: Results
Assumption: Translation introduces a constant translation error rate r
that has the effec...
Evaluation of AceWiki-GF: Feedback
Questionnaire for the participants contained these questions:
1 Was AceWiki Geography e...
The Future...?
Can we make a truly multilingual Wikipedia?
• Store main content in a semantic representation
• Verbalizati...
Links
ACE parser (APE) source code: https://github.com/Attempto/APE
ACE-in-GF source code: http://github.com/Attempto/ACE-...
Thank you for your Attention!
If you are interested in Controlled Natural Languages,
come visit us at CNL 2014!
CNL
2014
F...
Upcoming SlideShare
Loading in …5
×

A Multilingual Semantic Wiki Based on Controlled Natural Language

562 views
466 views

Published on

This presentation introduces AceWiki-GF, a semantic wiki based on controlled natural language that makes its knowledge base viewable and editable in different languages applying high-quality rule-based machine translation.

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
562
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A Multilingual Semantic Wiki Based on Controlled Natural Language

  1. 1. A Multilingual Semantic Wiki Based on Controlled Natural Language Tobias Kuhn Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich, Switzerland Insight, National University of Ireland, Galway 19 August 2014
  2. 2. About This Talk This talk is mainly based on the following papers: Kaarel Kaljurand and Tobias Kuhn. A Multilingual Semantic Wiki Based on Attempto Controlled English and Grammatical Framework. In Proceedings of the 10th Extended Semantic Web Conference (ESWC). 2013. http://purl.org/tkuhn/eswc2013acewikigf Kaarel Kaljurand, Tobias Kuhn, and Laura Canedo. Collaborative multilingual knowledge management based on controlled natural language. Semantic Web. Accepted, to appear. http://www.semantic-web-journal.net/system/files/swj524.pdf Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 2 / 27
  3. 3. Imagine ... ... that Wikipedia can check consistency and answer questions about the contained knowledge, and ... that all content is instantly available in all languages! Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 3 / 27
  4. 4. • AceWiki is a semantic wiki • Articles are written in Attempto Controlled English (ACE) • These sentences are internally translated into the Semantic Web language OWL • An OWL reasoner is built in to answer questions and detect inconsistencies • Special editor for writing ACE statements • Extended to support multilinguality Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 4 / 27
  5. 5. Monolingual AceWiki: Screenshot Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 5 / 27
  6. 6. Attempto Controlled English (ACE) Subset of natural English: • Conjunction, disjunction, negation, if-then, ... • Anaphoric references: pronouns, definite noun phrases, variables • Quantifiers: every, no, at least 3, ... • Content words: proper names, nouns, verbs, adjectives, ... Grammar is fixed, but users can change content words. Deterministic ambiguity handling: • Anaphora resolution (France borders Spain and it borders Portugal.) • Quantifier scope (Every country borders a country.) • Attachment (Every EU-country borders a country that is an EU-country and is a NATO-country.) Well-defined translations to and from first-order logic, OWL, ... Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 6 / 27
  7. 7. Predictive Editor Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 7 / 27
  8. 8. Consistency Checking AceWiki ensures consistency by checking every new statement: Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 8 / 27
  9. 9. Question Answering AceWiki supports simple wh-questions: Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 9 / 27
  10. 10. Monolingual AceWiki: Demo http://attempto.ifi.uzh.ch/acewiki/ Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 10 / 27
  11. 11. ACE Reasoning via Translation to OWL Every country that does not border a sea is a landlocked-country. SubClassOf( ObjectIntersectionOf( :country ObjectComplementOf( ObjectSomeValuesFrom( :border :sea ) ) ) :landlocked-country ) Which country is a landlocked-country? ObjectIntersectionOf( :country :landlocked-country ) Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 11 / 27
  12. 12. Evaluation Two small usability experiments with earlier versions of AceWiki: • Altogether 26 untrained participants • Task: Collaborative creation of a knowledge base Results: • 78%-81% of the sentences were correct and sensible • 61%-70% of them were complex (containing negations, implications, disjunctions or number restrictions) • Creation of a correct sentence every 5–6 minutes • Definition of a new word every 5–7 minutes → Even untrained users can effectively use AceWiki Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 12 / 27
  13. 13. Multilingual AceWiki: AceWiki-GF General ideas: • Make wiki content available in different languages • Automatically translated content using high-quality rule-based machine translation: Grammatical Framework (GF) • Language switching like in Wikipedia • Localization of the user interface Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 13 / 27
  14. 14. Grammatical Framework (GF) GF is a framework for multilingual grammar engineering: • Rule-based • Functional programming language (based on Haskell) optimized to handle natural language • Resource Grammar Library implementing common morphological and syntactic structures • Mildly context sensitive • Bidirectional translations: concrete language ⇔ abstract syntax Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 14 / 27
  15. 15. GF grammars and translations GF grammars consist of: • One language-neutral abstract syntax • Concrete syntaxes specify words, agreement, word order, etc. by implementing the abstract categories and functions Example border : Country -> Country -> Relation English: border x y = x!Nom + "borders" + y!Nom Estonian: border x y = x!Gen + "naaber on" + y!Nom GF translations consist of: • First, parse a string in the original language to a tree (or trees) in the abstract syntax • Then, linearize these trees as strings in the target language Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 15 / 27
  16. 16. Multilingual AceWiki: Screenshot Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 16 / 27
  17. 17. Multilingual AceWiki: Demo http://attempto.ifi.uzh.ch/acewiki-gf/ Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 17 / 27
  18. 18. ACE-in-GF • Multiple controlled versions of natural languages that map to ACE (and to each other) • As a result, they can be bidirectionally mapped to various formal languages already supported by ACE Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 18 / 27
  19. 19. ACE-in-GF: Example German: Jedes Land, das nicht an ein Meer grenzt, ist ein Binnenland. ACE-in-GF tree: baseText (sText (s (vpS (everyNP (relCN (cn_as_VarCN country_CN) (neg_predRS which_RP (v2VP border_V2 (thereNP_as_NP (aNP (cn_as_VarCN sea_CN))))))) (npVP (thereNP_as_NP (aNP (cn_as_VarCN landlocked_country_CN))))))) ACE: Every country that does not border a sea is a landlocked-country. OWL: SubClassOf( ObjectIntersectionOf( :country ObjectComplementOf( ObjectSomeValuesFrom( :border :sea ) ) ) :landlocked-country ) Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 19 / 27
  20. 20. ACE-in-GF: Implementation Implementation of the ACE syntax: • Targeting the subset of ACE that can be mapped to OWL • Almost 100% coverage at almost 0% ambiguity Support of most RGL languages: • Bulgarian, Catalan, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Italian, Latvian, Norwegian, Polish, Romanian, Russian, Spanish, Swedish, Thai, Urdu • RGL-based design provides automatic increase in quality and language-coverage over time Status • Some precision problems, e.g. with anaphoric references • Ambiguity and coverage problems in some languages Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 20 / 27
  21. 21. Evaluation of AceWiki-GF Hypothesis: A group of users reaches almost the same level of agreement on the content of an article presented to them in different languages as when the article is presented to all of them in the same language. Design • Based on a 500-word lexicon on European geography in three languages: English, German and Spanish • 30 participants accessed AceWiki-GF and wrote sentences in their language (10 participants for each language) • They had to enter true and false sentences and tag them as such • In a post-editing task, each participant checked the output of two other participants: one translated from another language and one written in the same language (true/false tags were removed and sentences shuffled); they were asked to remove all false sentences Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 21 / 27
  22. 22. Evaluation of AceWiki-GF: Results 30 participants spent on average 37 minutes using AceWiki-GF, creating 316 sentences in total. Definition of agreement level: (Tk + Fd )/S S is the total number of sentences, Tk the number of sentences marked as true and kept, and Fd the ones marked as false and deleted Agreement level (difference is not significant): 82.2%without translation 84.0%with translation 0% 25% 50% 75% 100% agreement level Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 22 / 27
  23. 23. Evaluation of AceWiki-GF: Results Assumption: Translation introduces a constant translation error rate r that has the effect of reducing the agreement level a to (1 − r) · a. New hypothesis: The translation error rate is less than 5%. 78.1%with hypothetical translation (r = 5%) 84.0%with translation 0% 25% 50% 75% 100% agreement level p-value with one-tailed Wilcoxon signed-rank test: 0.046 → With AceWiki-GF, translation error rate is less than 5% Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 23 / 27
  24. 24. Evaluation of AceWiki-GF: Feedback Questionnaire for the participants contained these questions: 1 Was AceWiki Geography easy or difficult to use in general? 2 Was the sentence editor easy or difficult to use? 3 Was creating true and false statements easy or difficult to perform? Possible answers: “very difficult” (0), “difficult” (1), “medium” (2), “easy” (3), and “very easy” (4) Results: 1 Average: 2.93 (∼“easy”) 2 Average: 2.77 (∼“easy”) 3 Average: 2.70 (∼“easy”) Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 24 / 27
  25. 25. The Future...? Can we make a truly multilingual Wikipedia? • Store main content in a semantic representation • Verbalization in different languages • All content is instantly available in all languages (once the required vocabulary is defined) • Breaking the current dominance of English and putting an end to the lock-out of users speaking less widespread or underrepresented languages • Contributing to the Semantic Web Related: • http://www.wikidata.org • http://meta.wikimedia.org/wiki/A_proposal_towards_a_ multilingual_Wikipedia Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 25 / 27
  26. 26. Links ACE parser (APE) source code: https://github.com/Attempto/APE ACE-in-GF source code: http://github.com/Attempto/ACE-in-GF AceWiki and AceWikiGF • Source code: http://github.com/AceWiki/AceWiki • Demos (non-GF): http://attempto.ifi.uzh.ch/acewiki/ • Demos (GF): http://attempto.ifi.uzh.ch/acewiki-gf/ MOLTO project web site: http://www.molto-project.eu Attempto web site: http://attempto.ifi.uzh.ch GF: http://www.grammaticalframework.org Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 26 / 27
  27. 27. Thank you for your Attention! If you are interested in Controlled Natural Languages, come visit us at CNL 2014! CNL 2014 Fourth Workshop on Controlled Natural Language 20–22 August 2014, Galway http://attempto.ifi.uzh.ch/site/cnl2014/ Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 27 / 27

×