SlideShare a Scribd company logo
A Multilingual Semantic Wiki based on
Attempto Controlled English and
Grammatical Framework
Tobias Kuhn
Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich,
Switzerland

Computational Linguistics Colloquium, University of Zurich
29 October 2013
About This Talk
This talk is mainly based on the following papers:
Kaarel Kaljurand and Tobias Kuhn. A Multilingual Semantic Wiki
Based on Attempto Controlled English and Grammatical Framework.
In Proceedings of the 10th Extended Semantic Web Conference
(ESWC). 2013.
http://purl.org/tkuhn/eswc2013acewikigf

Kaarel Kaljurand, Tobias Kuhn, and Laura Canedo. Collaborative
multilingual knowledge management based on controlled natural
language. Under review.
http://www.semantic-web-journal.net/system/files/swj524.pdf

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

2 / 32
Imagine ...

... that Wikipedia can check consistency and answer
questions about the contained knowledge, and
... that all content is instantly available in all
languages!

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

3 / 32
• AceWiki is a semantic wiki
• Articles are written in Attempto Controlled English (ACE)
• These sentences are internally translated into the Semantic Web

language OWL
• An OWL reasoner is built in to answer questions and detect

inconsistencies
• Special editor for writing ACE statements
• Has been extended to support multilinguality
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

4 / 32
Monolingual AceWiki: Screenshot

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

5 / 32
Attempto Controlled English (ACE)
Subset of natural English:
• Conjunction, disjunction, negation, if-then, ...
• Anaphoric references: pronouns, definite noun phrases, variables
• Quantifiers: every, no, at least 3, ...
• Content words: proper names, nouns, verbs, adjectives, ...

Grammar is fixed, but users can change content words.
Deterministic ambiguity handling:
• Anaphora resolution (France borders Spain and it borders

Portugal.)
• Quantifier scope (Every country borders a country.)
• Attachment (Every EU-country borders a country that is an

EU-country and is a NATO-country.)
Well-defined translations to and from first-order logic, OWL, ...
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

6 / 32
Predictive Editor

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

7 / 32
Consistency Checking

AceWiki ensures consistency by checking every new statement:

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

8 / 32
Question Answering

AceWiki supports simple wh-questions:

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

9 / 32
Monolingual AceWiki: Demo

http://attempto.ifi.uzh.ch/acewiki/

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

10 / 32
ACE Reasoning via Translation to OWL
Every country that does not border a sea is a landlocked-country.
SubClassOf(
ObjectIntersectionOf(
:country
ObjectComplementOf(
ObjectSomeValuesFrom(
:border
:sea
)
)
)
:landlocked-country
)

Which country is a landlocked-country?
ObjectIntersectionOf(
:country
:landlocked-country
)

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

11 / 32
Evaluation
Two small usability experiments with earlier versions of AceWiki:
• Altogether 26 untrained participants
• Task: Collaborative creation of a knowledge base

Results:
• 78%-81% of the sentences were correct and sensible
• 61%-70% of them were complex (containing negations,

implications, disjunctions or number restrictions)
• Creation of a correct sentence every 5–6 minutes
• Definition of a new word every 5–7 minutes

→ Even untrained users can effectively use AceWiki

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

12 / 32
Multilingual AceWiki: AceWiki-GF

General ideas:
• Make wiki content available in different languages
• Automatically translated content using rule-based machine

translation: Grammatical Framework (GF)
• Language switching like in Wikipedia
• Localization of the user interface

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

13 / 32
Grammatical Framework (GF)

GF is a framework for multilingual grammar engineering:
• Rule-based
• Functional programming language (based on Haskell) optimized

to handle natural language
• Resource Grammar Library implementing common morphological

and syntactic structures
• Mildly context sensitive
• Bidirectional translations: concrete language ⇔ abstract syntax
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

14 / 32
GF grammars and translations
GF grammars consist of:
• One language-neutral abstract syntax
• Concrete syntaxes specify words, agreement, word order, etc. by

implementing the abstract categories and functions
Example
border : Country -> Country -> Relation
English: border x y = x!Nom + "borders" + y!Nom
Estonian: border x y = x!Gen + "naaber on" + y!Nom
GF translations consist of:
• First, parse a string in the original language to a tree (or trees)

in the abstract syntax
• Then, linearize these trees as strings in the target language
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

15 / 32
Multilingual AceWiki: Screenshot

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

16 / 32
GF Resource Grammar Library (RGL)
• Morphology and syntax for ∼30 languages via language-neutral

API
• Developers do not need detailed knowledge of the languages

that they want to support in their application

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

17 / 32
Implementation of AceWiki-GF
Integration of ACE with GF (ACE-in-GF):
• Implementation of a multilingual grammar of ACE in the GF

framework
• Coverage of the languages supported by the GF resource

grammar
• No fine-tuning to any particular language (apart from ACE)

Integration of AceWiki with GF (AceWiki-GF):
• Implementation of connections to GF tools (GF Webservice /

Cloud Service)
• Support for the management of multilinguality, ambiguity, and

grammar/lexicon editing

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

18 / 32
Multilingual AceWiki: Demo

http://attempto.ifi.uzh.ch/acewiki-gf/

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

19 / 32
ACE-in-GF
• Multiple controlled versions of natural languages that map to

ACE (and to each other)
• As a result, they can be bidirectionally mapped to various formal

languages already supported by ACE

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

20 / 32
ACE-in-GF: Example
German: Jedes Land, das nicht an ein Meer grenzt, ist ein
Binnenland.
ACE-in-GF tree:
baseText (sText (s (vpS (everyNP (relCN (cn_as_VarCN country_CN)
(neg_predRS which_RP (v2VP border_V2 (thereNP_as_NP
(aNP (cn_as_VarCN sea_CN))))))) (npVP (thereNP_as_NP
(aNP (cn_as_VarCN landlocked_country_CN)))))))

ACE: Every country that does not border a sea is a
landlocked-country.
OWL:
SubClassOf(
ObjectIntersectionOf(
:country
ObjectComplementOf(
ObjectSomeValuesFrom( :border :sea )
)
)
:landlocked-country
)
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

21 / 32
ACE-in-GF: Implementation
Implementation of the ACE syntax:
• Targeting the subset of ACE that can be mapped to OWL
• Almost 100% coverage at almost 0% ambiguity

Support of most RGL languages:
• Bulgarian, Catalan, Chinese, Danish, Dutch, English, Finnish,

French, German, Greek, Hindi, Italian, Latvian, Norwegian,
Polish, Romanian, Russian, Spanish, Swedish, Thai, Urdu
• RGL-based design provides automatic increase in quality and

language-coverage over time
Status
• Some precision problems, e.g. with anaphoric references
• Ambiguity and coverage problems in some languages

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

22 / 32
Ambiguity Resolution

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

23 / 32
Evaluation of ACE-in-GF
Design
• Generation of ∼100 ACE sentences/questions and automatic
translation to all supported languages
• Full coverage of grammar functions
• Large coverage of OWL axiom structures (subclass, range,

domain, transitivity, ...)
• Measuring translation accuracy from ACE to other languages
• Using Google Translate as the baseline
• 20 human evaluators (2 per language) as the gold standard

Results
• Participants preferred ACE-in-GF translations to Google

translations and post-edited them less
• Many edits were stylistic

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

24 / 32
Evaluation of ACE-in-GF: Results

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

25 / 32
Evaluation of AceWiki-GF
Hypothesis: A group of users reaches almost the same level of agreement
on the content of an article presented to them in different languages as
when the article is presented to all of them in the same language.

Design
• Based on a 500-word lexicon on European geography in three

languages: English, German and Spanish
• 30 participants accessed AceWiki-GF and wrote sentences in

their language (10 participants for each language)
• They had to enter true and false sentences and tag them as such
• In a post-editing task, each participant checked the output of

two other participants: one translated from another language
and one written in the same language (true/false tags were
removed and sentences shuffled); they were asked to remove all
false sentences
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

26 / 32
Evaluation of AceWiki-GF: Results
30 participants spent on average 37 minutes using AceWiki-GF,
creating 316 sentences in total.
Definition of agreement level: (Tk + Fd )/S
S is the total number of sentences, Tk the number of sentences marked as true
and kept, and Fd the ones marked as false and deleted

Agreement level (difference is not significant):

without translation

82.2%

with translation
0%

Tobias Kuhn, ETH Zurich

25%

84.0%
50%
agreement level

75%

A Multilingual Semantic Wiki

100%

27 / 32
Evaluation of AceWiki-GF: Results
Assumption: translation introduces a constant translation error rate r
that has the effect that the agreement level is (1 − r ) × a instead of a
New hypothesis: The translation error rate is less than 5%.

with hypothetical translation (r = 5%)

78.1%

with translation
0%

25%

84.0%
50%
agreement level

75%

100%

p-value with one-tailed Wilcoxon signed-rank test: 0.046
→ With AceWiki-GF, translation error rate is less than 5%
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

28 / 32
Evaluation of AceWiki-GF: Feedback
Questionnaire for the participants contained these questions:
1

Was AceWiki Geography easy or difficult to use in general?

2

Was the sentence editor easy or difficult to use?

3

Was creating true and false statements easy or difficult to
perform?

Possible answers: “very difficult” (0), “difficult” (1), “medium” (2),
“easy” (3), and “very easy” (4)
Results:
1

Average: 2.93 (∼“easy”)

2

Average: 2.77 (∼“easy”)

3

Average: 2.70 (∼“easy”)

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

29 / 32
The Future...?
Can we make a truly multilingual Wikipedia?
• Store main content in a semantic representation
• Verbalization in different languages
• All content is instantly available in all languages (once the

required vocabulary is defined)
• Breaking the current dominance of English and putting an end

to the lock-out of users speaking less widespread or
underrepresented languages
• Contributing to the Semantic Web

Related:
• http://www.wikidata.org
• http://meta.wikimedia.org/wiki/A_proposal_towards_a_
multilingual_Wikipedia
Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

30 / 32
Links
ACE parser (APE) source code: https://github.com/Attempto/APE
ACE-in-GF source code: http://github.com/Attempto/ACE-in-GF
AceWiki and AceWikiGF
• Source code: http://github.com/AceWiki/AceWiki
• Demos (non-GF): http://attempto.ifi.uzh.ch/acewiki/
• Demos (GF): http://attempto.ifi.uzh.ch/acewiki-gf/
MOLTO project web site: http://www.molto-project.eu
Attempto web site: http://attempto.ifi.uzh.ch
GF: http://www.grammaticalframework.org

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

31 / 32
Thank you for your Attention!

Questions?

Tobias Kuhn, ETH Zurich

A Multilingual Semantic Wiki

32 / 32

More Related Content

Similar to A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
Tobias Kuhn
 
How Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic WikisHow Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic Wikis
Tobias Kuhn
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
Tobias Kuhn
 
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...
MeetupDataScienceRoma
 
Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia EditingMultilinguals and Wikipedia Editing
Multilinguals and Wikipedia Editing
Scott A. Hale
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
Marco Fossati
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary Induction
Sebastian Ruder
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
Janifer Gatenby
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
linshanleearchive
 
AceWiki
AceWikiAceWiki
AceWiki
Tobias Kuhn
 
Umd draft-2010 jun22
Umd draft-2010 jun22Umd draft-2010 jun22
Umd draft-2010 jun22
Ed Bice
 
This presentation about corpus linguistics
This presentation about corpus linguisticsThis presentation about corpus linguistics
This presentation about corpus linguistics
NezrinMemmedzade1
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Fatima Batool
 
Imperative programming
Imperative programmingImperative programming
Imperative programming
Edward Blurock
 
List of wikipedias
List of wikipediasList of wikipedias
List of wikipedias
KYLEKALANILIMNEST
 
Wikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemWikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization System
Jakob .
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
ThennarasuSakkan
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
King Saud University
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Normunds Grūzītis
 
Development of analysis rules to identify proper noun from bengali sentence f...
Development of analysis rules to identify proper noun from bengali sentence f...Development of analysis rules to identify proper noun from bengali sentence f...
Development of analysis rules to identify proper noun from bengali sentence f...
Syeful Islam
 

Similar to A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework (20)

An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
 
How Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic WikisHow Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic Wikis
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with Bab...
 
Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia EditingMultilinguals and Wikipedia Editing
Multilinguals and Wikipedia Editing
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary Induction
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
AceWiki
AceWikiAceWiki
AceWiki
 
Umd draft-2010 jun22
Umd draft-2010 jun22Umd draft-2010 jun22
Umd draft-2010 jun22
 
This presentation about corpus linguistics
This presentation about corpus linguisticsThis presentation about corpus linguistics
This presentation about corpus linguistics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Imperative programming
Imperative programmingImperative programming
Imperative programming
 
List of wikipedias
List of wikipediasList of wikipedias
List of wikipedias
 
Wikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemWikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization System
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 
Development of analysis rules to identify proper noun from bengali sentence f...
Development of analysis rules to identify proper noun from bengali sentence f...Development of analysis rules to identify proper noun from bengali sentence f...
Development of analysis rules to identify proper noun from bengali sentence f...
 

More from Tobias Kuhn

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
Tobias Kuhn
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
Tobias Kuhn
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
Tobias Kuhn
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
Tobias Kuhn
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
Tobias Kuhn
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Tobias Kuhn
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
Tobias Kuhn
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
Tobias Kuhn
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
Tobias Kuhn
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
Tobias Kuhn
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
Tobias Kuhn
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
Tobias Kuhn
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
Tobias Kuhn
 
Nanopubs
NanopubsNanopubs
Nanopubs
Tobias Kuhn
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Tobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Tobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Tobias Kuhn
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Tobias Kuhn
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Tobias Kuhn
 

More from Tobias Kuhn (20)

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
 

Recently uploaded

Luxury Lifestyle Summit 2024 - Sponsorship
Luxury Lifestyle Summit 2024 - SponsorshipLuxury Lifestyle Summit 2024 - Sponsorship
Luxury Lifestyle Summit 2024 - Sponsorship
uitdiedosfinance
 
Corporate Governance for South African Mining Companies
Corporate Governance for South African Mining CompaniesCorporate Governance for South African Mining Companies
Corporate Governance for South African Mining Companies
James AH Campbell
 
Restaurant Chiraz Sindbad Hotel Hammamet
Restaurant Chiraz Sindbad Hotel HammametRestaurant Chiraz Sindbad Hotel Hammamet
Restaurant Chiraz Sindbad Hotel Hammamet
rihabkorbi24
 
EN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdf
EN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdfEN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdf
EN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdf
ivanparu86
 
Apparel Sourcing Week 2024 DelegateDeck.pdf
Apparel Sourcing Week 2024 DelegateDeck.pdfApparel Sourcing Week 2024 DelegateDeck.pdf
Apparel Sourcing Week 2024 DelegateDeck.pdf
Apparel Sourcing Week
 
STRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptx
STRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptxSTRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptx
STRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptx
ImranTabish1
 
Movers near me in Dubai , Best Packers and Movers In Dubai
Movers near me in Dubai , Best Packers and Movers In DubaiMovers near me in Dubai , Best Packers and Movers In Dubai
Movers near me in Dubai , Best Packers and Movers In Dubai
imranmalik114455
 
Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...
Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...
Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...
Philip M Caputo
 
Top five predictions today, .
Top five predictions today,            .Top five predictions today,            .
Top five predictions today, .
Rupasingh82
 
New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
44annissa
 
Connected Small Boat Protection Solution | July 2024
Connected Small Boat Protection Solution | July  2024Connected Small Boat Protection Solution | July  2024
Connected Small Boat Protection Solution | July 2024
Hector Del Castillo, CPM, CPMM
 
TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...
TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...
TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...
kevinkariuki227
 
Accounts.pdfjsjshwnsksowowkwmwmsmmsmsksk
Accounts.pdfjsjshwnsksowowkwmwmsmmsmskskAccounts.pdfjsjshwnsksowowkwmwmsmmsmsksk
Accounts.pdfjsjshwnsksowowkwmwmsmmsmsksk
gautamprateek97
 
TALENT ACQUISITION AND MANAGEMENT LECTURE 5
TALENT ACQUISITION AND MANAGEMENT LECTURE 5TALENT ACQUISITION AND MANAGEMENT LECTURE 5
TALENT ACQUISITION AND MANAGEMENT LECTURE 5
projectseasy
 
Qatar Airways Kuwait Office.pdf.........
Qatar Airways Kuwait Office.pdf.........Qatar Airways Kuwait Office.pdf.........
Qatar Airways Kuwait Office.pdf.........
anissageorge9890
 
WAM Corporate Presentation July 2024.pdf
WAM Corporate Presentation July 2024.pdfWAM Corporate Presentation July 2024.pdf
WAM Corporate Presentation July 2024.pdf
Western Alaska Minerals Corp.
 
Retail Store Scavenger Hunt powerpoint slides
Retail Store Scavenger Hunt powerpoint slidesRetail Store Scavenger Hunt powerpoint slides
Retail Store Scavenger Hunt powerpoint slides
JairSemexant
 
21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf
21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf
21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf
emmanuelpulido003
 
Path to the next normal collection McKinsey
Path to the next normal collection McKinseyPath to the next normal collection McKinsey
Path to the next normal collection McKinsey
MajIman2
 
YouTube Automation Step-by-step Guide.pdf
YouTube Automation Step-by-step Guide.pdfYouTube Automation Step-by-step Guide.pdf
YouTube Automation Step-by-step Guide.pdf
grizzyhuncho
 

Recently uploaded (20)

Luxury Lifestyle Summit 2024 - Sponsorship
Luxury Lifestyle Summit 2024 - SponsorshipLuxury Lifestyle Summit 2024 - Sponsorship
Luxury Lifestyle Summit 2024 - Sponsorship
 
Corporate Governance for South African Mining Companies
Corporate Governance for South African Mining CompaniesCorporate Governance for South African Mining Companies
Corporate Governance for South African Mining Companies
 
Restaurant Chiraz Sindbad Hotel Hammamet
Restaurant Chiraz Sindbad Hotel HammametRestaurant Chiraz Sindbad Hotel Hammamet
Restaurant Chiraz Sindbad Hotel Hammamet
 
EN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdf
EN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdfEN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdf
EN_Chinese-Automotive-in-SEA-Vero-White-Paper_2023.pdf
 
Apparel Sourcing Week 2024 DelegateDeck.pdf
Apparel Sourcing Week 2024 DelegateDeck.pdfApparel Sourcing Week 2024 DelegateDeck.pdf
Apparel Sourcing Week 2024 DelegateDeck.pdf
 
STRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptx
STRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptxSTRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptx
STRATEGY TO OVERCOME CURRENT PROBLEMS AT MTC.pptx
 
Movers near me in Dubai , Best Packers and Movers In Dubai
Movers near me in Dubai , Best Packers and Movers In DubaiMovers near me in Dubai , Best Packers and Movers In Dubai
Movers near me in Dubai , Best Packers and Movers In Dubai
 
Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...
Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...
Innovation Hub_ Spotlight on Toms River's Role as a Beacon for Entrepreneuria...
 
Top five predictions today, .
Top five predictions today,            .Top five predictions today,            .
Top five predictions today, .
 
New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
New Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
 
Connected Small Boat Protection Solution | July 2024
Connected Small Boat Protection Solution | July  2024Connected Small Boat Protection Solution | July  2024
Connected Small Boat Protection Solution | July 2024
 
TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...
TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...
TEST BANK For Auditing & Assurance Services A Systematic Approach, 12th Editi...
 
Accounts.pdfjsjshwnsksowowkwmwmsmmsmsksk
Accounts.pdfjsjshwnsksowowkwmwmsmmsmskskAccounts.pdfjsjshwnsksowowkwmwmsmmsmsksk
Accounts.pdfjsjshwnsksowowkwmwmsmmsmsksk
 
TALENT ACQUISITION AND MANAGEMENT LECTURE 5
TALENT ACQUISITION AND MANAGEMENT LECTURE 5TALENT ACQUISITION AND MANAGEMENT LECTURE 5
TALENT ACQUISITION AND MANAGEMENT LECTURE 5
 
Qatar Airways Kuwait Office.pdf.........
Qatar Airways Kuwait Office.pdf.........Qatar Airways Kuwait Office.pdf.........
Qatar Airways Kuwait Office.pdf.........
 
WAM Corporate Presentation July 2024.pdf
WAM Corporate Presentation July 2024.pdfWAM Corporate Presentation July 2024.pdf
WAM Corporate Presentation July 2024.pdf
 
Retail Store Scavenger Hunt powerpoint slides
Retail Store Scavenger Hunt powerpoint slidesRetail Store Scavenger Hunt powerpoint slides
Retail Store Scavenger Hunt powerpoint slides
 
21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf
21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf
21stcenturyskillsframeworkfinalpresentation2-240509214747-71edb7ee.pdf
 
Path to the next normal collection McKinsey
Path to the next normal collection McKinseyPath to the next normal collection McKinsey
Path to the next normal collection McKinsey
 
YouTube Automation Step-by-step Guide.pdf
YouTube Automation Step-by-step Guide.pdfYouTube Automation Step-by-step Guide.pdf
YouTube Automation Step-by-step Guide.pdf
 

A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

  • 1. A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework Tobias Kuhn Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich, Switzerland Computational Linguistics Colloquium, University of Zurich 29 October 2013
  • 2. About This Talk This talk is mainly based on the following papers: Kaarel Kaljurand and Tobias Kuhn. A Multilingual Semantic Wiki Based on Attempto Controlled English and Grammatical Framework. In Proceedings of the 10th Extended Semantic Web Conference (ESWC). 2013. http://purl.org/tkuhn/eswc2013acewikigf Kaarel Kaljurand, Tobias Kuhn, and Laura Canedo. Collaborative multilingual knowledge management based on controlled natural language. Under review. http://www.semantic-web-journal.net/system/files/swj524.pdf Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 2 / 32
  • 3. Imagine ... ... that Wikipedia can check consistency and answer questions about the contained knowledge, and ... that all content is instantly available in all languages! Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 3 / 32
  • 4. • AceWiki is a semantic wiki • Articles are written in Attempto Controlled English (ACE) • These sentences are internally translated into the Semantic Web language OWL • An OWL reasoner is built in to answer questions and detect inconsistencies • Special editor for writing ACE statements • Has been extended to support multilinguality Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 4 / 32
  • 5. Monolingual AceWiki: Screenshot Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 5 / 32
  • 6. Attempto Controlled English (ACE) Subset of natural English: • Conjunction, disjunction, negation, if-then, ... • Anaphoric references: pronouns, definite noun phrases, variables • Quantifiers: every, no, at least 3, ... • Content words: proper names, nouns, verbs, adjectives, ... Grammar is fixed, but users can change content words. Deterministic ambiguity handling: • Anaphora resolution (France borders Spain and it borders Portugal.) • Quantifier scope (Every country borders a country.) • Attachment (Every EU-country borders a country that is an EU-country and is a NATO-country.) Well-defined translations to and from first-order logic, OWL, ... Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 6 / 32
  • 7. Predictive Editor Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 7 / 32
  • 8. Consistency Checking AceWiki ensures consistency by checking every new statement: Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 8 / 32
  • 9. Question Answering AceWiki supports simple wh-questions: Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 9 / 32
  • 10. Monolingual AceWiki: Demo http://attempto.ifi.uzh.ch/acewiki/ Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 10 / 32
  • 11. ACE Reasoning via Translation to OWL Every country that does not border a sea is a landlocked-country. SubClassOf( ObjectIntersectionOf( :country ObjectComplementOf( ObjectSomeValuesFrom( :border :sea ) ) ) :landlocked-country ) Which country is a landlocked-country? ObjectIntersectionOf( :country :landlocked-country ) Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 11 / 32
  • 12. Evaluation Two small usability experiments with earlier versions of AceWiki: • Altogether 26 untrained participants • Task: Collaborative creation of a knowledge base Results: • 78%-81% of the sentences were correct and sensible • 61%-70% of them were complex (containing negations, implications, disjunctions or number restrictions) • Creation of a correct sentence every 5–6 minutes • Definition of a new word every 5–7 minutes → Even untrained users can effectively use AceWiki Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 12 / 32
  • 13. Multilingual AceWiki: AceWiki-GF General ideas: • Make wiki content available in different languages • Automatically translated content using rule-based machine translation: Grammatical Framework (GF) • Language switching like in Wikipedia • Localization of the user interface Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 13 / 32
  • 14. Grammatical Framework (GF) GF is a framework for multilingual grammar engineering: • Rule-based • Functional programming language (based on Haskell) optimized to handle natural language • Resource Grammar Library implementing common morphological and syntactic structures • Mildly context sensitive • Bidirectional translations: concrete language ⇔ abstract syntax Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 14 / 32
  • 15. GF grammars and translations GF grammars consist of: • One language-neutral abstract syntax • Concrete syntaxes specify words, agreement, word order, etc. by implementing the abstract categories and functions Example border : Country -> Country -> Relation English: border x y = x!Nom + "borders" + y!Nom Estonian: border x y = x!Gen + "naaber on" + y!Nom GF translations consist of: • First, parse a string in the original language to a tree (or trees) in the abstract syntax • Then, linearize these trees as strings in the target language Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 15 / 32
  • 16. Multilingual AceWiki: Screenshot Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 16 / 32
  • 17. GF Resource Grammar Library (RGL) • Morphology and syntax for ∼30 languages via language-neutral API • Developers do not need detailed knowledge of the languages that they want to support in their application Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 17 / 32
  • 18. Implementation of AceWiki-GF Integration of ACE with GF (ACE-in-GF): • Implementation of a multilingual grammar of ACE in the GF framework • Coverage of the languages supported by the GF resource grammar • No fine-tuning to any particular language (apart from ACE) Integration of AceWiki with GF (AceWiki-GF): • Implementation of connections to GF tools (GF Webservice / Cloud Service) • Support for the management of multilinguality, ambiguity, and grammar/lexicon editing Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 18 / 32
  • 19. Multilingual AceWiki: Demo http://attempto.ifi.uzh.ch/acewiki-gf/ Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 19 / 32
  • 20. ACE-in-GF • Multiple controlled versions of natural languages that map to ACE (and to each other) • As a result, they can be bidirectionally mapped to various formal languages already supported by ACE Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 20 / 32
  • 21. ACE-in-GF: Example German: Jedes Land, das nicht an ein Meer grenzt, ist ein Binnenland. ACE-in-GF tree: baseText (sText (s (vpS (everyNP (relCN (cn_as_VarCN country_CN) (neg_predRS which_RP (v2VP border_V2 (thereNP_as_NP (aNP (cn_as_VarCN sea_CN))))))) (npVP (thereNP_as_NP (aNP (cn_as_VarCN landlocked_country_CN))))))) ACE: Every country that does not border a sea is a landlocked-country. OWL: SubClassOf( ObjectIntersectionOf( :country ObjectComplementOf( ObjectSomeValuesFrom( :border :sea ) ) ) :landlocked-country ) Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 21 / 32
  • 22. ACE-in-GF: Implementation Implementation of the ACE syntax: • Targeting the subset of ACE that can be mapped to OWL • Almost 100% coverage at almost 0% ambiguity Support of most RGL languages: • Bulgarian, Catalan, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Italian, Latvian, Norwegian, Polish, Romanian, Russian, Spanish, Swedish, Thai, Urdu • RGL-based design provides automatic increase in quality and language-coverage over time Status • Some precision problems, e.g. with anaphoric references • Ambiguity and coverage problems in some languages Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 22 / 32
  • 23. Ambiguity Resolution Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 23 / 32
  • 24. Evaluation of ACE-in-GF Design • Generation of ∼100 ACE sentences/questions and automatic translation to all supported languages • Full coverage of grammar functions • Large coverage of OWL axiom structures (subclass, range, domain, transitivity, ...) • Measuring translation accuracy from ACE to other languages • Using Google Translate as the baseline • 20 human evaluators (2 per language) as the gold standard Results • Participants preferred ACE-in-GF translations to Google translations and post-edited them less • Many edits were stylistic Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 24 / 32
  • 25. Evaluation of ACE-in-GF: Results Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 25 / 32
  • 26. Evaluation of AceWiki-GF Hypothesis: A group of users reaches almost the same level of agreement on the content of an article presented to them in different languages as when the article is presented to all of them in the same language. Design • Based on a 500-word lexicon on European geography in three languages: English, German and Spanish • 30 participants accessed AceWiki-GF and wrote sentences in their language (10 participants for each language) • They had to enter true and false sentences and tag them as such • In a post-editing task, each participant checked the output of two other participants: one translated from another language and one written in the same language (true/false tags were removed and sentences shuffled); they were asked to remove all false sentences Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 26 / 32
  • 27. Evaluation of AceWiki-GF: Results 30 participants spent on average 37 minutes using AceWiki-GF, creating 316 sentences in total. Definition of agreement level: (Tk + Fd )/S S is the total number of sentences, Tk the number of sentences marked as true and kept, and Fd the ones marked as false and deleted Agreement level (difference is not significant): without translation 82.2% with translation 0% Tobias Kuhn, ETH Zurich 25% 84.0% 50% agreement level 75% A Multilingual Semantic Wiki 100% 27 / 32
  • 28. Evaluation of AceWiki-GF: Results Assumption: translation introduces a constant translation error rate r that has the effect that the agreement level is (1 − r ) × a instead of a New hypothesis: The translation error rate is less than 5%. with hypothetical translation (r = 5%) 78.1% with translation 0% 25% 84.0% 50% agreement level 75% 100% p-value with one-tailed Wilcoxon signed-rank test: 0.046 → With AceWiki-GF, translation error rate is less than 5% Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 28 / 32
  • 29. Evaluation of AceWiki-GF: Feedback Questionnaire for the participants contained these questions: 1 Was AceWiki Geography easy or difficult to use in general? 2 Was the sentence editor easy or difficult to use? 3 Was creating true and false statements easy or difficult to perform? Possible answers: “very difficult” (0), “difficult” (1), “medium” (2), “easy” (3), and “very easy” (4) Results: 1 Average: 2.93 (∼“easy”) 2 Average: 2.77 (∼“easy”) 3 Average: 2.70 (∼“easy”) Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 29 / 32
  • 30. The Future...? Can we make a truly multilingual Wikipedia? • Store main content in a semantic representation • Verbalization in different languages • All content is instantly available in all languages (once the required vocabulary is defined) • Breaking the current dominance of English and putting an end to the lock-out of users speaking less widespread or underrepresented languages • Contributing to the Semantic Web Related: • http://www.wikidata.org • http://meta.wikimedia.org/wiki/A_proposal_towards_a_ multilingual_Wikipedia Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 30 / 32
  • 31. Links ACE parser (APE) source code: https://github.com/Attempto/APE ACE-in-GF source code: http://github.com/Attempto/ACE-in-GF AceWiki and AceWikiGF • Source code: http://github.com/AceWiki/AceWiki • Demos (non-GF): http://attempto.ifi.uzh.ch/acewiki/ • Demos (GF): http://attempto.ifi.uzh.ch/acewiki-gf/ MOLTO project web site: http://www.molto-project.eu Attempto web site: http://attempto.ifi.uzh.ch GF: http://www.grammaticalframework.org Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 31 / 32
  • 32. Thank you for your Attention! Questions? Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 32 / 32