This document discusses building the localization web by publishing localization data on the semantic web using W3C standards. This enables terms and translations to become linkable resources that can add value when leveraged for machine translation and text analytics. The localization web is envisioned as a decentralized, annotated global translation memory and term base. Examples of use cases involving publishing linguistic data as linked open data are provided, as well as how different language resources, technology, and workflows are interdependent in the language lifecycle.
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
Interverbum falcon-10oct14-az
1. FALCON: Building the Localization Web
Andrzej Zydroń
XTM International
Dave Lewis
CNGL at Trinity College Dublin
2. • Open Data on the Web: W3C Semantic Web
standards allow data to be published on Web
– Fine-grained URI-based inter-linking
– Extensible meta-data
– Standard Query APIs
• Enables a Localization Web
– Terms and translations become linkable resources
– Meta-data from L10n workflows adds value
– Leverage in training Machine Translation and Text
Analytics
The Localization Web
The Localization Web = Decentralised Annotated
Global Translation Memory and Term Base
3. Linguistic Linked Data: Lexicons
Red
Phonetic form
Form
number
singular
[RED]
Form
plural
[REDES]
Phonetic form
number
Red
Sense
written form
“red”
Sense
written form
“malla”
equivalent
Red
image
Red
Sense Sense
translation
es - en
written form
“red” “network”
written form
Red
written form
Form
gender
femenine
“red”
5. Words as Resources on the Web
Barak Obama if the
44th president of the
United State of
America. He was first
elected in 2009.
Barak Obama si el 44 º
presidente de los Estados
Unidos de América. Ha
fue electo primera vez en
2009.
http:// www.ex.org/obama_en.html
http:// www.ex.org/obama_es.html
The Web of Content The Localization Web
http://data.ex.org/String_0001
http:// data.ex.org/String_0002
Derived
From
Derived
From
Text: “Barak Obama if
the 44th president of the
United State of
America.”
Lang:en
Text:“Barak Obama si el
44 º presidente de los
Estados Unidos de
América.”
Lang:es
TranslatedBy:Google
Translate
Translated
From
Translation Data
Term: “United State of
America.”
Lang:en
Term:“Estados Unidos
de América.”
Lang:es
Translation
Of
http:// babelnet.org/345621
http:// babelnet.org/57835
Terminology Data
Topic: Barak Obama
Lang: en
BirthDate: 1961-08-04
Spouse: Michelle Obama
Residence: White House
http:// Dbpedia.org/Page/
Barak_Obama
Encyclopaedic Data
9. Data Management Lifecycles
Publish
Correct &
refine
Lex-
concept
lifecycleCorrect &
refine
Discover &
use
Discover &
use
Correct &
refine
Bitext lifecycle
Discover
data
(Re)train-
MT
Revise and
annotate
Publish
Content
lifecycle
Publish
I18n &
source QA
Trans
QA
Post-
edit
Automated
translation
Consume Create
10. • Better in-context postediting:
– XTM-Easyling
• Feeding term suggestions from posteditor to Terminology Management
– XTM-Interverbum
• Dynamic Retraining
– XTM-DCU
• Bilingual Dictionary SMT improvements
– XTM-DCU
• NER, terminology enforcements, forced decoding
– XTM-Interverbum-DCU
• Postediting prioritisation and term flagging
– TCD-DCU-XTM
• Publishing interlinks of parallel text, lexically rich term bases
– TCD: DG-T TM, EurVoc, Snomed-CT, LEMON, BabelNet
• Closing the loop – operational instrumentation of postediting
– Translog, iOmegaT
Systems Integration
11. More Information
• Contact: dave.lewis@cs.tcd.ie
• https://www.w3.org/International/its/wiki/Open_Data_Management_for_Public_Au
tomated_Translation_Services
• http://www.falcon-project.eu
• See also:
– Linked Data for Language Technology (LD4LT) W3C
Community Group
– http://www.w3.org/community/ld4lt/