A hugely beautiful piece of philosophy. Extend a word to be an idea or an action, like, say a war, and you can surmise that the meaning of that idea or war is equal to its use in the language. The problem is that we toss phrases like... clash of civilizations and..
&#x2018;the war on terror&#x2019; and ideas like the invasion of Iraq into a place where we cannot speak the language. we have single events that are shaping our global landscape that are understood in radically different ways by each party...
this is the Arabic phrase that approximates the &#x2018;literal&#x2019; translation of the war on terror, it is approximately &#x2018;the war against terrorism&#x2019; --it is only occasionally used in some of the conservative media outlets.
though more commonly it is &#x2018;the war against Arabs&#x2019;
in this phrasing it is simply, &#x201C;Bush&#x2019;s War&#x201D;...
you can see that some of the original intent of the...
so the issue of what the war in Iraq means is very much a namespace issue- but when we understand the common referent and offer that a different signifier is used we can come a bit closer to understanding the complexity of the context for that referent.
source author, source language, source location, translation author, etc, etc.
Umd draft-2010 jun22
crowdsourcing for the crowd
generating and curating open and accessible linguistic data
Crowdsoucing and Translation Workshop University of Maryland June 10-11, 2010
The Language Commons seeks to increase open and accessible linguistic data of all
forms for all languages.
We are a consortium of individuals, institutes, organizations, and corporations working to
build and promote the tools, standards, policy, infrastructure, awareness, and community
needed to preserve the world’s linguistic diversity and gather the open data needed to
provide global access to knowledge and information across all languages.
[Linguistics may] go down in history as the only science that presided obliviously
over the disappearance of 90 per cent of the very ﬁeld to which it is dedicated.
–Hale et al
We live during a brief period of overlap between the mass extinction of the world’s
languages and the advent of the digital age.
Web-based: The multi-lingual, read/write web has created the opportunity to generate,
share, curate linguistic data
Open: leverage the momentum behind open content (Creative Commons) and open data
Crowdsourced: Semisupervised communities can scale datasets (Haiti)
Capturing the public imagination: This project represents the convergence of a grand
social and grand scientiﬁc challenge
Function as a consortium working in parallel on various aspects of the mission:
Collaborate on needed tools
Inﬂuence data/content publishers to open license their data
Inﬂuence policy makers to mandate an open linguistic data for publicly funded projects
Generate and curate open data among our consortium members
Work to identify and share resources - Language Commons SourceWiki
Pursue longer term goals for universal corpus infrastructure and API design
NSF Si2 annotation framework for video/audio data (LDC/Meedan)
UN Corpus effort ~600 million words/ seven languages (LC Steering Committee)
Language Commons SourceWiki - presenting at WikiMania (Rosetta Project, Freebase)
Human Language Project universal corpus infrastructure and API design (Bird, Abney)
we are building translation solutions for
bloggers and bishops
translation for news.meedan.net
translation for the distrib global newsroom
Wikipedia ethic to translation editing
+translation as dynamic
+revisions are collaborative
+show translation history
+the consumer as editor
+MT feedback loop
+able to translate more, better
+Community vets translations
+humanizes the translator- translators proﬁle
Makes media global, conversational, social, cross-language
translation a network of religious scholars
Translation as a form of scholarship
+no Machine Translation
+Domain trained translators
+Annotation layers- addresses the namespace issue
+Granular Attribution - word/sentence/document level
the user interface
the meaning of a translation includes
the fact that it is a translation
Showing two languages side by side
counter to traditional UI/UX best practices
+lots of human effort
other fun stuff: generating data, transporting
knowledge, globalizing great NGOs
WikiArabia Meedan Memory Kiva.org
+Project with KACST +Open AR/EN TM +Translate 700k words
+Translate 2000 articles +Circa 2m words +Jump start Kiva AR
+Science Tech Health +Informal domain +Cisco Funded
+116k articles in AR WP +on Github
+530 million AR speakers