RSA Conference Exhibitor List 2024 - Exhibitors Data
Umd draft-2010 jun22
1. crowdsourcing for the crowd
generating and curating open and accessible linguistic data
Crowdsoucing and Translation Workshop University of Maryland June 10-11, 2010
2.
3. mission
The Language Commons seeks to increase open and accessible linguistic data of all
forms for all languages.
We are a consortium of individuals, institutes, organizations, and corporations working to
build and promote the tools, standards, policy, infrastructure, awareness, and community
needed to preserve the world’s linguistic diversity and gather the open data needed to
provide global access to knowledge and information across all languages.
4. urgency
[Linguistics may] go down in history as the only science that presided obliviously
over the disappearance of 90 per cent of the very field to which it is dedicated.
–Hale et al
We live during a brief period of overlap between the mass extinction of the world’s
languages and the advent of the digital age.
–Bird
5. rationale
Web-based: The multi-lingual, read/write web has created the opportunity to generate,
share, curate linguistic data
Open: leverage the momentum behind open content (Creative Commons) and open data
(data.gov) movements
Crowdsourced: Semisupervised communities can scale datasets (Haiti)
Capturing the public imagination: This project represents the convergence of a grand
social and grand scientific challenge
6. solution
Function as a consortium working in parallel on various aspects of the mission:
Collaborate on needed tools
Influence data/content publishers to open license their data
Influence policy makers to mandate an open linguistic data for publicly funded projects
Generate and curate open data among our consortium members
Work to identify and share resources - Language Commons SourceWiki
Pursue longer term goals for universal corpus infrastructure and API design
7. projects
NSF Si2 annotation framework for video/audio data (LDC/Meedan)
UN Corpus effort ~600 million words/ seven languages (LC Steering Committee)
Language Commons SourceWiki - presenting at WikiMania (Rosetta Project, Freebase)
Human Language Project universal corpus infrastructure and API design (Bird, Abney)
10. “the meaning
of a word is its
use in the
language”
Wittgenstein
Philosophical Investigations
11. a language is a socially constructed
framework for storing and
transporting meaning within a
community
however, there is an increasing need
to transport meaning across this:
26. translation for the distrib global newsroom
Wikipedia ethic to translation editing
+translation as dynamic
+revisions are collaborative
+show translation history
+the consumer as editor
+MT feedback loop
+able to translate more, better
+constantly improving
+Community vets translations
+humanizes the translator- translators profile
Makes media global, conversational, social, cross-language
http://news.meedan.net
27. translation a network of religious scholars
Translation as a form of scholarship
+no Machine Translation
+Domain trained translators
+Glossaries
+Annotation layers- addresses the namespace issue
+Granular Attribution - word/sentence/document level
28. the user interface
the meaning of a translation includes
the fact that it is a translation
29. Showing two languages side by side
counter to traditional UI/UX best practices
+provenance
+attribution
+version control
+visual cues
+url translation
+lots of human effort
31. other fun stuff: generating data, transporting
knowledge, globalizing great NGOs
WikiArabia Meedan Memory Kiva.org
+Project with KACST +Open AR/EN TM +Translate 700k words
+Translate 2000 articles +Circa 2m words +Jump start Kiva AR
+Science Tech Health +Informal domain +Cisco Funded
+116k articles in AR WP +on Github
+530 million AR speakers
Editor's Notes
A hugely beautiful piece of philosophy. Extend a word to be an idea or an action, like, say a war, and you can surmise that the meaning of that idea or war is equal to its use in the language. The problem is that we toss phrases like... clash of civilizations and..
‘the war on terror’ and ideas like the invasion of Iraq into a place where we cannot speak the language. we have single events that are shaping our global landscape that are understood in radically different ways by each party...
this is the Arabic phrase that approximates the ‘literal’ translation of the war on terror, it is approximately ‘the war against terrorism’ --it is only occasionally used in some of the conservative media outlets.
though more commonly it is ‘the war against Arabs’
in this phrasing it is simply, “Bush’s War”...
you can see that some of the original intent of the...
so the issue of what the war in Iraq means is very much a namespace issue- but when we understand the common referent and offer that a different signifier is used we can come a bit closer to understanding the complexity of the context for that referent.
humility
source author, source language, source location, translation author, etc, etc.