Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
crowdsourcing for the crowd
generating and curating open and accessible linguistic data




   Crowdsoucing and Translatio...
mission



The Language Commons seeks to increase open and accessible linguistic data of all
forms for all languages.

We ...
urgency



   [Linguistics may] go down in history as the only science that presided obliviously
   over the disappearance...
rationale

   Web-based: The multi-lingual, read/write web has created the opportunity to generate,
   share, curate lingu...
solution
   Function as a consortium working in parallel on various aspects of the mission:

     Collaborate on needed to...
projects

  NSF Si2 annotation framework for video/audio data (LDC/Meedan)

  UN Corpus effort ~600 million words/ seven l...
1. theory
“the meaning
of a word is its
   use in the
   language”
                   Wittgenstein
    Philosophical Investigations
a language is a socially constructed
framework for storing and
transporting meaning within a
community

however, there is ...
often the meaning (use) of the words
does not translate
huh?
the war on terror
global understanding problem




      ?                                                 ?




                           ...
semantic namespace problem
translation in sensitive contexts often
does not solve for understanding, it
merely exposes the
mis(dis)understanding
2. practice
we are building translation solutions for
bloggers and bishops
translation for news.meedan.net




                    http://news.meedan.net
translation for the distrib global newsroom
    Wikipedia ethic to translation editing

      +translation as dynamic
    ...
translation a network of religious scholars

     Translation as a form of scholarship

       +no Machine Translation
   ...
the user interface



the meaning of a translation includes
the fact that it is a translation
Showing two languages side by side
                        counter to traditional UI/UX best practices




+provenance
+at...
unintended consequences: language learning
other fun stuff: generating data, transporting
knowledge, globalizing great NGOs




WikiArabia                  Meedan Me...
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Upcoming SlideShare
Loading in …5
×

Umd draft-2010 jun22

1,182 views

Published on

presentation given at the University of Maryland workshop on Crowdsourcing Translation

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Umd draft-2010 jun22

  1. 1. crowdsourcing for the crowd generating and curating open and accessible linguistic data Crowdsoucing and Translation Workshop University of Maryland June 10-11, 2010
  2. 2. mission The Language Commons seeks to increase open and accessible linguistic data of all forms for all languages. We are a consortium of individuals, institutes, organizations, and corporations working to build and promote the tools, standards, policy, infrastructure, awareness, and community needed to preserve the world’s linguistic diversity and gather the open data needed to provide global access to knowledge and information across all languages.
  3. 3. urgency [Linguistics may] go down in history as the only science that presided obliviously over the disappearance of 90 per cent of the very field to which it is dedicated. –Hale et al We live during a brief period of overlap between the mass extinction of the world’s languages and the advent of the digital age. –Bird
  4. 4. rationale Web-based: The multi-lingual, read/write web has created the opportunity to generate, share, curate linguistic data Open: leverage the momentum behind open content (Creative Commons) and open data (data.gov) movements Crowdsourced: Semisupervised communities can scale datasets (Haiti) Capturing the public imagination: This project represents the convergence of a grand social and grand scientific challenge
  5. 5. solution Function as a consortium working in parallel on various aspects of the mission: Collaborate on needed tools Influence data/content publishers to open license their data Influence policy makers to mandate an open linguistic data for publicly funded projects Generate and curate open data among our consortium members Work to identify and share resources - Language Commons SourceWiki Pursue longer term goals for universal corpus infrastructure and API design
  6. 6. projects NSF Si2 annotation framework for video/audio data (LDC/Meedan) UN Corpus effort ~600 million words/ seven languages (LC Steering Committee) Language Commons SourceWiki - presenting at WikiMania (Rosetta Project, Freebase) Human Language Project universal corpus infrastructure and API design (Bird, Abney)
  7. 7. 1. theory
  8. 8. “the meaning of a word is its use in the language” Wittgenstein Philosophical Investigations
  9. 9. a language is a socially constructed framework for storing and transporting meaning within a community however, there is an increasing need to transport meaning across this:
  10. 10. often the meaning (use) of the words does not translate
  11. 11. huh?
  12. 12. the war on terror
  13. 13. global understanding problem ? ? Creative Commons - Mushon Zer-Aviv
  14. 14. semantic namespace problem
  15. 15. translation in sensitive contexts often does not solve for understanding, it merely exposes the mis(dis)understanding
  16. 16. 2. practice
  17. 17. we are building translation solutions for bloggers and bishops
  18. 18. translation for news.meedan.net http://news.meedan.net
  19. 19. translation for the distrib global newsroom Wikipedia ethic to translation editing +translation as dynamic +revisions are collaborative +show translation history +the consumer as editor +MT feedback loop +able to translate more, better +constantly improving  +Community vets translations +humanizes the translator- translators profile Makes media global, conversational, social, cross-language http://news.meedan.net
  20. 20. translation a network of religious scholars Translation as a form of scholarship +no Machine Translation +Domain trained translators +Glossaries +Annotation layers- addresses the namespace issue +Granular Attribution - word/sentence/document level
  21. 21. the user interface the meaning of a translation includes the fact that it is a translation
  22. 22. Showing two languages side by side counter to traditional UI/UX best practices +provenance +attribution +version control +visual cues +url translation +lots of human effort
  23. 23. unintended consequences: language learning
  24. 24. other fun stuff: generating data, transporting knowledge, globalizing great NGOs WikiArabia Meedan Memory Kiva.org +Project with KACST +Open AR/EN TM +Translate 700k words +Translate 2000 articles +Circa 2m words +Jump start Kiva AR +Science Tech Health +Informal domain +Cisco Funded +116k articles in AR WP +on Github +530 million AR speakers

×