crowdsourcing for the crowd
generating and curating open and accessible linguistic data




   Crowdsoucing and Translatio...
mission



The Language Commons seeks to increase open and accessible linguistic data of all
forms for all languages.

We ...
urgency



   [Linguistics may] go down in history as the only science that presided obliviously
   over the disappearance...
rationale

   Web-based: The multi-lingual, read/write web has created the opportunity to generate,
   share, curate lingu...
solution
   Function as a consortium working in parallel on various aspects of the mission:

     Collaborate on needed to...
projects

  NSF Si2 annotation framework for video/audio data (LDC/Meedan)

  UN Corpus effort ~600 million words/ seven l...
1. theory
“the meaning
of a word is its
   use in the
   language”
                   Wittgenstein
    Philosophical Investigations
a language is a socially constructed
framework for storing and
transporting meaning within a
community

however, there is ...
often the meaning (use) of the words
does not translate
huh?
the war on terror
global understanding problem




      ?                                                 ?




                           ...
semantic namespace problem
translation in sensitive contexts often
does not solve for understanding, it
merely exposes the
mis(dis)understanding
2. practice
we are building translation solutions for
bloggers and bishops
translation for news.meedan.net




                    http://news.meedan.net
translation for the distrib global newsroom
    Wikipedia ethic to translation editing

      +translation as dynamic
    ...
translation a network of religious scholars

     Translation as a form of scholarship

       +no Machine Translation
   ...
the user interface



the meaning of a translation includes
the fact that it is a translation
Showing two languages side by side
                        counter to traditional UI/UX best practices




+provenance
+at...
unintended consequences: language learning
other fun stuff: generating data, transporting
knowledge, globalizing great NGOs




WikiArabia                  Meedan Me...
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Umd draft-2010 jun22
Upcoming SlideShare
Loading in …5
×

Umd draft-2010 jun22

855 views

Published on

presentation given at the University of Maryland workshop on Crowdsourcing Translation

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
855
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide









  • A hugely beautiful piece of philosophy. Extend a word to be an idea or an action, like, say a war, and you can surmise that the meaning of that idea or war is equal to its use in the language. The problem is that we toss phrases like... clash of civilizations and..







  • ‘the war on terror’ and ideas like the invasion of Iraq into a place where we cannot speak the language. we have single events that are shaping our global landscape that are understood in radically different ways by each party...
  • this is the Arabic phrase that approximates the ‘literal’ translation of the war on terror, it is approximately ‘the war against terrorism’ --it is only occasionally used in some of the conservative media outlets.
  • though more commonly it is ‘the war against Arabs’
  • in this phrasing it is simply, “Bush’s War”...

    you can see that some of the original intent of the...

  • so the issue of what the war in Iraq means is very much a namespace issue- but when we understand the common referent and offer that a different signifier is used we can come a bit closer to understanding the complexity of the context for that referent.
  • humility






  • source author, source language, source location, translation author, etc, etc.


  • Umd draft-2010 jun22

    1. 1. crowdsourcing for the crowd generating and curating open and accessible linguistic data Crowdsoucing and Translation Workshop University of Maryland June 10-11, 2010
    2. 2. mission The Language Commons seeks to increase open and accessible linguistic data of all forms for all languages. We are a consortium of individuals, institutes, organizations, and corporations working to build and promote the tools, standards, policy, infrastructure, awareness, and community needed to preserve the world’s linguistic diversity and gather the open data needed to provide global access to knowledge and information across all languages.
    3. 3. urgency [Linguistics may] go down in history as the only science that presided obliviously over the disappearance of 90 per cent of the very field to which it is dedicated. –Hale et al We live during a brief period of overlap between the mass extinction of the world’s languages and the advent of the digital age. –Bird
    4. 4. rationale Web-based: The multi-lingual, read/write web has created the opportunity to generate, share, curate linguistic data Open: leverage the momentum behind open content (Creative Commons) and open data (data.gov) movements Crowdsourced: Semisupervised communities can scale datasets (Haiti) Capturing the public imagination: This project represents the convergence of a grand social and grand scientific challenge
    5. 5. solution Function as a consortium working in parallel on various aspects of the mission: Collaborate on needed tools Influence data/content publishers to open license their data Influence policy makers to mandate an open linguistic data for publicly funded projects Generate and curate open data among our consortium members Work to identify and share resources - Language Commons SourceWiki Pursue longer term goals for universal corpus infrastructure and API design
    6. 6. projects NSF Si2 annotation framework for video/audio data (LDC/Meedan) UN Corpus effort ~600 million words/ seven languages (LC Steering Committee) Language Commons SourceWiki - presenting at WikiMania (Rosetta Project, Freebase) Human Language Project universal corpus infrastructure and API design (Bird, Abney)
    7. 7. 1. theory
    8. 8. “the meaning of a word is its use in the language” Wittgenstein Philosophical Investigations
    9. 9. a language is a socially constructed framework for storing and transporting meaning within a community however, there is an increasing need to transport meaning across this:
    10. 10. often the meaning (use) of the words does not translate
    11. 11. huh?
    12. 12. the war on terror
    13. 13. global understanding problem ? ? Creative Commons - Mushon Zer-Aviv
    14. 14. semantic namespace problem
    15. 15. translation in sensitive contexts often does not solve for understanding, it merely exposes the mis(dis)understanding
    16. 16. 2. practice
    17. 17. we are building translation solutions for bloggers and bishops
    18. 18. translation for news.meedan.net http://news.meedan.net
    19. 19. translation for the distrib global newsroom Wikipedia ethic to translation editing +translation as dynamic +revisions are collaborative +show translation history +the consumer as editor +MT feedback loop +able to translate more, better +constantly improving  +Community vets translations +humanizes the translator- translators profile Makes media global, conversational, social, cross-language http://news.meedan.net
    20. 20. translation a network of religious scholars Translation as a form of scholarship +no Machine Translation +Domain trained translators +Glossaries +Annotation layers- addresses the namespace issue +Granular Attribution - word/sentence/document level
    21. 21. the user interface the meaning of a translation includes the fact that it is a translation
    22. 22. Showing two languages side by side counter to traditional UI/UX best practices +provenance +attribution +version control +visual cues +url translation +lots of human effort
    23. 23. unintended consequences: language learning
    24. 24. other fun stuff: generating data, transporting knowledge, globalizing great NGOs WikiArabia Meedan Memory Kiva.org +Project with KACST +Open AR/EN TM +Translate 700k words +Translate 2000 articles +Circa 2m words +Jump start Kiva AR +Science Tech Health +Informal domain +Cisco Funded +116k articles in AR WP +on Github +530 million AR speakers

    ×