Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh: Merging Crowdsourcing and Computational Approaches for Digital Humanities

516 views

Published on

V Międzynarodowa Konferencja Naukowa Nauka o informacji (informacja naukowa) w okresie zmian Innowacyjne usługi informacyjne. Wydział Dziennikarstwa, Informacji i Bibliologii Katedra Informatologii, Uniwersytet Warszawski, Warszawa, 15 – 16 maja 2017

Published in: Education
  • Be the first to comment

  • Be the first to like this

Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh: Merging Crowdsourcing and Computational Approaches for Digital Humanities

  1. 1. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Merging Crowdsourcing and Computational Approaches for Digital Humanities A Case of Mark Twain Translations Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh The 4th International Scientific Conference Information Science in the Age of Change Warsaw, 15th – 16th May 2017 Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  2. 2. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Outline Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  3. 3. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Objectives Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  4. 4. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Needs Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  5. 5. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Corpora and Ressources Dimensions Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  6. 6. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Types of methods Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  7. 7. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Which Types of Tasks Which Types of Resources Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  8. 8. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Which Types of Tasks Which Types of Resources Tasks of Natural Language Processing Natural Language Processing processes language material (e.g., text documents) to perform useful tasks Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  9. 9. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Which Types of Tasks Which Types of Resources NLP Tasks for Digital Humanities Multilingual Text Analysis Automatic Construction of Multilingual Knowledge Lexicons Terminologies Ontologies etc. Machine Translation Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  10. 10. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Which Types of Tasks Which Types of Resources Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  11. 11. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Which Types of Tasks Which Types of Resources Parallel Corpora D´efinition A parallel corpus is a corpus that contains a collection of original texts in language L1 and their translations into a set of languages L2...Ln Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  12. 12. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Which Types of Tasks Which Types of Resources Annotated Parallel Corpora Entities Relation markers Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  13. 13. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Linguistic level Corpora : Texts, Sentences, Sentence segments Lexical : Words and terms Language well-endowed languages under resourced languages Domain general-purpose resources Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  14. 14. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  15. 15. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Human Construction of Language Resources The ’traditional’ way Writing a lexicon Writing a thesaurus Writing a grammar Patterns, local grammar Phrase-structure rules Lexico-Syntactico-Semantic patterns or rules Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  16. 16. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  17. 17. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Collective Human Intelligence for Language Resources construction Crowdsourcing is ”the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.”[Howe, 2006] Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  18. 18. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Crowdsourcing Crowdsourcing is ”the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.”[Howe, 2006] no a priori selection of the participants (”open call”) Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  19. 19. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Crowdsourcing Crowdsourcing is ”the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.”[Howe, 2006] no a priori selection of the participants (”open call”) massive (in production and participation) Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  20. 20. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Crowdsourcing Crowdsourcing is ”the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.”[Howe, 2006] no a priori selection of the participants (”open call”) massive (in production and participation) (relatively) cheap Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  21. 21. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Crowdsourcing model Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  22. 22. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Previous Works : [Fraisse and al., 2014] Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  23. 23. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  24. 24. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Traditional Human Creation Crowdsourcing Our Approach 2-Step Approach for Resources Construction Step 1 : Building initial translations core Crawling Open bases and online source to collect : the source version of a literary text and its translations into a number of well-endowed languages (such as French, German, or Spanish). Step 2 : Data Enrichment Incrementally extend this core to other languages through crowdsourcing data collection tasks, which, should allow us to collect translations into languages that would otherwise be inaccessible. Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  25. 25. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  26. 26. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Mark Twain’s Adventures of Huckleberry Finn Why ? The digitization of the writings of American author Mark Twain (1835-1910) is already very much advanced. Adventures of Huckleberry Finn, deals with transnational and universal topics such as slavery, freedom, childhood, racism, and coming of age ; this focus, combined with the astounding number of translations available, make it an ideal text to use for the prototype in an investigation of the global circulation of a literary text. Large portions of his writings are now in the public domain Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  27. 27. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  28. 28. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 2-Steps for building Parallel Mark Twain’s Corpora 1 Collect the English source text of Adventures of Huckleberry Finn (English) and its translations into a number of well-endowed languages (we started by French) Using open bases offered by National Libraries or any other online source. 2 Use crowdsourcing to collect translations into languages that would otherwise be inaccessible. Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  29. 29. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Data Collection Tasks Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  30. 30. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Data Collection Tasks Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  31. 31. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Translations Tasks Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  32. 32. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation The Deep Maps Model (Shelley Fisher Fishkins, 2011) Deep Maps would embed links to archival texts and images in nodes on an interactive map. To construct them, scholars would mine digital archives around the world for material to include as links, using the durable URL of the text or image in the digital archive in which it resides, as well as additional relevant source information (including the online citation and, if available, the original print source of the text or image as indicated in the online source where it is found). Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  33. 33. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation The Deep Maps Model (Shelley Fisher Fishkins, 2011) Deep Maps would focus on topics that cross borders and would include links to texts and images in different locations—sometimes in different languages, and sometimes reflecting conflicting interpretations of the material involved. Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  34. 34. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation The Deep Maps Model (Shelley Fisher Fishkins, 2011) Deep Maps would be accessible to as broad an international public as possible. Ideally they would be free and would be available as pedagogical tools to any teacher or student with access to the internet. Ideally, they would be hosted on open access university or other non profit websites. Scholars involved in creating Deep Maps would work with colleagues and consortiums working in this area with technical expertise to develop user interfaces that were simple and clean. Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  35. 35. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation User interface Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  36. 36. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation Outline 1 Introduction 2 Needs Which Types of Tasks Which Types of Resources 3 Dimensions 4 Methods for Construction Corpora Traditional Human Creation Crowdsourcing Our Approach 5 The case of Mark Twain Translations Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation 6 Conclusion Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  37. 37. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Experiemnt Setup Building and Visualizing Parallel Corpora Quality Evaluation User Feedback Users would be given the possibility of expressing their opinions about the collected translations. Expressed opinions and comments will be automatically analysed (opinion mining task) in order to propose a first classification of opinions in polarity according to the following four classes : Positive (translation of good quality), Negative (translation of bad quality), Mixed (translation has as many positive and negative opinions) and Neutral (when the given opinion is none of the above). Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  38. 38. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Conclusion and future works We propose a new paradigm to assess the contribution of crowdsourcing-based models for collection, and annotation purposes. Setting up a generic methodology for tracking the global circulation of any literary text Future works : Include other types of documents related to the novel of Mark Twain(scientific paper, studies, etc.) Using collected parallel corpora to extract multilingual knowledge Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita
  39. 39. Introduction Needs Dimensions Methods for Construction Corpora The case of Mark Twain Translations Conclusion Thank you ! Amel Fraisse, Ronald Jenn, Quoc-Tan Tran, Samia Takhtoukh Merging Crowdsourcing and Computational Approaches for Digita

×