Your SlideShare is downloading. ×
香港六合彩-六合彩 » SlideShare
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

香港六合彩-六合彩 » SlideShare

557
views

Published on


可是,香港六合彩实在不放心啊。高强忧心忡忡道。谢文东露出宽心的笑容,说道:我说没事,自然会没事,即使有变故,我一个香港六合彩应对也会更方便一些。见众香港六合彩还要说话,香港六合彩晃晃手,道:不用再说了,就这么定了。

谢文东还是走了,坐当天晚上的飞机,身边只带了两名小弟,喝了一些酒,是李爽姜森等香港六合彩为香港六合彩饯行准备的。

香港六合彩香港六合彩都认为香港六合彩此去北京异常凶险,但香港六合彩却不这么认为,谢文东不傻,至少比世界上大部分香港六合彩要

Published in: Economy & Finance

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
557
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Working Together: A Collaborative Approach to DIY Corpora Lynne Bowker University of Ottawa, Canada [email_address]
  • 2. Overview
    • Background
    • Experiment
    • Results
    • Discussion
    • Observations about WWW as corpus resource
    • Concluding remarks
  • 3. Background
    • Previous experience with corpus use in translation classroom:
      • A single large “multipurpose” corpus
        • One size doesn’t fit all…
      • Individual corpora built by trainer
        • Students not learning about corpus design and building; total exhaustion of trainer…
      • Individual corpora built by students
        • Small and poorly designed…
  • 4. There has to be a better way…
    • Inspiration from
      • those who have shown that group work can work
        • e.g. Maia, Varantola
      • advocaters of a “learning-centred” approach
        • e.g. Kiraly, Yuste
  • 5. Collaborative approach
    • Build one corpus per text/group of related texts
    • Entire class would contribute to each corpus
    • Parameters:
      • a) coordinators, b) number of texts contributed, c) quality of texts, d) time frame, e) file format
  • 6. a) coordinators
    • 2 students per corpus
      • Coordinators don’t have to contribute
    • Receive corpus submissions from other students via email
      • Special email account set up
    • Act as “clearing house”
      • evaluate relevance of texts and eliminate duplicates
    • Collate remaining texts into corpus for posting on class website
  • 7. b) number of texts contributed by each student per corpus
    • Class size of 20-30 students
      • 22 students for this experiment
    • Each student tries to contribute 3 relevant texts
      • Can submit more if they find more
    • 3 x 22 = 66 – some duplicates = reasonable size corpus
      • Likely to be larger than those previously created by individuals
  • 8. c) quality of texts
    • Students must put time and care into text selection
      • If everyone simply sends the first 3 hits found using Alta Vista:
        • texts may not be relevant
        • there would be too much duplication
  • 9. d) time frame
    • Everyone needs a reasonable amount of time to do their job
      • Trainer provides source text 3 weeks in advance
      • Students have 1 week to submit texts
      • Coordinators have 1 week to evaluate and collate texts
      • Everyone has 1 week to consult corpus
  • 10. e) file format
    • Texts to be submitted as plain text files
      • Easier for coordinators (don’t need to convert files or have access to different software packages)
      • Resulting corpus can be manipulated using a variety of corpus processing tools
      • Lower risk of spreading viruses
  • 11. Resulting corpora   Subject Text type Texts submitted Texts rejected Number of texts / words in corpus Passwords FAQ Web page 58 35 23 texts / 40,600 words Antivirus programs Instructional 78 22 56 texts / 170,919 words Encryption Informative/popularized 74 19 55 texts / 216,522 words Firewalls Buyer’s guide 63 18 45 texts / 136,017 words Steganography Product description 35 21 14 texts / 7,401 words Biometrics Research article 29 17 12 texts / 69,651 words Cookies Technical encyclopedia entry 41 19 22 texts / 11,754 words
  • 12. Corpus 1: FAQ on passwords
    • High degree of duplication
      • All students used web (not library)
      • Most used Alta Vista search engine
      • Most simply took first 3 hits
      • Students were informed that different search engines index different pages; about meta-search engines
      • Agreed to consult more resources and look beyond first 3 hits
  • 13. Corpora 2 - 4
    • Popular subjects
      • Viruses, encryption, firewalls
    • Relatively common text types
      • Instructional text, buyer’s guide, popularized articles
        • lots of info available
      • Some students submitted more than 3 texts
      • Less duplication than with corpus 1
  • 14. Corpus 5: steganography
    • Less common subject
      • Not popular with “average” users
    • Text type: product description
      • Relatively few commercial packages available
      • Fewer texts to choose from
      • more judged “not relevant” (wrong text type)
        • Students couldn’t find texts meeting all the criteria but wanted to submit something so they chose anything at all on the subject of steganography
  • 15. Corpus 6: biometrics
    • Recent research article
      • Many links looked promising, but required paid subscription
      • Free texts were “older” (not state of the art)
      • Relatively few texts submitted
        • But texts were long so word count relatively high
  • 16. Corpus 7: cookies
    • Online technical encyclopedia entry
      • Limited number of comparable texts
      • Texts were quite short
        • Low word count
  • 17. Observations about using Web as a resource for corpora
    • great resource on the whole, but does have some limitations
      • Sometimes overwhelmed by information
        • Must formulate queries carefully to reduce noise
        • Think about criteria beyond subject (e.g. type)
          • “ cookies” vs +cookies +encyclopedia
      • Sometimes underwhelmed by information
        • Try same query using different search engines
  • 18.
    • Quality control
      • Anyone can post material – be selective
      • Seen as an ephemeral resource
    • Limited range of text types available
      • General interest  widely available, free
      • Specialized  more limited selection, subscription
    • Nature of the web
      • Good web design not conducive to easy corpus building  hyperlinked documents time-consuming to download
      • Multimedia texts not always suitable for text-based corpora
  • 19. Concluding remarks
    • An overall success
      • Corpora were more useful than either the “multipurpose” corpus or the corpora built by individual students
        • General improvement in quality of translations
      • Shift in pedagogical strategy gave students opportunity to become independent learners
        • Reflect on suitability of resources
        • Reflect on issues of text type
      • Students were positive about the experience