• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Crowdsourcing Transcription Beyond Mechanical Turk
 

Crowdsourcing Transcription Beyond Mechanical Turk

on

  • 483 views

Talk at AAAI Human Computation 2013 Workshop on Scaling Speech, Language Understanding and Dialogue through Crowdsourcing (November 9, 2013): ...

Talk at AAAI Human Computation 2013 Workshop on Scaling Speech, Language Understanding and Dialogue through Crowdsourcing (November 9, 2013): http://faculty.washington.edu/mtjalve/HCOMP2013.Workshop.html

Statistics

Views

Total Views
483
Views on SlideShare
483
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Crowdsourcing Transcription Beyond Mechanical Turk Crowdsourcing Transcription Beyond Mechanical Turk Presentation Transcript

    • Crowdsourcing Transcription Beyond Mechanical Turk Haofeng Zhou, Denys Baskov, Matthew Lease Matthew Lease School of Information University of Texas at Austin @mattlease ml@utexas.edu
    • Roadmap • Natural Speech: Opportunity & Challenge • Strengths & Limitations of AMT research – e.g. AMT-based Transcription • Qualitative review of 8 transcription providers • Quantitative evaluation of 4 providers • Observations & Contributions 2
    • The Rise of Stored Natural Speech • Conversational speech is the most ubiquitous form of human communication on the planet • We can now capture & store our conversations in new ways & at massive scale • But… need effective technology to search massive conversational speech archives • Oard: “Unlocking the Potential of the Spoken Word” 3
    • Oral History as a Testbed 4
    • oh i'll you know are yeah yeah yeah yeah yeah yeah yeah the very why don't we start with you saying anything in your about grandparents great grandparents well as a small child i remember only one of my grandfathers and his wife his second wife he was selling flour and the type of business it was he didn't even have a store he just a few sacks of different flour and the entrance of an apartment building and people would pass by everyday and buy a chela but two killers of flour we have to remember related times were there was no already baked bread so people had to baked her own bread all the time for some strange reason i do remember fresh rolls where everyone would buy every day but not the bread so that was the business that's how he made a living where was this was the name of the town it wasn't shammay dish he ours is we be and why i as i know in southern poland and alisa are close 5
    • Perfect ASR: Raw Transcription I never left new York before I didn't know anything else so some fellow I knew he said I have a friend that lives in Tucson Arizona so I went to the map looked it up I never heard of Tucson he says I'll write him a letter and when you go there you could stay with him so he did he wrote a letter and his friend he was a dentist he invited me to come over there and spend a week with him 6
    • Rich Transcription [so I didn't] * I never left New York before. I didn't know anything else. So some fellow I knew [mentioned that] <uh> * he said I have a friend that lives [in Arizona] * in Tucson Arizona. So I went to the map looked it up. <um> I never heard of Tucson. <uh and anyhow> He says <well> I'll write him a letter and when you go there you could <uh> stay with him. So he did. He wrote a letter. And his friend, he was a dentist. He invited me to come over there and spend a week with him. 7
    • Transcription Research via AMT • • • • • • • • Audhkhasi et al. (2011) Evanini et al. (2010) Gruenstein et al. (2009) Lee et al. (2011) Marge et al. (2010) Novotney et al. (2010) Parent et al. (2010) Williams et al. (2011) 8
    • 9
    • Why Eytan Adar hates MTurk Research (CHI 2011 CHC Workshop) • Overly-narrow focus on MTurk – Identify general vs. platform-specific problems – Academic vs. Industrial problems • How much should we focus on “...writing the user’s manual for MTurk ... struggl[ing] against the limits of the platform...”? 10
    • HCOMP 2013 Panel Anand Kulkarni: “How do we dramatically reduce the complexity of getting work done with the crowd?” Greg Little: How can we post a task and with 98% confidence know we’ll get a quality result? 11
    • Beyond AMT: An Analysis of Crowd Work Platforms • Vakharia & Lease, arXiv online 2013 • Near-exclusive research focus on AMT risks its particular vagaries and limitations overly shaping our understanding of crowd work and the research questions and directions being pursued. • We present a cross-platform content analysis of seven crowd work platforms. 12
    • Transcription Providers 13
    • Qualitative Analysis • Base Price • Accuracy • Transcript Formats • Time stamps • Speaker Identification/Changes • Verbatim • Turnaround Time • Difficult Audio Surcharge 14
    • Experiment • 10-minute segments from 6 interviews – USC-SFI MALACH English corpus (LDC2012S05) • 4 low-cost service providers – – – – CastingWords (CW) Transcription Hub (TH) 1-888-Type-It-Up (VerbalFusion, VF) oDesk: 3 workers • Format Issues & Data Cleaning • Aligned with revised CMU Sphinx code 15
    • Word Error Rate (WER) vs. Cost Service Provider with Price Rate Interviews Transcripts 00017 00038 00042 00058 00740 13078 Avg. by Provider Accuracy/ $ Ratio CastingWords (CW) ($60/hr per audio) 31.356 9.707 0.154 33.198 17.005 0.881 23.273 14.885 0.822 28.624 15.976 0.814 16.833 11.643 1.996 26.452 14.129 2.119 26.623 13.891 1.131 1.435 Transcription Hub (TH) ($45/hr per audio) 30.233 8.450 0.155 34.628 18.405 1.022 29.129 18.308 1.221 33.433 18.399 1.197 18.071 9.036 2.495 28.874 14.588 2.116 29.061 14.531 1.368 1.899 1-888-Type-It-Up (VF) (avg $125/hr per audio) 28.874 9.524 0.151 26.819 11.051 1.011 18.543 11.175 0.662 23.921 11.658 0.454 12.559 6.212 2.296 24.072 10.977 2.120 22.465 10.099 1.116 0.719 oDesk Worker1 (OD1) ($5.56/hr per work) 31.144 10.510 0.155 29.787 16.884 1.098 30.465 13.697 0.626 15.522 24.281 13.600 2.594 7.777 36.199 20.886 1.678 5.696 oDesk Worker2 (OD2) ($11.11/hr per work) 20.066 12.226 2.591 oDesk Worker3 (OD3) ($13.89/hr per work) Avg. by Interview 34.415 22.545 1.623 30.402 9.548 0.154 31.108 15.836 1.003 37.983 19.228 1.734 26.340 16.728 1.082 30.990 16.315 1.050 28.495 14.973 2.597 16.883 9.779 2.345 26.973 13.667 2.238 28.183 14.451 1.419 16
    • Errors Distribution in WER 3000 Misc Name Alignment PostError Spelling Revision Repetition Filler RefError Partial Background 2500 Errors 2000 1500 1000 500 0 CW OD TH VF 17
    • Hidden Costs • Management costs beyond Base Price – Crowdsourcing studies rarely discuss other costs (other costs dwarf crowd costs…) • CW, TH and VF's price higher than oDesk • But… oDesk: no management cost in the price rate, but additional effort was needed – communicate with workers to negotiate price – clarify requirements, and monitor work – take risk of low quality or late/no delivery 18
    • Contributions • Snapshot in time of current crowdsourcing transcription providers & offerings beyond AMT – Those looking for alternatives today – Retrospective studies • Quantitative WER vs. cost for spontaneous speech transcription across providers • Discussion of tradeoffs among quality, cost, risk & effort in crowdsourcing transcription 19
    • Thank You! Matt Lease ml@utexas.edu Slides: www.slideshare.net/mattlease ir.ischool.utexas.edu