Adding Sentence Boundaries to Conversational Speech Transcriptions  using Noisily Labelled Examples Tetsuya Nasukawa, IBM Tokyo Research Lab Diwakar Punjani, IBM India Research Lab Shourya Roy , IBM India Research Lab L V Subramaniam , IBM India Research Lab Hironori Takeuchi, IBM Tokyo Research Lab Presented by : Shourya Roy
What are We Trying to do? Automatically identifying sentence boundaries in noisy transcriptions of conversational data. Transcriptions can be manual or automatic (ASR) It can work without any manual supervision The accuracy improves with manual supervision Detects only periods – not comma, semicolon
Importance – One Motivating Example from Real Life Huge amount of telephonic conversational data produced in various domains such as CRM, BPO Important to analyze to improve  customer satisfaction, agent productivity, market reputation NLP techniques on transcriptions is an obvious approach Transcriptions are noisy and does not contain any punctuation marks POS taggers and syntactic parsers perform poorly in absence of sentence boundaries Importance of analysis of transcriptions Importance of  sentence boundary detection for  transcriptions  analysis
Why Non Trivial Noise in the dataset Spontaneous nature of conversation Variation in style of speaking Boundary density varies from call to call Removing the calls with very low boundary density improves the scores by approx. 10%
Existing Solutions SBD on conversational data – not many work Based on Pause (Silence) Information
Example: Manual Transcription 64.88 67.59 A: i've i've barely been out of the country.  i wouldn't {breath}  65.10 67.16 B: {lipsmack} {breath}  67.64 71.26 A: i think my most memorable trip was when i was in high school. 70.57 71.81 B: {breath} uh-huh. 71.69 74.29 A: i went to %uh ^London and ^Paris. 74.29 75.01 B: %oh that's cool. 74.82 76.80 A: and that's about as exotic as it ever got. 76.75 77.76 B: {breath} was it fun? 77.49 79.95 A: %uh other than that, i haven't been west of ^Texas 80.04 80.44 B: %hm. 81.31 83.63 B: {breath} it looks like you are a east *coaster born and raised. 84.02 86.14 A: yeah. how about yourself? where are you? 86.74 87.38 B: {breath} i'm in ^Philly 87.72 90.78 A: you're in ^Philly, i guess? i wonder if everybody here is in ^Philly? probably. {breath}  88.57 89.01 B: yeah. 90.82 94.68 B: yeah, i think so because it's a ~U ^Penn thing. they probably just did it locally. plus  94.80 96.69 B: %uh are you using an ^Omnipoint phone? 96.82 97.23 A: uh-huh Timing Speaker Meta Info Names of Places
Example : Automatic Transcription then go to properties ok now once when you go to properties up if you scroll down there that he's having internet protocol ok you have to no i'm sorry just any scroll down that you're having a net firewall so that's no we have to check if there's a check next to it ok if it's not checked you have to get a check that ok and if if you do not so if you are calling you having a check all you have to do is i can check the net firewalls so this ok and you have to go ahead and reboot the system
Example then go to properties ok now once when you go to properties up if you scroll down there that he's having internet protocol ok you have to no i'm sorry just any scroll down that you're having a net firewall so that's no we have to check if there's a check next to it ok if it's not checked you have to get a check that ok and if if you do not so if you are calling you having a check all you have to do is i can check the net firewalls so this ok and you have to go ahead and reboot the system
Summary of Proposed Technique From (possibly imprecisely) marked sentence boundaries in conversational data identify n-grams which are more likely to occur at sentence boundaries than inside the sentence Mark sentence boundaries before (or after)  head  or ( tail ) n-grams in test data
Technique Preprocessing of data Pause filling words, repetitions, unclear words are removed Identify frequent head and tail n-grams from training data which occur in beginning and ending of sentences Filter n-grams which also occur significant number of times in middle of the sentences Threshold on head/tail:middle of sentence ratio Handle interruption and continuation across turns separately Words indicating incomplete turn e.g.  get ,  and
Technique (Contd.) In the test set mark a boundary before every head n-gram and after every tail n-gram In the case of boundaries marked based on silence information on ASR data, add new sentence boundaries  If the turn does not end with a word from the set of words indicating incomplete turn mark a boundary at the end of the turn
Nature of Data Manual Transcriptions Switchboard corpus and the Call-home corpus of transcribed phone conversations from LDC Automatic Transcriptions Manually put punctuations Automatically put punctuations based on silence ASR transcribed calls from IBM helpdesk Data Statistics
Manual Transcription Automatic Transcription 92839 13886 20821 Total Boundaries 6429 10345 16597 Complete Turns 1636889 219846 266476 Words 56000 28633 37822 Turns 1720 120 245 Total Calls IBM Helpdesk Call Home Switchboard
Results Result of punctuation insertion for helpdesk data Increasing Decreasing 0.65 0.37 F1 0.60 0.55 0.78 Only Head/Tail 0.96 0.28 0.54 Only Silence Word Error Rate (WER) Recall Precision Method 0.58 0.70 0.69 0.72 Head/Tail + Silence – FalseBoundaries 0.66 0.68 0.72 0.66 Head/Tail + Silence
Improvement in PoS Tagging PoS Tagging Accuracy on Helpdesk Data An example PoS tagging improving with sentence boundary detection Ideally ‘i’ should be pronoun and ‘yeah’ and ‘oh’ should be interjection
Improvement in PoS Tagging (Contd.) Extracted top 10 Noun Phrases from Switchboard Data Set
Word Error Rate (WER) =  FP+FN TP+FN ACTUAL PREDICTED TN FN FP TP
Distribution of “boundaries-at-the-turns” in randomly selected 50 calls from Switchboard data
Variation in #Completed Turns to Total # Turns
Summary Fundamental operation to be performed to apply state-of-the-art NLP techniques on (automatic) transcriptions of conversations  We proposed a technique to train a sentence boundary detector with minimal manual supervision It would be interesting to see how much improvement is happening in actual extraction task!
Questions?

Roy_p71

  • 1.
    Adding Sentence Boundariesto Conversational Speech Transcriptions using Noisily Labelled Examples Tetsuya Nasukawa, IBM Tokyo Research Lab Diwakar Punjani, IBM India Research Lab Shourya Roy , IBM India Research Lab L V Subramaniam , IBM India Research Lab Hironori Takeuchi, IBM Tokyo Research Lab Presented by : Shourya Roy
  • 2.
    What are WeTrying to do? Automatically identifying sentence boundaries in noisy transcriptions of conversational data. Transcriptions can be manual or automatic (ASR) It can work without any manual supervision The accuracy improves with manual supervision Detects only periods – not comma, semicolon
  • 3.
    Importance – OneMotivating Example from Real Life Huge amount of telephonic conversational data produced in various domains such as CRM, BPO Important to analyze to improve customer satisfaction, agent productivity, market reputation NLP techniques on transcriptions is an obvious approach Transcriptions are noisy and does not contain any punctuation marks POS taggers and syntactic parsers perform poorly in absence of sentence boundaries Importance of analysis of transcriptions Importance of sentence boundary detection for transcriptions analysis
  • 4.
    Why Non TrivialNoise in the dataset Spontaneous nature of conversation Variation in style of speaking Boundary density varies from call to call Removing the calls with very low boundary density improves the scores by approx. 10%
  • 5.
    Existing Solutions SBDon conversational data – not many work Based on Pause (Silence) Information
  • 6.
    Example: Manual Transcription64.88 67.59 A: i've i've barely been out of the country. i wouldn't {breath} 65.10 67.16 B: {lipsmack} {breath} 67.64 71.26 A: i think my most memorable trip was when i was in high school. 70.57 71.81 B: {breath} uh-huh. 71.69 74.29 A: i went to %uh ^London and ^Paris. 74.29 75.01 B: %oh that's cool. 74.82 76.80 A: and that's about as exotic as it ever got. 76.75 77.76 B: {breath} was it fun? 77.49 79.95 A: %uh other than that, i haven't been west of ^Texas 80.04 80.44 B: %hm. 81.31 83.63 B: {breath} it looks like you are a east *coaster born and raised. 84.02 86.14 A: yeah. how about yourself? where are you? 86.74 87.38 B: {breath} i'm in ^Philly 87.72 90.78 A: you're in ^Philly, i guess? i wonder if everybody here is in ^Philly? probably. {breath} 88.57 89.01 B: yeah. 90.82 94.68 B: yeah, i think so because it's a ~U ^Penn thing. they probably just did it locally. plus 94.80 96.69 B: %uh are you using an ^Omnipoint phone? 96.82 97.23 A: uh-huh Timing Speaker Meta Info Names of Places
  • 7.
    Example : AutomaticTranscription then go to properties ok now once when you go to properties up if you scroll down there that he's having internet protocol ok you have to no i'm sorry just any scroll down that you're having a net firewall so that's no we have to check if there's a check next to it ok if it's not checked you have to get a check that ok and if if you do not so if you are calling you having a check all you have to do is i can check the net firewalls so this ok and you have to go ahead and reboot the system
  • 8.
    Example then goto properties ok now once when you go to properties up if you scroll down there that he's having internet protocol ok you have to no i'm sorry just any scroll down that you're having a net firewall so that's no we have to check if there's a check next to it ok if it's not checked you have to get a check that ok and if if you do not so if you are calling you having a check all you have to do is i can check the net firewalls so this ok and you have to go ahead and reboot the system
  • 9.
    Summary of ProposedTechnique From (possibly imprecisely) marked sentence boundaries in conversational data identify n-grams which are more likely to occur at sentence boundaries than inside the sentence Mark sentence boundaries before (or after) head or ( tail ) n-grams in test data
  • 10.
    Technique Preprocessing ofdata Pause filling words, repetitions, unclear words are removed Identify frequent head and tail n-grams from training data which occur in beginning and ending of sentences Filter n-grams which also occur significant number of times in middle of the sentences Threshold on head/tail:middle of sentence ratio Handle interruption and continuation across turns separately Words indicating incomplete turn e.g. get , and
  • 11.
    Technique (Contd.) Inthe test set mark a boundary before every head n-gram and after every tail n-gram In the case of boundaries marked based on silence information on ASR data, add new sentence boundaries If the turn does not end with a word from the set of words indicating incomplete turn mark a boundary at the end of the turn
  • 12.
    Nature of DataManual Transcriptions Switchboard corpus and the Call-home corpus of transcribed phone conversations from LDC Automatic Transcriptions Manually put punctuations Automatically put punctuations based on silence ASR transcribed calls from IBM helpdesk Data Statistics
  • 13.
    Manual Transcription AutomaticTranscription 92839 13886 20821 Total Boundaries 6429 10345 16597 Complete Turns 1636889 219846 266476 Words 56000 28633 37822 Turns 1720 120 245 Total Calls IBM Helpdesk Call Home Switchboard
  • 14.
    Results Result ofpunctuation insertion for helpdesk data Increasing Decreasing 0.65 0.37 F1 0.60 0.55 0.78 Only Head/Tail 0.96 0.28 0.54 Only Silence Word Error Rate (WER) Recall Precision Method 0.58 0.70 0.69 0.72 Head/Tail + Silence – FalseBoundaries 0.66 0.68 0.72 0.66 Head/Tail + Silence
  • 15.
    Improvement in PoSTagging PoS Tagging Accuracy on Helpdesk Data An example PoS tagging improving with sentence boundary detection Ideally ‘i’ should be pronoun and ‘yeah’ and ‘oh’ should be interjection
  • 16.
    Improvement in PoSTagging (Contd.) Extracted top 10 Noun Phrases from Switchboard Data Set
  • 17.
    Word Error Rate(WER) = FP+FN TP+FN ACTUAL PREDICTED TN FN FP TP
  • 18.
    Distribution of “boundaries-at-the-turns”in randomly selected 50 calls from Switchboard data
  • 19.
    Variation in #CompletedTurns to Total # Turns
  • 20.
    Summary Fundamental operationto be performed to apply state-of-the-art NLP techniques on (automatic) transcriptions of conversations We proposed a technique to train a sentence boundary detector with minimal manual supervision It would be interesting to see how much improvement is happening in actual extraction task!
  • 21.