Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automatic Key Term Extraction and Summarization from Spoken Course Lectures

461 views

Published on

National Taiwan University - MS Oral Defense (Jun., 2011)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Automatic Key Term Extraction and Summarization from Spoken Course Lectures

  1. 1. Speaker: Yun-Nung Chen 陳縕儂 Advisor: Prof. Lin-Shan Lee 李琳山 National Taiwan University Automatic Key Term Extraction and Summarization from Spoken Course Lectures 課程錄音之自動關鍵用語擷取及摘要
  2. 2. Introduction 2 Master Defense, National Taiwan University Target: extract key terms and summaries from course lectures
  3. 3. Key Term Summary O Indexing and retrieval O The relations between key terms and segments of documents 3 Introduction Master Defense, National Taiwan University O Efficiently understand the document Related to document understanding and semantics from the document Both are “Information Extraction”
  4. 4. 4 Master Defense, National Taiwan University
  5. 5. Definition O Key Term O Higher term frequency O Core content O Two types O Keyword O Ex. “語音” O Key phrase O Ex. “語言 模型” 5 Master Defense, National Taiwan University
  6. 6. Automatic Key Term Extraction 6 ▼ Original spoken documents Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Master Defense, National Taiwan University
  7. 7. Automatic Key Term Extraction 7 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Master Defense, National Taiwan University
  8. 8. Automatic Key Term Extraction 8 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Master Defense, National Taiwan University
  9. 9. Phrase Identification Automatic Key Term Extraction 9 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal First using branching entropy to identify phrases ASR trans Master Defense, National Taiwan University
  10. 10. Phrase Identification Key Term Extraction Automatic Key Term Extraction 10 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal Key terms entropy acoustic model : Then using learning methods to extract key terms by some features ASR trans Master Defense, National Taiwan University
  11. 11. Phrase Identification Key Term Extraction Automatic Key Term Extraction 11 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal Key terms entropy acoustic model : ASR trans Master Defense, National Taiwan University
  12. 12. Branching Entropy 12 O Inside the phrase hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  13. 13. Branching Entropy 13 O Inside the phrase O Inside the phrase hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  14. 14. Branching Entropy 14 hidden Markov model boundary Define branching entropy to decide possible boundary How to decide the boundary of a phrase? represent is can : : is of in : : O Inside the phrase O Inside the phrase O Boundary of the phrase Master Defense, National Taiwan University
  15. 15. Branching Entropy 15 hidden Markov model O Definition of Right Branching Entropy O Probability of xi given X O Right branching entropy for X X xi How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  16. 16. Branching Entropy 16 hidden Markov model O Decision of Right Boundary O Find the right boundary located between X and xi where X boundary How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  17. 17. Branching Entropy 17 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  18. 18. Branching Entropy 18 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  19. 19. Branching Entropy 19 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University boundary Using PAT tree to implement
  20. 20. Phrase Identification Key Term Extraction Automatic Key Term Extraction 20 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal Key terms entropy acoustic model : Extract prosodic, lexical, and semantic features for each candidate term ASR trans Master Defense, National Taiwan University
  21. 21. Feature Extraction 21 OProsodic features O For each candidate term appearing at the first time Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Speaker tends to use longer duration to emphasize key terms using 4 values for duration of the term duration of phone “a” normalized by avg duration of phone “a” Master Defense, National Taiwan University
  22. 22. Feature Extraction 22 OProsodic features O For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Master Defense, National Taiwan University
  23. 23. Feature Extraction 23 OProsodic features O For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Master Defense, National Taiwan University
  24. 24. Feature Extraction 24 OProsodic features O For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Master Defense, National Taiwan University
  25. 25. Feature Extraction 25 OProsodic features O For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Energy (I - IV) energy (max, min, mean, range) Master Defense, National Taiwan University
  26. 26. Feature Extraction 26 OLexical features Feature Name Feature Description TF term frequency IDF inverse document frequency TFIDF tf * idf PoS the PoS tag Using some well-known lexical features for each candidate term Master Defense, National Taiwan University
  27. 27. Feature Extraction 27 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Probability Key terms tend to focus on limited topics t1 t2 tj tn D1 D2 Di DN TK Tk T2 T1 P(T |D )k i P(t |T )j k Di: documents Tk: latent topics tj: terms Master Defense, National Taiwan University
  28. 28. Feature Extraction 28 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Probability Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics describe a probability distribution Master Defense, National Taiwan University
  29. 29. Feature Extraction 29 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Significance Within-topic to out-of-topic ratio Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics within-topic freq. out-of-topic freq. Master Defense, National Taiwan University
  30. 30. Feature Extraction 30 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Significance Within-topic to out-of-topic ratio Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics within-topic freq. out-of-topic freq. Master Defense, National Taiwan University
  31. 31. Feature Extraction 31 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Entropy Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics Master Defense, National Taiwan University
  32. 32. Feature Extraction 32 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Entropy Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) LTE term entropy for latent topic non-key term key term Key terms tend to focus on limited topics Higher LTE Lower LTE Master Defense, National Taiwan University
  33. 33. Phrase Identification Key Term Extraction Automatic Key Term Extraction 33 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Key terms entropy acoustic model : Using supervised approaches to extract key terms Master Defense, National Taiwan University
  34. 34. Learning Methods 34 OAdaptive Boosting (AdaBoost) ONeural Network Automatically adjust the weights of features to train a classifier Master Defense, National Taiwan University
  35. 35. Automatic Key Term Extraction 35 Master Defense, National Taiwan University
  36. 36. Experiments 36 OCorpus O NTU lecture corpus O Mandarin Chinese embedded by English words O Single speaker O 45.2 hours OASR System O Bilingual AM with model adaptation [1] O LM with adaptation using random forests [2] Master Defense, National Taiwan University Language Mandarin English Overall Char Acc (%) 78.15 53.44 76.26 [1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011. [2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and Random Forests,” Master Thesis, 2011.
  37. 37. Experiments 37 OReference Key Terms O Annotations from 61 students who have taken the course O If the an annotator labeled 150 key terms, he gave each of them a score of 1/150 , but 0 to others O Rank the terms by the sum of all scores given by all annotators for each term O Choose the top N terms form the list O N is average number of key terms O N = 154 key terms O 59 key phrases and 95 keywords OEvaluation O 3-fold cross validation Master Defense, National Taiwan University
  38. 38. 0 10 20 30 40 50 60 Pr Lx Sm Pr+Lx Pr+Lx+Sm Experiments 38 OFeature Effectiveness O Neural network for keywords from ASR transcriptions Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful 20.78 42.86 35.63 48.15 56.55 Pr: Prosodic Lx: Lexical Sm: Semantic F-measure Master Defense, National Taiwan University
  39. 39. 0 10 20 30 40 50 60 70 N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network ASR Manual Experiments 39 OOverall Performance (Keywords & Key Phrases) Baseline Branching entropy performs well F-measure Master Defense, National Taiwan University N-Gram TFIDF Branching Entropy TFIDF Branching Entropy AdaBoost Branching Entropy Neural Network key phrase keyword 23.44 52.60 57.68 62.70
  40. 40. The performance of manual is slightly better than ASR 0 10 20 30 40 50 60 70 N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network ASR Manual Experiments 40 OOverall Performance (Keywords & Key Phrases) 55.84 Baseline 62.39 67.31 32.19 Supervised learning using neural network gives the best results F-measure Master Defense, National Taiwan University N-Gram TFIDF Branching Entropy TFIDF Branching Entropy AdaBoost Branching Entropy Neural Network key phrase keyword 23.44 52.60 57.68 62.70
  41. 41. 41 Master Defense, National Taiwan University
  42. 42. Introduction 42 OExtractive Summary O Important sentences in the document OComputing Importance of Sentences O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram Score, Grammatical Structure Score ORanking Sentences by Importance and Deciding Ratio of Summary Master Defense, National Taiwan University Proposed a better statistical measure of a term
  43. 43. Statistical Measure of a Term 43 OLTE-Based Statistical Measure (Baseline) OKey-Term-Based Statistical Measure O Considering only key terms O Weighted by LTS of the term Master Defense, National Taiwan University Tk-1 Tk Tk+1… … Key terms can represent core content of the document Latent topic probability can be estimated more accurately ti ϵ key ti ϵ key
  44. 44. Importance of the Sentence 44 OOriginal Importance O LTE-based statistical measure O Key-term-based statistical measure ONew Importance O Considering original importance and similarity of other sentences Master Defense, National Taiwan University Sentences similar to more sentences should get higher importance
  45. 45. Random Walk on a Graph 45 O Idea O Sentences similar to more important sentences should be more important O Graph Construction O Node: sentence in the document O Edge: weighted by similarity between nodes O Node Score O Interpolating two scores O Normalized original score of sentence Si O Scores propagated from neighbors according to edge weight p(j, i) Master Defense, National Taiwan University Nodes connecting to more nodes with higher scores should get higher scores score of Si in k-th iter.
  46. 46. Random Walk on a Graph 46 OTopical Similarity between Sentences O Edge weight sim(Si, Sj): (sentence i  sentence j) O Latent topic probability of the sentence O Using Latent Topic Significance Master Defense, National Taiwan University Sj t LTS Si … …Tk Tk+1 tj Tk-1 ti tk
  47. 47. Random Walk on a Graph 47 OScores of Sentences O Converged equation O Matrix form O Solution dominate eigen vector of P’ OIntegrated with Original Importance Master Defense, National Taiwan University
  48. 48. Automatic Summarization 48 Master Defense, National Taiwan University
  49. 49. Experiments 49 OSame Corpus and ASR System O NTU lecture corpus OReference Summaries O Two human produced reference summaries for each document O Ranking sentences from “the most important” to “of average importance” OEvaluation Metric O ROUGE-1, ROUGE-2, ROUGE-3 O ROUGE-L: Longest Common Subsequence (LCS) Master Defense, National Taiwan University
  50. 50. Evaluation 50 Master Defense, National Taiwan University 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ASR ROUGE-1 LTE Key ROUGE-2 ROUGE-3 ROUGE-L Key-term-based statistical measure is helpful
  51. 51. Evaluation 51 Master Defense, National Taiwan University 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L Random walk can help the LTE-based statistical measure 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L Random walk can also help the key-term-based statistical measure LTE LTE + RW Key Key + RW ASR ASR Topical similarity can compensate recognition errors
  52. 52. Evaluation 52 Master Defense, National Taiwan University 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L ASR LTE LTE + RW Key Key + RW 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% Manual Key-term-based statistical measure and random walk using topical similarity are useful for summarization
  53. 53. 53 Master Defense, National Taiwan University
  54. 54. Automatic Key Term Extraction Automatic Summarization • The performance can be improved by ▫ Key-term-based statistical measure ▫ Random walk with topical similarity  Compensating recognition errors  Giving higher scores to sentences topically similar to more important sentences  Considering all sentences in the document 54 • The performance can be improved by ▫ Identifying phrases by branching entropy ▫ Prosodic, lexical, and semantic features together Conclusions Master Defense, National Taiwan University
  55. 55. Published Papers: [1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of SLT, 2010. [2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of InterSpeech, 2011. 55 Master Defense, National Taiwan University

×