Your SlideShare is downloading. ×
Semi-supervised classification for natural language processing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Semi-supervised classification for natural language processing

292
views

Published on

This presentation describes semi-supervised learning and its application on natural language processing tasks

This presentation describes semi-supervised learning and its application on natural language processing tasks

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
292
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SEMI-SUPERVISED CLASSIFICATION FOR NATURAL LANGUAGE PROCESSING
  • 2. PRESENTATION AT A GLANCE • – – – – • – – – – • 2
  • 3. SEMI-SUPERVISED LEARNING • – • • 3
  • 4. SEMI-SUPERVISED LEARNING PROBLEMS (1) Learn from labeled data Inductive Learning (2) Apply learning on unlabeled data to label them Transductive Learning (4) Apply learning on unseen unlabeled data (3) If confident in labeling, then learn from (1) and (2) 4
  • 5. SEMI-SUPERVISED LEARNING PROBLEMS • – • – 5
  • 6. SCOPES OF SEMI-SUPERVISED LEARNING • – – – 6
  • 7. HOW DOES SEMI-SUPERVISED CLASSIFICATION WORK? 7
  • 8. TYPES OF SEMI-SUPERVISED LEARNING • • • • • 8
  • 9. GENERATIVE VS DISCRIMINATIVE MODELS (x,y) Discriminative Models Generative Models 9
  • 10. GENERATIVE VS DISCRIMINATIVE MODELS • • • 10
  • 11. GENERATIVE VS DISCRIMINATIVE MODELS • • • 11
  • 12. GENERATIVE VS DISCRIMINATIVE MODELS Conditional Probability, to determine class boundaries Transductive SVM, Graph-based methods Joint Probability P(x,y), for any given y, we can generate its x EM Algorithm, Self-learning Cannot be used without considering P(x) Difficult because P(x|y) are inadequate 12
  • 13. GENERATIVE VS DISCRIMINATIVE MODELS • • • • • • 13
  • 14. IS THERE A FREE LUNCH? • – • 14
  • 15. IS THERE A FREE LUNCH? • • • 15
  • 16. IS THERE A FREE LUNCH? • – • 16
  • 17. SELF-TRAINING 17
  • 18. CO-TRAINING • • 18
  • 19. CO-TRAINING 19
  • 20. CO-TRAINING • • • • • 20
  • 21. CO-TRAINING • • • • 21
  • 22. CO-TRAINING: COVEATS 22
  • 23. ACTIVE LEARNING 23
  • 24. WHICH METHOD SHOULD I USE? • – • – • – • – 24
  • 25. WHICH METHOD SHOULD I USE? • – • – 25
  • 26. SEMI-SUPERVISED CLASSIFICATION FOR NLP • • • • 26
  • 27. EFFECTIVE SELF-TRAINING FOR PARSING 27
  • 28. INTRODUCTION • • – 28
  • 29. METHODS • • • – 29
  • 30. DATASETS • – • • • – 30
  • 31. RESULTS • – • – – 31
  • 32. LIMITATIONS • • • – 32
  • 33. SEMI-SUPERVISED SPAM FILTERING: DOES IT WORK? 33
  • 34. INTRODUCTION • • 34
  • 35. BACKGROUND • – • • • • • – • • • 35
  • 36. BACKGROUND • • • • 36
  • 37. BACKGROUND • – – 37
  • 38. METHODS AND MATERIALS • – • • • – • • 38
  • 39. RESULTS: DELAYED FEEDBACK VS CROSS-USER Delayed Feedback Cross-User 39
  • 40. RESULTS: CROSS-CORPUS • • 40
  • 41. EXTRACTIVE SUMMARIZATION USING SUPERVISED AND SEMI-SUPERVISED LEARNING 41
  • 42. INTRODUCTION • • 42
  • 43. METHOD • • – • – – 43
  • 44. DATASETS • • – • • – • – 44
  • 45. RESULTS: FEATURE SELECTION • Human Summary ROUGE I Score was 0.422 45
  • 46. RESULTS: EFFECT OF UNLABELED DATA More labeled data produced better Fscore 46
  • 47. RESULTS: SUPERVISED VS SEMI-SUPERVISED 47
  • 48. RESULTS: EFFECT OF SUMMARY LENGTH 48
  • 49. LIMITATIONS • – • – 49
  • 50. SEMI-SUPERVISED CLASSIFICATION FOR EXTRACTING PROTEIN INTERACTION SENTENCES USING DEPENDENCY PARSING 50
  • 51. INTRODUCTION • • • • • 51
  • 52. INTRODUCTION • • 52
  • 53. METHOD • • 53
  • 54. DATASETS • – – 54
  • 55. RESULTS: AIMED DATASET 55
  • 56. RESULTS: CB DATASET 56
  • 57. RESULTS: EFFECT OF TRAINING DATA SIZE (AIMED) • • 57
  • 58. RESULTS: EFFECT OF TRAINING DATA SIZE (CB) • • 58
  • 59. LIMITATIONS • – • – 59
  • 60. HOW MUCH UNLABELED DATA IS USED? 60
  • 61. CONCLUSIONS • • – – – 61
  • 62. CONCLUSIONS • – – – – 62
  • 63. CONCLUSIONS 63