Practical Natural Language Processing From Theory to Industrial Applications

2,404
-1

Published on

Practical Natural Language Processing

From Theory to Industrial Applications

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,404
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
153
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Practical Natural Language Processing From Theory to Industrial Applications

  1. 1. Practical Natural Language Processing From Theory to Industrial Applications Jaganadh G http://jaganadhg.in jaganadhg@gmail.com Karpagam University Coimbatore 19th March 2012 Jaganadh G Practical Natural Language Processing
  2. 2. About me !! Working in Natural Language Processing, Machine Learning, Data Mining etc... Passionate about Free and Open source :-) When gets free time teaches Python, Speaks about FOSS and blogs at http://jaganadhg.in I am a computational linguist / Linguist and Indologist, Book reviewer Software Engineer by Profession Jaganadh G Practical Natural Language Processing
  3. 3. Question ?? Have you ever used any Natural Language Processing based tools/services? Jaganadh G Practical Natural Language Processing
  4. 4. Question ?? Have you ever used any Natural Language Processing based tools/services? Jaganadh G Practical Natural Language Processing
  5. 5. Question ?? Have you ever used any Natural Language Processing based tools/services? Jaganadh G Practical Natural Language Processing
  6. 6. What is Natural Language Processing (NLP) ? Aim : To build intelligent systems that can interact with human beings as like human beings Jaganadh G Practical Natural Language Processing
  7. 7. What is Natural Language Processing (NLP) ? Aim : To build intelligent systems that can interact with human beings as like human beings Jaganadh G Practical Natural Language Processing
  8. 8. What is Natural Language Processing (NLP) ? Aim : To build intelligent systems that can interact with human beings as like human beings A sub-field of Artificial Intelligence (AI) Jaganadh G Practical Natural Language Processing
  9. 9. What is Natural Language Processing (NLP) ? Aim : To build intelligent systems that can interact with human beings as like human beings A sub-field of Artificial Intelligence (AI) Inter-disciplinary subject (Language + Linguistics + Statistics + Computer Science + .. ) Natural Language Refers to the language spoken by people, e.g. English,Japanese, Tamil, Malayalam as opposed to artificial languages, like C++, Java, etc. Jaganadh G Practical Natural Language Processing
  10. 10. Definition Natural Language Processing Natural Language Processing is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts/speech at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications. NLP was considered as an academic discipline before some 10 to 20 years. Now concepts from NLP is applied in variety of Computing Platforms and Services Jaganadh G Practical Natural Language Processing
  11. 11. Practical NLP ? Problem Picture Courtesy: http://twitpic.com/1y21qm/full Jaganadh G Practical Natural Language Processing
  12. 12. Practical NLP ? Problem Before going to some theory can we have some funny practical problems to solve ? Picture Courtesy: http://twitpic.com/1y21qm/full Jaganadh G Practical Natural Language Processing
  13. 13. Practical NLP ? Problem Before going to some theory can we have some funny practical problems to solve ? Picture Courtesy: http://twitpic.com/1y21qm/full Jaganadh G Practical Natural Language Processing
  14. 14. Practical NLP Problem Jaganadh G Practical Natural Language Processing
  15. 15. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Jaganadh G Practical Natural Language Processing
  16. 16. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Jaganadh G Practical Natural Language Processing
  17. 17. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Jaganadh G Practical Natural Language Processing
  18. 18. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Tweets related to enquirers Jaganadh G Practical Natural Language Processing
  19. 19. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Tweets related to enquirers They requires following things to be automated Jaganadh G Practical Natural Language Processing
  20. 20. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Tweets related to enquirers They requires following things to be automated Identify tweet category Jaganadh G Practical Natural Language Processing
  21. 21. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Tweets related to enquirers They requires following things to be automated Identify tweet category Process home-delivery request Jaganadh G Practical Natural Language Processing
  22. 22. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Tweets related to enquirers They requires following things to be automated Identify tweet category Process home-delivery request Evaluate quality related tweets Jaganadh G Practical Natural Language Processing
  23. 23. Practical NLP Problem Tweet-a-Toddy receives thousands of tweets per day Tweets requesting home delivery Tweets about quality of products Tweets related to enquirers They requires following things to be automated Identify tweet category Process home-delivery request Evaluate quality related tweets How? How to find a solution for Tweet-a-Toddy Jaganadh G Practical Natural Language Processing
  24. 24. Solution ?? Any Solutions Jaganadh G Practical Natural Language Processing
  25. 25. Solution ?? Any Solutions Some thoughts Jaganadh G Practical Natural Language Processing
  26. 26. Solution ?? Any Solutions Some thoughts Text Classification Jaganadh G Practical Natural Language Processing
  27. 27. Solution ?? Any Solutions Some thoughts Text Classification Entity Identification Jaganadh G Practical Natural Language Processing
  28. 28. Solution ?? Any Solutions Some thoughts Text Classification Entity Identification Information Extraction Jaganadh G Practical Natural Language Processing
  29. 29. Solution ?? Any Solutions Some thoughts Text Classification Entity Identification Information Extraction Sentiment Analysis Jaganadh G Practical Natural Language Processing
  30. 30. Solution ?? Any Solutions Some thoughts Text Classification Entity Identification Information Extraction Sentiment Analysis Parsing, gammer ... Jaganadh G Practical Natural Language Processing
  31. 31. Solution ?? Any Solutions Some thoughts Text Classification Entity Identification Information Extraction Sentiment Analysis Parsing, gammer ... Regex (Regular Expressions) Jaganadh G Practical Natural Language Processing
  32. 32. Another Practical Question Everybody might have used spell checker available in word processing systems like OpenOffice.org or Microsoft Word Any guess on how to develop a spell checker system ? Solutions Jaganadh G Practical Natural Language Processing
  33. 33. Another Practical Question Everybody might have used spell checker available in word processing systems like OpenOffice.org or Microsoft Word Any guess on how to develop a spell checker system ? Solutions Word List Jaganadh G Practical Natural Language Processing
  34. 34. Another Practical Question Everybody might have used spell checker available in word processing systems like OpenOffice.org or Microsoft Word Any guess on how to develop a spell checker system ? Solutions Word List Structure of words Jaganadh G Practical Natural Language Processing
  35. 35. Another Practical Question Everybody might have used spell checker available in word processing systems like OpenOffice.org or Microsoft Word Any guess on how to develop a spell checker system ? Solutions Word List Structure of words Dynamic Programming (Edit Distance) Jaganadh G Practical Natural Language Processing
  36. 36. Another Practical Question ... Context Sensitive Spell-checking Identifying and suggesting spelling of words based on context How ?? Jaganadh G Practical Natural Language Processing
  37. 37. Another Practical Question ... Context Sensitive Spell-checking Identifying and suggesting spelling of words based on context How ?? Solutions Jaganadh G Practical Natural Language Processing
  38. 38. Another Practical Question ... Context Sensitive Spell-checking Identifying and suggesting spelling of words based on context How ?? Solutions Statistical Models Jaganadh G Practical Natural Language Processing
  39. 39. Another Practical Question ... Context Sensitive Spell-checking Identifying and suggesting spelling of words based on context How ?? Solutions Statistical Models Word category based suggestions Jaganadh G Practical Natural Language Processing
  40. 40. Can Machines Translate ?? Answer !!! Jaganadh G Practical Natural Language Processing
  41. 41. Why NLP ? Because ”Information is Power !!!” Jaganadh G Practical Natural Language Processing
  42. 42. Why NLP ? Because ”Information is Power !!!” Picture Courtesy: http://soundsgood.in/wikipediafat print book/ Jaganadh G Practical Natural Language Processing
  43. 43. Why NLP ? Because ”Information is Power !!!” Every day wast amount of text and speech data is being produced Picture Courtesy: http://soundsgood.in/wikipediafat print book/ Jaganadh G Practical Natural Language Processing
  44. 44. Why NLP ? Because ”Information is Power !!!” Every day wast amount of text and speech data is being produced Internet == at least 40 Million pages Picture Courtesy: http://soundsgood.in/wikipediafat print book/ Jaganadh G Practical Natural Language Processing
  45. 45. Why NLP ? Because ”Information is Power !!!” Every day wast amount of text and speech data is being produced Internet == at least 40 Million pages Picture Courtesy: http://soundsgood.in/wikipediafat print book/ Jaganadh G Practical Natural Language Processing
  46. 46. History Jaganadh G Practical Natural Language Processing
  47. 47. History Second World War !!! Jaganadh G Practical Natural Language Processing
  48. 48. History Second World War !!! Machine Translation Jaganadh G Practical Natural Language Processing
  49. 49. History Second World War !!! Machine Translation Now : Jaganadh G Practical Natural Language Processing
  50. 50. History Second World War !!! Machine Translation Now : Most promising imperfect technology Jaganadh G Practical Natural Language Processing
  51. 51. History Second World War !!! Machine Translation Now : Most promising imperfect technology Moves from Lab to Industry to Layman Jaganadh G Practical Natural Language Processing
  52. 52. NLP Really Hard to Achieve? NLP delas with human languages Human Language is dynamic and mysterious !!! Jaganadh G Practical Natural Language Processing
  53. 53. NLP Really Hard to Achieve? NLP delas with human languages Human Language is dynamic and mysterious !!! Communication in Human Language Jaganadh G Practical Natural Language Processing
  54. 54. NLP Really Hard to Achieve? Levels of Knowledge encoding in Language Data Jaganadh G Practical Natural Language Processing
  55. 55. Tasks in NLP Broad Areas Jaganadh G Practical Natural Language Processing
  56. 56. Tasks in NLP Broad Areas Text Processing Jaganadh G Practical Natural Language Processing
  57. 57. Tasks in NLP Broad Areas Text Processing Speech Processing Jaganadh G Practical Natural Language Processing
  58. 58. Major tasks in Text Processing Jaganadh G Practical Natural Language Processing
  59. 59. Major tasks in Text Processing Word Level Analysis Jaganadh G Practical Natural Language Processing
  60. 60. Major tasks in Text Processing Word Level Analysis Morphological Synthesis Jaganadh G Practical Natural Language Processing
  61. 61. Major tasks in Text Processing Word Level Analysis Morphological Synthesis Part of Speech Tagging Jaganadh G Practical Natural Language Processing
  62. 62. Major tasks in Text Processing Word Level Analysis Morphological Synthesis Part of Speech Tagging Stemming Jaganadh G Practical Natural Language Processing
  63. 63. Major tasks in Text Processing Word Level Analysis Morphological Synthesis Part of Speech Tagging Stemming Lemmatization Jaganadh G Practical Natural Language Processing
  64. 64. Major tasks in Text Processing Word Level Analysis Morphological Synthesis Part of Speech Tagging Stemming Lemmatization Sentence Level Analysis - Syntactical Parsing Jaganadh G Practical Natural Language Processing
  65. 65. Major tasks in Text Processing Word Level Analysis Morphological Synthesis Part of Speech Tagging Stemming Lemmatization Sentence Level Analysis - Syntactical Parsing Discourse Analysis - Semantic Processing Jaganadh G Practical Natural Language Processing
  66. 66. Morphology The branch of linguistics that studies word structures. Jaganadh G Practical Natural Language Processing
  67. 67. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Jaganadh G Practical Natural Language Processing
  68. 68. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Morphological analysis can be explained as: the process of analyzing words to identify its constituents Jaganadh G Practical Natural Language Processing
  69. 69. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Morphological analysis can be explained as: the process of analyzing words to identify its constituents Computational Analysis of Morphology Morphological Analysis Jaganadh G Practical Natural Language Processing
  70. 70. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Morphological analysis can be explained as: the process of analyzing words to identify its constituents Computational Analysis of Morphology Morphological Analysis Jaganadh G Practical Natural Language Processing
  71. 71. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Morphological analysis can be explained as: the process of analyzing words to identify its constituents Computational Analysis of Morphology Morphological Analysis Morphological Generation Jaganadh G Practical Natural Language Processing
  72. 72. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Morphological analysis can be explained as: the process of analyzing words to identify its constituents Computational Analysis of Morphology Morphological Analysis Morphological Generation Stemming Jaganadh G Practical Natural Language Processing
  73. 73. Morphology The branch of linguistics that studies word structures. To a computer program a word is : ??? Morphological analysis can be explained as: the process of analyzing words to identify its constituents Computational Analysis of Morphology Morphological Analysis Morphological Generation Stemming Lemmatization Jaganadh G Practical Natural Language Processing
  74. 74. Practical Question from Morphology Approximate number of word forms that can be derived from the word ”maram” Jaganadh G Practical Natural Language Processing
  75. 75. Parts of Speech Tagging POS tagging is the process of marking up the words in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context. Ram goes to school. Ram/NNP goes/VBZ to/TO school/NN ./. Jaganadh G Practical Natural Language Processing
  76. 76. Parts of Speech Tagging POS tagging is the process of marking up the words in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context. Ram goes to school. Ram/NNP goes/VBZ to/TO school/NN ./. Words are ambiguous !!!! e.g. book, cricket, bank Jaganadh G Practical Natural Language Processing
  77. 77. Syntactical Parsing Parsing In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. Jaganadh G Practical Natural Language Processing
  78. 78. Syntactical Parsing Parsing In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. Sentences are ambiguous !!!! Jaganadh G Practical Natural Language Processing
  79. 79. Semantics Study of meaning ans its structure Jaganadh G Practical Natural Language Processing
  80. 80. Semantics Study of meaning ans its structure Word meaning is ambiguous !!!! E.g. marriage Jaganadh G Practical Natural Language Processing
  81. 81. Where can I apply this techniques? Machine Translation Systems Jaganadh G Practical Natural Language Processing
  82. 82. Where can I apply this techniques? Machine Translation Systems Search Engine Jaganadh G Practical Natural Language Processing
  83. 83. Where can I apply this techniques? Machine Translation Systems Search Engine Spell-checker Jaganadh G Practical Natural Language Processing
  84. 84. Where can I apply this techniques? Machine Translation Systems Search Engine Spell-checker Grammar Checker Jaganadh G Practical Natural Language Processing
  85. 85. Where can I apply this techniques? Machine Translation Systems Search Engine Spell-checker Grammar Checker .......... Jaganadh G Practical Natural Language Processing
  86. 86. Other Interesting Tasks Named Entity Identification Jaganadh G Practical Natural Language Processing
  87. 87. Other Interesting Tasks Named Entity Identification Information Extraction Jaganadh G Practical Natural Language Processing
  88. 88. Other Interesting Tasks Named Entity Identification Information Extraction Information Retrieval Jaganadh G Practical Natural Language Processing
  89. 89. Other Interesting Tasks Named Entity Identification Information Extraction Information Retrieval Text Classification and Clustering Jaganadh G Practical Natural Language Processing
  90. 90. Speech Processing Two Major Areas Text to Speech Speech Recognition Jaganadh G Practical Natural Language Processing
  91. 91. Speech Processing Two Major Areas Text to Speech Speech Recognition Practical Applications IVR Technology for Visually Challenged People Mobile Phones Speech Enabled Web Vehicle Mounted GPS Navigator Jaganadh G Practical Natural Language Processing
  92. 92. Commerical NLP Applications What Industry Looks Jaganadh G Practical Natural Language Processing
  93. 93. Commerical NLP Applications What Industry Looks Components of Word Processors Jaganadh G Practical Natural Language Processing
  94. 94. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Jaganadh G Practical Natural Language Processing
  95. 95. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Custom Search Systems Jaganadh G Practical Natural Language Processing
  96. 96. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Custom Search Systems Information Extraction Jaganadh G Practical Natural Language Processing
  97. 97. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Custom Search Systems Information Extraction Entity Identification Jaganadh G Practical Natural Language Processing
  98. 98. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Custom Search Systems Information Extraction Entity Identification Text Summarization Jaganadh G Practical Natural Language Processing
  99. 99. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Custom Search Systems Information Extraction Entity Identification Text Summarization Speech Systems Jaganadh G Practical Natural Language Processing
  100. 100. Commerical NLP Applications What Industry Looks Components of Word Processors Machine Translation Systems Custom Search Systems Information Extraction Entity Identification Text Summarization Speech Systems Question Answering Systems Jaganadh G Practical Natural Language Processing
  101. 101. Future of NLP Future!!! Semantics oriented technologies Jaganadh G Practical Natural Language Processing
  102. 102. NLP in other domains Bio-Medical Legal Forensic Science Advertisement Education Politics E-governance Business Development Marketing and where ever we use language !!! Jaganadh G Practical Natural Language Processing
  103. 103. Natural Language Processing in India Academic Institutions IIT Kanpur, Kharagpur, Bombay IIIT hydrabad IISc Bangalore AU-KBC Chennai Amritha University Ettimadai, Coimbatore IIITMK, Trivandrum Central University, Hydrabad JNU, Delhi Tamil University, Thanjore Jaganadh G Practical Natural Language Processing
  104. 104. Natural Language Processing in India Industry Microsoft Yahoo! AOL 365Media Pvt. Ltd. Inside View Thaazza AIAIO Labs Jaganadh G Practical Natural Language Processing
  105. 105. Questions ?? Jaganadh G Practical Natural Language Processing
  106. 106. References Daniel Jurafsky,James H. Martin, SPEECH and LANGUAGE PROCESSING, 2nd Edition. U.S. Tiwary, Tanveer Siddiqui , Natural Language Processing and Information Retrieval Jaganadh G Practical Natural Language Processing
  107. 107. Finally Jaganadh G Practical Natural Language Processing
  108. 108. Questions ?? Jaganadh G Practical Natural Language Processing
  109. 109. References Daniel Jurafsky,James H. Martin, SPEECH and LANGUAGE PROCESSING, 2nd Edition. U.S. Tiwary, Tanveer Siddiqui , Natural Language Processing and Information Retrieval Jaganadh G Practical Natural Language Processing
  110. 110. Finally Jaganadh G Practical Natural Language Processing

×