Practical Natural Language Processing

1,151 views
1,090 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,151
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Practical Natural Language Processing

  1. 1. Practical Natural Language ProcessingFrom Theory to Industrial ApplicationsJaganadh Ghttp://jaganadhg.injaganadhg@gmail.comIIIT-MKThiruvananthapuram6thApril 2013Jaganadh G Practical Natural Language Processing
  2. 2. About me !!Working as Data Scientist to a Fortune 500 CompanyWorking in Natural Language Processing, MachineLearning, Data Mining etc...Passionate about Free and Open source :-)When gets free time teaches Python, Speaks about FOSSand blogs athttp://jaganadhg.inI am a computational linguist / Linguist and Indologist,Book reviewerSoftware Engineer by ProfessionJaganadh G Practical Natural Language Processing
  3. 3. Past to FutureJaganadh G Practical Natural Language Processing
  4. 4. Question ??Have you ever used any Natural Language Processing basedtools/services?Jaganadh G Practical Natural Language Processing
  5. 5. Question ??Have you ever used any Natural Language Processing basedtools/services?Jaganadh G Practical Natural Language Processing
  6. 6. Question ??Have you ever used any Natural Language Processing basedtools/services?Jaganadh G Practical Natural Language Processing
  7. 7. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsJaganadh G Practical Natural Language Processing
  8. 8. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsJaganadh G Practical Natural Language Processing
  9. 9. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsA sub-field of Artificial Intelligence (AI)Jaganadh G Practical Natural Language Processing
  10. 10. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsA sub-field of Artificial Intelligence (AI)Inter-disciplinary subject (Language + Linguistics +Statistics + Computer Science + .. )Natural LanguageRefers to the language spoken by people, e.g.English,Japanese, Tamil, Malayalam as opposed to artificiallanguages, like C++, Java, etc.Jaganadh G Practical Natural Language Processing
  11. 11. DefinitionNatural Language ProcessingNatural Language Processing is a theoretically motivated rangeof computational techniques for analyzing and representingnaturally occurring texts/speech at one or more levels oflinguistic analysis for the purpose of achieving human-likelanguage processing for a range of tasks or applications.NLP was considered as an academic discipline beforesome 10 to 20 years.Now concepts from NLP is applied in variety ofComputing Platforms and ServicesJaganadh G Practical Natural Language Processing
  12. 12. Practical NLP ?ProblemPicture Courtesy: http://twitpic.com/1y21qm/fullJaganadh G Practical Natural Language Processing
  13. 13. Practical NLP ?ProblemBefore going to some theory can we have some funnypractical problems to solve ?Picture Courtesy: http://twitpic.com/1y21qm/fullJaganadh G Practical Natural Language Processing
  14. 14. Practical NLP ?ProblemBefore going to some theory can we have some funnypractical problems to solve ?Picture Courtesy: http://twitpic.com/1y21qm/fullJaganadh G Practical Natural Language Processing
  15. 15. Practical NLPProblemJaganadh G Practical Natural Language Processing
  16. 16. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayJaganadh G Practical Natural Language Processing
  17. 17. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryJaganadh G Practical Natural Language Processing
  18. 18. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsJaganadh G Practical Natural Language Processing
  19. 19. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersJaganadh G Practical Natural Language Processing
  20. 20. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedJaganadh G Practical Natural Language Processing
  21. 21. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryJaganadh G Practical Natural Language Processing
  22. 22. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryProcess home-delivery requestJaganadh G Practical Natural Language Processing
  23. 23. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryProcess home-delivery requestEvaluate quality related tweetsJaganadh G Practical Natural Language Processing
  24. 24. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryProcess home-delivery requestEvaluate quality related tweetsHow?How to find a solution for Tweet-a-ToddyJaganadh G Practical Natural Language Processing
  25. 25. Solution??Any SolutionsJaganadh G Practical Natural Language Processing
  26. 26. Solution??Any SolutionsSome thoughtsJaganadh G Practical Natural Language Processing
  27. 27. Solution??Any SolutionsSome thoughtsText ClassificationJaganadh G Practical Natural Language Processing
  28. 28. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationJaganadh G Practical Natural Language Processing
  29. 29. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionJaganadh G Practical Natural Language Processing
  30. 30. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionSentiment AnalysisJaganadh G Practical Natural Language Processing
  31. 31. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionSentiment AnalysisParsing, gammer ...Jaganadh G Practical Natural Language Processing
  32. 32. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionSentiment AnalysisParsing, gammer ...Regex (Regular Expressions)Jaganadh G Practical Natural Language Processing
  33. 33. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsJaganadh G Practical Natural Language Processing
  34. 34. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsWord ListJaganadh G Practical Natural Language Processing
  35. 35. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsWord ListStructure of wordsJaganadh G Practical Natural Language Processing
  36. 36. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsWord ListStructure of wordsDynamic Programming (Edit Distance)Jaganadh G Practical Natural Language Processing
  37. 37. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??Jaganadh G Practical Natural Language Processing
  38. 38. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??SolutionsJaganadh G Practical Natural Language Processing
  39. 39. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??SolutionsStatistical ModelsJaganadh G Practical Natural Language Processing
  40. 40. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??SolutionsStatistical ModelsWord category based suggestionsJaganadh G Practical Natural Language Processing
  41. 41. Can Machines Translate ??Answer !!!Jaganadh G Practical Natural Language Processing
  42. 42. Why NLP ?Because ”Information is Power !!!”Jaganadh G Practical Natural Language Processing
  43. 43. Why NLP ?Because ”Information is Power !!!”Picture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  44. 44. Why NLP ?Because ”Information is Power !!!”Every day wast amount of text and speech data is beingproducedPicture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  45. 45. Why NLP ?Because ”Information is Power !!!”Every day wast amount of text and speech data is beingproducedInternet == at least 40 Million pagesPicture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  46. 46. Why NLP ?Because ”Information is Power !!!”Every day wast amount of text and speech data is beingproducedInternet == at least 40 Million pagesPicture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  47. 47. HistoryJaganadh G Practical Natural Language Processing
  48. 48. HistorySecond World War !!!Jaganadh G Practical Natural Language Processing
  49. 49. HistorySecond World War !!!Machine TranslationJaganadh G Practical Natural Language Processing
  50. 50. HistorySecond World War !!!Machine TranslationNow :Jaganadh G Practical Natural Language Processing
  51. 51. HistorySecond World War !!!Machine TranslationNow :Most promising imperfect technologyJaganadh G Practical Natural Language Processing
  52. 52. HistorySecond World War !!!Machine TranslationNow :Most promising imperfect technologyMoves from Lab to Industry to LaymanJaganadh G Practical Natural Language Processing
  53. 53. NLP Really Hard to Achieve?NLP delas with human languagesHuman Language is dynamic and mysterious !!!Jaganadh G Practical Natural Language Processing
  54. 54. NLP Really Hard to Achieve?NLP delas with human languagesHuman Language is dynamic and mysterious !!!Communication in Human LanguageJaganadh G Practical Natural Language Processing
  55. 55. NLP Really Hard to Achieve?Levels of Knowledge encoding in Language DataJaganadh G Practical Natural Language Processing
  56. 56. Tasks in NLPBroad AreasJaganadh G Practical Natural Language Processing
  57. 57. Tasks in NLPBroad AreasText ProcessingJaganadh G Practical Natural Language Processing
  58. 58. Tasks in NLPBroad AreasText ProcessingSpeech ProcessingJaganadh G Practical Natural Language Processing
  59. 59. Major tasks in Text ProcessingJaganadh G Practical Natural Language Processing
  60. 60. Major tasks in Text ProcessingWord Level AnalysisJaganadh G Practical Natural Language Processing
  61. 61. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisJaganadh G Practical Natural Language Processing
  62. 62. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingJaganadh G Practical Natural Language Processing
  63. 63. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingJaganadh G Practical Natural Language Processing
  64. 64. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingLemmatizationJaganadh G Practical Natural Language Processing
  65. 65. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingLemmatizationSentence Level Analysis - Syntactical ParsingJaganadh G Practical Natural Language Processing
  66. 66. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingLemmatizationSentence Level Analysis - Syntactical ParsingDiscourse Analysis - Semantic ProcessingJaganadh G Practical Natural Language Processing
  67. 67. MorphologyThe branch of linguistics that studies word structures.Jaganadh G Practical Natural Language Processing
  68. 68. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Jaganadh G Practical Natural Language Processing
  69. 69. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsJaganadh G Practical Natural Language Processing
  70. 70. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisJaganadh G Practical Natural Language Processing
  71. 71. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisJaganadh G Practical Natural Language Processing
  72. 72. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisMorphological GenerationJaganadh G Practical Natural Language Processing
  73. 73. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisMorphological GenerationStemmingJaganadh G Practical Natural Language Processing
  74. 74. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisMorphological GenerationStemmingLemmatizationJaganadh G Practical Natural Language Processing
  75. 75. Practical Question from MorphologyApproximate number of word forms that can be derived fromthe word”maram”Jaganadh G Practical Natural Language Processing
  76. 76. Parts of Speech TaggingPOS tagging is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, basedon both its definition, as well as its context.Ram goes to school.Ram/NNP goes/VBZ to/TO school/NN ./.Jaganadh G Practical Natural Language Processing
  77. 77. Parts of Speech TaggingPOS tagging is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, basedon both its definition, as well as its context.Ram goes to school.Ram/NNP goes/VBZ to/TO school/NN ./.Words are ambiguous !!!!e.g. book, cricket, bankJaganadh G Practical Natural Language Processing
  78. 78. Syntactical ParsingParsingIn computer science and linguistics, parsing, or, more formally,syntactic analysis, is the process of analyzing a text, made of asequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given (more or less)formal grammar.Jaganadh G Practical Natural Language Processing
  79. 79. Syntactical ParsingParsingIn computer science and linguistics, parsing, or, more formally,syntactic analysis, is the process of analyzing a text, made of asequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given (more or less)formal grammar.Sentences are ambiguous !!!!Jaganadh G Practical Natural Language Processing
  80. 80. SemanticsStudy of meaning ans its structureJaganadh G Practical Natural Language Processing
  81. 81. SemanticsStudy of meaning ans its structureWord meaning is ambiguous !!!!E.g. marriageJaganadh G Practical Natural Language Processing
  82. 82. Where can I apply this techniques?Machine Translation SystemsJaganadh G Practical Natural Language Processing
  83. 83. Where can I apply this techniques?Machine Translation SystemsSearch EngineJaganadh G Practical Natural Language Processing
  84. 84. Where can I apply this techniques?Machine Translation SystemsSearch EngineSpell-checkerJaganadh G Practical Natural Language Processing
  85. 85. Where can I apply this techniques?Machine Translation SystemsSearch EngineSpell-checkerGrammar CheckerJaganadh G Practical Natural Language Processing
  86. 86. Where can I apply this techniques?Machine Translation SystemsSearch EngineSpell-checkerGrammar Checker..........Jaganadh G Practical Natural Language Processing
  87. 87. Other Interesting TasksNamed Entity IdentificationJaganadh G Practical Natural Language Processing
  88. 88. Other Interesting TasksNamed Entity IdentificationInformation ExtractionJaganadh G Practical Natural Language Processing
  89. 89. Other Interesting TasksNamed Entity IdentificationInformation ExtractionInformation RetrievalJaganadh G Practical Natural Language Processing
  90. 90. Other Interesting TasksNamed Entity IdentificationInformation ExtractionInformation RetrievalText Classification and ClusteringJaganadh G Practical Natural Language Processing
  91. 91. Speech ProcessingTwo Major AreasText to SpeechSpeech RecognitionJaganadh G Practical Natural Language Processing
  92. 92. Speech ProcessingTwo Major AreasText to SpeechSpeech RecognitionPractical ApplicationsIVRTechnology for Visually Challenged PeopleMobile PhonesSpeech Enabled WebVehicle Mounted GPS NavigatorJaganadh G Practical Natural Language Processing
  93. 93. Commerical NLP ApplicationsWhat Industry LooksJaganadh G Practical Natural Language Processing
  94. 94. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsJaganadh G Practical Natural Language Processing
  95. 95. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsJaganadh G Practical Natural Language Processing
  96. 96. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsJaganadh G Practical Natural Language Processing
  97. 97. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionJaganadh G Practical Natural Language Processing
  98. 98. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationJaganadh G Practical Natural Language Processing
  99. 99. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationText SummarizationJaganadh G Practical Natural Language Processing
  100. 100. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationText SummarizationSpeech SystemsJaganadh G Practical Natural Language Processing
  101. 101. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationText SummarizationSpeech SystemsQuestion Answering SystemsJaganadh G Practical Natural Language Processing
  102. 102. Future of NLPFuture!!!Semantics oriented technologiesJaganadh G Practical Natural Language Processing
  103. 103. NLP in other domainsBio-MedicalLegalForensic ScienceAdvertisementEducationPoliticsE-governanceBusiness DevelopmentMarketingand where ever we use language !!!Jaganadh G Practical Natural Language Processing
  104. 104. Natural Language Processing in IndiaAcademic InstitutionsIIT Kanpur, Kharagpur, BombayIIIT hydrabadIISc BangaloreAU-KBC ChennaiAmritha University Ettimadai, CoimbatoreIIITMK, TrivandrumCentral University, HydrabadJNU, DelhiTamil University, ThanjoreJaganadh G Practical Natural Language Processing
  105. 105. Natural Language Processing in IndiaIndustryMicrosoftYahoo!AOL365Media Pvt. Ltd.Inside ViewThaazzaAIAIO LabsJaganadh G Practical Natural Language Processing
  106. 106. Questions ??Jaganadh G Practical Natural Language Processing
  107. 107. ReferencesDaniel Jurafsky,James H. Martin, SPEECH andLANGUAGE PROCESSING, 2ndEdition.U.S. Tiwary, Tanveer Siddiqui , Natural LanguageProcessing and Information RetrievalJaganadh G Practical Natural Language Processing
  108. 108. FinallyJaganadh G Practical Natural Language Processing
  109. 109. Questions ??Jaganadh G Practical Natural Language Processing
  110. 110. ReferencesDaniel Jurafsky,James H. Martin, SPEECH andLANGUAGE PROCESSING, 2ndEdition.U.S. Tiwary, Tanveer Siddiqui , Natural LanguageProcessing and Information RetrievalJaganadh G Practical Natural Language Processing
  111. 111. FinallyJaganadh G Practical Natural Language Processing

×