Practical Natural Language Processing

  • 969 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
969
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
1
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Practical Natural Language ProcessingFrom Theory to Industrial ApplicationsJaganadh Ghttp://jaganadhg.injaganadhg@gmail.comCentral University of KeralaKasargod22ndFeb 2013Jaganadh G Practical Natural Language Processing
  • 2. About me !!Working in Natural Language Processing, MachineLearning, Data Mining etc...Passionate about Free and Open source :-)When gets free time teaches Python, Speaks about FOSSand blogs athttp://jaganadhg.inI am a computational linguist / Linguist and Indologist,Book reviewerSoftware Engineer by ProfessionJaganadh G Practical Natural Language Processing
  • 3. Question ??Have you ever used any Natural Language Processing basedtools/services?Jaganadh G Practical Natural Language Processing
  • 4. Question ??Have you ever used any Natural Language Processing basedtools/services?Jaganadh G Practical Natural Language Processing
  • 5. Question ??Have you ever used any Natural Language Processing basedtools/services?Jaganadh G Practical Natural Language Processing
  • 6. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsJaganadh G Practical Natural Language Processing
  • 7. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsJaganadh G Practical Natural Language Processing
  • 8. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsA sub-field of Artificial Intelligence (AI)Jaganadh G Practical Natural Language Processing
  • 9. What is Natural Language Processing (NLP) ?Aim : To build intelligent systems that can interact withhuman beings as like human beingsA sub-field of Artificial Intelligence (AI)Inter-disciplinary subject (Language + Linguistics +Statistics + Computer Science + .. )Natural LanguageRefers to the language spoken by people, e.g.English,Japanese, Tamil, Malayalam as opposed to artificiallanguages, like C++, Java, etc.Jaganadh G Practical Natural Language Processing
  • 10. DefinitionNatural Language ProcessingNatural Language Processing is a theoretically motivated rangeof computational techniques for analyzing and representingnaturally occurring texts/speech at one or more levels oflinguistic analysis for the purpose of achieving human-likelanguage processing for a range of tasks or applications.NLP was considered as an academic discipline beforesome 10 to 20 years.Now concepts from NLP is applied in variety ofComputing Platforms and ServicesJaganadh G Practical Natural Language Processing
  • 11. Practical NLP ?ProblemPicture Courtesy: http://twitpic.com/1y21qm/fullJaganadh G Practical Natural Language Processing
  • 12. Practical NLP ?ProblemBefore going to some theory can we have some funnypractical problems to solve ?Picture Courtesy: http://twitpic.com/1y21qm/fullJaganadh G Practical Natural Language Processing
  • 13. Practical NLP ?ProblemBefore going to some theory can we have some funnypractical problems to solve ?Picture Courtesy: http://twitpic.com/1y21qm/fullJaganadh G Practical Natural Language Processing
  • 14. Practical NLPProblemJaganadh G Practical Natural Language Processing
  • 15. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayJaganadh G Practical Natural Language Processing
  • 16. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryJaganadh G Practical Natural Language Processing
  • 17. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsJaganadh G Practical Natural Language Processing
  • 18. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersJaganadh G Practical Natural Language Processing
  • 19. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedJaganadh G Practical Natural Language Processing
  • 20. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryJaganadh G Practical Natural Language Processing
  • 21. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryProcess home-delivery requestJaganadh G Practical Natural Language Processing
  • 22. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryProcess home-delivery requestEvaluate quality related tweetsJaganadh G Practical Natural Language Processing
  • 23. Practical NLPProblemTweet-a-Toddy receives thousands of tweets per dayTweets requesting home deliveryTweets about quality of productsTweets related to enquirersThey requires following things to be automatedIdentify tweet categoryProcess home-delivery requestEvaluate quality related tweetsHow?How to find a solution for Tweet-a-ToddyJaganadh G Practical Natural Language Processing
  • 24. Solution??Any SolutionsJaganadh G Practical Natural Language Processing
  • 25. Solution??Any SolutionsSome thoughtsJaganadh G Practical Natural Language Processing
  • 26. Solution??Any SolutionsSome thoughtsText ClassificationJaganadh G Practical Natural Language Processing
  • 27. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationJaganadh G Practical Natural Language Processing
  • 28. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionJaganadh G Practical Natural Language Processing
  • 29. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionSentiment AnalysisJaganadh G Practical Natural Language Processing
  • 30. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionSentiment AnalysisParsing, gammer ...Jaganadh G Practical Natural Language Processing
  • 31. Solution??Any SolutionsSome thoughtsText ClassificationEntity IdentificationInformation ExtractionSentiment AnalysisParsing, gammer ...Regex (Regular Expressions)Jaganadh G Practical Natural Language Processing
  • 32. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsJaganadh G Practical Natural Language Processing
  • 33. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsWord ListJaganadh G Practical Natural Language Processing
  • 34. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsWord ListStructure of wordsJaganadh G Practical Natural Language Processing
  • 35. Another Practical QuestionEverybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft Word Anyguess on how to develop a spell checker system ?SolutionsWord ListStructure of wordsDynamic Programming (Edit Distance)Jaganadh G Practical Natural Language Processing
  • 36. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??Jaganadh G Practical Natural Language Processing
  • 37. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??SolutionsJaganadh G Practical Natural Language Processing
  • 38. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??SolutionsStatistical ModelsJaganadh G Practical Natural Language Processing
  • 39. Another Practical Question ...Context Sensitive Spell-checkingIdentifying and suggesting spelling of words based on contextHow ??SolutionsStatistical ModelsWord category based suggestionsJaganadh G Practical Natural Language Processing
  • 40. Can Machines Translate ??Answer !!!Jaganadh G Practical Natural Language Processing
  • 41. Why NLP ?Because ”Information is Power !!!”Jaganadh G Practical Natural Language Processing
  • 42. Why NLP ?Because ”Information is Power !!!”Picture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  • 43. Why NLP ?Because ”Information is Power !!!”Every day wast amount of text and speech data is beingproducedPicture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  • 44. Why NLP ?Because ”Information is Power !!!”Every day wast amount of text and speech data is beingproducedInternet == at least 40 Million pagesPicture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  • 45. Why NLP ?Because ”Information is Power !!!”Every day wast amount of text and speech data is beingproducedInternet == at least 40 Million pagesPicture Courtesy: http://soundsgood.in/wikipediafat print book/Jaganadh G Practical Natural Language Processing
  • 46. HistoryJaganadh G Practical Natural Language Processing
  • 47. HistorySecond World War !!!Jaganadh G Practical Natural Language Processing
  • 48. HistorySecond World War !!!Machine TranslationJaganadh G Practical Natural Language Processing
  • 49. HistorySecond World War !!!Machine TranslationNow :Jaganadh G Practical Natural Language Processing
  • 50. HistorySecond World War !!!Machine TranslationNow :Most promising imperfect technologyJaganadh G Practical Natural Language Processing
  • 51. HistorySecond World War !!!Machine TranslationNow :Most promising imperfect technologyMoves from Lab to Industry to LaymanJaganadh G Practical Natural Language Processing
  • 52. NLP Really Hard to Achieve?NLP delas with human languagesHuman Language is dynamic and mysterious !!!Jaganadh G Practical Natural Language Processing
  • 53. NLP Really Hard to Achieve?NLP delas with human languagesHuman Language is dynamic and mysterious !!!Communication in Human LanguageJaganadh G Practical Natural Language Processing
  • 54. NLP Really Hard to Achieve?Levels of Knowledge encoding in Language DataJaganadh G Practical Natural Language Processing
  • 55. Tasks in NLPBroad AreasJaganadh G Practical Natural Language Processing
  • 56. Tasks in NLPBroad AreasText ProcessingJaganadh G Practical Natural Language Processing
  • 57. Tasks in NLPBroad AreasText ProcessingSpeech ProcessingJaganadh G Practical Natural Language Processing
  • 58. Major tasks in Text ProcessingJaganadh G Practical Natural Language Processing
  • 59. Major tasks in Text ProcessingWord Level AnalysisJaganadh G Practical Natural Language Processing
  • 60. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisJaganadh G Practical Natural Language Processing
  • 61. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingJaganadh G Practical Natural Language Processing
  • 62. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingJaganadh G Practical Natural Language Processing
  • 63. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingLemmatizationJaganadh G Practical Natural Language Processing
  • 64. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingLemmatizationSentence Level Analysis - Syntactical ParsingJaganadh G Practical Natural Language Processing
  • 65. Major tasks in Text ProcessingWord Level AnalysisMorphological SynthesisPart of Speech TaggingStemmingLemmatizationSentence Level Analysis - Syntactical ParsingDiscourse Analysis - Semantic ProcessingJaganadh G Practical Natural Language Processing
  • 66. MorphologyThe branch of linguistics that studies word structures.Jaganadh G Practical Natural Language Processing
  • 67. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Jaganadh G Practical Natural Language Processing
  • 68. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsJaganadh G Practical Natural Language Processing
  • 69. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisJaganadh G Practical Natural Language Processing
  • 70. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisJaganadh G Practical Natural Language Processing
  • 71. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisMorphological GenerationJaganadh G Practical Natural Language Processing
  • 72. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisMorphological GenerationStemmingJaganadh G Practical Natural Language Processing
  • 73. MorphologyThe branch of linguistics that studies word structures.To a computer program a word is : ???Morphological analysis can be explained as: the process ofanalyzing words to identify its constituentsComputational Analysis of MorphologyMorphological AnalysisMorphological GenerationStemmingLemmatizationJaganadh G Practical Natural Language Processing
  • 74. Practical Question from MorphologyApproximate number of word forms that can be derived fromthe word”maram”Jaganadh G Practical Natural Language Processing
  • 75. Parts of Speech TaggingPOS tagging is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, basedon both its definition, as well as its context.Ram goes to school.Ram/NNP goes/VBZ to/TO school/NN ./.Jaganadh G Practical Natural Language Processing
  • 76. Parts of Speech TaggingPOS tagging is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, basedon both its definition, as well as its context.Ram goes to school.Ram/NNP goes/VBZ to/TO school/NN ./.Words are ambiguous !!!!e.g. book, cricket, bankJaganadh G Practical Natural Language Processing
  • 77. Syntactical ParsingParsingIn computer science and linguistics, parsing, or, more formally,syntactic analysis, is the process of analyzing a text, made of asequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given (more or less)formal grammar.Jaganadh G Practical Natural Language Processing
  • 78. Syntactical ParsingParsingIn computer science and linguistics, parsing, or, more formally,syntactic analysis, is the process of analyzing a text, made of asequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given (more or less)formal grammar.Sentences are ambiguous !!!!Jaganadh G Practical Natural Language Processing
  • 79. SemanticsStudy of meaning ans its structureJaganadh G Practical Natural Language Processing
  • 80. SemanticsStudy of meaning ans its structureWord meaning is ambiguous !!!!E.g. marriageJaganadh G Practical Natural Language Processing
  • 81. Where can I apply this techniques?Machine Translation SystemsJaganadh G Practical Natural Language Processing
  • 82. Where can I apply this techniques?Machine Translation SystemsSearch EngineJaganadh G Practical Natural Language Processing
  • 83. Where can I apply this techniques?Machine Translation SystemsSearch EngineSpell-checkerJaganadh G Practical Natural Language Processing
  • 84. Where can I apply this techniques?Machine Translation SystemsSearch EngineSpell-checkerGrammar CheckerJaganadh G Practical Natural Language Processing
  • 85. Where can I apply this techniques?Machine Translation SystemsSearch EngineSpell-checkerGrammar Checker..........Jaganadh G Practical Natural Language Processing
  • 86. Other Interesting TasksNamed Entity IdentificationJaganadh G Practical Natural Language Processing
  • 87. Other Interesting TasksNamed Entity IdentificationInformation ExtractionJaganadh G Practical Natural Language Processing
  • 88. Other Interesting TasksNamed Entity IdentificationInformation ExtractionInformation RetrievalJaganadh G Practical Natural Language Processing
  • 89. Other Interesting TasksNamed Entity IdentificationInformation ExtractionInformation RetrievalText Classification and ClusteringJaganadh G Practical Natural Language Processing
  • 90. Speech ProcessingTwo Major AreasText to SpeechSpeech RecognitionJaganadh G Practical Natural Language Processing
  • 91. Speech ProcessingTwo Major AreasText to SpeechSpeech RecognitionPractical ApplicationsIVRTechnology for Visually Challenged PeopleMobile PhonesSpeech Enabled WebVehicle Mounted GPS NavigatorJaganadh G Practical Natural Language Processing
  • 92. Commerical NLP ApplicationsWhat Industry LooksJaganadh G Practical Natural Language Processing
  • 93. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsJaganadh G Practical Natural Language Processing
  • 94. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsJaganadh G Practical Natural Language Processing
  • 95. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsJaganadh G Practical Natural Language Processing
  • 96. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionJaganadh G Practical Natural Language Processing
  • 97. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationJaganadh G Practical Natural Language Processing
  • 98. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationText SummarizationJaganadh G Practical Natural Language Processing
  • 99. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationText SummarizationSpeech SystemsJaganadh G Practical Natural Language Processing
  • 100. Commerical NLP ApplicationsWhat Industry LooksComponents of Word ProcessorsMachine Translation SystemsCustom Search SystemsInformation ExtractionEntity IdentificationText SummarizationSpeech SystemsQuestion Answering SystemsJaganadh G Practical Natural Language Processing
  • 101. Future of NLPFuture!!!Semantics oriented technologiesJaganadh G Practical Natural Language Processing
  • 102. NLP in other domainsBio-MedicalLegalForensic ScienceAdvertisementEducationPoliticsE-governanceBusiness DevelopmentMarketingand where ever we use language !!!Jaganadh G Practical Natural Language Processing
  • 103. Natural Language Processing in IndiaAcademic InstitutionsIIT Kanpur, Kharagpur, BombayIIIT hydrabadIISc BangaloreAU-KBC ChennaiAmritha University Ettimadai, CoimbatoreIIITMK, TrivandrumCentral University, HydrabadJNU, DelhiTamil University, ThanjoreJaganadh G Practical Natural Language Processing
  • 104. Natural Language Processing in IndiaIndustryMicrosoftYahoo!AOL365Media Pvt. Ltd.Inside ViewThaazzaAIAIO LabsJaganadh G Practical Natural Language Processing
  • 105. Questions ??Jaganadh G Practical Natural Language Processing
  • 106. ReferencesDaniel Jurafsky,James H. Martin, SPEECH andLANGUAGE PROCESSING, 2ndEdition.U.S. Tiwary, Tanveer Siddiqui , Natural LanguageProcessing and Information RetrievalJaganadh G Practical Natural Language Processing
  • 107. FinallyJaganadh G Practical Natural Language Processing
  • 108. Questions ??Jaganadh G Practical Natural Language Processing
  • 109. ReferencesDaniel Jurafsky,James H. Martin, SPEECH andLANGUAGE PROCESSING, 2ndEdition.U.S. Tiwary, Tanveer Siddiqui , Natural LanguageProcessing and Information RetrievalJaganadh G Practical Natural Language Processing
  • 110. FinallyJaganadh G Practical Natural Language Processing