SlideShare a Scribd company logo
1 of 101
Download to read offline
Getting Started with NLTK
    An Introduction to NLTK


           Sreejith S
     srssreejith@gmail.com
          @tweet2sree

     FOSSMeet 2011,NIC Calicut


       06 February 2011




        Sreejith S   Getting Started with NLTK
Just a word about me !!




     Working in Natural Language Processing (NLP), Machine Learning,
     Text Mining
     Active member of ilugcbe , http://ilugcbe.techstud.org
     Works for 365Media Pvt. Ltd. Coimbatore India.
     @tweet2sree , srssreejith@gmail.com




                             Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing




                           Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP
         Handwriting recognition,Machine translation,Question-answering
         systems,Spell checkers,Grammer checkers etc...




                              Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible
     http://www.nltk.org




                             Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs




                           Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP
     How language data is stored in standard formats, and how data can
     be used to evaluate the performance of NLP techniques




                            Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()
     Now you are ready to play with NLTK !!!



                              Sreejith S   Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality

  nltk.corpus                 Courpus




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
             >>> from nltk.book import *
             >>> text1.concordance("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information
              >>> text4.dispersion_plot(["citizens",
         "democracy", "freedom", "duties", "America"])

             >>> text4.dispersion_plot(["and",
         "to", "of", "with", "the"])
         What is it !!! Why ???



                              Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary
             >>> len(text3)




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text
              >>> text3.count("and")

               What percentage of text it is taken by a specific word
               >>> 100 * text3.count("and") / len(text3)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram




                       Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
      >>> text4.collocations()




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)
  what will happen if i do like this
       >>> bigrams(text)


                                Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus




                            Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus
         >>> for fid in wordlist.fileids():
                print len(wordlist.raw(fid))
         >>> for fid in wordlist.fileids():
                print len(wordlist.words(fid))

         >>> for fid in wordlist.fileids():
                print len(wordlist.sents(fid))



                            Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution
          >>> fdist = FreqDist(text1)
          >>> fdist.plot(50,cumulative=True)




                              Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form




                               Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form
       >>> porter = nltk.PorterStemmer()
       >>> word = ’running’
       >>> porter.stem(word)

       >>> lancaster = nltk.LancasterStemmer()
       >>> lancaster.stem(tok[2])




                               Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary




                             Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary
      >>> wnl = nltk.WordNetLemmatizer()
      >>> wnl.lemmatize(word)




                             Sreejith S   Getting Started with NLTK
POS Tagging




              Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging




                               Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging
       >>> text = nltk.word_tokenize("we are attending
                  FOSS meet at NIC calicut")
       >>> nltk.pos_tag(text)




                               Sreejith S   Getting Started with NLTK
Parsing




          Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree




                              Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree

      >>> sentence = [("the", "DT"), ("little", "JJ"),
          ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
          ("at", "IN"), ("the", "DT"), ("cat", "NN")]
      >>> grammar = "NP: {<DT>?<JJ>*<NN>}"
      >>> cp = nltk.RegexpParser(grammar)
      >>> result = cp.parse(sentence)
      >>> print result
      >>> result.draw()




                              Sreejith S   Getting Started with NLTK
Machine Translation




                      Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run


      Just try Google Translator, Yahoo babelfish




                              Sreejith S   Getting Started with NLTK
What u can do??




     Contribute to NLTK
     GSOC
     NLP Training
     Real time research




                          Sreejith S   Getting Started with NLTK
Reference




     Steven Bird, Edvard Loper and Ewan Klien
     Natural Language Processing with Python
     Jacob Perkins
     Python Text Processing with NLTK2.0 Cookbook
     http://www.nltk.org




                            Sreejith S   Getting Started with NLTK
Questions




            Sreejith S   Getting Started with NLTK
And finally...




                                                         Sreejith.S



                Sreejith S   Getting Started with NLTK

More Related Content

What's hot

Text classification presentation
Text classification presentationText classification presentation
Text classification presentationMarijn van Zelst
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsVaibhav Khanna
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseRAKESH P
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptxHeneWijaya
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 

What's hot (20)

NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nltk
NltkNltk
Nltk
 
NLP
NLPNLP
NLP
 
Text features
Text featuresText features
Text features
 

Similar to Introduction to NLTK

Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Mokhtar Ebrahim
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in aiRam Kumar
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing WorkshopLakshya Sivaramakrishnan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Technical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with PythonTechnical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with PythonMichelle Purnama
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...MobileMonday Estonia
 
Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1MAHALAKSHMI P
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxFitsum36
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonNhi Nguyen
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Getting started with Linux and Python by Caffe
Getting started with Linux and Python by CaffeGetting started with Linux and Python by Caffe
Getting started with Linux and Python by CaffeLihang Li
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text MiningSushanti Acharya
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 

Similar to Introduction to NLTK (20)

Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in ai
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
nlp ppt.pdf
nlp ppt.pdfnlp ppt.pdf
nlp ppt.pdf
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Technical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with PythonTechnical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with Python
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
 
Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with Python
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Getting started with Linux and Python by Caffe
Getting started with Linux and Python by CaffeGetting started with Linux and Python by Caffe
Getting started with Linux and Python by Caffe
 
Nlp final
Nlp finalNlp final
Nlp final
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 

Recently uploaded

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 

Recently uploaded (20)

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Introduction to NLTK

  • 1. Getting Started with NLTK An Introduction to NLTK Sreejith S srssreejith@gmail.com @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreejith S Getting Started with NLTK
  • 2. Just a word about me !! Working in Natural Language Processing (NLP), Machine Learning, Text Mining Active member of ilugcbe , http://ilugcbe.techstud.org Works for 365Media Pvt. Ltd. Coimbatore India. @tweet2sree , srssreejith@gmail.com Sreejith S Getting Started with NLTK
  • 3. Introduction - NLP Natural Language Processing Sreejith S Getting Started with NLTK
  • 4. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Sreejith S Getting Started with NLTK
  • 5. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Sreejith S Getting Started with NLTK
  • 6. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Sreejith S Getting Started with NLTK
  • 7. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... Sreejith S Getting Started with NLTK
  • 8. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence Sreejith S Getting Started with NLTK
  • 9. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. Sreejith S Getting Started with NLTK
  • 10. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Sreejith S Getting Started with NLTK
  • 11. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Sreejith S Getting Started with NLTK
  • 12. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Handwriting recognition,Machine translation,Question-answering systems,Spell checkers,Grammer checkers etc... Sreejith S Getting Started with NLTK
  • 13. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Sreejith S Getting Started with NLTK
  • 14. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien Sreejith S Getting Started with NLTK
  • 15. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Sreejith S Getting Started with NLTK
  • 16. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Sreejith S Getting Started with NLTK
  • 17. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Sreejith S Getting Started with NLTK
  • 18. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Sreejith S Getting Started with NLTK
  • 19. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Sreejith S Getting Started with NLTK
  • 20. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible Sreejith S Getting Started with NLTK
  • 21. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible http://www.nltk.org Sreejith S Getting Started with NLTK
  • 22. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs Sreejith S Getting Started with NLTK
  • 23. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language Sreejith S Getting Started with NLTK
  • 24. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP Sreejith S Getting Started with NLTK
  • 25. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques Sreejith S Getting Started with NLTK
  • 26. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Sreejith S Getting Started with NLTK
  • 27. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Sreejith S Getting Started with NLTK
  • 28. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Sreejith S Getting Started with NLTK
  • 29. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it Sreejith S Getting Started with NLTK
  • 30. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Sreejith S Getting Started with NLTK
  • 31. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Sreejith S Getting Started with NLTK
  • 32. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install Sreejith S Getting Started with NLTK
  • 33. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Sreejith S Getting Started with NLTK
  • 34. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Sreejith S Getting Started with NLTK
  • 35. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Now you are ready to play with NLTK !!! Sreejith S Getting Started with NLTK
  • 36. NLTK Modules NLTK Modules Functionality Sreejith S Getting Started with NLTK
  • 37. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus Sreejith S Getting Started with NLTK
  • 38. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers Sreejith S Getting Started with NLTK
  • 39. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info Sreejith S Getting Started with NLTK
  • 40. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT Sreejith S Getting Started with NLTK
  • 41. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means Sreejith S Getting Started with NLTK
  • 42. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity Sreejith S Getting Started with NLTK
  • 43. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing Sreejith S Getting Started with NLTK
  • 44. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation Sreejith S Getting Started with NLTK
  • 45. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics Sreejith S Getting Started with NLTK
  • 46. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation Sreejith S Getting Started with NLTK
  • 47. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 48. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 49. Let us start the game To access data for working out the example in the book Start python interpreter Sreejith S Getting Started with NLTK
  • 50. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 51. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance Sreejith S Getting Started with NLTK
  • 52. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Sreejith S Getting Started with NLTK
  • 53. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar Sreejith S Getting Started with NLTK
  • 54. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Sreejith S Getting Started with NLTK
  • 55. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information Sreejith S Getting Started with NLTK
  • 56. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information >>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"]) >>> text4.dispersion_plot(["and", "to", "of", "with", "the"]) What is it !!! Why ??? Sreejith S Getting Started with NLTK
  • 57. Continued... Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 58. Continued... Some basic work outs from the book Generate Sreejith S Getting Started with NLTK
  • 59. Continued... Some basic work outs from the book Generate >>> text3.generate() Sreejith S Getting Started with NLTK
  • 60. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary Sreejith S Getting Started with NLTK
  • 61. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) Sreejith S Getting Started with NLTK
  • 62. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. Sreejith S Getting Started with NLTK
  • 63. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Sreejith S Getting Started with NLTK
  • 64. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text Sreejith S Getting Started with NLTK
  • 65. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text >>> text3.count("and") What percentage of text it is taken by a specific word >>> 100 * text3.count("and") / len(text3) Sreejith S Getting Started with NLTK
  • 66. Collocation & Bigram Sreejith S Getting Started with NLTK
  • 67. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation Sreejith S Getting Started with NLTK
  • 68. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Sreejith S Getting Started with NLTK
  • 69. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs Sreejith S Getting Started with NLTK
  • 70. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) Sreejith S Getting Started with NLTK
  • 71. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) what will happen if i do like this >>> bigrams(text) Sreejith S Getting Started with NLTK
  • 72. Work with our own data Populate our own corpora with NLTK and analyse it Sreejith S Getting Started with NLTK
  • 73. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Sreejith S Getting Started with NLTK
  • 74. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus Sreejith S Getting Started with NLTK
  • 75. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus >>> for fid in wordlist.fileids(): print len(wordlist.raw(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.words(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.sents(fid)) Sreejith S Getting Started with NLTK
  • 76. Continued... Ploting conditional frquency distribution Sreejith S Getting Started with NLTK
  • 77. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Sreejith S Getting Started with NLTK
  • 78. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD Sreejith S Getting Started with NLTK
  • 79. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Sreejith S Getting Started with NLTK
  • 80. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution Sreejith S Getting Started with NLTK
  • 81. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution >>> fdist = FreqDist(text1) >>> fdist.plot(50,cumulative=True) Sreejith S Getting Started with NLTK
  • 82. Normalizing Text Sreejith S Getting Started with NLTK
  • 83. Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form Sreejith S Getting Started with NLTK
  • 84. Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form >>> porter = nltk.PorterStemmer() >>> word = ’running’ >>> porter.stem(word) >>> lancaster = nltk.LancasterStemmer() >>> lancaster.stem(tok[2]) Sreejith S Getting Started with NLTK
  • 85. Normalizing Text Sreejith S Getting Started with NLTK
  • 86. Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary Sreejith S Getting Started with NLTK
  • 87. Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary >>> wnl = nltk.WordNetLemmatizer() >>> wnl.lemmatize(word) Sreejith S Getting Started with NLTK
  • 88. POS Tagging Sreejith S Getting Started with NLTK
  • 89. POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging Sreejith S Getting Started with NLTK
  • 90. POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging >>> text = nltk.word_tokenize("we are attending FOSS meet at NIC calicut") >>> nltk.pos_tag(text) Sreejith S Getting Started with NLTK
  • 91. Parsing Sreejith S Getting Started with NLTK
  • 92. Parsing Sentence Parsing Analyzing sentence structures and create a Parse Tree Sreejith S Getting Started with NLTK
  • 93. Parsing Sentence Parsing Analyzing sentence structures and create a Parse Tree >>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN>}" >>> cp = nltk.RegexpParser(grammar) >>> result = cp.parse(sentence) >>> print result >>> result.draw() Sreejith S Getting Started with NLTK
  • 94. Machine Translation Sreejith S Getting Started with NLTK
  • 95. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell Sreejith S Getting Started with NLTK
  • 96. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Sreejith S Getting Started with NLTK
  • 97. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Just try Google Translator, Yahoo babelfish Sreejith S Getting Started with NLTK
  • 98. What u can do?? Contribute to NLTK GSOC NLP Training Real time research Sreejith S Getting Started with NLTK
  • 99. Reference Steven Bird, Edvard Loper and Ewan Klien Natural Language Processing with Python Jacob Perkins Python Text Processing with NLTK2.0 Cookbook http://www.nltk.org Sreejith S Getting Started with NLTK
  • 100. Questions Sreejith S Getting Started with NLTK
  • 101. And finally... Sreejith.S Sreejith S Getting Started with NLTK