Getting Started with NLTK
    An Introduction to NLTK


           Sreejith S
     srssreejith@gmail.com
          @tweet2sree

     FOSSMeet 2011,NIC Calicut


       06 February 2011




        Sreejith S   Getting Started with NLTK
Just a word about me !!




     Working in Natural Language Processing (NLP), Machine Learning,
     Text Mining
     Active member of ilugcbe , http://ilugcbe.techstud.org
     Works for 365Media Pvt. Ltd. Coimbatore India.
     @tweet2sree , srssreejith@gmail.com




                             Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing




                           Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP
         Handwriting recognition,Machine translation,Question-answering
         systems,Spell checkers,Grammer checkers etc...




                              Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible
     http://www.nltk.org




                             Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs




                           Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP
     How language data is stored in standard formats, and how data can
     be used to evaluate the performance of NLP techniques




                            Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()
     Now you are ready to play with NLTK !!!



                              Sreejith S   Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality

  nltk.corpus                 Courpus




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
             >>> from nltk.book import *
             >>> text1.concordance("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information
              >>> text4.dispersion_plot(["citizens",
         "democracy", "freedom", "duties", "America"])

             >>> text4.dispersion_plot(["and",
         "to", "of", "with", "the"])
         What is it !!! Why ???



                              Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary
             >>> len(text3)




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text
              >>> text3.count("and")

               What percentage of text it is taken by a specific word
               >>> 100 * text3.count("and") / len(text3)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram




                       Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
      >>> text4.collocations()




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)
  what will happen if i do like this
       >>> bigrams(text)


                                Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus




                            Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus
         >>> for fid in wordlist.fileids():
                print len(wordlist.raw(fid))
         >>> for fid in wordlist.fileids():
                print len(wordlist.words(fid))

         >>> for fid in wordlist.fileids():
                print len(wordlist.sents(fid))



                            Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution
          >>> fdist = FreqDist(text1)
          >>> fdist.plot(50,cumulative=True)




                              Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form




                               Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form
       >>> porter = nltk.PorterStemmer()
       >>> word = ’running’
       >>> porter.stem(word)

       >>> lancaster = nltk.LancasterStemmer()
       >>> lancaster.stem(tok[2])




                               Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary




                             Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary
      >>> wnl = nltk.WordNetLemmatizer()
      >>> wnl.lemmatize(word)




                             Sreejith S   Getting Started with NLTK
POS Tagging




              Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging




                               Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging
       >>> text = nltk.word_tokenize("we are attending
                  FOSS meet at NIC calicut")
       >>> nltk.pos_tag(text)




                               Sreejith S   Getting Started with NLTK
Parsing




          Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree




                              Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree

      >>> sentence = [("the", "DT"), ("little", "JJ"),
          ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
          ("at", "IN"), ("the", "DT"), ("cat", "NN")]
      >>> grammar = "NP: {<DT>?<JJ>*<NN>}"
      >>> cp = nltk.RegexpParser(grammar)
      >>> result = cp.parse(sentence)
      >>> print result
      >>> result.draw()




                              Sreejith S   Getting Started with NLTK
Machine Translation




                      Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run


      Just try Google Translator, Yahoo babelfish




                              Sreejith S   Getting Started with NLTK
What u can do??




     Contribute to NLTK
     GSOC
     NLP Training
     Real time research




                          Sreejith S   Getting Started with NLTK
Reference




     Steven Bird, Edvard Loper and Ewan Klien
     Natural Language Processing with Python
     Jacob Perkins
     Python Text Processing with NLTK2.0 Cookbook
     http://www.nltk.org




                            Sreejith S   Getting Started with NLTK
Questions




            Sreejith S   Getting Started with NLTK
And finally...




                                                         Sreejith.S



                Sreejith S   Getting Started with NLTK

Introduction to NLTK

  • 1.
    Getting Started withNLTK An Introduction to NLTK Sreejith S srssreejith@gmail.com @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreejith S Getting Started with NLTK
  • 2.
    Just a wordabout me !! Working in Natural Language Processing (NLP), Machine Learning, Text Mining Active member of ilugcbe , http://ilugcbe.techstud.org Works for 365Media Pvt. Ltd. Coimbatore India. @tweet2sree , srssreejith@gmail.com Sreejith S Getting Started with NLTK
  • 3.
    Introduction - NLP Natural Language Processing Sreejith S Getting Started with NLTK
  • 4.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Sreejith S Getting Started with NLTK
  • 5.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Sreejith S Getting Started with NLTK
  • 6.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Sreejith S Getting Started with NLTK
  • 7.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... Sreejith S Getting Started with NLTK
  • 8.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence Sreejith S Getting Started with NLTK
  • 9.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. Sreejith S Getting Started with NLTK
  • 10.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Sreejith S Getting Started with NLTK
  • 11.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Sreejith S Getting Started with NLTK
  • 12.
    Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Handwriting recognition,Machine translation,Question-answering systems,Spell checkers,Grammer checkers etc... Sreejith S Getting Started with NLTK
  • 13.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Sreejith S Getting Started with NLTK
  • 14.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien Sreejith S Getting Started with NLTK
  • 15.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Sreejith S Getting Started with NLTK
  • 16.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Sreejith S Getting Started with NLTK
  • 17.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Sreejith S Getting Started with NLTK
  • 18.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Sreejith S Getting Started with NLTK
  • 19.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Sreejith S Getting Started with NLTK
  • 20.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible Sreejith S Getting Started with NLTK
  • 21.
    Natural Language Toolkit(NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible http://www.nltk.org Sreejith S Getting Started with NLTK
  • 22.
    What You WillLearn How simple programs can help you manipulate and analyze language data, and how to write these programs Sreejith S Getting Started with NLTK
  • 23.
    What You WillLearn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language Sreejith S Getting Started with NLTK
  • 24.
    What You WillLearn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP Sreejith S Getting Started with NLTK
  • 25.
    What You WillLearn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques Sreejith S Getting Started with NLTK
  • 26.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Sreejith S Getting Started with NLTK
  • 27.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Sreejith S Getting Started with NLTK
  • 28.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Sreejith S Getting Started with NLTK
  • 29.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it Sreejith S Getting Started with NLTK
  • 30.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Sreejith S Getting Started with NLTK
  • 31.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Sreejith S Getting Started with NLTK
  • 32.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install Sreejith S Getting Started with NLTK
  • 33.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Sreejith S Getting Started with NLTK
  • 34.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Sreejith S Getting Started with NLTK
  • 35.
    Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Now you are ready to play with NLTK !!! Sreejith S Getting Started with NLTK
  • 36.
    NLTK Modules NLTK Modules Functionality Sreejith S Getting Started with NLTK
  • 37.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus Sreejith S Getting Started with NLTK
  • 38.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers Sreejith S Getting Started with NLTK
  • 39.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info Sreejith S Getting Started with NLTK
  • 40.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT Sreejith S Getting Started with NLTK
  • 41.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means Sreejith S Getting Started with NLTK
  • 42.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity Sreejith S Getting Started with NLTK
  • 43.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing Sreejith S Getting Started with NLTK
  • 44.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation Sreejith S Getting Started with NLTK
  • 45.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics Sreejith S Getting Started with NLTK
  • 46.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation Sreejith S Getting Started with NLTK
  • 47.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 48.
    NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 49.
    Let us startthe game To access data for working out the example in the book Start python interpreter Sreejith S Getting Started with NLTK
  • 50.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 51.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance Sreejith S Getting Started with NLTK
  • 52.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Sreejith S Getting Started with NLTK
  • 53.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar Sreejith S Getting Started with NLTK
  • 54.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Sreejith S Getting Started with NLTK
  • 55.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information Sreejith S Getting Started with NLTK
  • 56.
    Let us startthe game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information >>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"]) >>> text4.dispersion_plot(["and", "to", "of", "with", "the"]) What is it !!! Why ??? Sreejith S Getting Started with NLTK
  • 57.
    Continued... Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 58.
    Continued... Some basic work outs from the book Generate Sreejith S Getting Started with NLTK
  • 59.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Sreejith S Getting Started with NLTK
  • 60.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary Sreejith S Getting Started with NLTK
  • 61.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) Sreejith S Getting Started with NLTK
  • 62.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. Sreejith S Getting Started with NLTK
  • 63.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Sreejith S Getting Started with NLTK
  • 64.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text Sreejith S Getting Started with NLTK
  • 65.
    Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text >>> text3.count("and") What percentage of text it is taken by a specific word >>> 100 * text3.count("and") / len(text3) Sreejith S Getting Started with NLTK
  • 66.
    Collocation & Bigram Sreejith S Getting Started with NLTK
  • 67.
    Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation Sreejith S Getting Started with NLTK
  • 68.
    Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Sreejith S Getting Started with NLTK
  • 69.
    Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs Sreejith S Getting Started with NLTK
  • 70.
    Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) Sreejith S Getting Started with NLTK
  • 71.
    Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) what will happen if i do like this >>> bigrams(text) Sreejith S Getting Started with NLTK
  • 72.
    Work with ourown data Populate our own corpora with NLTK and analyse it Sreejith S Getting Started with NLTK
  • 73.
    Work with ourown data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Sreejith S Getting Started with NLTK
  • 74.
    Work with ourown data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus Sreejith S Getting Started with NLTK
  • 75.
    Work with ourown data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus >>> for fid in wordlist.fileids(): print len(wordlist.raw(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.words(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.sents(fid)) Sreejith S Getting Started with NLTK
  • 76.
    Continued... Ploting conditional frquency distribution Sreejith S Getting Started with NLTK
  • 77.
    Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Sreejith S Getting Started with NLTK
  • 78.
    Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD Sreejith S Getting Started with NLTK
  • 79.
    Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Sreejith S Getting Started with NLTK
  • 80.
    Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution Sreejith S Getting Started with NLTK
  • 81.
    Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution >>> fdist = FreqDist(text1) >>> fdist.plot(50,cumulative=True) Sreejith S Getting Started with NLTK
  • 82.
    Normalizing Text Sreejith S Getting Started with NLTK
  • 83.
    Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form Sreejith S Getting Started with NLTK
  • 84.
    Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form >>> porter = nltk.PorterStemmer() >>> word = ’running’ >>> porter.stem(word) >>> lancaster = nltk.LancasterStemmer() >>> lancaster.stem(tok[2]) Sreejith S Getting Started with NLTK
  • 85.
    Normalizing Text Sreejith S Getting Started with NLTK
  • 86.
    Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary Sreejith S Getting Started with NLTK
  • 87.
    Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary >>> wnl = nltk.WordNetLemmatizer() >>> wnl.lemmatize(word) Sreejith S Getting Started with NLTK
  • 88.
    POS Tagging Sreejith S Getting Started with NLTK
  • 89.
    POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging Sreejith S Getting Started with NLTK
  • 90.
    POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging >>> text = nltk.word_tokenize("we are attending FOSS meet at NIC calicut") >>> nltk.pos_tag(text) Sreejith S Getting Started with NLTK
  • 91.
    Parsing Sreejith S Getting Started with NLTK
  • 92.
    Parsing SentenceParsing Analyzing sentence structures and create a Parse Tree Sreejith S Getting Started with NLTK
  • 93.
    Parsing SentenceParsing Analyzing sentence structures and create a Parse Tree >>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN>}" >>> cp = nltk.RegexpParser(grammar) >>> result = cp.parse(sentence) >>> print result >>> result.draw() Sreejith S Getting Started with NLTK
  • 94.
    Machine Translation Sreejith S Getting Started with NLTK
  • 95.
    Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell Sreejith S Getting Started with NLTK
  • 96.
    Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Sreejith S Getting Started with NLTK
  • 97.
    Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Just try Google Translator, Yahoo babelfish Sreejith S Getting Started with NLTK
  • 98.
    What u cando?? Contribute to NLTK GSOC NLP Training Real time research Sreejith S Getting Started with NLTK
  • 99.
    Reference Steven Bird, Edvard Loper and Ewan Klien Natural Language Processing with Python Jacob Perkins Python Text Processing with NLTK2.0 Cookbook http://www.nltk.org Sreejith S Getting Started with NLTK
  • 100.
    Questions Sreejith S Getting Started with NLTK
  • 101.
    And finally... Sreejith.S Sreejith S Getting Started with NLTK