SlideShare a Scribd company logo
Getting Started with NLTK
    An Introduction to NLTK


           Sreejith S
     srssreejith@gmail.com
          @tweet2sree

     FOSSMeet 2011,NIC Calicut


       06 February 2011




        Sreejith S   Getting Started with NLTK
Just a word about me !!




     Working in Natural Language Processing (NLP), Machine Learning,
     Text Mining
     Active member of ilugcbe , http://ilugcbe.techstud.org
     Works for 365Media Pvt. Ltd. Coimbatore India.
     @tweet2sree , srssreejith@gmail.com




                             Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing




                           Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP
         Handwriting recognition,Machine translation,Question-answering
         systems,Spell checkers,Grammer checkers etc...




                              Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible
     http://www.nltk.org




                             Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs




                           Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP
     How language data is stored in standard formats, and how data can
     be used to evaluate the performance of NLP techniques




                            Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()
     Now you are ready to play with NLTK !!!



                              Sreejith S   Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality

  nltk.corpus                 Courpus




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
             >>> from nltk.book import *
             >>> text1.concordance("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information
              >>> text4.dispersion_plot(["citizens",
         "democracy", "freedom", "duties", "America"])

             >>> text4.dispersion_plot(["and",
         "to", "of", "with", "the"])
         What is it !!! Why ???



                              Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary
             >>> len(text3)




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text
              >>> text3.count("and")

               What percentage of text it is taken by a specific word
               >>> 100 * text3.count("and") / len(text3)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram




                       Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
      >>> text4.collocations()




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)
  what will happen if i do like this
       >>> bigrams(text)


                                Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus




                            Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus
         >>> for fid in wordlist.fileids():
                print len(wordlist.raw(fid))
         >>> for fid in wordlist.fileids():
                print len(wordlist.words(fid))

         >>> for fid in wordlist.fileids():
                print len(wordlist.sents(fid))



                            Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution
          >>> fdist = FreqDist(text1)
          >>> fdist.plot(50,cumulative=True)




                              Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form




                               Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form
       >>> porter = nltk.PorterStemmer()
       >>> word = ’running’
       >>> porter.stem(word)

       >>> lancaster = nltk.LancasterStemmer()
       >>> lancaster.stem(tok[2])




                               Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary




                             Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary
      >>> wnl = nltk.WordNetLemmatizer()
      >>> wnl.lemmatize(word)




                             Sreejith S   Getting Started with NLTK
POS Tagging




              Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging




                               Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging
       >>> text = nltk.word_tokenize("we are attending
                  FOSS meet at NIC calicut")
       >>> nltk.pos_tag(text)




                               Sreejith S   Getting Started with NLTK
Parsing




          Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree




                              Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree

      >>> sentence = [("the", "DT"), ("little", "JJ"),
          ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
          ("at", "IN"), ("the", "DT"), ("cat", "NN")]
      >>> grammar = "NP: {<DT>?<JJ>*<NN>}"
      >>> cp = nltk.RegexpParser(grammar)
      >>> result = cp.parse(sentence)
      >>> print result
      >>> result.draw()




                              Sreejith S   Getting Started with NLTK
Machine Translation




                      Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run


      Just try Google Translator, Yahoo babelfish




                              Sreejith S   Getting Started with NLTK
What u can do??




     Contribute to NLTK
     GSOC
     NLP Training
     Real time research




                          Sreejith S   Getting Started with NLTK
Reference




     Steven Bird, Edvard Loper and Ewan Klien
     Natural Language Processing with Python
     Jacob Perkins
     Python Text Processing with NLTK2.0 Cookbook
     http://www.nltk.org




                            Sreejith S   Getting Started with NLTK
Questions




            Sreejith S   Getting Started with NLTK
And finally...




                                                         Sreejith.S



                Sreejith S   Getting Started with NLTK

More Related Content

What's hot

NLP
NLPNLP
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
Nltk
NltkNltk
Nltk
Anirudh
 
Nlp
NlpNlp
NLTK
NLTKNLTK
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in ai
Ram Kumar
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
rohitnayak
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
Jacob Perkins
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
VeenaSKumar2
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Edureka!
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Varunjeet Singh Rekhi
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Edureka!
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
Illia Polosukhin
 

What's hot (20)

NLP
NLPNLP
NLP
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Nltk
NltkNltk
Nltk
 
Nlp
NlpNlp
Nlp
 
NLTK
NLTKNLTK
NLTK
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in ai
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 

Similar to Introduction to NLTK

Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)
Mokhtar Ebrahim
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
Jimmy Lai
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
Yousef Aburawi
 
nlp ppt.pdf
nlp ppt.pdfnlp ppt.pdf
nlp ppt.pdf
SaiKiran983895
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
KarenVacca
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
Lakshya Sivaramakrishnan
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Technical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with PythonTechnical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with Python
Michelle Purnama
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
MobileMonday Estonia
 
Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1
MAHALAKSHMI P
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
Fitsum36
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with Python
Nhi Nguyen
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Getting started with Linux and Python by Caffe
Getting started with Linux and Python by CaffeGetting started with Linux and Python by Caffe
Getting started with Linux and Python by Caffe
Lihang Li
 
Nlp final
Nlp finalNlp final
Nlp final
HARISHREDDY282
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
Sushanti Acharya
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
Alyona Medelyan
 
Eclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India 2015 - Keynote - Stephan HerrmannEclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
Shreyas Suresh Rao
 

Similar to Introduction to NLTK (20)

Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
nlp ppt.pdf
nlp ppt.pdfnlp ppt.pdf
nlp ppt.pdf
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Technical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with PythonTechnical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with Python
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
 
Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with Python
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Getting started with Linux and Python by Caffe
Getting started with Linux and Python by CaffeGetting started with Linux and Python by Caffe
Getting started with Linux and Python by Caffe
 
Nlp final
Nlp finalNlp final
Nlp final
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
Eclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India 2015 - Keynote - Stephan HerrmannEclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India 2015 - Keynote - Stephan Herrmann
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

Introduction to NLTK

  • 1. Getting Started with NLTK An Introduction to NLTK Sreejith S srssreejith@gmail.com @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreejith S Getting Started with NLTK
  • 2. Just a word about me !! Working in Natural Language Processing (NLP), Machine Learning, Text Mining Active member of ilugcbe , http://ilugcbe.techstud.org Works for 365Media Pvt. Ltd. Coimbatore India. @tweet2sree , srssreejith@gmail.com Sreejith S Getting Started with NLTK
  • 3. Introduction - NLP Natural Language Processing Sreejith S Getting Started with NLTK
  • 4. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Sreejith S Getting Started with NLTK
  • 5. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Sreejith S Getting Started with NLTK
  • 6. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Sreejith S Getting Started with NLTK
  • 7. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... Sreejith S Getting Started with NLTK
  • 8. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence Sreejith S Getting Started with NLTK
  • 9. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. Sreejith S Getting Started with NLTK
  • 10. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Sreejith S Getting Started with NLTK
  • 11. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Sreejith S Getting Started with NLTK
  • 12. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Handwriting recognition,Machine translation,Question-answering systems,Spell checkers,Grammer checkers etc... Sreejith S Getting Started with NLTK
  • 13. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Sreejith S Getting Started with NLTK
  • 14. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien Sreejith S Getting Started with NLTK
  • 15. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Sreejith S Getting Started with NLTK
  • 16. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Sreejith S Getting Started with NLTK
  • 17. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Sreejith S Getting Started with NLTK
  • 18. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Sreejith S Getting Started with NLTK
  • 19. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Sreejith S Getting Started with NLTK
  • 20. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible Sreejith S Getting Started with NLTK
  • 21. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible http://www.nltk.org Sreejith S Getting Started with NLTK
  • 22. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs Sreejith S Getting Started with NLTK
  • 23. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language Sreejith S Getting Started with NLTK
  • 24. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP Sreejith S Getting Started with NLTK
  • 25. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques Sreejith S Getting Started with NLTK
  • 26. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Sreejith S Getting Started with NLTK
  • 27. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Sreejith S Getting Started with NLTK
  • 28. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Sreejith S Getting Started with NLTK
  • 29. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it Sreejith S Getting Started with NLTK
  • 30. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Sreejith S Getting Started with NLTK
  • 31. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Sreejith S Getting Started with NLTK
  • 32. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install Sreejith S Getting Started with NLTK
  • 33. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Sreejith S Getting Started with NLTK
  • 34. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Sreejith S Getting Started with NLTK
  • 35. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Now you are ready to play with NLTK !!! Sreejith S Getting Started with NLTK
  • 36. NLTK Modules NLTK Modules Functionality Sreejith S Getting Started with NLTK
  • 37. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus Sreejith S Getting Started with NLTK
  • 38. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers Sreejith S Getting Started with NLTK
  • 39. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info Sreejith S Getting Started with NLTK
  • 40. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT Sreejith S Getting Started with NLTK
  • 41. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means Sreejith S Getting Started with NLTK
  • 42. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity Sreejith S Getting Started with NLTK
  • 43. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing Sreejith S Getting Started with NLTK
  • 44. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation Sreejith S Getting Started with NLTK
  • 45. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics Sreejith S Getting Started with NLTK
  • 46. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation Sreejith S Getting Started with NLTK
  • 47. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 48. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 49. Let us start the game To access data for working out the example in the book Start python interpreter Sreejith S Getting Started with NLTK
  • 50. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 51. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance Sreejith S Getting Started with NLTK
  • 52. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Sreejith S Getting Started with NLTK
  • 53. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar Sreejith S Getting Started with NLTK
  • 54. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Sreejith S Getting Started with NLTK
  • 55. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information Sreejith S Getting Started with NLTK
  • 56. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information >>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"]) >>> text4.dispersion_plot(["and", "to", "of", "with", "the"]) What is it !!! Why ??? Sreejith S Getting Started with NLTK
  • 57. Continued... Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 58. Continued... Some basic work outs from the book Generate Sreejith S Getting Started with NLTK
  • 59. Continued... Some basic work outs from the book Generate >>> text3.generate() Sreejith S Getting Started with NLTK
  • 60. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary Sreejith S Getting Started with NLTK
  • 61. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) Sreejith S Getting Started with NLTK
  • 62. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. Sreejith S Getting Started with NLTK
  • 63. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Sreejith S Getting Started with NLTK
  • 64. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text Sreejith S Getting Started with NLTK
  • 65. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text >>> text3.count("and") What percentage of text it is taken by a specific word >>> 100 * text3.count("and") / len(text3) Sreejith S Getting Started with NLTK
  • 66. Collocation & Bigram Sreejith S Getting Started with NLTK
  • 67. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation Sreejith S Getting Started with NLTK
  • 68. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Sreejith S Getting Started with NLTK
  • 69. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs Sreejith S Getting Started with NLTK
  • 70. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) Sreejith S Getting Started with NLTK
  • 71. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) what will happen if i do like this >>> bigrams(text) Sreejith S Getting Started with NLTK
  • 72. Work with our own data Populate our own corpora with NLTK and analyse it Sreejith S Getting Started with NLTK
  • 73. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Sreejith S Getting Started with NLTK
  • 74. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus Sreejith S Getting Started with NLTK
  • 75. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus >>> for fid in wordlist.fileids(): print len(wordlist.raw(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.words(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.sents(fid)) Sreejith S Getting Started with NLTK
  • 76. Continued... Ploting conditional frquency distribution Sreejith S Getting Started with NLTK
  • 77. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Sreejith S Getting Started with NLTK
  • 78. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD Sreejith S Getting Started with NLTK
  • 79. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Sreejith S Getting Started with NLTK
  • 80. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution Sreejith S Getting Started with NLTK
  • 81. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution >>> fdist = FreqDist(text1) >>> fdist.plot(50,cumulative=True) Sreejith S Getting Started with NLTK
  • 82. Normalizing Text Sreejith S Getting Started with NLTK
  • 83. Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form Sreejith S Getting Started with NLTK
  • 84. Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form >>> porter = nltk.PorterStemmer() >>> word = ’running’ >>> porter.stem(word) >>> lancaster = nltk.LancasterStemmer() >>> lancaster.stem(tok[2]) Sreejith S Getting Started with NLTK
  • 85. Normalizing Text Sreejith S Getting Started with NLTK
  • 86. Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary Sreejith S Getting Started with NLTK
  • 87. Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary >>> wnl = nltk.WordNetLemmatizer() >>> wnl.lemmatize(word) Sreejith S Getting Started with NLTK
  • 88. POS Tagging Sreejith S Getting Started with NLTK
  • 89. POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging Sreejith S Getting Started with NLTK
  • 90. POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging >>> text = nltk.word_tokenize("we are attending FOSS meet at NIC calicut") >>> nltk.pos_tag(text) Sreejith S Getting Started with NLTK
  • 91. Parsing Sreejith S Getting Started with NLTK
  • 92. Parsing Sentence Parsing Analyzing sentence structures and create a Parse Tree Sreejith S Getting Started with NLTK
  • 93. Parsing Sentence Parsing Analyzing sentence structures and create a Parse Tree >>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN>}" >>> cp = nltk.RegexpParser(grammar) >>> result = cp.parse(sentence) >>> print result >>> result.draw() Sreejith S Getting Started with NLTK
  • 94. Machine Translation Sreejith S Getting Started with NLTK
  • 95. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell Sreejith S Getting Started with NLTK
  • 96. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Sreejith S Getting Started with NLTK
  • 97. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Just try Google Translator, Yahoo babelfish Sreejith S Getting Started with NLTK
  • 98. What u can do?? Contribute to NLTK GSOC NLP Training Real time research Sreejith S Getting Started with NLTK
  • 99. Reference Steven Bird, Edvard Loper and Ewan Klien Natural Language Processing with Python Jacob Perkins Python Text Processing with NLTK2.0 Cookbook http://www.nltk.org Sreejith S Getting Started with NLTK
  • 100. Questions Sreejith S Getting Started with NLTK
  • 101. And finally... Sreejith.S Sreejith S Getting Started with NLTK