NLTK

Text Analytics With
NLTK
Girish Khanzode

Contents
• Tokenization
• Corpuses
• Frequency Distribution
• Stylistics
• SentenceTokenization
• WordNet
• Stemming
• Lemmatization
• Part of SpeechTagging
• Tagging Methods
• UnigramTagging
• N-gramTagging
• Chunking – Shallow Parsing
• Entity Recognition
• SupervisedClassification
• DocumentClassification
• Hidden Markov Models - HMM
• References

NLTK
• A set of Python modules to carry out many common natural language
tasks.
• Basic classes to represent data for NLP
• Infrastructure to build NLP programs in Python
• Python interface to over 50 corpora and lexical resources
• Focus on Machine Learning with specific domain knowledge
• Free and Open Source

NLTK
• Numpy and Scipy under the hood
• Fast and Formal
• Standard interfaces for tokenization, part-of-speech tagging, syntactic parsing
and text classification
• Windows:
>>> import nltk
>>> nltk.download('all')
• Linux
$ pip install --upgrade nltk

NLTK -Top-Level Organization
• Organized as a flat hierarchy of packages and modules
• Each module provides the tools necessary to address a specific task
• Modules has two types of classes
– Data-oriented classes
• Used to represent information relevant to natural language processing.
– Task-oriented classes
• Encapsulate the resources and methods needed to perform a specific task.

Modules
• Token - classes for representing and processing individual elements of
text, such as words and sentences
• Probability - classes for representing and processing probabilistic
information
• Tree - classes for representing and processing hierarchical information
over text
• Cfg - classes for representing and processing context free grammars

Modules
• Tagger - tagging each word with a part-of-speech, a sense, etc
• Parser - building trees over text (includes chart, chunk and probabilistic
parsers)
• Classifier - classify text into categories (includes feature,
featureSelection, maxent, naivebayes)
• Draw - visualize NLP structures and processes
• Corpus - access (tagged) corpus data

Tokenization
• Simplest way to represent a text is with a single string
• Difficult to process text in this format
• Convenient to work with a list of tokens
• Task of converting a text from a single string to a list of tokens is known as
tokenization
• The most basic natural language processing technique
• Example -WordTokenization
Input : “Hey there, How are you all?”
Output : “Hey”, “there,”, “How”, “are”, “you”, “all?”

Tokens andTypes
• The term word can be used in two different ways
– To refer to an individual occurrence of a word
– To refer to an abstract vocabulary item
• For example, the sentence “my dog likes his dog” contains five occurrences of
words, but four vocabulary items

Tokens andTypes
• To avoid confusion use more precise terminology
– Word token - an occurrence of a word
– WordType - a vocabulary item
• Tokens constructed from their types using theToken constructor
• Token member functions - type and loc

Tokens andTypes
>>> from nltk.token import *
>>> my_word_type = 'dog‘
'dog’
>>> my_word_token =Token(my_word_type) ‘dog'@[?]

Text Locations
• Text location @ [s:e] specifies a region of a text
– s is the start index
– e is the end index
• Specifies the text beginning at s, and including everything up to (but not
including) the text at e
• Consistent with Python slice

Text Locations
• Think of indices as appearing between elements
– I saw a man
– 0 1 2 3 4
• Shorthand notation when location width = 1
• Indices based on different units
– character
– word
– sentence

Text Locations
• Locations tagged with sources
– files, other text locations – the first word of the first sentence in the file
• Location member functions
– start
– end
– unit
– source

Text Corpus
• Large collection of text
• Concentrate on a topic or open domain
• May be raw text or annotated / categorized

Corpuses
• Gutenberg - selection of e-books from Project Gutenberg
• Webtext - forum discussions, reviews, movie script
• nps_chat - anonymized chats
• Brown - 1 million word corpus, categorized by genre
• Reuters - news corpus
• Inaugural - inaugural addresses of presidents
• Udhr - multilingual corpus

Accessing Corpora
• Corpora on disk - text files
• NLTK provides Python modules / functions / classes that allow for
accessing the corpora in a convenient way
• It is quite an effort to write functions that read in a corpus especially when
it comes with annotations
• The task of reading in a corpus is needed in many NLP projects

Accessing Corpora
• # tell Python we want to use the Gutenberg corpus
• from nltk.corpus import gutenberg
• # which files are in this corpus?
• print(gutenberg.fileids())
• >>> ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-
kjv.txt', ...]

Accessing Corpora - RawText
• # get the raw text of a corpus = one string
• >>> emmaText = gutenberg.raw("austen-emma.txt")
• # print the first 289 characters of the text
• >>> emmaText = gutenberg.raw("austen-emma.txt")
• >>> emmaText[:289]
• '[Emma by Jane Austen 1816]nnVOLUME InnCHAPTER InnnEmmaWoodhouse, handsome, clever,
and rich, with a comfortable homenand happy disposition, seemed to unite some of the best
blessingsnof existence; and had lived nearly twenty-one years in the worldnwith very little to distress or
vex her.‘

Accessing Corpora -Words
• # get the words of a corpus as a list
• emmaWords = gutenberg.words("austen-emma.txt")
• # print the first 30 words of the text
• >>> print(emmaWords[:30])
• ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']', 'VOLUME', 'I', 'CHAPTER', 'I', 'Emma',
'Woodhouse‘, 'handsome', ',', 'clever', ',', 'and', 'rich', ',', 'with', 'a', 'comfortable', 'home',
'and‘, 'happy', 'disposition', ',', 'seemed']

Accessing Corpora: Sentences
• # get the sentences of a corpus as a list of lists - one list of words per sentence
• >>> senseSents = gutenberg.sents("austen-sense.txt")
• # print out the first four sentences
• >>> print(senseSents[:4])
• [['[', 'Sense', 'and', 'Sensibility', 'by', 'Jane', 'Austen', '1811', ']'], ['CHAPTER', '1'], ['The',
'family', 'of', 'Dashwood', 'had', 'long‘, 'been', 'settled', 'in', 'Sussex', '.'], ['Their', 'estate',
'was', 'large', ',', 'and‘, 'their', 'residence', 'was', 'at', ...]]

Counting
• Use Inaugural Address text.
• >>> from nltk.book import text4
• Counting vocabulary: the length of a
text from start to fnish
• >>> len(text4)
• 145735
• How many distinct words?
• >>> len(set(text4)) #types
• 9754
• Richness of the text.
• >>> len(text4) / len(set(text4))
• 14.941049825712529
• >>> 100 * text4.count('democracy') /
len(text4)
• 0.03568120218204275

Positions of a Word inText
Lexical Dispersion Plot

List Elements Operations
• List comprehension
– >>> len(set([word.lower() for word in
text4 if len(word)>5]))
– 7339
– >>> [w.upper() for w in text4[0:5]]
– ['FELLOW', '-', 'CITIZENS', 'OF', 'THE']
• Loops and conditionals
• For word in text4[0:5]:
if len(word)<5 and word.endswith('e'):
print word, ' is short and ends with e‘
elif word.istitle():
print word, ' is a titlecase word‘
else:
print word, 'is just another word'

Brown Corpus
• First million-word electronic corpus of English
• Created at Brown University in 1961
• Text from 500 sources, categorized by genre
• >>> from nltk.corpus import brown
• >>> print(brown.categories())
• ['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor‘,
'learned', 'lore', 'mystery', 'news', 'religion‘, 'reviews', 'romance', 'science_fiction']

Brown Corpus – RetrieveWords by Category
• >>> from nltk.corpus import brown
• >>> news_words = brown.words(categories = "news")
• >>> print(news_words)
• ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation',
'of', "Atlanta's“, 'recent', 'primary', 'election', 'produced', ...]

Brown Corpus – RetrieveWords by Category
• >>> adv_words = brown.words(categories = "adventure")
• >>> print(adv_words)
• ['Dan', 'Morgan', 'told', 'himself', 'he', 'would‘, 'forget', 'Ann', 'Turner', '.', ...]
• >>> reli_words = brown.words(categories = "religion")
• >>> print(reli_words)
• ['As', 'a', 'result', ',', 'although', 'we', 'still', 'make', 'use', 'of', 'this', 'distinction', ',',...]

Frequency Distribution
• Records how often each item occurs in a list of words
• Frequency distribution over words
• Basically a dictionary with some extra functionality
• init creates a frequency distribution from a list of words

• >>>news_words = brown.words(categories = "news")
• >>>fdist = nltk.FreqDist(news_words)
• >>>print("shoe:", fdist["shoe"])
• >>>print("the: ", fdist["the"])

• # show the 10 most frequent words & frequencies
• >>>fdist.tabulate(10)
• the , . Of and to a in for The
• 5580 5188 4030 2849 2146 2116 1993 1893 943 806

Plot Frequency Distribution
• Create a plot of the 10 most frequent words
• >>>fdist.plot(10)

Stylistics
• Systematic differences between genres
• Brown corpus with its categories is a convenient resource
• Is there a difference in how the modal verbs (can, could, may, might,
must, will) are used in the genres?
• Let us look at the frequency distribution

Stylistics
• from nltk import FreqDist
• # Define modals of interest
• >>>modals = ["may", "could", "will"]
• # Define genres of interest
• >>>genres = ["adventure", "news",
"government", "romance"]
• # count how often they occur in the genres
of interest
• >>>for g in genres:
• >>>words = brown.words(categories = g)
• >>>fdist = FreqDist([w.lower() for w in
words
• >>> if w.lower() in modals])
• >>>print g, fdist

Conditional Frequency Distributions
• >>>from nltk import ConditionalFreqDist
• >>>cfdist = ConditionalFreqDist()
• >>>for g in genres:
words = brown.words(categories = g)
for w in words
if w.lower() in modals:
cfdist[g].inc(w.lower())
• >>> cfdist.tabulate()
could may will
Adventure 154 7 51
Government 38 179 244
News 87 93 389
Romance 195 11 49
• >>>cfdist.plot(title="Modals in various Genres")

Conditional Frequency Distributions

Processing RawText
• Assume you have a text file on your disk...
• # Read the text
• >>> path = "holmes.txt“
• >>> f = open(path)
• >>> rawText = f.read()
• >>> f.close()
• >>> print(rawText[:165])
• THE ADVENTURES OF SHERLOCK HOLMES
• By
• SIR ARTHUR CONAN DOYLE
I. A Scandal in Bohemia
II.The Red-headed League

SentenceTokenization
• # Split the text up into sentences
• >>> sents = nltk.sent_tokenize(raw)
• >>> print(sents[20:22])
• ['I had seen little of Holmes lately.', 'My marriage had drifted usrnaway from
each other.‘, ...]

WordTokenization
• >>># Tokenize the sentences using nltk
• >>>tokens = []
• >>>for sent in sents:
tokens += nltk.word_tokenize(sent)
• >>>print(tokens[300:350])
• [’such’, ’as’, ’his’, ’.’, ’And’, ’yet’, ’there’, ’was’, ’but’, ’one’, ’woman’, ’to’, ’him’, ’,’, ’and’, ’that’,
’woman’, ’was’, ’the’, ’late’, ’Irene’, ’Adler’, ’,’, ’of’, ’dubious’, ’and’, ’questionable’, ’memory’,
...]

Creating aText Object
• Using a list of tokens, we can create an nltk.Text object for a document.
• Collocations = terms that occur together unusually often
• Concordance view = shows the contexts in which a token occurs

Creating aText Object
• >>># Create a text object
• >>>text = nltk.Text(tokens)
• >>># Do stuff with the text object
• >>>print(text.collocations())
• Sherlock Holmes; said Holmes; St. Simon; Baker Street; Lord St.; St. Clair; Mr.
Holmes; HosmerAngel; Irene Adler; Miss Hunter; young lady; Briony Lodge; Stoke
Moran; Neville St.; Miss Stoner; ScotlandYard; could see; Mr. Holmes.; Boscombe
Pool; Mr. Rucastle

ConcordanceView
• >>>print(text.concordance("Irene"))
• >>>Building index...
• >>>Displaying 17 of 17 matches:
• to love for IreneAdler . All emotions , and that one
• was the late IreneAdler , of dubious and questionable
• dventuress , IreneAdler .The name is no doubt familia
• nd . " " And IreneAdler ? " "Threatens to send them t
• se , of Miss IreneAdler . " " Quite so ; but the seque
• And what of IreneAdler ? " I asked . " Oh , she has t
• tying up of IreneAdler , spinster , to Godfrey Norton
• ction . Miss Irene , or Madame , rather , returns from
• ...

Annotated Corpora
• Example -The/at Fulton/np-tl County/nn-tl Grand/jj-tl Jury/nn-tl said/vbd
Friday/nr an/at investigation/nn ...
• Some corpora come with annotations - POS tags, parse trees,...
• NLTK provides convenient access to these corpora (get the text + annotations)
• DependencyTree Bank (e.g. Penn): collection of (dependency-) parsed sentences
(manually annotated), can be used for training a statistical parser or parser
evaluation

WordNet
• Structured, semantically oriented English dictionary
• Synonyms, antonyms, hyponims, hypernims, depth of a synset, trees, entailments,
etc.
• >>> from nltk.corpus import wordnet as wn
• >>> wn.synsets('motorcar')
• [Synset('car.n.01')]
• >>> wn.synset('car.n.01').lemma_names
• ['car', 'auto', 'automobile', 'machine', 'motorcar']

WordNet
• >>> wn.synset('car.n.01').definition
• 'a motor vehicle with four wheels; usually propelled by an internal combustion engine'
• >>> for synset in wn.synsets('car')[1:3]:
• ... print synset.lemma_names
• ['car', 'railcar', 'railway_car', 'railroad_car'] ['car', 'gondola']
• >>> wn.synset('walk.v.01').entailments()
• #Walking involves stepping
• [Synset('step.v.01')]

Getting InputText - HTML
• >>> from urllib import urlopen
• >>> url = "http://www.bbc.co.uk/news/science-environment-21471908"
• >>> html = urlopen(url).read()
• html[:60]
• >>> raw = nltk.clean_html(html)
• '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http'
• >>> tokens = nltk.word_tokenize(raw)
• >>> tokens[:15]
• ['BBC', 'News', '-', 'Exoplanet', 'Kepler', '37b', 'is', 'tiniest‘, 'yet', '-', 'smaller', 'than', 'Mercury', 'Accessibility',
'links‘]

Getting InputText - User
• >>> s = raw_input("Enter some text: ")
• Use your own files on disk
• >>> f = open('C:DataFilesUK_natl_2010_en_Lab.txt')
• >>> raw = f.read()
• >>> print raw[:100]
• #Foreword by Gordon Brown
• This General Election is fought as our troops are bravely fighting to def

Import Files as Corpus
• >>> from nltk.corpus import PlaintextCorpusReader
• >>> corpus_root = "C:/Data/Files/"
• >>> wordlists = PlaintextCorpusReader(corpus_root, '.*.txt')
• >>> wordlists.fileids()[:3]
• ['UK_natl_1987_en_Con.txt', 'UK_natl_1987_en_Lab.txt',
• 'UK_natl_1987_en_LibSDP.txt']
• >>> wordlists.words('UK_natl_2010_en_Lab.txt')
• ['#', 'Foreword', 'by', 'Gordon', 'Brown', '.', 'This', ...]

Stemming
• Strip off affixes
• >>>porter = nltk.PorterStemmer()
• >>>[porter.stem(t) for t in tokens]
• Porter stemmer lying - lie, women - women
• >>>lancaster = nltk.LancasterStemmer()
• >>>[lancaster.stem(t) for t in tokens]
• Lancaster stemmer lying - lying, women - wom

Lemmatization
• Removes affixes if in dictionary
• >>>wnl = nltk.WordNetLemmatizer()
• >>>[wnl.lemmatize(t) for t in tokens]
• lying - lying, women - woman

Write Output to File
• Save separated sentences text to a new file
• >>>output_file = open('C:DataFilesoutput.txt', 'w')
• >>>words = set(sents)
• >>>for word in sorted(words):
• >>> output_file.write(word + "n")
• To write non-text data, first convert it to string - str()
• Avoid filenames that contain space characters or that are identical except for
case distinctions

Part of SpeechTagging
• POSTagging - Process of classifying words into their parts of speech &
labelling them accordingly
– Words grouped into classes, such as nouns, verbs, adjectives, and adverbs
• Parts of speech are also known as word classes or lexical categories
• The collection of tags used for a particular task is known as a tagset

Part of SpeechTagging
• NLTK tags text automatically
– Predicting the behaviour of previously unseen words
– Analyzing word usage in corpora
– Text-to-speech systems
– Powerful searches
– Classification

Tagging Methods
• Default tagger
• Regular expression tagger
• Unigram tagger
• N-gram taggers

Tagging Methods
• Can be combined using a technique known as backoff
– when a more specialized model (such as a bigram tagger) cannot assign a tag
in a given context, we backoff to a more general model (such as a unigram
tagger)
• Taggers can be trained and evaluated using tagged corpora

Tagging Examples
• Some corpora already tagged
• >>> nltk.corpus.brown.tagged_words()
• [('The', 'AT'), ('Fulton', 'NP-TL'), ...]
• A simple example
• >>> nltk.pos_tag(text)
• [('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')]
– CC is coordinating conjunction; RB is adverb; IN is preposition; NN is noun; JJ is adjective
– Lots of others - foreign term, verb tenses, “wh” determiner etc

Tagging Examples
• An example with homonyms
• >>> text = nltk.word_tokenize("They refuse to permit us to obtain the
refuse permit")
• >>> nltk.pos_tag(text)
• [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit‘, 'VB'), ('us', 'PRP'),
('to', 'TO'), ('obtain', 'VB'), ('the‘, 'DT'), ('refuse', 'NN'), ('permit', 'NN')]

UnigramTagging
• Unigram tagging - nltk.UnigramTagger()
– Assign the tag that is most likely for that particular token
– Train it specifying tagged sentence data as a parameter when we initialize the
tagger
– Separate training and testing data

N-gramTagging
• Context is the current word together with the part-of-speech
• Tags of the n-1 preceding tokens
• Evaluate performance
• Contexts that were not present in the training data – accuracy vs. Coverage
• Combine taggers

Information Extraction
• Search large bodies of unrestricted
text for specific types of entities and
relations
• Move these in well-organized
databases
• Use these databases to find answers
for specific questions

Information Extraction - Steps
• Segmenting, tokenizing, and part-of-speech tagging the text
• Search resulting data for specific types of entity
• Examine entities that are mentioned near one another in the text to
determine if specific relationships hold between those entities

Chunking – Shallow Parsing
• Analyzes a sentence to identify the constituents noun groups, verbs, verb groups etc
• However, it does not specify their internal structure, nor their role in the main sentence
• The smaller boxes show word-level tokenization and part-of-speech tagging, while large
boxes show higher-level chunking
• Each of these larger boxes is called a chunk
• Like tokenization, which omits whitespace, chunking usually selects a subset of the tokens
• Like tokenization, the pieces produced by a chunker do not overlap in the source text

Entity Recognition
• Entity recognition performed using chunkers
– Segment multi-token sequences and label them with the appropriate entity type
– ORGANIZATION, PERSON, LOCATION, DATE,TIME, MONEY, and GPE (geo-political
entity)
• Constructing chunkers
– Use rule-based systems like RegexpParser class from NLTK
– Using machine learning techniques like ConsecutiveNPChunker
– POS tags are very important in this context.

Relation Extraction
• Rule-based systems - look for specific patterns in the text that connect
entities and the intervening words
• Machine-learning systems - attempt to learn patterns automatically from
a training corpus

ProcessingText
• Choose a particular class label for a given input
• Identify particular features of language data that are salient for classifying it
• Construct models of language that can be used to perform language processing
tasks automatically
• Learn about text/language from these models
• Machine learning techniques
– Decision trees
– Naive Bayes' classifiers
– Maximum entropy classifiers

Applications
• Determining the topic of an article or a book
• Deciding if an email is spam or not
• Determining who wrote a text
• Determining the meaning of a word in a particular context
• Open-class classification - set of labels is not defined in advance
• Multi-class classification - each instance may be assigned multiple labels
• Sequence classification - a list of inputs are jointly classified

Example – Identify Gender by Name
• Relevant feature: last letter
• Create a feature set (a dictionary) that maps feature’s names to their values
– >>>def gender_features(word):
– >>>return {'last_letter': word[-1]}
• Import names, shuffle them
– >>>from nltk.corpus import names
– >>>import random
– >>>names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for
name in names.words('female.txt')])
– >>>random.shuffle(names)

• Divide list of features into training set and test set
– >>>featuresets = [(gender_features(n), g) for (n,g) in names]
– >>>from nltk.classify import apply_features
– >>>#Use apply if you're working with large corpora
– >>>train_set = apply_features(gender_features, names[500:])
– >>>test_set = apply_features(gender_features, names[:500])
• Use training set to train a naive Bayes classier
– >>>classifier = nltk.NaiveBayesClassifier.train(train_set)

• Test the classier on unseen data
– >>> classifier.classify(gender_features('Neo'))
– >>>'male'
– >>> classifier.classify(gender_features('Trinity'))
– >>>'female‘
• >>> print nltk.classify.accuracy(classifier, test_set)
– >>>0.744

• Examine the classier to see which feature is most effective at distinguishing
between classes
• >>> classifier.show_most_informative_features(5)
• Most Informative Features
• last_letter = 'a' female : male = 35.7 : 1.0
• last_letter = 'k' male : female = 31.7 : 1.0
• last_letter = 'f' male : female = 16.6 : 1.0
• last_letter = 'p' male : female = 11.9 : 1.0
• last_letter = 'v' male : female = 10.5 : 1.0

Example - Document Classification
• Use corpora where documents have been labelled with categories
– Build classifiers that will automatically tag new documents with appropriate
category labels
• Use the movie review corpus, which categorizes reviews as positive or
negative to construct a list of documents
• Define a feature extractor for documents - feature for each of the most
frequent 2000 words in the corpus
• Define a feature extractor that checks if words are present in a document
• Train a classier to label new movie reviews

Document Classification
• Compute accuracy on the test set
– >>> print nltk.classify.accuracy(classifier, test_set)
– >>> 0.79
• Evaluation issues: size of the test set depends on number of labels, their balance and the diversity of the test.
• Show most informative features
• >>> classifier.show_most_informative_features(5)
– Most Informative Features
– contains(outstanding) =True pos : neg = 11.2 : 1.0
– contains(mulan) =True pos : neg = 8.9 : 1.0
– contains(wonderfully) =True pos : neg = 8.5 : 1.0
– contains(seagal) =True neg : pos = 8.3 : 1.0
– contains(damon) =True pos : neg = 6.0 : 1.0

Context
• Contextual features often provide powerful clues for
classification
• Context-dependent feature extractor - pass in a complete
(untagged) sentence, along with the index of the target word
• Joint classier models - choose an appropriate labelling for a
collection of related inputs

Sequence Classification
• Jointly choose part-of-speech tags for all the words in a given
sentence
• Consecutive classification - find the most likely class label for
the first input, then to use that answer to help find the best
label for the next input, repeat
• Feature extraction function needs to take a history argument
- list of tags predicted so far

Hidden Markov Models - HMM
• Use inputs and the history of predicted tags
• Generate a probability distribution over tags
• Combine probabilities to calculate scores for sequences
• Choose tag sequence with the highest probability

More Advanced Models
• Maximum Entropy Markov Models
• Linear-ChainConditional Random Field Models

References
1. Indurkhya, Nitin and Fred Damerau (eds, 2010) Handbook of Natural Language Processing (Second
Edition)Chapman & Hall/CRC. 2010. (Indurkhya & Damerau, 2010) (Dale, Moisl, & Somers, 2000)
2. Jurafsky, Daniel and James Martin (2008) Speech and Language Processing (Second Edition). Prentice
Hall. (Jurafsky & Martin, 2008)
3. Mitkov, Ruslan (ed, 2003)The Oxford Handbook of Computational Linguistics.Oxford University Press.
(second edition expected in 2010). (Mitkov, 2002)
4. Bird, Steven; Klein, Ewan; Loper, Edward (2009). Natural Language Processing with Python. O'Reilly
Media Inc
5. Perkins, Jacob (2010). PythonText Processing with NLTK 2.0 Cookbook. Packt Publishing
6. Bird, Steven; Klein, Ewan; Loper, Edward; Baldridge, Jason (2008) Proceedings of theThirdWorkshop
on Issues inTeaching Computational Linguistics,ACL

ThankYou
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode

NLTK

More Related Content

What's hot

Viewers also liked

Similar to NLTK

More from Girish Khanzode

Recently uploaded

NLTK