Inteligencia artificial 12

Inteligência Artiﬁcial
Aula 12

Python para processamento de Texto

>>>inputstring = ' This is an example sent. The sentence
splitter will split on sent markers. Ohh really !!'
>>>from nltk.tokenize import sent_tokenize
>>>all_sent = sent_tokenize(inputstring)
>>>print all_sent
[' This is an example sent', 'The sentence splitter will split
on markers.','Ohh really !!']

>>>import nltk.tokenize.punkt
>>>tokenizer =
nltk.tokenize.punkt.PunktSentenceTokenizer()

Snowball stemmers that can be used for Dutch, English,
French, German, Italian, Portuguese, Romanian, Russian,
and so on

For Snowball Stemmer, which is based on Snowball
Stemming Algorithm, can be used in NLTK like this:
>>> from nltk.stem import SnowballStemmer
>>> snowball_stemmer = SnowballStemmer(“english”)
>>> snowball_stemmer.stem(‘maximum’)
u’maximum’
>>> snowball_stemmer.stem(‘presumably’)
u’presum’
>>> snowball_stemmer.stem(‘multiply’)
u’multipli’
>>> snowball_stemmer.stem(‘provision’)
u’provis’
>>> snowball_stemmer.stem(‘owed’)
u’owe’
>>> snowball_stemmer.stem(‘ear’)
u’ear’

>>>from nltk.corpus import stopwords
>>>stoplist = stopwords.words('english') # conﬁg the
language name
# NLTK supports 22 languages for removing the stop
words
>>>text = "This is just a test"
>>>cleanwordlist = [word for word in text.split() if word not
in stoplist]
# apart from just and test others are stopwords
['test']
StopWords

>>># tokens is a list of all tokens in corpus
>>>freq_dist = nltk.FreqDist(token)
>>>rarewords = freq_dist.keys()[-50:]
>>>after_rare_words = [ word for word in token not in
rarewords]
Rarewords

>>>import nltk
>>>from nltk import word_tokenize
>>>s = "I was watching TV"
>>>print nltk.pos_tag(word_tokenize(s))
[('I', 'PRP'), ('was', 'VBD'), ('watching', 'VBG'), ('TV', 'NN')]
What is Part of speech tagging

>>>from nltk.corpus import brown
>>>import nltk
>>>tags = [tag for (word, tag) in
brown.tagged_words(categories='news')]
>>>print nltk.FreqDist(tags)
<FreqDist: 'NN': 13162, 'IN': 10616, 'AT': 8893, 'NP': 6866, ',':
5133, 'NNS': 5066, '.': 4452, 'JJ': 4392 >

clusters <- hclust(dist(iris[, 3:4]))
plot(clusters)

clusterCut <- cutree(clusters, 3)

clusters <- hclust(dist(iris[, 3:4]),
method = 'average')
plot(clusters)

ggplot(iris, aes(Petal.Length, Petal.Width, color = iris
$Species)) +
geom_point(alpha = 0.4, size = 3.5) + geom_point(col =
clusterCut) +
scale_color_manual(values = c('black', 'red', 'green'))

Inteligencia artificial 12

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Inteligencia artificial 12

Similar to Inteligencia artificial 12 (20)

More from Nauber Gois

More from Nauber Gois (20)

Recently uploaded

Recently uploaded (20)

Inteligencia artificial 12