SlideShare a Scribd company logo
1 of 25
Download to read offline
pa-pe-pi-po-
  Pure Python
Text Processing

Rodrigo Senra
rsenra@acm.org
PythonBrasil[7] - São Paulo
Anatomia do Blá
• Eu, Vocês e Python
• retrospectiva PythonBrasil[7] anos!
• pa-pe-pi-po-pure python text processing
• referências
• 1 palavra dos patrocinadores
Quem está aí ?
✓Profissionais de
    Informática

✓Desenvolvedores
✓Estudantes
✓Professores
✓1ª vez na PyConBrasil
✓Membros APyBr
•   Nenhuma resposta acima!
Cenas dos últimos capítulos...
[1] 2005 - BigKahuna
[2] 2006 - Show Pyrotécnico
           Iteradores, Geradores,Hooks,Decoradores
[3] 2007 - Show Pyrotécnico II
           Routing, RTSP, Twisted, GIS
[4] 2008 - ISIS-NBP
          Bibliotecas Digitais
[5] 2009 - Rest, Gtw e Compiladores
         SFC(Rede Petri) + ST(Pascal) > Ladder
[5] 2010 - Potter vs Voldemort:
           Lições ofidiglotas da prática pythonica
>>> type("bla")
<type 'str'>
>>> "".join(['pa',"pe",'''pi''',"""po"""])
'papepipo'
>>> str(2**1024)[100:120]
'21120113879871393357'
>>> 2**1024
1797693134862315907729305190789024733617976978942306572734
30081157732675805500963132708477322407536021120113879871393
3576587897688144166224928474306394741243777678934248654852
7630221960124609411945308295208500576883815068234246288147
3913110540827237163350510684586298239947245938479716304835
356329624224137216L
>>> 'ariediod'[::-1]
'doideira'
>>> "    deu branco no prefixo e no sufixo, limpa com strip ".strip()
'deu branco no prefixo e no sufixo, limpa com strip'
>>> _.startswith("deu")
True
>>> "o rato roeu a roupa do rei de roma".partition("r")
('o ', 'r', 'ato roeu a roupa do rei de roma')
>>> "o rato roeu a roupa do rei de roma".split("r")
['o ', 'ato ', 'oeu a ', 'oupa do ', 'ei de ', 'oma']
>>> "o rato roeu a roupa do rei de roma".split()
['o', 'rato', 'roeu', 'a', 'roupa', 'do', 'rei', 'de', 'roma']
>>> r"W:naoprecisadeescape"
'W:naoprecisadeescape'
>>> type(r"W:naoprecisadeescape")
<type 'str'>
>>> type(u"Unicode")
<type 'unicode'>
>>> print(u"xc3xa2")
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

>>> print(unicode('xc3xa1','iso-8859-1').encode('iso-8859-1'))
á
>>> import codecs, sys
>>> sys.stdout = codecs.lookup('iso-8859-1')[-1](sys.stdout)
>>> print(u"xc3xa1")
á
>>> b"String de 8-bit chars"
         'String de 8-bit chars'




Python 2.6.1              Python 3.1.4
>>> b"Bla"                >>> b"Bla"
'Bla'                     b'Bla'
>>> b"Bla"=="Bla"         >>> type(b"Bla")
True                      <class 'bytes'>
>>> type(b"Bla")          >>> type("Bla")
<type 'str'>              <class 'str'>
                          >>> "Bla"==b"Bla"
                          False
>>> [ord(i) for i in "nulalexsedlex"]
[110, 117, 108, 97, 108, 101, 120, 115, 101, 100, 108, 101, 120]
>>> "".join([chr(i) for i in _])
'nulalexsedlex'
>>> 'lex' in _
True
>>> import string
>>> dir(string)
['Formatter', 'Template', '_TemplateMetaclass', '__builtins__',
'__doc__', '__file__', '__name__', '__package__', '_float', '_idmap',
'_idmapL', '_int', '_long', '_multimap', '_re', 'ascii_letters',
'ascii_lowercase', 'ascii_uppercase', 'atof', 'atof_error', 'atoi',
'atoi_error', 'atol', 'atol_error', 'capitalize', 'capwords', 'center', 'count',
'digits', 'expandtabs', 'find', 'hexdigits', 'index', 'index_error', 'join',
'joinfields', 'letters', 'ljust', 'lower', 'lowercase', 'lstrip', 'maketrans',
'octdigits', 'printable', 'punctuation', 'replace', 'rfind', 'rindex', 'rjust',
'rsplit', 'rstrip', 'split', 'splitfields', 'strip', 'swapcase', 'translate', 'upper',
'uppercase', 'whitespace', 'zfill']
>>> string.hexdigits
'0123456789abcdefABCDEF'
>>> string.punctuation
'!"#$%&'()*+,-./:;<=>?@[]^_`{|}~'
>>> string.maketrans('','')
'x00x01x02x03x04x05x06x07x08tnx0bx0crx0ex0f
x10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"#
$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]
^_`abcdefghijklmnopqrstuvwxyz{|}~x7f
x80x81x82x83x84x85x86x87x88x89x8ax8bx8cx8dx8e
x8fx90x91x92x93x94x95x96x97x98x99x9ax9bx9cx9d
x9ex9fxa0xa1xa2xa3xa4xa5xa6xa7xa8xa9xaaxabxac
xadxaexafxb0xb1xb2xb3xb4xb5xb6xb7xb8xb9xbaxbb
xbcxbdxbexbfxc0xc1xc2xc3xc4xc5xc6xc7xc8xc9xcaxcb
xccxcdxcexcfxd0xd1xd2xd3xd4xd5xd6xd7xd8xd9xdaxdb
xdcxddxdexdfxe0xe1xe2xe3xe4xe5xe6xe7xe8xe9xea
xebxecxedxeexefxf0xf1xf2xf3xf4xf5xf6xf7xf8xf9xfa
xfbxfcxfdxfexff'
>>> def t(x,y): return string.translate(x,string.maketrans('',''),y)
...
>>> t("O rato roeu. O que? A roupa! De quem? Do rei, de roma;",
string.punctuation)
'O rato roeu O que A roupa De quem Do rei de roma'


>>> class Bla(object):
...   def __str__(self):
...       return "Belex"
...   def __repr__(self):
...       return "Bla()"
...
>>> b = Bla()
>>> for i in [b, eval(repr(b))]:
...   print(i, end='t')
...
Belex Belex >>>
>>> class istr(str):
...    pass
>>> for name in 'eq lt le gt ge ne cmp contains'.split():
...    meth = getattr(str, '__%s__' % name)
...   def new_meth(self, param, *args):
...        return meth(self.lower(), param.lower(), *args)
...   setattr(istr, '__%s__'% name, new_meth)
...
>>> istr("SomeCamelCase") == istr("sOmeCaMeLcase")
True
>>> 'Ec' in istr("SomeCamel")
True



                                          Adapted from Python Cookbook
>>> import re
>>> pat = re.compile(re.escape("<strong>"))
>>> re.escape("<strong>")
'<strong>'
>>> pat.sub("_","<strong>Hasta la vista<strong> baby")
'_Hasta la vista_ baby'
>>> date = re.compile(r"(dddd-dd-dd)s(w+)")
>>> date.findall("Em 2011-09-29 PythonBrasil na parada. Em 2010-10-21
curitiba hospedou")
[('2011-09-29', 'PythonBrasil'), ('2010-10-21', 'curitiba')]
$ python -mtimeit -s "import re; n=re.compile(r'abra')" "n.search
('abracadabra')"
1000000 loops, best of 3: 0.306 usec per loop


$ python -mtimeit -s "import re; n=r'abra'" "n in 'abracadabra'"
10000000 loops, best of 3: 0.0591 usec per loop



$ python -mtimeit -s "import re; n=re.compile(r'd+$')" "n.match
('0123456789')"
1000000 loops, best of 3: 0.511 usec per loop


$ python -mtimeit -s "import re" "'0123456789'.isdigit()"10000000
loops, best of 3: 0.0945 usec per loop



                                      Extracted from PyMag Jan 2008
$ python -mtimeit -s 
"import re;r=re.compile('pa|pe|pi|po|pu');h='patapetapitapotapuxa'” 
 "r.search(h)"
1000000 loops, best of 3: 0.383 usec per loop


$ python -mtimeit -s 
"import re;n=['pa','pe','pi','po','pu'];h='patapetapitapotapuxa'"
"any(x in h for x in n)"
1000000 loops, best of 3: 0.914 usec per loop




                                          Extracted from PyMag Jan 2008
from pyparsing import Word, Literal, Combine
import string
def doSum(s,l,tokens):
    return int(tokens[0]) + int(tokens[2])
integer = Word(string.digits)
addition = Combine(integer) + Literal('+') + Combine(integer)
addition.setParseAction(doSum)


>>> addition.parseString("5+7")
([12], {})
import ply.lex as lex
tokens = 'NUMBER', 'PLUS'
t_PLUS = r'+'
def t_NUMBER(t):
   r'd+'
   t.value = int(t.value)
   return t
t_ignore = ' tnw'
def t_error(t): t.lexer.skip(1)
lexer = lex.lex()




                                  Adapted from http://www.dabeaz.com
import ply.yacc as yacc
def p_expression_plus(p):
   'expression : expression PLUS expression'
   p[0] = p[1] + p[3]
def p_factor_num(p):
   'expression : NUMBER'
   p[0] = p[1]
def p_error(p):
   print "Syntax error in input!"
parser = yacc.yacc()




                                     Adapted from http://www.dabeaz.com
>>> parser.parse("1+2 + 45 n + 10")
58
>>> parser.parse("Quanto vale 2 + 7")
9
>>> parser.parse("A soma 2 + 7 resulta em 9")
Syntax error in input!
>>> parser.parse("2 + 7 9")
Syntax error in input!




                                     Adapted from http://www.dabeaz.com
>>> parser.parse("1+2 + 45 n + 10")
58
>>> parser.parse("Quanto vale 2 + 7")
9
>>> parser.parse("A soma 2 + 7 resulta em 9")
Syntax error in input!
>>> parser.parse("2 + 7 9")
Syntax error in input!




                                     Adapted from http://www.dabeaz.com
from nltk.tokenize import sent_tokenize, word_tokenize
msg = “Congratulations to Erico and his team. PythonBrasil gets better
every year. You are now the BiggestKahuna.”
>>> sent_tokenize(msg)
['Congratulations to Erico and his team.', 'PythonBrasil gets better every
year.', 'You are now the BiggestKahuna.']
>>> word_tokenize(msg)
['Congratulations', 'to', 'Erico', 'and', 'his', 'team.', 'PythonBrasil', 'gets',
'better', 'every', 'year.', 'You', 'are', 'now', 'the', 'BiggestKahuna', '.']




                                             Extracted from NLP with Python
>>> def gender_features(word):
...    return {"last_letter": word[-1]}
...
>>> from nltk.corpus import names
>>> len(names.words("male.txt"))
2943
>>> names = ([(name,'male') for name in names.words('male.txt')] +
...        [(name,'female') for name in names.words('female.txt')])
>>> import random
>>> random.shuffle(names)
>>> featuresets = [(gender_features(n),g) for n,g in names]
>>> train_set, test_set = featuresets[500:], featuresets[:500]
>>> classifier = nltk.naiveBayesClassifier.train(train_set)
>>> classifier.classify(gender_features("Dorneles"))
'male'
>>> classifier.classify(gender_features("Magali"))
'female'
                                        Extracted from NLP with Python
Referências
Uma palavra dos patrocinadores...
Obrigado a todos
                         pela atenção.

                            Rodrigo Dias Arruda Senra
                                 http://rodrigo.senra.nom.br
                                      rsenra@acm.org
As opiniões e conclusões expressas nesta apresentação são de exclusiva responsabilidade de Rodrigo Senra.

Não é necessário requisitar permissão do autor para o uso de partes ou do todo desta apresentação, desde que
não sejam feitas alterações no conteúdo reutilizado e que esta nota esteja presente na íntegra no material
resultante.

Imagens e referências para outros trabalhos nesta apresentação permanecem propriedade daqueles que detêm
seus direitos de copyright.

More Related Content

What's hot

Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015Takayuki Shimizukawa
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Charles Nutter
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고용 최
 
«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co
«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co
«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&CoMail.ru Group
 
pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__Renyuan Lyu
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesCharles Nutter
 
Learn Python 3 for absolute beginners
Learn Python 3 for absolute beginnersLearn Python 3 for absolute beginners
Learn Python 3 for absolute beginnersKingsleyAmankwa
 
Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016Steffen Wenz
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for BioinformaticsJosé Héctor Gálvez
 
Boost.Python - domesticating the snake
Boost.Python - domesticating the snakeBoost.Python - domesticating the snake
Boost.Python - domesticating the snakeSławomir Zborowski
 
sizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may mattersizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may matterDawid Weiss
 
Practicing Python 3
Practicing Python 3Practicing Python 3
Practicing Python 3Mosky Liu
 
Learning Python from Data
Learning Python from DataLearning Python from Data
Learning Python from DataMosky Liu
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekingeProf. Wim Van Criekinge
 
Cluj Big Data Meetup - Big Data in Practice
Cluj Big Data Meetup - Big Data in PracticeCluj Big Data Meetup - Big Data in Practice
Cluj Big Data Meetup - Big Data in PracticeSteffen Wenz
 

What's hot (20)

Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고
 
«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co
«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co
«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co
 
pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
 
Learn Python 3 for absolute beginners
Learn Python 3 for absolute beginnersLearn Python 3 for absolute beginners
Learn Python 3 for absolute beginners
 
Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for Bioinformatics
 
Boost.Python - domesticating the snake
Boost.Python - domesticating the snakeBoost.Python - domesticating the snake
Boost.Python - domesticating the snake
 
Don't do this
Don't do thisDon't do this
Don't do this
 
sizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may mattersizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may matter
 
Practicing Python 3
Practicing Python 3Practicing Python 3
Practicing Python 3
 
Python tour
Python tourPython tour
Python tour
 
Learning Python from Data
Learning Python from DataLearning Python from Data
Learning Python from Data
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
 
System Calls
System CallsSystem Calls
System Calls
 
Cluj Big Data Meetup - Big Data in Practice
Cluj Big Data Meetup - Big Data in PracticeCluj Big Data Meetup - Big Data in Practice
Cluj Big Data Meetup - Big Data in Practice
 
TensorFlow XLA RPC
TensorFlow XLA RPCTensorFlow XLA RPC
TensorFlow XLA RPC
 

Viewers also liked

Tech talk about iswc2013
Tech talk about iswc2013Tech talk about iswc2013
Tech talk about iswc2013Rodrigo Senra
 
Show Pyrotécnico - Keynote PythonBrasil[9] 2013
Show Pyrotécnico - Keynote PythonBrasil[9] 2013Show Pyrotécnico - Keynote PythonBrasil[9] 2013
Show Pyrotécnico - Keynote PythonBrasil[9] 2013Rodrigo Senra
 
Depurador onisciente
Depurador oniscienteDepurador onisciente
Depurador oniscienteRodrigo Senra
 
Cases de Python no 7Masters 2012
Cases de Python no 7Masters 2012Cases de Python no 7Masters 2012
Cases de Python no 7Masters 2012Rodrigo Senra
 
Organicer: Organizando informação com Python
Organicer: Organizando informação com PythonOrganicer: Organizando informação com Python
Organicer: Organizando informação com PythonRodrigo Senra
 
Rupy2014 - Show Pyrotécnico
Rupy2014 - Show PyrotécnicoRupy2014 - Show Pyrotécnico
Rupy2014 - Show PyrotécnicoRodrigo Senra
 
Uma breve história no tempo...da computação
Uma breve história no tempo...da computaçãoUma breve história no tempo...da computação
Uma breve história no tempo...da computaçãoRodrigo Senra
 
Python: Cabe no seu bolso, no seu micro, no seu cérebro.
Python: Cabe no seu bolso, no seu micro, no seu cérebro.Python: Cabe no seu bolso, no seu micro, no seu cérebro.
Python: Cabe no seu bolso, no seu micro, no seu cérebro.Rodrigo Senra
 
Brainiak: Um plano maligno de dominação semântica hipermídia
Brainiak: Um plano maligno de dominação semântica hipermídiaBrainiak: Um plano maligno de dominação semântica hipermídia
Brainiak: Um plano maligno de dominação semântica hipermídiaRodrigo Senra
 
Linked data at globo.com
Linked data at globo.comLinked data at globo.com
Linked data at globo.comRodrigo Senra
 
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)Rodrigo Senra
 
Brainiak - uma API REST Hipermedia
Brainiak - uma API REST Hipermedia Brainiak - uma API REST Hipermedia
Brainiak - uma API REST Hipermedia Rodrigo Senra
 
Rest, Gateway e Compiladores
Rest, Gateway e CompiladoresRest, Gateway e Compiladores
Rest, Gateway e CompiladoresRodrigo Senra
 
Python: A Arma Secreta do Cientista de Dados
Python: A Arma Secreta do Cientista de DadosPython: A Arma Secreta do Cientista de Dados
Python: A Arma Secreta do Cientista de DadosRodrigo Senra
 
Python: a arma secreta do Cientista de Dados
Python: a arma secreta do Cientista de DadosPython: a arma secreta do Cientista de Dados
Python: a arma secreta do Cientista de DadosRodrigo Senra
 
Cientista de Dados - A profissão mais sexy do século 21
Cientista de Dados - A profissão mais sexy do século 21Cientista de Dados - A profissão mais sexy do século 21
Cientista de Dados - A profissão mais sexy do século 21Rodrigo Senra
 

Viewers also liked (17)

Tech talk about iswc2013
Tech talk about iswc2013Tech talk about iswc2013
Tech talk about iswc2013
 
Show Pyrotécnico - Keynote PythonBrasil[9] 2013
Show Pyrotécnico - Keynote PythonBrasil[9] 2013Show Pyrotécnico - Keynote PythonBrasil[9] 2013
Show Pyrotécnico - Keynote PythonBrasil[9] 2013
 
Depurador onisciente
Depurador oniscienteDepurador onisciente
Depurador onisciente
 
Cientista de Dados
Cientista de DadosCientista de Dados
Cientista de Dados
 
Cases de Python no 7Masters 2012
Cases de Python no 7Masters 2012Cases de Python no 7Masters 2012
Cases de Python no 7Masters 2012
 
Organicer: Organizando informação com Python
Organicer: Organizando informação com PythonOrganicer: Organizando informação com Python
Organicer: Organizando informação com Python
 
Rupy2014 - Show Pyrotécnico
Rupy2014 - Show PyrotécnicoRupy2014 - Show Pyrotécnico
Rupy2014 - Show Pyrotécnico
 
Uma breve história no tempo...da computação
Uma breve história no tempo...da computaçãoUma breve história no tempo...da computação
Uma breve história no tempo...da computação
 
Python: Cabe no seu bolso, no seu micro, no seu cérebro.
Python: Cabe no seu bolso, no seu micro, no seu cérebro.Python: Cabe no seu bolso, no seu micro, no seu cérebro.
Python: Cabe no seu bolso, no seu micro, no seu cérebro.
 
Brainiak: Um plano maligno de dominação semântica hipermídia
Brainiak: Um plano maligno de dominação semântica hipermídiaBrainiak: Um plano maligno de dominação semântica hipermídia
Brainiak: Um plano maligno de dominação semântica hipermídia
 
Linked data at globo.com
Linked data at globo.comLinked data at globo.com
Linked data at globo.com
 
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
 
Brainiak - uma API REST Hipermedia
Brainiak - uma API REST Hipermedia Brainiak - uma API REST Hipermedia
Brainiak - uma API REST Hipermedia
 
Rest, Gateway e Compiladores
Rest, Gateway e CompiladoresRest, Gateway e Compiladores
Rest, Gateway e Compiladores
 
Python: A Arma Secreta do Cientista de Dados
Python: A Arma Secreta do Cientista de DadosPython: A Arma Secreta do Cientista de Dados
Python: A Arma Secreta do Cientista de Dados
 
Python: a arma secreta do Cientista de Dados
Python: a arma secreta do Cientista de DadosPython: a arma secreta do Cientista de Dados
Python: a arma secreta do Cientista de Dados
 
Cientista de Dados - A profissão mais sexy do século 21
Cientista de Dados - A profissão mais sexy do século 21Cientista de Dados - A profissão mais sexy do século 21
Cientista de Dados - A profissão mais sexy do século 21
 

Similar to pa-pe-pi-po-pure Python Text Processing

Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonOSCON Byrum
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesMatt Harrison
 
Stupid Awesome Python Tricks
Stupid Awesome Python TricksStupid Awesome Python Tricks
Stupid Awesome Python TricksBryan Helmig
 
Python 내장 함수
Python 내장 함수Python 내장 함수
Python 내장 함수용 최
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsMichael Pirnat
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQLPeter Eisentraut
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리용 최
 
Python basic
Python basic Python basic
Python basic sewoo lee
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonKHNOG
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadiesAlicia Pérez
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 
Τα Πολύ Βασικά για την Python
Τα Πολύ Βασικά για την PythonΤα Πολύ Βασικά για την Python
Τα Πολύ Βασικά για την PythonMoses Boudourides
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsagniklal
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientistsLambda Tree
 
Clojure: Simple By Design
Clojure: Simple By DesignClojure: Simple By Design
Clojure: Simple By DesignAll Things Open
 

Similar to pa-pe-pi-po-pure Python Text Processing (20)

Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Stupid Awesome Python Tricks
Stupid Awesome Python TricksStupid Awesome Python Tricks
Stupid Awesome Python Tricks
 
Python 내장 함수
Python 내장 함수Python 내장 함수
Python 내장 함수
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQL
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리
 
Python basic
Python basic Python basic
Python basic
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Python 1
Python 1Python 1
Python 1
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadies
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Python basic
Python basicPython basic
Python basic
 
Τα Πολύ Βασικά για την Python
Τα Πολύ Βασικά για την PythonΤα Πολύ Βασικά για την Python
Τα Πολύ Βασικά για την Python
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsag
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Clojure: Simple By Design
Clojure: Simple By DesignClojure: Simple By Design
Clojure: Simple By Design
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

pa-pe-pi-po-pure Python Text Processing

  • 1. pa-pe-pi-po- Pure Python Text Processing Rodrigo Senra rsenra@acm.org PythonBrasil[7] - São Paulo
  • 2. Anatomia do Blá • Eu, Vocês e Python • retrospectiva PythonBrasil[7] anos! • pa-pe-pi-po-pure python text processing • referências • 1 palavra dos patrocinadores
  • 3. Quem está aí ? ✓Profissionais de Informática ✓Desenvolvedores ✓Estudantes ✓Professores ✓1ª vez na PyConBrasil ✓Membros APyBr • Nenhuma resposta acima!
  • 4. Cenas dos últimos capítulos... [1] 2005 - BigKahuna [2] 2006 - Show Pyrotécnico Iteradores, Geradores,Hooks,Decoradores [3] 2007 - Show Pyrotécnico II Routing, RTSP, Twisted, GIS [4] 2008 - ISIS-NBP Bibliotecas Digitais [5] 2009 - Rest, Gtw e Compiladores SFC(Rede Petri) + ST(Pascal) > Ladder [5] 2010 - Potter vs Voldemort: Lições ofidiglotas da prática pythonica
  • 5. >>> type("bla") <type 'str'> >>> "".join(['pa',"pe",'''pi''',"""po"""]) 'papepipo' >>> str(2**1024)[100:120] '21120113879871393357' >>> 2**1024 1797693134862315907729305190789024733617976978942306572734 30081157732675805500963132708477322407536021120113879871393 3576587897688144166224928474306394741243777678934248654852 7630221960124609411945308295208500576883815068234246288147 3913110540827237163350510684586298239947245938479716304835 356329624224137216L >>> 'ariediod'[::-1] 'doideira'
  • 6. >>> " deu branco no prefixo e no sufixo, limpa com strip ".strip() 'deu branco no prefixo e no sufixo, limpa com strip' >>> _.startswith("deu") True >>> "o rato roeu a roupa do rei de roma".partition("r") ('o ', 'r', 'ato roeu a roupa do rei de roma') >>> "o rato roeu a roupa do rei de roma".split("r") ['o ', 'ato ', 'oeu a ', 'oupa do ', 'ei de ', 'oma'] >>> "o rato roeu a roupa do rei de roma".split() ['o', 'rato', 'roeu', 'a', 'roupa', 'do', 'rei', 'de', 'roma']
  • 7. >>> r"W:naoprecisadeescape" 'W:naoprecisadeescape' >>> type(r"W:naoprecisadeescape") <type 'str'> >>> type(u"Unicode") <type 'unicode'> >>> print(u"xc3xa2") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) >>> print(unicode('xc3xa1','iso-8859-1').encode('iso-8859-1')) á >>> import codecs, sys >>> sys.stdout = codecs.lookup('iso-8859-1')[-1](sys.stdout) >>> print(u"xc3xa1") á
  • 8. >>> b"String de 8-bit chars" 'String de 8-bit chars' Python 2.6.1 Python 3.1.4 >>> b"Bla" >>> b"Bla" 'Bla' b'Bla' >>> b"Bla"=="Bla" >>> type(b"Bla") True <class 'bytes'> >>> type(b"Bla") >>> type("Bla") <type 'str'> <class 'str'> >>> "Bla"==b"Bla" False
  • 9. >>> [ord(i) for i in "nulalexsedlex"] [110, 117, 108, 97, 108, 101, 120, 115, 101, 100, 108, 101, 120] >>> "".join([chr(i) for i in _]) 'nulalexsedlex' >>> 'lex' in _ True >>> import string >>> dir(string) ['Formatter', 'Template', '_TemplateMetaclass', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_float', '_idmap', '_idmapL', '_int', '_long', '_multimap', '_re', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'atof', 'atof_error', 'atoi', 'atoi_error', 'atol', 'atol_error', 'capitalize', 'capwords', 'center', 'count', 'digits', 'expandtabs', 'find', 'hexdigits', 'index', 'index_error', 'join', 'joinfields', 'letters', 'ljust', 'lower', 'lowercase', 'lstrip', 'maketrans', 'octdigits', 'printable', 'punctuation', 'replace', 'rfind', 'rindex', 'rjust', 'rsplit', 'rstrip', 'split', 'splitfields', 'strip', 'swapcase', 'translate', 'upper', 'uppercase', 'whitespace', 'zfill']
  • 10. >>> string.hexdigits '0123456789abcdefABCDEF' >>> string.punctuation '!"#$%&'()*+,-./:;<=>?@[]^_`{|}~' >>> string.maketrans('','') 'x00x01x02x03x04x05x06x07x08tnx0bx0crx0ex0f x10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"# $%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[] ^_`abcdefghijklmnopqrstuvwxyz{|}~x7f x80x81x82x83x84x85x86x87x88x89x8ax8bx8cx8dx8e x8fx90x91x92x93x94x95x96x97x98x99x9ax9bx9cx9d x9ex9fxa0xa1xa2xa3xa4xa5xa6xa7xa8xa9xaaxabxac xadxaexafxb0xb1xb2xb3xb4xb5xb6xb7xb8xb9xbaxbb xbcxbdxbexbfxc0xc1xc2xc3xc4xc5xc6xc7xc8xc9xcaxcb xccxcdxcexcfxd0xd1xd2xd3xd4xd5xd6xd7xd8xd9xdaxdb xdcxddxdexdfxe0xe1xe2xe3xe4xe5xe6xe7xe8xe9xea xebxecxedxeexefxf0xf1xf2xf3xf4xf5xf6xf7xf8xf9xfa xfbxfcxfdxfexff'
  • 11. >>> def t(x,y): return string.translate(x,string.maketrans('',''),y) ... >>> t("O rato roeu. O que? A roupa! De quem? Do rei, de roma;", string.punctuation) 'O rato roeu O que A roupa De quem Do rei de roma' >>> class Bla(object): ... def __str__(self): ... return "Belex" ... def __repr__(self): ... return "Bla()" ... >>> b = Bla() >>> for i in [b, eval(repr(b))]: ... print(i, end='t') ... Belex Belex >>>
  • 12. >>> class istr(str): ... pass >>> for name in 'eq lt le gt ge ne cmp contains'.split(): ... meth = getattr(str, '__%s__' % name) ... def new_meth(self, param, *args): ... return meth(self.lower(), param.lower(), *args) ... setattr(istr, '__%s__'% name, new_meth) ... >>> istr("SomeCamelCase") == istr("sOmeCaMeLcase") True >>> 'Ec' in istr("SomeCamel") True Adapted from Python Cookbook
  • 13. >>> import re >>> pat = re.compile(re.escape("<strong>")) >>> re.escape("<strong>") '<strong>' >>> pat.sub("_","<strong>Hasta la vista<strong> baby") '_Hasta la vista_ baby' >>> date = re.compile(r"(dddd-dd-dd)s(w+)") >>> date.findall("Em 2011-09-29 PythonBrasil na parada. Em 2010-10-21 curitiba hospedou") [('2011-09-29', 'PythonBrasil'), ('2010-10-21', 'curitiba')]
  • 14. $ python -mtimeit -s "import re; n=re.compile(r'abra')" "n.search ('abracadabra')" 1000000 loops, best of 3: 0.306 usec per loop $ python -mtimeit -s "import re; n=r'abra'" "n in 'abracadabra'" 10000000 loops, best of 3: 0.0591 usec per loop $ python -mtimeit -s "import re; n=re.compile(r'd+$')" "n.match ('0123456789')" 1000000 loops, best of 3: 0.511 usec per loop $ python -mtimeit -s "import re" "'0123456789'.isdigit()"10000000 loops, best of 3: 0.0945 usec per loop Extracted from PyMag Jan 2008
  • 15. $ python -mtimeit -s "import re;r=re.compile('pa|pe|pi|po|pu');h='patapetapitapotapuxa'” "r.search(h)" 1000000 loops, best of 3: 0.383 usec per loop $ python -mtimeit -s "import re;n=['pa','pe','pi','po','pu'];h='patapetapitapotapuxa'" "any(x in h for x in n)" 1000000 loops, best of 3: 0.914 usec per loop Extracted from PyMag Jan 2008
  • 16. from pyparsing import Word, Literal, Combine import string def doSum(s,l,tokens): return int(tokens[0]) + int(tokens[2]) integer = Word(string.digits) addition = Combine(integer) + Literal('+') + Combine(integer) addition.setParseAction(doSum) >>> addition.parseString("5+7") ([12], {})
  • 17. import ply.lex as lex tokens = 'NUMBER', 'PLUS' t_PLUS = r'+' def t_NUMBER(t): r'd+' t.value = int(t.value) return t t_ignore = ' tnw' def t_error(t): t.lexer.skip(1) lexer = lex.lex() Adapted from http://www.dabeaz.com
  • 18. import ply.yacc as yacc def p_expression_plus(p): 'expression : expression PLUS expression' p[0] = p[1] + p[3] def p_factor_num(p): 'expression : NUMBER' p[0] = p[1] def p_error(p): print "Syntax error in input!" parser = yacc.yacc() Adapted from http://www.dabeaz.com
  • 19. >>> parser.parse("1+2 + 45 n + 10") 58 >>> parser.parse("Quanto vale 2 + 7") 9 >>> parser.parse("A soma 2 + 7 resulta em 9") Syntax error in input! >>> parser.parse("2 + 7 9") Syntax error in input! Adapted from http://www.dabeaz.com
  • 20. >>> parser.parse("1+2 + 45 n + 10") 58 >>> parser.parse("Quanto vale 2 + 7") 9 >>> parser.parse("A soma 2 + 7 resulta em 9") Syntax error in input! >>> parser.parse("2 + 7 9") Syntax error in input! Adapted from http://www.dabeaz.com
  • 21. from nltk.tokenize import sent_tokenize, word_tokenize msg = “Congratulations to Erico and his team. PythonBrasil gets better every year. You are now the BiggestKahuna.” >>> sent_tokenize(msg) ['Congratulations to Erico and his team.', 'PythonBrasil gets better every year.', 'You are now the BiggestKahuna.'] >>> word_tokenize(msg) ['Congratulations', 'to', 'Erico', 'and', 'his', 'team.', 'PythonBrasil', 'gets', 'better', 'every', 'year.', 'You', 'are', 'now', 'the', 'BiggestKahuna', '.'] Extracted from NLP with Python
  • 22. >>> def gender_features(word): ... return {"last_letter": word[-1]} ... >>> from nltk.corpus import names >>> len(names.words("male.txt")) 2943 >>> names = ([(name,'male') for name in names.words('male.txt')] + ... [(name,'female') for name in names.words('female.txt')]) >>> import random >>> random.shuffle(names) >>> featuresets = [(gender_features(n),g) for n,g in names] >>> train_set, test_set = featuresets[500:], featuresets[:500] >>> classifier = nltk.naiveBayesClassifier.train(train_set) >>> classifier.classify(gender_features("Dorneles")) 'male' >>> classifier.classify(gender_features("Magali")) 'female' Extracted from NLP with Python
  • 24. Uma palavra dos patrocinadores...
  • 25. Obrigado a todos pela atenção. Rodrigo Dias Arruda Senra http://rodrigo.senra.nom.br rsenra@acm.org As opiniões e conclusões expressas nesta apresentação são de exclusiva responsabilidade de Rodrigo Senra. Não é necessário requisitar permissão do autor para o uso de partes ou do todo desta apresentação, desde que não sejam feitas alterações no conteúdo reutilizado e que esta nota esteja presente na íntegra no material resultante. Imagens e referências para outros trabalhos nesta apresentação permanecem propriedade daqueles que detêm seus direitos de copyright.