SlideShare a Scribd company logo
1 of 51
Introduction to Python
Chen LinChen Lin
clin@brandeis.educlin@brandeis.edu
COSI 134aCOSI 134a
Volen 110Volen 110
Office Hour: Thurs. 3-5Office Hour: Thurs. 3-5
For More Information?
http://python.org/
- documentation, tutorials, beginners guide, core
distribution, ...
Books include:
 Learning Python by Mark Lutz
 Python Essential Reference by David Beazley
 Python Cookbook, ed. by Martelli, Ravenscroft and
Ascher
 (online at
http://code.activestate.com/recipes/langs/python/)
 http://wiki.python.org/moin/PythonBooks
Python VideosPython Videos
http://showmedo.com/videotutorials/python
“5 Minute Overview (What Does Python
Look Like?)”
“Introducing the PyDev IDE for Eclipse”
“Linear Algebra with Numpy”
And many more
4 Major Versions of Python4 Major Versions of Python
“Python” or “CPython” is written in C/C++
- Version 2.7 came out in mid-2010
- Version 3.1.2 came out in early 2010
“Jython” is written in Java for the JVM
“IronPython” is written in C# for the .Net
environment
Go To Website
Development EnvironmentsDevelopment Environments
what IDE to use?what IDE to use? http://stackoverflow.com/questions/81584http://stackoverflow.com/questions/81584
1. PyDev with Eclipse
2. Komodo
3. Emacs
4. Vim
5. TextMate
6. Gedit
7. Idle
8. PIDA (Linux)(VIM Based)
9. NotePad++ (Windows)
10.BlueFish (Linux)
Pydev with EclipsePydev with Eclipse
Python Interactive ShellPython Interactive Shell
% python% python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.Type "help", "copyright", "credits" or "license" for more information.
>>>>>>
You can type things directly into a running Python sessionYou can type things directly into a running Python session
>>> 2+3*4>>> 2+3*4
1414
>>> name = "Andrew">>> name = "Andrew"
>>> name>>> name
'Andrew''Andrew'
>>> print "Hello", name>>> print "Hello", name
Hello AndrewHello Andrew
>>>>>>
 BackgroundBackground
 Data Types/StructureData Types/Structure
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK
ListList
A compound data type:A compound data type:
[0][0]
[2.3, 4.5][2.3, 4.5]
[5, "Hello", "there", 9.8][5, "Hello", "there", 9.8]
[][]
Use len() to get the length of a listUse len() to get the length of a list
>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]
>>> len(names)>>> len(names)
33
Use [ ] to index items in the listUse [ ] to index items in the list
>>> names[0]>>> names[0]
‘‘Ben'Ben'
>>> names[1]>>> names[1]
‘‘Chen'Chen'
>>> names[2]>>> names[2]
‘‘Yaqin'Yaqin'
>>> names[3]>>> names[3]
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
IndexError: list index out of rangeIndexError: list index out of range
>>> names[-1]>>> names[-1]
‘‘Yaqin'Yaqin'
>>> names[-2]>>> names[-2]
‘‘Chen'Chen'
>>> names[-3]>>> names[-3]
‘‘Ben'Ben'
[0] is the first item.
[1] is the second item
...
Out of range values
raise an exception
Negative values
go backwards from
the last element.
Strings share many features with listsStrings share many features with lists
>>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles[0]>>> smiles[0]
'C''C'
>>> smiles[1]>>> smiles[1]
'(''('
>>> smiles[-1]>>> smiles[-1]
'O''O'
>>> smiles[1:5]>>> smiles[1:5]
'(=N)''(=N)'
>>> smiles[10:-4]>>> smiles[10:-4]
'C(=O)''C(=O)'
Use “slice” notation to
get a substring
String Methods: find, splitString Methods: find, split
smiles = "C(=N)(N)N.C(=O)(O)O"smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles.find("(O)")>>> smiles.find("(O)")
1515
>>> smiles.find(".")>>> smiles.find(".")
99
>>> smiles.find(".", 10)>>> smiles.find(".", 10)
-1-1
>>> smiles.split(".")>>> smiles.split(".")
['C(=N)(N)N', 'C(=O)(O)O']['C(=N)(N)N', 'C(=O)(O)O']
>>>>>>
Use “find” to find the
start of a substring.
Start looking at position 10.
Find returns -1 if it couldn’t
find a match.
Split the string into parts
with “.” as the delimiter
String operators: in, not inString operators: in, not in
if "Br" in “Brother”:if "Br" in “Brother”:
print "contains brother“print "contains brother“
email_address = “clin”email_address = “clin”
if "@" not in email_address:if "@" not in email_address:
email_address += "@brandeis.edu“email_address += "@brandeis.edu“
String Method: “strip”, “rstrip”, “lstrip” are ways toString Method: “strip”, “rstrip”, “lstrip” are ways to
remove whitespace or selected charactersremove whitespace or selected characters
>>> line = " # This is a comment line n">>> line = " # This is a comment line n"
>>> line.strip()>>> line.strip()
'# This is a comment line''# This is a comment line'
>>> line.rstrip()>>> line.rstrip()
' # This is a comment line'' # This is a comment line'
>>> line.rstrip("n")>>> line.rstrip("n")
' # This is a comment line '' # This is a comment line '
>>>>>>
More String methodsMore String methods
email.startswith(“c") endswith(“u”)email.startswith(“c") endswith(“u”)
True/FalseTrue/False
>>> "%s@brandeis.edu" % "clin">>> "%s@brandeis.edu" % "clin"
'clin@brandeis.edu''clin@brandeis.edu'
>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]
>>> ", ".join(names)>>> ", ".join(names)
‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘
>>> “chen".upper()>>> “chen".upper()
‘‘CHEN'CHEN'
Unexpected things about stringsUnexpected things about strings
>>> s = "andrew">>> s = "andrew"
>>> s[0] = "A">>> s[0] = "A"
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support itemTypeError: 'str' object does not support item
assignmentassignment
>>> s = "A" + s[1:]>>> s = "A" + s[1:]
>>> s>>> s
'Andrew‘'Andrew‘
Strings are read only
““” is for special characters” is for special characters
n -> newlinen -> newline
t -> tabt -> tab
 -> backslash -> backslash
......
But Windows uses backslash for directories!
filename = "M:nickel_projectreactive.smi" # DANGER!
filename = "M:nickel_projectreactive.smi" # Better!
filename = "M:/nickel_project/reactive.smi" # Usually works
Lists are mutable - some usefulLists are mutable - some useful
methodsmethods
>>> ids = ["9pti", "2plv", "1crn"]>>> ids = ["9pti", "2plv", "1crn"]
>>> ids.append("1alm")>>> ids.append("1alm")
>>> ids>>> ids
['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm']
>>>ids.extend(L)>>>ids.extend(L)
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
>>> del ids[0]>>> del ids[0]
>>> ids>>> ids
['2plv', '1crn', '1alm']['2plv', '1crn', '1alm']
>>> ids.sort()>>> ids.sort()
>>> ids>>> ids
['1alm', '1crn', '2plv']['1alm', '1crn', '2plv']
>>> ids.reverse()>>> ids.reverse()
>>> ids>>> ids
['2plv', '1crn', '1alm']['2plv', '1crn', '1alm']
>>> ids.insert(0, "9pti")>>> ids.insert(0, "9pti")
>>> ids>>> ids
['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm']
append an element
remove an element
sort by default order
reverse the elements in a list
insert an element at some
specified position.
(Slower than .append())
Tuples:Tuples: sort of an immutable list
>>> yellow = (255, 255, 0) # r, g, b>>> yellow = (255, 255, 0) # r, g, b
>>> one = (1,)>>> one = (1,)
>>> yellow[0]>>> yellow[0]
>>> yellow[1:]>>> yellow[1:]
(255, 0)(255, 0)
>>> yellow[0] = 0>>> yellow[0] = 0
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment
Very common in string interpolation:
>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056)
'Andrew lives in Sweden at latitude 57.7'
zipping lists togetherzipping lists together
>>> names>>> names
['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']
>>> gender =>>> gender = [0, 0, 1][0, 0, 1]
>>> zip(names, gender)>>> zip(names, gender)
[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]
DictionariesDictionaries
 Dictionaries are lookup tables.
 They map from a “key” to a “value”.
symbol_to_name = {
"H": "hydrogen",
"He": "helium",
"Li": "lithium",
"C": "carbon",
"O": "oxygen",
"N": "nitrogen"
}
 Duplicate keys are not allowed
 Duplicate values are just fine
Keys can be any immutable valueKeys can be any immutable value
numbers, strings, tuples, frozensetnumbers, strings, tuples, frozenset,,
not list, dictionary, set, ...not list, dictionary, set, ...
atomic_number_to_name = {atomic_number_to_name = {
1: "hydrogen"1: "hydrogen"
6: "carbon",6: "carbon",
7: "nitrogen"7: "nitrogen"
8: "oxygen",8: "oxygen",
}}
nobel_prize_winners = {nobel_prize_winners = {
(1979, "physics"): ["Glashow", "Salam", "Weinberg"],(1979, "physics"): ["Glashow", "Salam", "Weinberg"],
(1962, "chemistry"): ["Hodgkin"],(1962, "chemistry"): ["Hodgkin"],
(1984, "biology"): ["McClintock"],(1984, "biology"): ["McClintock"],
}}
A set is an unordered collection
with no duplicate elements.
DictionaryDictionary
>>> symbol_to_name["C"]>>> symbol_to_name["C"]
'carbon''carbon'
>>> "O" in symbol_to_name, "U" in symbol_to_name>>> "O" in symbol_to_name, "U" in symbol_to_name
(True, False)(True, False)
>>> "oxygen" in symbol_to_name>>> "oxygen" in symbol_to_name
FalseFalse
>>> symbol_to_name["P"]>>> symbol_to_name["P"]
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
KeyError: 'P'KeyError: 'P'
>>> symbol_to_name.get("P", "unknown")>>> symbol_to_name.get("P", "unknown")
'unknown''unknown'
>>> symbol_to_name.get("C", "unknown")>>> symbol_to_name.get("C", "unknown")
'carbon''carbon'
Get the value for a given key
Test if the key exists
(“in” only checks the keys,
not the values.)
[] lookup failures raise an exception.
Use “.get()” if you want
to return a default value.
Some useful dictionary methodsSome useful dictionary methods
>>> symbol_to_name.keys()>>> symbol_to_name.keys()
['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He']
>>> symbol_to_name.values()>>> symbol_to_name.values()
['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']
>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )
>>> symbol_to_name.items()>>> symbol_to_name.items()
[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',
'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]
>>> del symbol_to_name['C']>>> del symbol_to_name['C']
>>> symbol_to_name>>> symbol_to_name
{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}
 BackgroundBackground
 Data Types/StructureData Types/Structure
list, string, tuple, dictionarylist, string, tuple, dictionary
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK
Control FlowControl Flow
Things that are FalseThings that are False
 The boolean value False
 The numbers 0 (integer), 0.0 (float) and 0j (complex).
 The empty string "".
 The empty list [], empty dictionary {} and empty set set().
Things that are TrueThings that are True
 The boolean value TrueThe boolean value True
 All non-zero numbers.All non-zero numbers.
 Any string containing at least one character.Any string containing at least one character.
 A non-empty data structure.A non-empty data structure.
IfIf
>>> smiles = "BrC1=CC=C(C=C1)NN.Cl">>> smiles = "BrC1=CC=C(C=C1)NN.Cl"
>>> bool(smiles)>>> bool(smiles)
TrueTrue
>>> not bool(smiles)>>> not bool(smiles)
FalseFalse
>>> if not smiles>>> if not smiles::
... print "The SMILES string is empty"... print "The SMILES string is empty"
......
 The “else” case is always optional
Use “elif” to chain subsequent testsUse “elif” to chain subsequent tests
>>> mode = "absolute">>> mode = "absolute"
>>> if mode == "canonical":>>> if mode == "canonical":
...... smiles = "canonical"smiles = "canonical"
... elif mode == "isomeric":... elif mode == "isomeric":
...... smiles = "isomeric”smiles = "isomeric”
...... elif mode == "absolute":elif mode == "absolute":
...... smiles = "absolute"smiles = "absolute"
... else:... else:
...... raise TypeError("unknown mode")raise TypeError("unknown mode")
......
>>> smiles>>> smiles
' absolute '' absolute '
>>>>>>
“raise” is the Python way to raise exceptions
Boolean logicBoolean logic
Python expressions can have “and”s andPython expressions can have “and”s and
“or”s:“or”s:
if (benif (ben <=<= 5 and chen5 and chen >=>= 10 or10 or
chenchen ==== 500 and ben500 and ben !=!= 5):5):
print “Ben and Chen“print “Ben and Chen“
Range TestRange Test
if (3if (3 <= Time <=<= Time <= 5):5):
print “Office Hour"print “Office Hour"
ForFor
>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]
>>> for name in names:>>> for name in names:
...... print smilesprint smiles
......
BenBen
ChenChen
YaqinYaqin
Tuple assignment in for loopsTuple assignment in for loops
data = [ ("C20H20O3", 308.371),data = [ ("C20H20O3", 308.371),
("C22H20O2", 316.393),("C22H20O2", 316.393),
("C24H40N4O2", 416.6),("C24H40N4O2", 416.6),
("C14H25N5O3", 311.38),("C14H25N5O3", 311.38),
("C15H20O2", 232.3181)]("C15H20O2", 232.3181)]
forfor (formula, mw)(formula, mw) in data:in data:
print "The molecular weight of %s is %s" % (formula, mw)print "The molecular weight of %s is %s" % (formula, mw)
The molecular weight of C20H20O3 is 308.371The molecular weight of C20H20O3 is 308.371
The molecular weight of C22H20O2 is 316.393The molecular weight of C22H20O2 is 316.393
The molecular weight of C24H40N4O2 is 416.6The molecular weight of C24H40N4O2 is 416.6
The molecular weight of C14H25N5O3 is 311.38The molecular weight of C14H25N5O3 is 311.38
The molecular weight of C15H20O2 is 232.3181The molecular weight of C15H20O2 is 232.3181
Break, continueBreak, continue
>>> for value in [3, 1, 4, 1, 5, 9, 2]:>>> for value in [3, 1, 4, 1, 5, 9, 2]:
...... print "Checking", valueprint "Checking", value
...... if value > 8:if value > 8:
...... print "Exiting for loop"print "Exiting for loop"
...... breakbreak
...... elif value < 3:elif value < 3:
...... print "Ignoring"print "Ignoring"
...... continuecontinue
...... print "The square is", value**2print "The square is", value**2
......
Use “break” to stopUse “break” to stop
the for loopthe for loop
Use “continue” to stopUse “continue” to stop
processing the current itemprocessing the current item
Checking 3Checking 3
The square is 9The square is 9
Checking 1Checking 1
IgnoringIgnoring
Checking 4Checking 4
The square is 16The square is 16
Checking 1Checking 1
IgnoringIgnoring
Checking 5Checking 5
The square is 25The square is 25
Checking 9Checking 9
Exiting for loopExiting for loop
>>>>>>
Range()Range()
 ““range” creates a list of numbers in a specified rangerange” creates a list of numbers in a specified range
 range([start,] stop[, step]) -> list of integersrange([start,] stop[, step]) -> list of integers
 When step is given, it specifies the increment (or decrement).When step is given, it specifies the increment (or decrement).
>>> range(5)>>> range(5)
[0, 1, 2, 3, 4][0, 1, 2, 3, 4]
>>> range(5, 10)>>> range(5, 10)
[5, 6, 7, 8, 9][5, 6, 7, 8, 9]
>>> range(0, 10, 2)>>> range(0, 10, 2)
[0, 2, 4, 6, 8][0, 2, 4, 6, 8]
How to get every second element in a list?
for i in range(0, len(data), 2):
print data[i]
 BackgroundBackground
 Data Types/StructureData Types/Structure
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK
Reading filesReading files
>>> f = open(“names.txt")>>> f = open(“names.txt")
>>> f.readline()>>> f.readline()
'Yaqinn''Yaqinn'
Quick WayQuick Way
>>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst>>> lst
['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office
Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',
'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:
Tues. 3-5n']Tues. 3-5n']
Ignore the header?Ignore the header?
for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):
if i == 0: continueif i == 0: continue
print lineprint line
Using dictionaries to countUsing dictionaries to count
occurrencesoccurrences
>>> for line in open('names.txt'):>>> for line in open('names.txt'):
...... name = line.strip()name = line.strip()
...... name_count[name] = name_count.get(name,0)+ 1name_count[name] = name_count.get(name,0)+ 1
......
>>> for (name, count) in name_count.items():>>> for (name, count) in name_count.items():
...... print name, countprint name, count
......
Chen 3Chen 3
Ben 3Ben 3
Yaqin 3Yaqin 3
File OutputFile Output
input_file = open(“in.txt")input_file = open(“in.txt")
output_file = open(“out.txt", "w")output_file = open(“out.txt", "w")
for line in input_file:for line in input_file:
output_file.write(line)output_file.write(line)
“w” = “write mode”
“a” = “append mode”
“wb” = “write in binary”
“r” = “read mode” (default)
“rb” = “read in binary”
“U” = “read files with Unix
or Windows line endings”
 BackgroundBackground
 Data Types/StructureData Types/Structure
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK
ModulesModules
When a Python program starts it only has
access to a basic functions and classes.
(“int”, “dict”, “len”, “sum”, “range”, ...)
“Modules” contain additional functionality.
Use “import” to tell Python to load a
module.
>>> import math
>>> import nltk
import the math moduleimport the math module
>>> import math>>> import math
>>> math.pi>>> math.pi
3.14159265358979313.1415926535897931
>>> math.cos(0)>>> math.cos(0)
1.01.0
>>> math.cos(math.pi)>>> math.cos(math.pi)
-1.0-1.0
>>> dir(math)>>> dir(math)
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',
'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos',
'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod',
'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10',
'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan',
'tanh', 'trunc']'tanh', 'trunc']
>>> help(math)>>> help(math)
>>> help(math.cos)>>> help(math.cos)
““import” and “from ... import ...”import” and “from ... import ...”
>>> import math>>> import math
math.cosmath.cos
>>> from math import cos, pi
cos
>>> from math import *
 BackgroundBackground
 Data Types/StructureData Types/Structure
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK
ClassesClasses
class ClassName(object):class ClassName(object):
<statement-1><statement-1>
. . .. . .
<statement-N><statement-N>
class MyClass(object):class MyClass(object):
"""A simple example class""""""A simple example class"""
i = 1234512345
def f(self):def f(self):
return self.ireturn self.i
class DerivedClassName(BaseClassName):class DerivedClassName(BaseClassName):
<statement-1><statement-1>
. . .. . .
<statement-N><statement-N>
 BackgroundBackground
 Data Types/StructureData Types/Structure
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK
http://www.nltk.org/bookhttp://www.nltk.org/book
NLTK is on berry patch machines!NLTK is on berry patch machines!
>>>from nltk.book import *>>>from nltk.book import *
>>> text1>>> text1
<Text: Moby Dick by Herman Melville 1851><Text: Moby Dick by Herman Melville 1851>
>>> text1.name>>> text1.name
'Moby Dick by Herman Melville 1851''Moby Dick by Herman Melville 1851'
>>> text1.concordance("monstrous")>>> text1.concordance("monstrous")
>>> dir(text1)>>> dir(text1)
>>> text1.tokens>>> text1.tokens
>>> text1.index("my")>>> text1.index("my")
46474647
>>> sent2>>> sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',
'Sussex', '.']'Sussex', '.']
Classify TextClassify Text
>>> def gender_features(word):>>> def gender_features(word):
...... return {'last_letter': word[-1]}return {'last_letter': word[-1]}
>>> gender_features('Shrek')>>> gender_features('Shrek')
{'last_letter': 'k'}{'last_letter': 'k'}
>>> from nltk.corpus import names>>> from nltk.corpus import names
>>> import random>>> import random
>>> names = ([(name, 'male') for name in names.words('male.txt')] +>>> names = ([(name, 'male') for name in names.words('male.txt')] +
... [(name, 'female') for name in names.words('female.txt')])... [(name, 'female') for name in names.words('female.txt')])
>>> random.shuffle(names)>>> random.shuffle(names)
Featurize, train, test, predictFeaturize, train, test, predict
>>> featuresets = [(gender_features(n), g) for (n,g) in names]>>> featuresets = [(gender_features(n), g) for (n,g) in names]
>>> train_set, test_set = featuresets[500:], featuresets[:500]>>> train_set, test_set = featuresets[500:], featuresets[:500]
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
>>> print nltk.classify.accuracy(classifier, test_set)>>> print nltk.classify.accuracy(classifier, test_set)
0.7260.726
>>> classifier.classify(gender_features('Neo'))>>> classifier.classify(gender_features('Neo'))
'male''male'
fromfrom nltknltk.corpus import.corpus import reutersreuters
 Reuters Corpus:Reuters Corpus:10,788 news10,788 news
1.3 million words.1.3 million words.
 Been classified intoBeen classified into 9090 topicstopics
 Grouped into 2 sets, "training" and "test“Grouped into 2 sets, "training" and "test“
 Categories overlap with each otherCategories overlap with each other
http://nltk.googlecode.com/svn/trunk/doc/bohttp://nltk.googlecode.com/svn/trunk/doc/bo
ok/ch02.htmlok/ch02.html
ReutersReuters
>>> from nltk.corpus import reuters>>> from nltk.corpus import reuters
>>> reuters.fileids()>>> reuters.fileids()
['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]
>>> reuters.categories()>>> reuters.categories()
['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',
'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-
oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]

More Related Content

What's hot

Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
amit kuraria
 
Template Haskell Tutorial
Template Haskell TutorialTemplate Haskell Tutorial
Template Haskell Tutorial
kizzx2
 
groovy rules
groovy rulesgroovy rules
groovy rules
Paul King
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammers
Giovanni924
 

What's hot (20)

Round PEG, Round Hole - Parsing Functionally
Round PEG, Round Hole - Parsing FunctionallyRound PEG, Round Hole - Parsing Functionally
Round PEG, Round Hole - Parsing Functionally
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
 
Encryption: It's For More Than Just Passwords
Encryption: It's For More Than Just PasswordsEncryption: It's For More Than Just Passwords
Encryption: It's For More Than Just Passwords
 
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
 
PHP 7 – What changed internally?
PHP 7 – What changed internally?PHP 7 – What changed internally?
PHP 7 – What changed internally?
 
Template Haskell Tutorial
Template Haskell TutorialTemplate Haskell Tutorial
Template Haskell Tutorial
 
Clean code
Clean codeClean code
Clean code
 
groovy rules
groovy rulesgroovy rules
groovy rules
 
Dropping ACID with MongoDB
Dropping ACID with MongoDBDropping ACID with MongoDB
Dropping ACID with MongoDB
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
 
Fun with processes - lightning talk
Fun with processes - lightning talkFun with processes - lightning talk
Fun with processes - lightning talk
 
PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Regular expression for everyone
Regular expression for everyoneRegular expression for everyone
Regular expression for everyone
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
Python WATs: Uncovering Odd Behavior
Python WATs: Uncovering Odd BehaviorPython WATs: Uncovering Odd Behavior
Python WATs: Uncovering Odd Behavior
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammers
 
The Internet Is a Series of Tubes
The Internet Is a Series of TubesThe Internet Is a Series of Tubes
The Internet Is a Series of Tubes
 

Similar to Python tutorial

Python tutorial
Python tutorialPython tutorial
Python tutorial
Rajiv Risi
 
python chapter 1
python chapter 1python chapter 1
python chapter 1
Raghu nath
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2
Raghu nath
 
Python language data types
Python language data typesPython language data types
Python language data types
Harry Potter
 

Similar to Python tutorial (20)

Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Introduction to Python Programming.ppt
Introduction to Python Programming.pptIntroduction to Python Programming.ppt
Introduction to Python Programming.ppt
 
Python material
Python materialPython material
Python material
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
python chapter 1
python chapter 1python chapter 1
python chapter 1
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Python Workshop
Python  Workshop Python  Workshop
Python Workshop
 
Becoming a Pythonist
Becoming a PythonistBecoming a Pythonist
Becoming a Pythonist
 
Introduction to python programming 1
Introduction to python programming   1Introduction to python programming   1
Introduction to python programming 1
 
Python Workshop - Learn Python the Hard Way
Python Workshop - Learn Python the Hard WayPython Workshop - Learn Python the Hard Way
Python Workshop - Learn Python the Hard Way
 
Python
PythonPython
Python
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Τα Πολύ Βασικά για την Python
Τα Πολύ Βασικά για την PythonΤα Πολύ Βασικά για την Python
Τα Πολύ Βασικά για την Python
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python language data types
Python language data typesPython language data types
Python language data types
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 

Python tutorial

  • 1. Introduction to Python Chen LinChen Lin clin@brandeis.educlin@brandeis.edu COSI 134aCOSI 134a Volen 110Volen 110 Office Hour: Thurs. 3-5Office Hour: Thurs. 3-5
  • 2. For More Information? http://python.org/ - documentation, tutorials, beginners guide, core distribution, ... Books include:  Learning Python by Mark Lutz  Python Essential Reference by David Beazley  Python Cookbook, ed. by Martelli, Ravenscroft and Ascher  (online at http://code.activestate.com/recipes/langs/python/)  http://wiki.python.org/moin/PythonBooks
  • 3. Python VideosPython Videos http://showmedo.com/videotutorials/python “5 Minute Overview (What Does Python Look Like?)” “Introducing the PyDev IDE for Eclipse” “Linear Algebra with Numpy” And many more
  • 4. 4 Major Versions of Python4 Major Versions of Python “Python” or “CPython” is written in C/C++ - Version 2.7 came out in mid-2010 - Version 3.1.2 came out in early 2010 “Jython” is written in Java for the JVM “IronPython” is written in C# for the .Net environment Go To Website
  • 5. Development EnvironmentsDevelopment Environments what IDE to use?what IDE to use? http://stackoverflow.com/questions/81584http://stackoverflow.com/questions/81584 1. PyDev with Eclipse 2. Komodo 3. Emacs 4. Vim 5. TextMate 6. Gedit 7. Idle 8. PIDA (Linux)(VIM Based) 9. NotePad++ (Windows) 10.BlueFish (Linux)
  • 6. Pydev with EclipsePydev with Eclipse
  • 7. Python Interactive ShellPython Interactive Shell % python% python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin[GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information.Type "help", "copyright", "credits" or "license" for more information. >>>>>> You can type things directly into a running Python sessionYou can type things directly into a running Python session >>> 2+3*4>>> 2+3*4 1414 >>> name = "Andrew">>> name = "Andrew" >>> name>>> name 'Andrew''Andrew' >>> print "Hello", name>>> print "Hello", name Hello AndrewHello Andrew >>>>>>
  • 8.  BackgroundBackground  Data Types/StructureData Types/Structure  Control flowControl flow  File I/OFile I/O  ModulesModules  ClassClass  NLTKNLTK
  • 9. ListList A compound data type:A compound data type: [0][0] [2.3, 4.5][2.3, 4.5] [5, "Hello", "there", 9.8][5, "Hello", "there", 9.8] [][] Use len() to get the length of a listUse len() to get the length of a list >>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"] >>> len(names)>>> len(names) 33
  • 10. Use [ ] to index items in the listUse [ ] to index items in the list >>> names[0]>>> names[0] ‘‘Ben'Ben' >>> names[1]>>> names[1] ‘‘Chen'Chen' >>> names[2]>>> names[2] ‘‘Yaqin'Yaqin' >>> names[3]>>> names[3] Traceback (most recent call last):Traceback (most recent call last): File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module> IndexError: list index out of rangeIndexError: list index out of range >>> names[-1]>>> names[-1] ‘‘Yaqin'Yaqin' >>> names[-2]>>> names[-2] ‘‘Chen'Chen' >>> names[-3]>>> names[-3] ‘‘Ben'Ben' [0] is the first item. [1] is the second item ... Out of range values raise an exception Negative values go backwards from the last element.
  • 11. Strings share many features with listsStrings share many features with lists >>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles[0]>>> smiles[0] 'C''C' >>> smiles[1]>>> smiles[1] '(''(' >>> smiles[-1]>>> smiles[-1] 'O''O' >>> smiles[1:5]>>> smiles[1:5] '(=N)''(=N)' >>> smiles[10:-4]>>> smiles[10:-4] 'C(=O)''C(=O)' Use “slice” notation to get a substring
  • 12. String Methods: find, splitString Methods: find, split smiles = "C(=N)(N)N.C(=O)(O)O"smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles.find("(O)")>>> smiles.find("(O)") 1515 >>> smiles.find(".")>>> smiles.find(".") 99 >>> smiles.find(".", 10)>>> smiles.find(".", 10) -1-1 >>> smiles.split(".")>>> smiles.split(".") ['C(=N)(N)N', 'C(=O)(O)O']['C(=N)(N)N', 'C(=O)(O)O'] >>>>>> Use “find” to find the start of a substring. Start looking at position 10. Find returns -1 if it couldn’t find a match. Split the string into parts with “.” as the delimiter
  • 13. String operators: in, not inString operators: in, not in if "Br" in “Brother”:if "Br" in “Brother”: print "contains brother“print "contains brother“ email_address = “clin”email_address = “clin” if "@" not in email_address:if "@" not in email_address: email_address += "@brandeis.edu“email_address += "@brandeis.edu“
  • 14. String Method: “strip”, “rstrip”, “lstrip” are ways toString Method: “strip”, “rstrip”, “lstrip” are ways to remove whitespace or selected charactersremove whitespace or selected characters >>> line = " # This is a comment line n">>> line = " # This is a comment line n" >>> line.strip()>>> line.strip() '# This is a comment line''# This is a comment line' >>> line.rstrip()>>> line.rstrip() ' # This is a comment line'' # This is a comment line' >>> line.rstrip("n")>>> line.rstrip("n") ' # This is a comment line '' # This is a comment line ' >>>>>>
  • 15. More String methodsMore String methods email.startswith(“c") endswith(“u”)email.startswith(“c") endswith(“u”) True/FalseTrue/False >>> "%s@brandeis.edu" % "clin">>> "%s@brandeis.edu" % "clin" 'clin@brandeis.edu''clin@brandeis.edu' >>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"] >>> ", ".join(names)>>> ", ".join(names) ‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘ >>> “chen".upper()>>> “chen".upper() ‘‘CHEN'CHEN'
  • 16. Unexpected things about stringsUnexpected things about strings >>> s = "andrew">>> s = "andrew" >>> s[0] = "A">>> s[0] = "A" Traceback (most recent call last):Traceback (most recent call last): File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module> TypeError: 'str' object does not support itemTypeError: 'str' object does not support item assignmentassignment >>> s = "A" + s[1:]>>> s = "A" + s[1:] >>> s>>> s 'Andrew‘'Andrew‘ Strings are read only
  • 17. ““” is for special characters” is for special characters n -> newlinen -> newline t -> tabt -> tab -> backslash -> backslash ...... But Windows uses backslash for directories! filename = "M:nickel_projectreactive.smi" # DANGER! filename = "M:nickel_projectreactive.smi" # Better! filename = "M:/nickel_project/reactive.smi" # Usually works
  • 18. Lists are mutable - some usefulLists are mutable - some useful methodsmethods >>> ids = ["9pti", "2plv", "1crn"]>>> ids = ["9pti", "2plv", "1crn"] >>> ids.append("1alm")>>> ids.append("1alm") >>> ids>>> ids ['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm'] >>>ids.extend(L)>>>ids.extend(L) Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L. >>> del ids[0]>>> del ids[0] >>> ids>>> ids ['2plv', '1crn', '1alm']['2plv', '1crn', '1alm'] >>> ids.sort()>>> ids.sort() >>> ids>>> ids ['1alm', '1crn', '2plv']['1alm', '1crn', '2plv'] >>> ids.reverse()>>> ids.reverse() >>> ids>>> ids ['2plv', '1crn', '1alm']['2plv', '1crn', '1alm'] >>> ids.insert(0, "9pti")>>> ids.insert(0, "9pti") >>> ids>>> ids ['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm'] append an element remove an element sort by default order reverse the elements in a list insert an element at some specified position. (Slower than .append())
  • 19. Tuples:Tuples: sort of an immutable list >>> yellow = (255, 255, 0) # r, g, b>>> yellow = (255, 255, 0) # r, g, b >>> one = (1,)>>> one = (1,) >>> yellow[0]>>> yellow[0] >>> yellow[1:]>>> yellow[1:] (255, 0)(255, 0) >>> yellow[0] = 0>>> yellow[0] = 0 Traceback (most recent call last):Traceback (most recent call last): File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment Very common in string interpolation: >>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056) 'Andrew lives in Sweden at latitude 57.7'
  • 20. zipping lists togetherzipping lists together >>> names>>> names ['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin'] >>> gender =>>> gender = [0, 0, 1][0, 0, 1] >>> zip(names, gender)>>> zip(names, gender) [('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]
  • 21. DictionariesDictionaries  Dictionaries are lookup tables.  They map from a “key” to a “value”. symbol_to_name = { "H": "hydrogen", "He": "helium", "Li": "lithium", "C": "carbon", "O": "oxygen", "N": "nitrogen" }  Duplicate keys are not allowed  Duplicate values are just fine
  • 22. Keys can be any immutable valueKeys can be any immutable value numbers, strings, tuples, frozensetnumbers, strings, tuples, frozenset,, not list, dictionary, set, ...not list, dictionary, set, ... atomic_number_to_name = {atomic_number_to_name = { 1: "hydrogen"1: "hydrogen" 6: "carbon",6: "carbon", 7: "nitrogen"7: "nitrogen" 8: "oxygen",8: "oxygen", }} nobel_prize_winners = {nobel_prize_winners = { (1979, "physics"): ["Glashow", "Salam", "Weinberg"],(1979, "physics"): ["Glashow", "Salam", "Weinberg"], (1962, "chemistry"): ["Hodgkin"],(1962, "chemistry"): ["Hodgkin"], (1984, "biology"): ["McClintock"],(1984, "biology"): ["McClintock"], }} A set is an unordered collection with no duplicate elements.
  • 23. DictionaryDictionary >>> symbol_to_name["C"]>>> symbol_to_name["C"] 'carbon''carbon' >>> "O" in symbol_to_name, "U" in symbol_to_name>>> "O" in symbol_to_name, "U" in symbol_to_name (True, False)(True, False) >>> "oxygen" in symbol_to_name>>> "oxygen" in symbol_to_name FalseFalse >>> symbol_to_name["P"]>>> symbol_to_name["P"] Traceback (most recent call last):Traceback (most recent call last): File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module> KeyError: 'P'KeyError: 'P' >>> symbol_to_name.get("P", "unknown")>>> symbol_to_name.get("P", "unknown") 'unknown''unknown' >>> symbol_to_name.get("C", "unknown")>>> symbol_to_name.get("C", "unknown") 'carbon''carbon' Get the value for a given key Test if the key exists (“in” only checks the keys, not the values.) [] lookup failures raise an exception. Use “.get()” if you want to return a default value.
  • 24. Some useful dictionary methodsSome useful dictionary methods >>> symbol_to_name.keys()>>> symbol_to_name.keys() ['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He'] >>> symbol_to_name.values()>>> symbol_to_name.values() ['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium'] >>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} ) >>> symbol_to_name.items()>>> symbol_to_name.items() [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')] >>> del symbol_to_name['C']>>> del symbol_to_name['C'] >>> symbol_to_name>>> symbol_to_name {'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}
  • 25.  BackgroundBackground  Data Types/StructureData Types/Structure list, string, tuple, dictionarylist, string, tuple, dictionary  Control flowControl flow  File I/OFile I/O  ModulesModules  ClassClass  NLTKNLTK
  • 26. Control FlowControl Flow Things that are FalseThings that are False  The boolean value False  The numbers 0 (integer), 0.0 (float) and 0j (complex).  The empty string "".  The empty list [], empty dictionary {} and empty set set(). Things that are TrueThings that are True  The boolean value TrueThe boolean value True  All non-zero numbers.All non-zero numbers.  Any string containing at least one character.Any string containing at least one character.  A non-empty data structure.A non-empty data structure.
  • 27. IfIf >>> smiles = "BrC1=CC=C(C=C1)NN.Cl">>> smiles = "BrC1=CC=C(C=C1)NN.Cl" >>> bool(smiles)>>> bool(smiles) TrueTrue >>> not bool(smiles)>>> not bool(smiles) FalseFalse >>> if not smiles>>> if not smiles:: ... print "The SMILES string is empty"... print "The SMILES string is empty" ......  The “else” case is always optional
  • 28. Use “elif” to chain subsequent testsUse “elif” to chain subsequent tests >>> mode = "absolute">>> mode = "absolute" >>> if mode == "canonical":>>> if mode == "canonical": ...... smiles = "canonical"smiles = "canonical" ... elif mode == "isomeric":... elif mode == "isomeric": ...... smiles = "isomeric”smiles = "isomeric” ...... elif mode == "absolute":elif mode == "absolute": ...... smiles = "absolute"smiles = "absolute" ... else:... else: ...... raise TypeError("unknown mode")raise TypeError("unknown mode") ...... >>> smiles>>> smiles ' absolute '' absolute ' >>>>>> “raise” is the Python way to raise exceptions
  • 29. Boolean logicBoolean logic Python expressions can have “and”s andPython expressions can have “and”s and “or”s:“or”s: if (benif (ben <=<= 5 and chen5 and chen >=>= 10 or10 or chenchen ==== 500 and ben500 and ben !=!= 5):5): print “Ben and Chen“print “Ben and Chen“
  • 30. Range TestRange Test if (3if (3 <= Time <=<= Time <= 5):5): print “Office Hour"print “Office Hour"
  • 31. ForFor >>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"] >>> for name in names:>>> for name in names: ...... print smilesprint smiles ...... BenBen ChenChen YaqinYaqin
  • 32. Tuple assignment in for loopsTuple assignment in for loops data = [ ("C20H20O3", 308.371),data = [ ("C20H20O3", 308.371), ("C22H20O2", 316.393),("C22H20O2", 316.393), ("C24H40N4O2", 416.6),("C24H40N4O2", 416.6), ("C14H25N5O3", 311.38),("C14H25N5O3", 311.38), ("C15H20O2", 232.3181)]("C15H20O2", 232.3181)] forfor (formula, mw)(formula, mw) in data:in data: print "The molecular weight of %s is %s" % (formula, mw)print "The molecular weight of %s is %s" % (formula, mw) The molecular weight of C20H20O3 is 308.371The molecular weight of C20H20O3 is 308.371 The molecular weight of C22H20O2 is 316.393The molecular weight of C22H20O2 is 316.393 The molecular weight of C24H40N4O2 is 416.6The molecular weight of C24H40N4O2 is 416.6 The molecular weight of C14H25N5O3 is 311.38The molecular weight of C14H25N5O3 is 311.38 The molecular weight of C15H20O2 is 232.3181The molecular weight of C15H20O2 is 232.3181
  • 33. Break, continueBreak, continue >>> for value in [3, 1, 4, 1, 5, 9, 2]:>>> for value in [3, 1, 4, 1, 5, 9, 2]: ...... print "Checking", valueprint "Checking", value ...... if value > 8:if value > 8: ...... print "Exiting for loop"print "Exiting for loop" ...... breakbreak ...... elif value < 3:elif value < 3: ...... print "Ignoring"print "Ignoring" ...... continuecontinue ...... print "The square is", value**2print "The square is", value**2 ...... Use “break” to stopUse “break” to stop the for loopthe for loop Use “continue” to stopUse “continue” to stop processing the current itemprocessing the current item Checking 3Checking 3 The square is 9The square is 9 Checking 1Checking 1 IgnoringIgnoring Checking 4Checking 4 The square is 16The square is 16 Checking 1Checking 1 IgnoringIgnoring Checking 5Checking 5 The square is 25The square is 25 Checking 9Checking 9 Exiting for loopExiting for loop >>>>>>
  • 34. Range()Range()  ““range” creates a list of numbers in a specified rangerange” creates a list of numbers in a specified range  range([start,] stop[, step]) -> list of integersrange([start,] stop[, step]) -> list of integers  When step is given, it specifies the increment (or decrement).When step is given, it specifies the increment (or decrement). >>> range(5)>>> range(5) [0, 1, 2, 3, 4][0, 1, 2, 3, 4] >>> range(5, 10)>>> range(5, 10) [5, 6, 7, 8, 9][5, 6, 7, 8, 9] >>> range(0, 10, 2)>>> range(0, 10, 2) [0, 2, 4, 6, 8][0, 2, 4, 6, 8] How to get every second element in a list? for i in range(0, len(data), 2): print data[i]
  • 35.  BackgroundBackground  Data Types/StructureData Types/Structure  Control flowControl flow  File I/OFile I/O  ModulesModules  ClassClass  NLTKNLTK
  • 36. Reading filesReading files >>> f = open(“names.txt")>>> f = open(“names.txt") >>> f.readline()>>> f.readline() 'Yaqinn''Yaqinn'
  • 37. Quick WayQuick Way >>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst= [ x for x in open("text.txt","r").readlines() ] >>> lst>>> lst ['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn', 'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour: Tues. 3-5n']Tues. 3-5n'] Ignore the header?Ignore the header? for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):for (i,line) in enumerate(open(‘text.txt’,"r").readlines()): if i == 0: continueif i == 0: continue print lineprint line
  • 38. Using dictionaries to countUsing dictionaries to count occurrencesoccurrences >>> for line in open('names.txt'):>>> for line in open('names.txt'): ...... name = line.strip()name = line.strip() ...... name_count[name] = name_count.get(name,0)+ 1name_count[name] = name_count.get(name,0)+ 1 ...... >>> for (name, count) in name_count.items():>>> for (name, count) in name_count.items(): ...... print name, countprint name, count ...... Chen 3Chen 3 Ben 3Ben 3 Yaqin 3Yaqin 3
  • 39. File OutputFile Output input_file = open(“in.txt")input_file = open(“in.txt") output_file = open(“out.txt", "w")output_file = open(“out.txt", "w") for line in input_file:for line in input_file: output_file.write(line)output_file.write(line) “w” = “write mode” “a” = “append mode” “wb” = “write in binary” “r” = “read mode” (default) “rb” = “read in binary” “U” = “read files with Unix or Windows line endings”
  • 40.  BackgroundBackground  Data Types/StructureData Types/Structure  Control flowControl flow  File I/OFile I/O  ModulesModules  ClassClass  NLTKNLTK
  • 41. ModulesModules When a Python program starts it only has access to a basic functions and classes. (“int”, “dict”, “len”, “sum”, “range”, ...) “Modules” contain additional functionality. Use “import” to tell Python to load a module. >>> import math >>> import nltk
  • 42. import the math moduleimport the math module >>> import math>>> import math >>> math.pi>>> math.pi 3.14159265358979313.1415926535897931 >>> math.cos(0)>>> math.cos(0) 1.01.0 >>> math.cos(math.pi)>>> math.cos(math.pi) -1.0-1.0 >>> dir(math)>>> dir(math) ['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc']'tanh', 'trunc'] >>> help(math)>>> help(math) >>> help(math.cos)>>> help(math.cos)
  • 43. ““import” and “from ... import ...”import” and “from ... import ...” >>> import math>>> import math math.cosmath.cos >>> from math import cos, pi cos >>> from math import *
  • 44.  BackgroundBackground  Data Types/StructureData Types/Structure  Control flowControl flow  File I/OFile I/O  ModulesModules  ClassClass  NLTKNLTK
  • 45. ClassesClasses class ClassName(object):class ClassName(object): <statement-1><statement-1> . . .. . . <statement-N><statement-N> class MyClass(object):class MyClass(object): """A simple example class""""""A simple example class""" i = 1234512345 def f(self):def f(self): return self.ireturn self.i class DerivedClassName(BaseClassName):class DerivedClassName(BaseClassName): <statement-1><statement-1> . . .. . . <statement-N><statement-N>
  • 46.  BackgroundBackground  Data Types/StructureData Types/Structure  Control flowControl flow  File I/OFile I/O  ModulesModules  ClassClass  NLTKNLTK
  • 47. http://www.nltk.org/bookhttp://www.nltk.org/book NLTK is on berry patch machines!NLTK is on berry patch machines! >>>from nltk.book import *>>>from nltk.book import * >>> text1>>> text1 <Text: Moby Dick by Herman Melville 1851><Text: Moby Dick by Herman Melville 1851> >>> text1.name>>> text1.name 'Moby Dick by Herman Melville 1851''Moby Dick by Herman Melville 1851' >>> text1.concordance("monstrous")>>> text1.concordance("monstrous") >>> dir(text1)>>> dir(text1) >>> text1.tokens>>> text1.tokens >>> text1.index("my")>>> text1.index("my") 46474647 >>> sent2>>> sent2 ['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.']'Sussex', '.']
  • 48. Classify TextClassify Text >>> def gender_features(word):>>> def gender_features(word): ...... return {'last_letter': word[-1]}return {'last_letter': word[-1]} >>> gender_features('Shrek')>>> gender_features('Shrek') {'last_letter': 'k'}{'last_letter': 'k'} >>> from nltk.corpus import names>>> from nltk.corpus import names >>> import random>>> import random >>> names = ([(name, 'male') for name in names.words('male.txt')] +>>> names = ([(name, 'male') for name in names.words('male.txt')] + ... [(name, 'female') for name in names.words('female.txt')])... [(name, 'female') for name in names.words('female.txt')]) >>> random.shuffle(names)>>> random.shuffle(names)
  • 49. Featurize, train, test, predictFeaturize, train, test, predict >>> featuresets = [(gender_features(n), g) for (n,g) in names]>>> featuresets = [(gender_features(n), g) for (n,g) in names] >>> train_set, test_set = featuresets[500:], featuresets[:500]>>> train_set, test_set = featuresets[500:], featuresets[:500] >>> classifier = nltk.NaiveBayesClassifier.train(train_set)>>> classifier = nltk.NaiveBayesClassifier.train(train_set) >>> print nltk.classify.accuracy(classifier, test_set)>>> print nltk.classify.accuracy(classifier, test_set) 0.7260.726 >>> classifier.classify(gender_features('Neo'))>>> classifier.classify(gender_features('Neo')) 'male''male'
  • 50. fromfrom nltknltk.corpus import.corpus import reutersreuters  Reuters Corpus:Reuters Corpus:10,788 news10,788 news 1.3 million words.1.3 million words.  Been classified intoBeen classified into 9090 topicstopics  Grouped into 2 sets, "training" and "test“Grouped into 2 sets, "training" and "test“  Categories overlap with each otherCategories overlap with each other http://nltk.googlecode.com/svn/trunk/doc/bohttp://nltk.googlecode.com/svn/trunk/doc/bo ok/ch02.htmlok/ch02.html
  • 51. ReutersReuters >>> from nltk.corpus import reuters>>> from nltk.corpus import reuters >>> reuters.fileids()>>> reuters.fileids() ['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]['test/14826', 'test/14828', 'test/14829', 'test/14832', ...] >>> reuters.categories()>>> reuters.categories() ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut', 'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton- oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]