Introduction to Python Language and Data Types

 Invented in the Netherlands, early 90s by Guido van
Rossum
 Named after Monty Python
 Open sourced from the beginning
 Used by Google from the beginning
 80 versions as of 4th
October 2014.
 “Python” or “CPython” is written in C/C++
- Version 2.7.7 came out in mid-2010
- Version 3.4.2 came out in late 2014
 “Jython” is written in Java for the JVM
 “IronPython” is written in C# for the .Net environment

“Python is an experiment in how
much freedom programmers need.
Too much freedom and nobody can
read another's code; too little and
expressive-ness is endangered.”
- Guido van Rossum

 Python is
 super easy to learn,
 relatively fast,
 object-oriented,
 strongly typed,
 widely used, and
 portable.
 C is much faster but
much harder to use.
 Java is about as fast and
slightly harder to use.
 Perl is slower, is as easy
to use, but is not strongly
typed.
 R is more of
programming language
for statistics.

 PyDev with Eclipse
 Komodo
 Emacs
 Vim
 TextMate
 Gedit
 IDLE
 PIDA (Linux)(VIM Based)
 NotePad++ (Windows)
 BlueFish (Linux)
 Interactive Shell

 Great for learning the language
 Great for experimenting with the library
 Great for testing your own modules
 Two variations: IDLE (GUI) (next slide),
python (command line)
 Press “Windows + R” and then write “python”. Enter
 Type statements or expressions at prompt:
>>> print "Hello, world"
Hello, world
>>> x = 12**2
>>> x/2
72

 IDLE is an Integrated Development Environment for
Python, typically used on Windows
 Multi-window text editor with syntax highlighting, auto-
completion, smart indent and other.
 Python shell with syntax highlighting.
 Integrated debugger
with stepping, persis-
tent breakpoints,
and call stack visi-
bility

 Install Python 2.7.7 from the official website. 32 bit as most
packages are built for that.
 Install setuptools, which will help to install other packages.
Download the setup tools and then install.
 Start the Python IDLE and start programming.
 Install Ipython which provides a browser-based notebook with
support for code, text, mathematical expressions, inline plots and
other rich media.
 Start writing small programs using problems at
http://www.ling.gu.se/~lager/python_exercises.html
 Can also use www.codecademy.com/tracks/python which has an
interactive structure + deal with web APIs.

 Start comments with #, rest of line is ignored
 Can include a “documentation string” as the first line of
a new function or class you define
# My first program
def fact(n):
“““fact(n) assumes n is a positive integer and returns
facorial of n.”””
assert(n>0)
return 1 if n==1 else n*fact(n-1)

 Names are case sensitive and cannot start with a
number. They can contain letters, numbers, and
underscores.
var Var _var _2_var_ var_2 VaR
2_Var 222var
 There are some reserved words:
and, assert, break, class, continue, def, del, elif, else,
except, exec, finally, for, from, global, if, import, in, is,
lambda, not, or, pass, print, raise, return, try, while

 No need to declare unlike in JAVA
 Simple assignment:
▪ a = 2, A = 2
 Need to assign (initialize)
▪ use of uninitialized variable raises exception
 Not typed
if friendly: greeting = "hello world“ #(or ‘hello word’)
else: greeting = 12**2
print greeting

a
1
b
a
1b
a = 1
a = a+1
b = a
a 1
2
old reference deleted
by assignment (a=...)
new int object created
by add operator (1+1)

 If you try to access a name before it’s been properly created
(by placing it on the left side of an assignment), you’ll get an
error.
>>> y
Traceback (most recent call last):
File "<pyshell#16>", line 1, in -toplevel-
y
NameError: name ‘y' is not defined
>>> y = 3
>>> y
3

 Assignment manipulates references
▪ x = y does not make a copy of y
▪ x = y makes x reference the object y references
 Very useful; but beware!
 Example:
>>> a = [1, 2, 3]
>>> b = a
>>> a.append(4)
>>> print b
[1, 2, 3, 4]

 You can also assign to multiple names at the same time.
>>> x, y = 2, 3
>>> x
2
>>> y
3
>>> x, y, z = 2, 3, 5.67
>>> x
2
>>> y
3
>>> z
5.67

+ Addition - Adds values on either side of the
operator
a + b will give 30
- Subtraction - Subtracts right hand operand from
left hand operand
a - b will give -10
* Multiplication - Multiplies values on either side of
the operator
a * b will give 200
/ Division - Divides left hand operand by right hand
operand
b / a will give 2
% Modulus - Divides left hand operand by right hand
operand and returns remainder
b % a will give 0
** Exponent - Performs exponential (power)
calculation on operators
a**b will give 10 to the power
20
// Floor Division - The division of operands where the
result is the quotient in which the digits after the
decimal point are removed.
9//2 is equal to 4 and 9.0//2.0
is equal to 4.0

== Checks if the value of two operands are equal or not, if
yes then condition becomes true.
(a == b) is not true.
!= Checks if the value of two operands are equal or not, if
values are not equal then condition becomes true.
(a != b) is true.
<> Checks if the value of two operands are equal or not, if
values are not equal then condition becomes true.
(a <> b) is true. This is similar
to != operator.
> Checks if the value of left operand is greater than the
value of right operand, if yes then condition becomes
true.
(a > b) is not true.
< Checks if the value of left operand is less than the value
of right operand, if yes then condition becomes true.
(a < b) is true.
>= Checks if the value of left operand is greater than or
equal to the value of right operand, if yes then condition
becomes true.
(a >= b) is not true.
<= Checks if the value of left operand is less than or equal to
the value of right operand, if yes then condition becomes
true.
(a <= b) is true.

& Binary AND Operator copies a bit to the result if it
exists in both operands.
(a & b) will give 12 which is
0000 1100
| Binary OR Operator copies a bit if it exists in eather
operand.
(a | b) will give 61 which is 0011
1101
^ Binary XOR Operator copies the bit if it is set in one
operand but not both.
(a ^ b) will give 49 which is
0011 0001
~ Binary Ones Complement Operator is unary and has
the efect of 'flipping' bits.
(~a ) will give -61 which is 1100
0011 in 2's complement form
due to a signed binary number.
<< Binary Left Shift Operator. The left operands value is
moved left by the number of bits specified by the
right operand.
a << 2 will give 240 which is
1111 0000
>> Binary Right Shift Operator. The left operands value is
moved right by the number of bits specified by the
right operand.
a >> 2 will give 15 which is
0000 1111

Operator Description Example
and Called Logical AND operator. If both the
operands are true then then condition
becomes true.
(a and b) is true.
or Called Logical OR Operator. If any of the two
operands are non zero then then condition
becomes true.
(a or b) is true.
not Called Logical NOT Operator. Use to reverses
the logical state of its operand. If a condition
is true then Logical NOT operator will make
false.
not(a and b) is false.

in Evaluates to true if it finds a variable in the
specified sequence and false otherwise.
x in y, here in results in a 1 if x
is a member of sequence y.
not in Evaluates to true if it does not finds a
variable in the specified sequence and false
otherwise.
x not in y, here not in results
in a 1 if x is not a member of
sequence y.
is Evaluates to true if the variables on either
side of the operator point to the same object
and false otherwise.
x is y, here is results in 1 if
id(x) equals id(y).
is not Evaluates to false if the variables on either
side of the operator point to the same object
and true otherwise.
x is not y, here is not results
in 1 if id(x) is not equal to
id(y).

 We use the term object to refer to any entity in a python
program.
 Every object has an associated type, which determines the
properties of the object.
 Python defines six types of built-in objects:
Number : 10
String : “hello”, ‘hi’
List: [1, 17, 44]
Tuple : (4, 5)
Dictionary: {‘food’ : ‘something you eat’,
‘lobster’ : ‘an edible, undersea arthropod’}
File
 Each type of object has its own properties, which we will learn
about in the next several weeks.
 Types can be accessed by ‘type’ method

 The usual suspects
▪ 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x)
 Integer – 1, 2, 3, 4. Also 1L, 2L
 Float – 1.0, 2.0, 3.0 3.2
 Addition
▪ 1+2 ->3, 1+2.0 -> 3.0
 Division
▪ 1/2 -> 0 , 1./2. -> 0.5, float(1)/2 -> 0.5
 Long (arbitrary precision)
▪ 2**100 -> 1267650600228229401496703205376

 Flexible arrays, mutable
▪ a = [99, "bottles of beer", ["on", "the", "wall"]]
 Same operators as for strings
▪ a+b, a*3, a[0], a[-1], a[1:], len(a)
 Item and slice assignment
▪ a[0] = 98
▪ a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]
▪ del a[-1] # -> [98, "bottles", "of", "beer"]
 [0]*3 = [0,0,0]

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)
Positive index: count from the left, starting with 0
>>> list_ex[1]
‘abc’
Negative index: count from right, starting with –1
>>> list_ex[-3]
4.56

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)
Return a copy of the container with a subset of the original
members. Start copying at the first index, and stop copying
before second.
>>> list_ex[1:4]
(‘abc’, 4.56, (2,3))
Negative indices count from end
>>> list_ex[1:-1]
(‘abc’, 4.56, (2,3))

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)
Omit first index to make copy starting from beginning of
the container
>>> list_ex[:2]
(23, ‘abc’)
Omit second index to make copy starting at first index and
going to end
>>> list_ex[2:]
(4.56, (2,3), ‘def’)

 [ : ] makes a copy of an entire sequence
>>> list_ex[:]
(23, ‘abc’, 4.56, (2,3), ‘def’)
 Note the difference between these two lines for mutable
sequences
>>> l2 = l1 # Both refer to 1 ref,
# changing one affects both
>>> l2 = l1[:] # Independent copies, two refs

>>> li = [‘abc’, 23, 4.34, 23]
>>> li[1] = 45
>>> li
[‘abc’, 45, 4.34, 23]
 Name li still points to the same memory reference when
we’re done.

a
1 2 3
b
a
1 2 3
b
4
a = [1, 2, 3]
a.append(4)
b = a
a 1 2 3

>>> li = [1, 11, 3, 4, 5]
>>> li.append(‘a’) # Note the method syntax
>>> li
[1, 11, 3, 4, 5, ‘a’]
>>> li.insert(2, ‘i’)
>>> li
[1, 11, ‘i’, 3, 4, 5, ‘a’]
>>> print (li + [1,2,3])
[1, 11, ‘i’, 3, 4, 5, ‘a’, 1, 2, 3]

 + creates a fresh list with a new memory ref
 extend operates on list li in place.
>>> li.extend([9, 8, 7])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7]
 Potentially confusing:
 extend takes a list as an argument.
 append takes a singleton as an argument.
>>> li.append([10, 11, 12])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7, [10, 11, 12]]

Lists have many methods, including index, count, remove,
reverse, sort
>>> li = [‘a’, ‘b’, ‘c’, ‘b’]
>>> li.index(‘b’) # index of 1st
occurrence
1
>>> li.count(‘b’) # number of occurrences
2
>>> li.remove(‘b’) # remove 1st
occurrence
>>> li
[‘a’, ‘c’, ‘b’]

>>> a = range(5) # [0,1,2,3,4]
>>> a.append(5) # [0,1,2,3,4,5]
>>> a.pop() # [0,1,2,3,4]
5
>>> a.pop(2) # [0,1,2,3,4]
2
>>> a.insert(0, 42) # [42,0,1,2,3,4]
>>> a.pop(0) # [0,1,2,3,4]
42
>>> a.reverse() # [4,3,2,1,0]
>>> a.sort() # [0,1,2,3,4]

>>> names[0]
‘Ben'
>>> names[1]
‘Chen'
>>> names[2]
‘Yaqin'
>>> names[3]
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
IndexError: list index out of rangeIndexError: list index out of range
>>> names[-1]
‘Yaqin'
>>> names[-2]
‘Chen'
>>> names[-3]
[0] is the first item.
[1] is the second item
...
Out of range values
raise an exception
Negative values
go backwards from
the last element.
Names = [ ‘Ben’, ‘Chen’, ‘Yaqin’]

>>> ids = ["9pti", "2plv", "1crn"]
>>> ids.append("1alm")
>>> ids
['9pti', '2plv', '1crn', '1alm']
>>> del ids[0]
>>> ids
['2plv', '1crn', '1alm']
>>> ids.sort()
>>> ids
['1alm', '1crn', '2plv']
>>> ids.reverse()
>>> ids
['2plv', '1crn', '1alm']
>>> ids.insert(0, "9pti")
>>> ids
['9pti', '2plv', '1crn', '1alm']

>>> names
['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']
>>> gender = [0, 0, 1][0, 0, 1]
>>> zip(names, gender)
[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]
Can also be done using a for loop, but this is more efficientCan also be done using a for loop, but this is more efficient

 The comma is the tuple creation operator, not parentheses
>>> 1, -> (1,)
 Python shows parentheses for clarity (best practice)
>>> (1,) -> (1,)
 Don't forget the comma!
>>> (1) -> 1
 Non-mutable lists. Elements can be accessed same way
 Empty tuples have a special syntactic form
>>> () -> ()
>>> tuple() -> ()

>>> yellow = (255, 255, 0) # r, g, b
>>> one = (1,)>>> one = (1,)
>>> yellow[0]
>>> yellow[1:]
(255, 0)
>>> yellow[0] = 0
TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment
Very common in string interpolation:
>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden",
57.7056)

>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> t[2] = 3.14
Traceback (most recent call last):
File "<pyshell#75>", line 1, in -toplevel-
tu[2] = 3.14
TypeError: object doesn't support item assignment
You can’t change a tuple.
You can make a fresh tuple and assign its reference to a
previously used name.
>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)
The immutability of tuples means they’re faster than lists.

 Lists slower but more powerful than tuples
 Lists can be modified, and they have lots of handy
operations and methods
 Tuples are immutable and have fewer features
 To convert between tuples and lists use the list() and
tuple() functions:
li = list(tu)
tu = tuple(li)

▪ "hello"+"world" "helloworld" # concatenation
▪ "hello"*3 "hellohellohello" # repetition
▪ "hello"[0] "h" # indexing
▪ "hello"[-1] "o" # (from end)
▪ "hello"[1:4] "ell" # slicing
▪ len("hello") 5 # size
▪ "hello" < "jello" 1 # comparison
▪ "e" in "hello"1 # search
▪ "escapes: n etc, 033 etc, if etc"
▪ 'single quotes' "double quotes"

>>> smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles[0]
'C'
>>> smiles[1]
'('
>>> smiles[-1]
'O'
>>> smiles[1:5]
'(=N)'
>>> smiles[10:-4]
'C(=O)'

 Boolean test whether a value is inside a container:
>>> t = [1, 2, 4, 5]
>>> 2 in t
True
>>> 4 not in t
False
 For strings, tests for substrings
>>> a = 'abcde'
>>> 'c' in a
True
>>> 'cd' in a
True

The + operator produces a new tuple, list, or string whose
value is the concatenation of its arguments.
>>> (1, 2, 3) + (4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> [1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
>>> “Hello” + “ ” + “World”
‘Hello World’

 The * operator produces a new tuple, list, or string that
“repeats” the original content.
>>> (1, 2, 3) * 3
(1, 2, 3, 1, 2, 3, 1, 2, 3)
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> “Hello” * 3
‘HelloHelloHello’

smiles = "C(=N)(N)N.C(=O)(O)Osmiles = "C(=N)(N)N.C(=O)(O)O""
>>> smiles.find("(O)")
15
>>> smiles.find(".")
9
>>> smiles.find(".", 10)
-1
>>> smiles.split(".")
['C(=N)(N)N', 'C(=O)(O)O']
Use “find” to find the
start of a substring.
Start looking at position 10.
Find returns -1 if it couldn’t
find a match.
Split the string into parts
with “.” as the delimiter

if "Br" in “Brother”:
print "contains brother“
>>> contains brother
email_address = “ravi”
if "@" not in email_address:
email_address += "@latentview.com“
print email_address
>>> ravi@latentview.com

>>> line = " # This is a comment line n"
>>> line.strip()
'# This is a comment line'
>>> line.rstrip()
' # This is a comment line'
>>> line.rstrip("n")
' # This is a comment line '

email = cku@stanford.edu
>>> email.startswith(“c")
TrueTrue
>>> email.endswith(“u")
TrueTrue
>>> "%s@brandeis.edu" % "clin"
'clin@brandeis.edu''clin@brandeis.edu'
>>> names = [“Ben", “Chen", “Yaqin"]
>>> ", ".join(names)
‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘
>>> “chen".upper()
‘‘CHEN‘CHEN‘

 capitalize() -- Capitalizes first letter of string
 count(str, beg= 0,end=len(string))
Counts how many times str occurs in string.
 encode(encoding=’UTF-8′,errors=’strict’)
Returns encoded string version of string
 isalnum()
Returns true if string has at least 1 character
 isalpha()
Returns true if string has at least 1 character
 isdigit()
Returns true if string contains only digits and false
otherwise

Strings are read only
>>> s = "andrew"
>>> s[0] = "A"
TypeError: 'str' object does not support item assignmentTypeError: 'str' object does not support item assignment
>>> s = "A" + s[1:]
>>> s
'Andrew‘

n -> newline
t -> tab
-> backslash
...
But Windows uses backslash for directories!
filename = "M:nickel_projectreactive.smi" # DANGER!
filename = "M:nickel_projectreactive.smi" # Better!
filename = "M:/nickel_project/reactive.smi" # Usually works

 Dictionaries are lookup tables.
 Random, not sorted
 They map from a “key” to a “value”.
symbol_to_name = {
"H": "hydrogen",
"He": "helium",
"Li": "lithium",
"C": "carbon",
"O": "oxygen",
"N": "nitrogen"
}
 Duplicate keys are not allowed
 Duplicate values are just fine
 Keys can be any immutable value

 Hash tables, "associative arrays"
▪ d = {"duck": "eend", "water": "water"}
 Lookup:
▪ d["duck"] -> "eend"
▪ d["back"] # raises KeyError exception
 Delete, insert, overwrite:
▪ del d["water"] # {"duck": "eend", "back": "rug"}
▪ d["back"] = "rug" # {"duck": "eend", "back": "rug"}
▪ d["duck"] = "duik" # {"duck": "duik", "back": "rug"}

>>> symbol_to_name["C"]
'carbon''carbon'
>>> "O" in symbol_to_name, "U" in symbol_to_name
(True, False)(True, False)
>>> "oxygen" in symbol_to_name
False
>>> symbol_to_name["P"]>>> symbol_to_name["P"]
KeyError: 'P'KeyError: 'P'
>>> symbol_to_name.get("P", "unknown")
'unknown''unknown'
>>> symbol_to_name.get("C", "unknown")
'carbon''carbon'

 Keys, values, items:
▪ d.keys() -> ["duck", "back"]
▪ d.values() -> ["duik", "rug"]
▪ d.items() -> [("duck","duik"), ("back","rug")]
 Presence check:
▪ d.has_key("duck") -> 1; d.has_key("spam") -> 0
 Values of any type; keys almost any
▪ {"name":"Guido", "age":43, ("hello","world"):1,
42:"yes", "flag": ["red","white","blue"]}

>>> symbol_to_name.keys()
['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He']
>>> symbol_to_name.values()
['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']
>>> symbol_to_name.update( {"P": "phosphorous", "S":
"sulfur"} )
>>> symbol_to_name.items()
[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'),[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'),
('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He',('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He',
'helium')]'helium')]
>>> del symbol_to_name['C']
>>> symbol_to_name
{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He':{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He':

 dict.clear()
Removes all elements of dictionary dict
 dict.fromkeys()
Create a new dictionary with keys from seq and
values set to value
 dict.setdefault(key, default=None)
Similar to get(), but will set dict[key]=default if key is not
already in dict
 dict.update(dict2)
Adds dictionary dict2‘s key-values pairs to dict

 Keys must be immutable:
 numbers, strings, tuples of immutables
▪ these cannot be changed after creation
 reason is hashing (fast lookup technique)
 not lists or other dictionaries
▪ these types of objects can be changed "in place"
 no restrictions on values
 Keys will be listed in arbitrary order
 again, because of hashing

if condition:
statements
[elif condition:
statements] ...
else:
statements
while condition:
statements
for var in sequence:
statements
break
continue

var2 = 0
if var2:
print "2 - Got a true
expression value"
print var2
else:
print "2 - Got a false
expression value"
print var2
print "Good bye!“
2 - Got a false expression
value 0 Good bye!
var1 = 100
if var1:
print "1 - Got a true
expression value"
print var1
else:
print "1 - Got a false
expression value"
print var1
1 - Got a true expression
value 100

count = 0
while (count < 9):
print 'The count is:',
count count = count + 1
print "Good bye!“
The count is: 0 The count is: 1
The count is: 8 Good bye!
Error with the program
var = 1
while var == 1 :
# This constructs an
infinite loop
num = raw_input("Enter a
number :")
print "You entered: ",
num print "Good bye!"

for i in range(20):
if i%3 == 0:
print i
if i%5 == 0:
print "Bingo!"
print "---“
for letter in 'Python':
# First Example
print 'Current Letter :', letter
fruits = ['banana', 'apple', 'mango']
for fruit in fruits:
# Second Example print 'Current fruit :', fruit
print "Good bye!"

def name(arg1, arg2, ...):
"""documentation""" # optional doc string
statements
return # from procedure
return expression # from function
def add():
return 5
>>>print add()
5

def functionname( parameters ):
"function_docstring" function_suite
return [expression]
def printme( str ):
"This prints a passed string into this function"
print str
print ‘n’
# Now you can call printme function
printme ("I'm first call to user defined function!")
printme("Again second call to the same function")

def gcd(a, b):
"greatest common divisor"
while a != 0:
a, b = b%a, a # parallel assignment
return b
>>> gcd.__doc__
'greatest common divisor'
>>> gcd(12, 20)
4

class name:
"documentation"
statements
-or-
class name(base1, base2, ...):
...
Most, statements are method definitions:
def name(self, arg1, arg2, ...):
...
May also be class variable assignments

class Stack:
"A well-known data structure…"
def __init__(self): # constructor
self.items = []
def push(self, x):
self.items.append(x)
def pop(self):
x = self.items[-1]
del self.items[-1]
return x
def empty(self):
return len(self.items) == 0 # Boolean result

 To create an instance, simply call the class object:
x = Stack() # no 'new' operator!
 To use methods of the instance, call using dot notation:
x.empty() # -> 1
x.push(1) # [1]
x.empty() # -> 0
x.push("hello")# [1, "hello"]
x.pop() # -> "hello" # [1]
 To inspect instance variables, use dot notation:
x.items # -> [1]

 When a Python program starts it only has access to a
basic functions and classes.
(“int”, “dict”, “len”, “sum”, “range”, ...)
 “Modules” contain additional functionality
 Use “import” to tell Python to load a module
>>> import math #Used for mathematical functions
>>> import nltk #Used for NLP operations

 Importing modules:
 import re; print re.match("[a-z]+", s)
 from re import match; print match("[a-z]+", s)
 Import with rename:
 import re as regex
 from re import match as m
 Before Python 2.0:
▪ import re; regex = re; del re

>>> import math
>>> math.pi
3.14159265358979313.1415926535897931
>>> math.cos(0)
1.01.0
>>> math.cos(math.pi)
-1.0-1.0
>>> dir(math)
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',
'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos',
'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod',
'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10',
'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan',
'tanh', 'trunc']'tanh', 'trunc']

>>> import math
math.cosmath.cos
>>> from math import cos, pi
cos
>>> from math import *

 BeautifulSoup – A bit slow, but useful and easy for
beginners
 Numpy – Advanced mathematical functionalities
 SciPy – Algorithms and mathematical tools for python
 Matplotlib – plotting library. Excellent graphs
 Scrapy – Advanced Web scraping
 NLTK - Natural Language Toolkit (Pattern, TextBlob)
 IPython – Notebooks
 Scikit-learn – machine learning on Python
 Pandas –
 Gensim – Topic Modelling
 http://

 a, X, 9, < -- ordinary characters just match themselves exactly. The meta-
characters which do not match themselves because they have special meanings
are: . ^ $ * + ? { [ ] | ( ) (details below)
 . (a period) -- matches any single character except newline 'n'
 w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-
zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches
a single word char, not a whole word. W (upper case W) matches any non-
word character.
 b -- boundary between word and non-word
 s -- (lowercase s) matches a single whitespace character -- space, newline,
return, tab, form [ nrtf]. S (upper case S) matches any non-whitespace
character.
 t, n, r -- tab, newline, return
 d -- decimal digit [0-9] (some older regex utilities do not support but d, but
they all support w and s)
 ^ = start, $ = end -- match the start or end of the string
 -- inhibit the "specialness" of a character. So, for example, use . to match a
period or to match a slash. If you are unsure if a character has special
meaning, such as '@', you can put a slash in front of it, @, to make sure it is
treated just as a character.

 ## Search for pattern 'iii' in string 'piiig'.
 ## All of the pattern must match, but it may appear anywhere
 ## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig') => found, match.group() == "iii"
match = re.search(r'igs', 'piiig') => not found, match == None
## . = any char but n
match = re.search(r'..g', 'piiig') => found, match.group() == "iig"
## d = digi , w = word
match = re.search(r'ddd', 'p123g') => found, match.group() ==
"123“
match = re.search(r'www', '@@abcd!!') => found,
match.group() == "abc"

 + -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
 * -- 0 or more occurrences of the pattern to its left
 ? -- match 0 or 1 occurrences of the pattern to its left
 ## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig') => found, match.group() == "piii"
## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii') => found, match.group() == "ii"
## s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'ds*ds*d', 'xx1 2 3xx') => found, match.group() == "1 2 3"
match = re.search(r'ds*ds*d', 'xx12 3xx') => found, match.group() == "12 3"
match = re.search(r'ds*ds*d', 'xx123xx') => found, match.group() == "123"
## ^ = matches the start of string, so this fails:
match = re.search(r'^bw+', 'foobar') => not found, match == None
## but without the ^ it succeeds:
match = re.search(r'bw+', 'foobar') => found, match.group() == "bar“
 https://developers.google.com/edu/python/regular-expressions

def foo(x):
return 1/x
def bar(x):
try:
print foo(x)
except ZeroDivisionError, message:
print "Can’t divide by zero:", message
bar(0)

Used for cleanup
f = open(file)
try:
process_file(f)
finally:
f.close()# always executed
print "OK" # executed on success only

 Gives an idea where the error happened
 For example, in nested loops
 raise IndexError
 raise IndexError("k out of range")
 raise IndexError, "k out of range"
 try:
something
except: # catch everything
print "Oops"
raise # reraise

 “w” = “write mode”
 “a” = “append mode”
 “wb” = “write in binary”
 “r” = “read mode” (default)
 “rb” = “read in binary”
 “U” = “read files with Unix
 or Windows line endings”
>>> f = open(“names.txt")
>>> f.readline()
‘‘Ravin‘Ravin‘
>>> f.read()
‘‘Ravin‘Ravin‘
‘‘Shankarn’Shankarn’

>>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst
['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office
Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',
'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:
Tues. 3-5n']Tues. 3-5n']
input_file = open(“in.txt")
output_file = open(“out.txt", "w")
for line in input_file:
output_file.write(line)

>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
... for row in spamreader:
... print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans Spam,
Lovely Spam, Wonderful Spam
>>> with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
>>> w = csv.writer(open(Fn,'ab'),dialect='excel')

 Python is a great language for NLP:
 Simple
 Easy to debug:
▪ Exceptions
▪ Interpreted language
 Easy to structure
▪ Modules
▪ Object oriented programming
 Powerful string manipulation

Here is a description from the NLTK official site:
“NLTK is a leading platform for building Python programs
to work with human language data. It provides easy-to-use
interfaces to over 50 corpora and lexical resources such as
WordNet, along with a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing,
and semantic reasoning.”

 A software package for manipulating linguistic data and
performing NLP tasks
 Advanced tasks are possible from an early stage
 Permits projects at various levels
 Consistent interfaces
 Facilitates reusability of modules
 Implemented in Python

 The token class to encode information about NL texts.
 Each token instance represents a unit of text such as a
word, a text, or a document.
 A given instance is defined by a partial mapping from
property names to property values.
 TreebankWordTokenizer
 WordPunctTokenizer
 PunctWordTokenizer
 WhitespaceTokenizer

>>> import nltk
>>> text = nltk.word_tokenize(“Dive into NLTK : Part-of-
speech tagging and POS Tagger”)
>>> text
[‘Dive’, ‘into’, ‘NLTK’, ‘:’, ‘Part-of-speech’, ‘tagging’, ‘and’,
‘POS’, ‘Tagger’]
>>> nltk.pos_tag(text)
[(‘Dive’, ‘JJ’), (‘into’, ‘IN’), (‘NLTK’, ‘NNP’), (‘:’, ‘:’), (‘Part-of-
speech’, ‘JJ’), (‘tagging’, ‘NN’), (‘and’, ‘CC’), (‘POS’, ‘NNP’),
(‘Tagger’, ‘NNP’)]

In linguistic morphology and information retrieval, stemming is the
process for reducing inflected (or sometimes derived) words to their
stem, base or root form—generally a written word form. The stem
need not be identical to the morphological root of the word; it is
usually sufficient that related words map to the same stem, even if this
stem is not in itself a valid root.
Stemming programs are commonly referred to as stemming
algorithms or stemmers.
Helpful in reducing the vocabulary in large scale operations

>>> from nltk.stem.porter import PorterStemmer
>>> porter_stemmer = PorterStemmer()
>>> porter_stemmer.stem(‘maximum’)
u’maximum’
>>> porter_stemmer.stem(‘presumably’)
u’presum’
>>> porter_stemmer.stem(‘multiply’)
u’multipli’
>>> porter_stemmer.stem(‘provision’)
u’provis’
>>> porter_stemmer.stem(‘owed’)
u’owe’
>>> porter_stemmer.stem(‘ear’)
u’ear’
>>> porter_stemmer.stem(‘saying’)
u’say’

>>> from nltk.stem.lancaster import LancasterStemmer
>>> lancaster_stemmer = LancasterStemmer()
>>> lancaster_stemmer.stem(‘maximum’)
‘maxim’
>>> lancaster_stemmer.stem(‘presumably’)
‘presum’
>>> lancaster_stemmer.stem(‘presumably’)
‘presum’
>>> lancaster_stemmer.stem(‘multiply’)
‘multiply’
>>> lancaster_stemmer.stem(‘provision’)
u’provid’
>>> lancaster_stemmer.stem(‘owed’)
‘ow’
>>> lancaster_stemmer.stem(‘ear’)
‘ear’

>>> from nltk.stem import SnowballStemmer
>>> snowball_stemmer = SnowballStemmer(“english”)
>>> snowball_stemmer.stem(‘maximum’)
u’maximum’
>>> snowball_stemmer.stem(‘presumably’)
u’presum’
>>> snowball_stemmer.stem(‘multiply’)
u’multipli’
>>> snowball_stemmer.stem(‘provision’)
u’provis’
>>> snowball_stemmer.stem(‘owed’)
u’owe’
>>> snowball_stemmer.stem(‘ear’)
u’ear’

Lemmatisation (or lemmatization) in linguistics, is the process of
grouping together the different inflected forms of a word so they can
be analysed as a single item.
Lemmatisation is closely related to stemming. The difference is that a
stemmer operates on a single word without knowledge of the context,
and therefore cannot discriminate between words which have different
meanings depending on part of speech. However, stemmers are
typically easier to implement and run faster, and the reduced accuracy
may not matter for some applications.

>>> from nltk.stem import WordNetLemmatizer
>>> wordnet_lemmatizer = WordNetLemmatizer()
>>> wordnet_lemmatizer.lemmatize(‘dogs’)
u’dog’
>>> wordnet_lemmatizer.lemmatize(‘churches’)
u’church’
>>> wordnet_lemmatizer.lemmatize(‘aardwolves’)
u’aardwolf’
>>> wordnet_lemmatizer.lemmatize(‘abaci’)
u’abacus’
>>> wordnet_lemmatizer.lemmatize(‘hardrock’)
‘hardrock’
>>> wordnet_lemmatizer.lemmatize(‘are’)
‘are’
>>> wordnet_lemmatizer.lemmatize(‘is’)
‘is’

 Stanford POS Tagger
 Stanford Named Entity Recognizer
 english_nertagger.tag(‘Rami Eid is studying at Stony Brook University in NY’.split())
Out[3]:
[(u’Rami’, u’PERSON’),
(u’Eid’, u’PERSON’),
(u’is’, u’O’),
(u’studying’, u’O’),
(u’at’, u’O’),
(u’Stony’, u’ORGANIZATION’),
(u’Brook’, u’ORGANIZATION’),
(u’University’, u’ORGANIZATION’),
(u’in’, u’O’),
(u’NY’, u’O’)]
 Stanford Parser

 Stanford Parser
 english_parser.raw_parse_sents((“this is the english parser test”, “the parser is from
stanford parser”))
Out[4]:
[[u’this/DT is/VBZ the/DT english/JJ parser/NN test/NN’],
[u'(ROOT’,
u’ (S’,
u’ (NP (DT this))’,
u’ (VP (VBZ is)’,
u’ (NP (DT the) (JJ english) (NN parser) (NN test)))))’],

 Part of Speech Tagging
 Parsing
 Word Net
 Named Entity
Recognition
 Information Retrieval
 Sentiment Analysis
 Document Clustering
 Topic Segmentation
 Authoring
 Machine Translation
 Summarization
 Information Extraction
 Spoken Dialog Systems
 Natural Language
Generation
 Word Sense
Disambiguation

Det Noun Verb Prep Adj
A 0.9 0.1 0 0 0
man 0 0.6 0.2 0 0.2
walks 0 0.2 0.8 0 0
into 0 0 0 1 0
bar 0 0.7 0.3 0 0

Introduction to Python Language and Data Types

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (18)

Similar to Introduction to Python Language and Data Types

Similar to Introduction to Python Language and Data Types (20)

Recently uploaded

Recently uploaded (20)

Introduction to Python Language and Data Types