TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
Introduction to Python Language and Data Types
1.
2. Invented in the Netherlands, early 90s by Guido van
Rossum
Named after Monty Python
Open sourced from the beginning
Used by Google from the beginning
80 versions as of 4th
October 2014.
“Python” or “CPython” is written in C/C++
- Version 2.7.7 came out in mid-2010
- Version 3.4.2 came out in late 2014
“Jython” is written in Java for the JVM
“IronPython” is written in C# for the .Net environment
3. “Python is an experiment in how
much freedom programmers need.
Too much freedom and nobody can
read another's code; too little and
expressive-ness is endangered.”
- Guido van Rossum
4. Python is
super easy to learn,
relatively fast,
object-oriented,
strongly typed,
widely used, and
portable.
C is much faster but
much harder to use.
Java is about as fast and
slightly harder to use.
Perl is slower, is as easy
to use, but is not strongly
typed.
R is more of
programming language
for statistics.
7. Great for learning the language
Great for experimenting with the library
Great for testing your own modules
Two variations: IDLE (GUI) (next slide),
python (command line)
Press “Windows + R” and then write “python”. Enter
Type statements or expressions at prompt:
>>> print "Hello, world"
Hello, world
>>> x = 12**2
>>> x/2
72
8. IDLE is an Integrated Development Environment for
Python, typically used on Windows
Multi-window text editor with syntax highlighting, auto-
completion, smart indent and other.
Python shell with syntax highlighting.
Integrated debugger
with stepping, persis-
tent breakpoints,
and call stack visi-
bility
9. Install Python 2.7.7 from the official website. 32 bit as most
packages are built for that.
Install setuptools, which will help to install other packages.
Download the setup tools and then install.
Start the Python IDLE and start programming.
Install Ipython which provides a browser-based notebook with
support for code, text, mathematical expressions, inline plots and
other rich media.
Start writing small programs using problems at
http://www.ling.gu.se/~lager/python_exercises.html
Can also use www.codecademy.com/tracks/python which has an
interactive structure + deal with web APIs.
10. Start comments with #, rest of line is ignored
Can include a “documentation string” as the first line of
a new function or class you define
# My first program
def fact(n):
“““fact(n) assumes n is a positive integer and returns
facorial of n.”””
assert(n>0)
return 1 if n==1 else n*fact(n-1)
11. Names are case sensitive and cannot start with a
number. They can contain letters, numbers, and
underscores.
var Var _var _2_var_ var_2 VaR
2_Var 222var
There are some reserved words:
and, assert, break, class, continue, def, del, elif, else,
except, exec, finally, for, from, global, if, import, in, is,
lambda, not, or, pass, print, raise, return, try, while
12. No need to declare unlike in JAVA
Simple assignment:
▪ a = 2, A = 2
Need to assign (initialize)
▪ use of uninitialized variable raises exception
Not typed
if friendly: greeting = "hello world“ #(or ‘hello word’)
else: greeting = 12**2
print greeting
13. a
1
b
a
1b
a = 1
a = a+1
b = a
a 1
2
old reference deleted
by assignment (a=...)
new int object created
by add operator (1+1)
14. If you try to access a name before it’s been properly created
(by placing it on the left side of an assignment), you’ll get an
error.
>>> y
Traceback (most recent call last):
File "<pyshell#16>", line 1, in -toplevel-
y
NameError: name ‘y' is not defined
>>> y = 3
>>> y
3
15. Assignment manipulates references
▪ x = y does not make a copy of y
▪ x = y makes x reference the object y references
Very useful; but beware!
Example:
>>> a = [1, 2, 3]
>>> b = a
>>> a.append(4)
>>> print b
[1, 2, 3, 4]
16. You can also assign to multiple names at the same time.
>>> x, y = 2, 3
>>> x
2
>>> y
3
>>> x, y, z = 2, 3, 5.67
>>> x
2
>>> y
3
>>> z
5.67
17. + Addition - Adds values on either side of the
operator
a + b will give 30
- Subtraction - Subtracts right hand operand from
left hand operand
a - b will give -10
* Multiplication - Multiplies values on either side of
the operator
a * b will give 200
/ Division - Divides left hand operand by right hand
operand
b / a will give 2
% Modulus - Divides left hand operand by right hand
operand and returns remainder
b % a will give 0
** Exponent - Performs exponential (power)
calculation on operators
a**b will give 10 to the power
20
// Floor Division - The division of operands where the
result is the quotient in which the digits after the
decimal point are removed.
9//2 is equal to 4 and 9.0//2.0
is equal to 4.0
18. == Checks if the value of two operands are equal or not, if
yes then condition becomes true.
(a == b) is not true.
!= Checks if the value of two operands are equal or not, if
values are not equal then condition becomes true.
(a != b) is true.
<> Checks if the value of two operands are equal or not, if
values are not equal then condition becomes true.
(a <> b) is true. This is similar
to != operator.
> Checks if the value of left operand is greater than the
value of right operand, if yes then condition becomes
true.
(a > b) is not true.
< Checks if the value of left operand is less than the value
of right operand, if yes then condition becomes true.
(a < b) is true.
>= Checks if the value of left operand is greater than or
equal to the value of right operand, if yes then condition
becomes true.
(a >= b) is not true.
<= Checks if the value of left operand is less than or equal to
the value of right operand, if yes then condition becomes
true.
(a <= b) is true.
19. & Binary AND Operator copies a bit to the result if it
exists in both operands.
(a & b) will give 12 which is
0000 1100
| Binary OR Operator copies a bit if it exists in eather
operand.
(a | b) will give 61 which is 0011
1101
^ Binary XOR Operator copies the bit if it is set in one
operand but not both.
(a ^ b) will give 49 which is
0011 0001
~ Binary Ones Complement Operator is unary and has
the efect of 'flipping' bits.
(~a ) will give -61 which is 1100
0011 in 2's complement form
due to a signed binary number.
<< Binary Left Shift Operator. The left operands value is
moved left by the number of bits specified by the
right operand.
a << 2 will give 240 which is
1111 0000
>> Binary Right Shift Operator. The left operands value is
moved right by the number of bits specified by the
right operand.
a >> 2 will give 15 which is
0000 1111
20. Operator Description Example
and Called Logical AND operator. If both the
operands are true then then condition
becomes true.
(a and b) is true.
or Called Logical OR Operator. If any of the two
operands are non zero then then condition
becomes true.
(a or b) is true.
not Called Logical NOT Operator. Use to reverses
the logical state of its operand. If a condition
is true then Logical NOT operator will make
false.
not(a and b) is false.
21. in Evaluates to true if it finds a variable in the
specified sequence and false otherwise.
x in y, here in results in a 1 if x
is a member of sequence y.
not in Evaluates to true if it does not finds a
variable in the specified sequence and false
otherwise.
x not in y, here not in results
in a 1 if x is not a member of
sequence y.
is Evaluates to true if the variables on either
side of the operator point to the same object
and false otherwise.
x is y, here is results in 1 if
id(x) equals id(y).
is not Evaluates to false if the variables on either
side of the operator point to the same object
and true otherwise.
x is not y, here is not results
in 1 if id(x) is not equal to
id(y).
22. We use the term object to refer to any entity in a python
program.
Every object has an associated type, which determines the
properties of the object.
Python defines six types of built-in objects:
Number : 10
String : “hello”, ‘hi’
List: [1, 17, 44]
Tuple : (4, 5)
Dictionary: {‘food’ : ‘something you eat’,
‘lobster’ : ‘an edible, undersea arthropod’}
File
Each type of object has its own properties, which we will learn
about in the next several weeks.
Types can be accessed by ‘type’ method
24. Flexible arrays, mutable
▪ a = [99, "bottles of beer", ["on", "the", "wall"]]
Same operators as for strings
▪ a+b, a*3, a[0], a[-1], a[1:], len(a)
Item and slice assignment
▪ a[0] = 98
▪ a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]
▪ del a[-1] # -> [98, "bottles", "of", "beer"]
[0]*3 = [0,0,0]
25. >>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)
Positive index: count from the left, starting with 0
>>> list_ex[1]
‘abc’
Negative index: count from right, starting with –1
>>> list_ex[-3]
4.56
26. >>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)
Return a copy of the container with a subset of the original
members. Start copying at the first index, and stop copying
before second.
>>> list_ex[1:4]
(‘abc’, 4.56, (2,3))
Negative indices count from end
>>> list_ex[1:-1]
(‘abc’, 4.56, (2,3))
27. >>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)
Omit first index to make copy starting from beginning of
the container
>>> list_ex[:2]
(23, ‘abc’)
Omit second index to make copy starting at first index and
going to end
>>> list_ex[2:]
(4.56, (2,3), ‘def’)
28. [ : ] makes a copy of an entire sequence
>>> list_ex[:]
(23, ‘abc’, 4.56, (2,3), ‘def’)
Note the difference between these two lines for mutable
sequences
>>> l2 = l1 # Both refer to 1 ref,
# changing one affects both
>>> l2 = l1[:] # Independent copies, two refs
29. >>> li = [‘abc’, 23, 4.34, 23]
>>> li[1] = 45
>>> li
[‘abc’, 45, 4.34, 23]
Name li still points to the same memory reference when
we’re done.
30. a
1 2 3
b
a
1 2 3
b
4
a = [1, 2, 3]
a.append(4)
b = a
a 1 2 3
32. + creates a fresh list with a new memory ref
extend operates on list li in place.
>>> li.extend([9, 8, 7])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7]
Potentially confusing:
extend takes a list as an argument.
append takes a singleton as an argument.
>>> li.append([10, 11, 12])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7, [10, 11, 12]]
33. Lists have many methods, including index, count, remove,
reverse, sort
>>> li = [‘a’, ‘b’, ‘c’, ‘b’]
>>> li.index(‘b’) # index of 1st
occurrence
1
>>> li.count(‘b’) # number of occurrences
2
>>> li.remove(‘b’) # remove 1st
occurrence
>>> li
[‘a’, ‘c’, ‘b’]
35. >>> names[0]
‘Ben'
>>> names[1]
‘Chen'
>>> names[2]
‘Yaqin'
>>> names[3]
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
IndexError: list index out of rangeIndexError: list index out of range
>>> names[-1]
‘Yaqin'
>>> names[-2]
‘Chen'
>>> names[-3]
[0] is the first item.
[1] is the second item
...
Out of range values
raise an exception
Negative values
go backwards from
the last element.
Names = [ ‘Ben’, ‘Chen’, ‘Yaqin’]
37. >>> names
['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']
>>> gender = [0, 0, 1][0, 0, 1]
>>> zip(names, gender)
[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]
Can also be done using a for loop, but this is more efficientCan also be done using a for loop, but this is more efficient
38. The comma is the tuple creation operator, not parentheses
>>> 1, -> (1,)
Python shows parentheses for clarity (best practice)
>>> (1,) -> (1,)
Don't forget the comma!
>>> (1) -> 1
Non-mutable lists. Elements can be accessed same way
Empty tuples have a special syntactic form
>>> () -> ()
>>> tuple() -> ()
39. >>> yellow = (255, 255, 0) # r, g, b
>>> one = (1,)>>> one = (1,)
>>> yellow[0]
>>> yellow[1:]
(255, 0)
>>> yellow[0] = 0
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment
Very common in string interpolation:
>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden",
57.7056)
40. >>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> t[2] = 3.14
Traceback (most recent call last):
File "<pyshell#75>", line 1, in -toplevel-
tu[2] = 3.14
TypeError: object doesn't support item assignment
You can’t change a tuple.
You can make a fresh tuple and assign its reference to a
previously used name.
>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)
The immutability of tuples means they’re faster than lists.
41. Lists slower but more powerful than tuples
Lists can be modified, and they have lots of handy
operations and methods
Tuples are immutable and have fewer features
To convert between tuples and lists use the list() and
tuple() functions:
li = list(tu)
tu = tuple(li)
44. Boolean test whether a value is inside a container:
>>> t = [1, 2, 4, 5]
>>> 2 in t
True
>>> 4 not in t
False
For strings, tests for substrings
>>> a = 'abcde'
>>> 'c' in a
True
>>> 'cd' in a
True
45. The + operator produces a new tuple, list, or string whose
value is the concatenation of its arguments.
>>> (1, 2, 3) + (4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> [1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
>>> “Hello” + “ ” + “World”
‘Hello World’
46. The * operator produces a new tuple, list, or string that
“repeats” the original content.
>>> (1, 2, 3) * 3
(1, 2, 3, 1, 2, 3, 1, 2, 3)
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> “Hello” * 3
‘HelloHelloHello’
47. smiles = "C(=N)(N)N.C(=O)(O)Osmiles = "C(=N)(N)N.C(=O)(O)O""
>>> smiles.find("(O)")
15
>>> smiles.find(".")
9
>>> smiles.find(".", 10)
-1
>>> smiles.split(".")
['C(=N)(N)N', 'C(=O)(O)O']
Use “find” to find the
start of a substring.
Start looking at position 10.
Find returns -1 if it couldn’t
find a match.
Split the string into parts
with “.” as the delimiter
48. if "Br" in “Brother”:
print "contains brother“
>>> contains brother
email_address = “ravi”
if "@" not in email_address:
email_address += "@latentview.com“
print email_address
>>> ravi@latentview.com
49. >>> line = " # This is a comment line n"
>>> line.strip()
'# This is a comment line'
>>> line.rstrip()
' # This is a comment line'
>>> line.rstrip("n")
' # This is a comment line '
51. capitalize() -- Capitalizes first letter of string
count(str, beg= 0,end=len(string))
Counts how many times str occurs in string.
encode(encoding=’UTF-8′,errors=’strict’)
Returns encoded string version of string
isalnum()
Returns true if string has at least 1 character
isalpha()
Returns true if string has at least 1 character
isdigit()
Returns true if string contains only digits and false
otherwise
52. Strings are read only
>>> s = "andrew"
>>> s[0] = "A"
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignmentTypeError: 'str' object does not support item assignment
>>> s = "A" + s[1:]
>>> s
'Andrew‘
53. n -> newline
t -> tab
-> backslash
...
But Windows uses backslash for directories!
filename = "M:nickel_projectreactive.smi" # DANGER!
filename = "M:nickel_projectreactive.smi" # Better!
filename = "M:/nickel_project/reactive.smi" # Usually works
54. Dictionaries are lookup tables.
Random, not sorted
They map from a “key” to a “value”.
symbol_to_name = {
"H": "hydrogen",
"He": "helium",
"Li": "lithium",
"C": "carbon",
"O": "oxygen",
"N": "nitrogen"
}
Duplicate keys are not allowed
Duplicate values are just fine
Keys can be any immutable value
59. dict.clear()
Removes all elements of dictionary dict
dict.fromkeys()
Create a new dictionary with keys from seq and
values set to value
dict.setdefault(key, default=None)
Similar to get(), but will set dict[key]=default if key is not
already in dict
dict.update(dict2)
Adds dictionary dict2‘s key-values pairs to dict
60. Keys must be immutable:
numbers, strings, tuples of immutables
▪ these cannot be changed after creation
reason is hashing (fast lookup technique)
not lists or other dictionaries
▪ these types of objects can be changed "in place"
no restrictions on values
Keys will be listed in arbitrary order
again, because of hashing
62. var2 = 0
if var2:
print "2 - Got a true
expression value"
print var2
else:
print "2 - Got a false
expression value"
print var2
print "Good bye!“
2 - Got a false expression
value 0 Good bye!
var1 = 100
if var1:
print "1 - Got a true
expression value"
print var1
else:
print "1 - Got a false
expression value"
print var1
1 - Got a true expression
value 100
63. count = 0
while (count < 9):
print 'The count is:',
count count = count + 1
print "Good bye!“
The count is: 0 The count is: 1
The count is: 2 The count is: 3
The count is: 4 The count is: 5
The count is: 6 The count is: 7
The count is: 8 Good bye!
Error with the program
var = 1
while var == 1 :
# This constructs an
infinite loop
num = raw_input("Enter a
number :")
print "You entered: ",
num print "Good bye!"
64. for i in range(20):
if i%3 == 0:
print i
if i%5 == 0:
print "Bingo!"
print "---“
for letter in 'Python':
# First Example
print 'Current Letter :', letter
fruits = ['banana', 'apple', 'mango']
for fruit in fruits:
# Second Example print 'Current fruit :', fruit
print "Good bye!"
65. def name(arg1, arg2, ...):
"""documentation""" # optional doc string
statements
return # from procedure
return expression # from function
def add():
return 5
>>>print add()
5
66. def functionname( parameters ):
"function_docstring" function_suite
return [expression]
def printme( str ):
"This prints a passed string into this function"
print str
print ‘n’
# Now you can call printme function
printme ("I'm first call to user defined function!")
printme("Again second call to the same function")
67. def gcd(a, b):
"greatest common divisor"
while a != 0:
a, b = b%a, a # parallel assignment
return b
>>> gcd.__doc__
'greatest common divisor'
>>> gcd(12, 20)
4
69. class Stack:
"A well-known data structure…"
def __init__(self): # constructor
self.items = []
def push(self, x):
self.items.append(x)
def pop(self):
x = self.items[-1]
del self.items[-1]
return x
def empty(self):
return len(self.items) == 0 # Boolean result
70. To create an instance, simply call the class object:
x = Stack() # no 'new' operator!
To use methods of the instance, call using dot notation:
x.empty() # -> 1
x.push(1) # [1]
x.empty() # -> 0
x.push("hello")# [1, "hello"]
x.pop() # -> "hello" # [1]
To inspect instance variables, use dot notation:
x.items # -> [1]
71. When a Python program starts it only has access to a
basic functions and classes.
(“int”, “dict”, “len”, “sum”, “range”, ...)
“Modules” contain additional functionality
Use “import” to tell Python to load a module
>>> import math #Used for mathematical functions
>>> import nltk #Used for NLP operations
72. Importing modules:
import re; print re.match("[a-z]+", s)
from re import match; print match("[a-z]+", s)
Import with rename:
import re as regex
from re import match as m
Before Python 2.0:
▪ import re; regex = re; del re
75. BeautifulSoup – A bit slow, but useful and easy for
beginners
Numpy – Advanced mathematical functionalities
SciPy – Algorithms and mathematical tools for python
Matplotlib – plotting library. Excellent graphs
Scrapy – Advanced Web scraping
NLTK - Natural Language Toolkit (Pattern, TextBlob)
IPython – Notebooks
Scikit-learn – machine learning on Python
Pandas –
Gensim – Topic Modelling
http://
76. a, X, 9, < -- ordinary characters just match themselves exactly. The meta-
characters which do not match themselves because they have special meanings
are: . ^ $ * + ? { [ ] | ( ) (details below)
. (a period) -- matches any single character except newline 'n'
w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-
zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches
a single word char, not a whole word. W (upper case W) matches any non-
word character.
b -- boundary between word and non-word
s -- (lowercase s) matches a single whitespace character -- space, newline,
return, tab, form [ nrtf]. S (upper case S) matches any non-whitespace
character.
t, n, r -- tab, newline, return
d -- decimal digit [0-9] (some older regex utilities do not support but d, but
they all support w and s)
^ = start, $ = end -- match the start or end of the string
-- inhibit the "specialness" of a character. So, for example, use . to match a
period or to match a slash. If you are unsure if a character has special
meaning, such as '@', you can put a slash in front of it, @, to make sure it is
treated just as a character.
77. ## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig') => found, match.group() == "iii"
match = re.search(r'igs', 'piiig') => not found, match == None
## . = any char but n
match = re.search(r'..g', 'piiig') => found, match.group() == "iig"
## d = digi , w = word
match = re.search(r'ddd', 'p123g') => found, match.group() ==
"123“
match = re.search(r'www', '@@abcd!!') => found,
match.group() == "abc"
78. + -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
* -- 0 or more occurrences of the pattern to its left
? -- match 0 or 1 occurrences of the pattern to its left
## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig') => found, match.group() == "piii"
## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii') => found, match.group() == "ii"
## s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'ds*ds*d', 'xx1 2 3xx') => found, match.group() == "1 2 3"
match = re.search(r'ds*ds*d', 'xx12 3xx') => found, match.group() == "12 3"
match = re.search(r'ds*ds*d', 'xx123xx') => found, match.group() == "123"
## ^ = matches the start of string, so this fails:
match = re.search(r'^bw+', 'foobar') => not found, match == None
## but without the ^ it succeeds:
match = re.search(r'bw+', 'foobar') => found, match.group() == "bar“
https://developers.google.com/edu/python/regular-expressions
80. Used for cleanup
f = open(file)
try:
process_file(f)
finally:
f.close()# always executed
print "OK" # executed on success only
81. Gives an idea where the error happened
For example, in nested loops
raise IndexError
raise IndexError("k out of range")
raise IndexError, "k out of range"
try:
something
except: # catch everything
print "Oops"
raise # reraise
82. “w” = “write mode”
“a” = “append mode”
“wb” = “write in binary”
“r” = “read mode” (default)
“rb” = “read in binary”
“U” = “read files with Unix
or Windows line endings”
>>> f = open(“names.txt")
>>> f.readline()
‘‘Ravin‘Ravin‘
>>> f.read()
‘‘Ravin‘Ravin‘
‘‘Shankarn’Shankarn’
83. >>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst
['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office
Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',
'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:
Tues. 3-5n']Tues. 3-5n']
input_file = open(“in.txt")
output_file = open(“out.txt", "w")
for line in input_file:
output_file.write(line)
84. >>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
... for row in spamreader:
... print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans Spam,
Lovely Spam, Wonderful Spam
>>> with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
>>> w = csv.writer(open(Fn,'ab'),dialect='excel')
85.
86. Python is a great language for NLP:
Simple
Easy to debug:
▪ Exceptions
▪ Interpreted language
Easy to structure
▪ Modules
▪ Object oriented programming
Powerful string manipulation
87. Here is a description from the NLTK official site:
“NLTK is a leading platform for building Python programs
to work with human language data. It provides easy-to-use
interfaces to over 50 corpora and lexical resources such as
WordNet, along with a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing,
and semantic reasoning.”
88. A software package for manipulating linguistic data and
performing NLP tasks
Advanced tasks are possible from an early stage
Permits projects at various levels
Consistent interfaces
Facilitates reusability of modules
Implemented in Python
89. The token class to encode information about NL texts.
Each token instance represents a unit of text such as a
word, a text, or a document.
A given instance is defined by a partial mapping from
property names to property values.
TreebankWordTokenizer
WordPunctTokenizer
PunctWordTokenizer
WhitespaceTokenizer
91. In linguistic morphology and information retrieval, stemming is the
process for reducing inflected (or sometimes derived) words to their
stem, base or root form—generally a written word form. The stem
need not be identical to the morphological root of the word; it is
usually sufficient that related words map to the same stem, even if this
stem is not in itself a valid root.
Stemming programs are commonly referred to as stemming
algorithms or stemmers.
Helpful in reducing the vocabulary in large scale operations
95. Lemmatisation (or lemmatization) in linguistics, is the process of
grouping together the different inflected forms of a word so they can
be analysed as a single item.
Lemmatisation is closely related to stemming. The difference is that a
stemmer operates on a single word without knowledge of the context,
and therefore cannot discriminate between words which have different
meanings depending on part of speech. However, stemmers are
typically easier to implement and run faster, and the reduced accuracy
may not matter for some applications.
97. Stanford POS Tagger
Stanford Named Entity Recognizer
english_nertagger.tag(‘Rami Eid is studying at Stony Brook University in NY’.split())
Out[3]:
[(u’Rami’, u’PERSON’),
(u’Eid’, u’PERSON’),
(u’is’, u’O’),
(u’studying’, u’O’),
(u’at’, u’O’),
(u’Stony’, u’ORGANIZATION’),
(u’Brook’, u’ORGANIZATION’),
(u’University’, u’ORGANIZATION’),
(u’in’, u’O’),
(u’NY’, u’O’)]
Stanford Parser
98. Stanford Parser
english_parser.raw_parse_sents((“this is the english parser test”, “the parser is from
stanford parser”))
Out[4]:
[[u’this/DT is/VBZ the/DT english/JJ parser/NN test/NN’],
[u'(ROOT’,
u’ (S’,
u’ (NP (DT this))’,
u’ (VP (VBZ is)’,
u’ (NP (DT the) (JJ english) (NN parser) (NN test)))))’],
99. Part of Speech Tagging
Parsing
Word Net
Named Entity
Recognition
Information Retrieval
Sentiment Analysis
Document Clustering
Topic Segmentation
Authoring
Machine Translation
Summarization
Information Extraction
Spoken Dialog Systems
Natural Language
Generation
Word Sense
Disambiguation
100. Det Noun Verb Prep Adj
A 0.9 0.1 0 0 0
man 0 0.6 0.2 0 0.2
walks 0 0.2 0.8 0 0
into 0 0 0 1 0
bar 0 0.7 0.3 0 0