4. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Strings
Regular expressions
5. Python
• Programming languages are overrated
– If you are going into bioinformatics you probably
learn/need multiple
– If you know one you know 90% of a second
• Choice does matter but it matters far less than people think it
does
• Why Python?
– Lets you start useful programs asap
– Build-in libraries – incl BioPython
– Free, most platforms, widely (scientifically) used
• Versus Perl?
– Incredibly similar
– Consistent syntax, indentation
7. Eclipse IDE Components
Menubars
Full drop down menus plus quick
access to common functions
Editor Pane
This is where we edit
our source code
Perspective Switcher
We can switch between
various perspectives
here
Outline Pane
This contains a hierarchical
view of a source file
Package Explorer Pane
This is where our
projects/files are listed
Miscellaneous Pane
Various components can appear in this
pane – typically this contains a console
and a list of compiler problems
Task List Pane
This contains a list of
“tasks” to complete
11. GitHub: Hosted GIT
• Largest open source git hosting site
• Public and private options
• User-centric rather than project-centric
• http://github.ugent.be (use your Ugent
login and password)
– Accept invitation from Bioinformatics-I-
2015
URI:
– https://github.ugent.be/Bioinformatics-I-
2015/Python.git
12. Run Install.py (is BioPython installed ?)
import pip
import sys
import platform
import webbrowser
print ("Python " + platform.python_version()+ " installed
packages:")
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
for i in installed_packages])
print(*installed_packages_list,sep="n")
14. range
The range function specifies a range of integers:
range(start, stop) - the integers between start (inclusive)
and stop (exclusive)
It can also accept a third value specifying the change between values.
range(start, stop, step) - the integers between start (inclusive)
and stop (exclusive) by step
Example:
for x in range(5, 0, -1):
print x
print "Blastoff!"
Output:
5
4
3
2
1
Blastoff!
Exercise: How would we print the "99 Bottles of Beer" song?
15. Grouping Indentation
In Python:
for i in range(20):
if i%3 == 0:
print (i)
if i%5 == 0:
print ("Bingo!”)
print ("---”)
0
Bingo!
---
---
---
3
---
---
---
6
---
---
---
9
---
---
---
12
---
---
---
15
Bingo!
---
---
---
18
---
---
16. while
while loop: Executes a group of statements as long as a
condition is True.
good for indefinite loops (repeat an unknown number of times)
Syntax:
while condition:
statements
Example:
number = 1
while number < 200:
print number,
number = number * 2
Output:
1 2 4 8 16 32 64 128
17. if
if statement: Executes a group of
statements only if a certain condition
is true. Otherwise, the statements are
skipped.
Syntax:
if condition:
statements
Example:
gpa = 3.4
if gpa > 2.0:
print "Your application is accepted."
18. if/else
if/else statement: Executes one block of statements if a certain
condition is True, and a second block of statements if it is False.
Syntax:
if condition:
statements
else:
statements
Example:
gpa = 1.4
if gpa > 2.0:
print "Welcome to Mars University!"
else:
print "Your application is denied."
Multiple conditions can be chained with elif ("else if"):
if condition:
statements
elif condition:
statements
else:
statements
19. Logic
Many logical expressions use relational operators:
Logical expressions can be combined with logical operators:
Operator Example Result
and 9 != 6 and 2 < 3 True
or 2 == 3 or -1 < 5 True
not not 7 > 0 False
Operator Meaning Example Result
== equals 1 + 1 == 2 True
!= does not equal 3.2 != 2.5 True
< less than 10 < 5 False
> greater than 10 > 5 True
<= less than or equal to 126 <= 100 False
>= greater than or equal to 5.0 >= 5.0 True
20. PI-thon.py
Introduction
Buffon's Needle is one of the oldest problems
in the field of geometrical probability. It
was first stated in 1777. It involves
dropping a needle on a lined sheet of paper
and determining the probability of the
needle crossing one of the lines on the page.
The remarkable result is that the probability
is directly related to the value of pi.
https://www.youtube.com/watch?v=Vws1jvM
bs64&feature=youtu.be
21. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Strings
22. string: A sequence of text characters in a program.
Strings start and end with quotation mark " or apostrophe ' characters.
Examples:
"hello"
"This is a string"
"This, too, is a string. It can be very long!"
A string may not span across multiple lines or contain a " character.
"This is not
a legal String."
"This is not a "legal" String either."
A string can represent characters by preceding them with a backslash.
t tab character
n new line character
" quotation mark character
backslash character
Example: "HellottherenHow are you?"
Strings
23. Indexes
Characters in a string are numbered with indexes starting at 0:
Example:
name = "P. Diddy"
Accessing an individual character of a string:
variableName [ index ]
Example:
print name, "starts with", name[0]
Output:
P. Diddy starts with P
index 0 1 2 3 4 5 6 7
character P . D i d d y
25. String properties
len(string) - number of characters in a string
(including spaces)
str.lower(string) - lowercase version of a string
str.upper(string) - uppercase version of a string
Example:
name = "Martin Douglas Stepp"
length = len(name)
big_name = str.upper(name)
print big_name, "has", length,
"characters"
Output:
MARTIN DOUGLAS STEPP has 20 characters
a.replace
26. Text processing
text processing: Examining, editing, formatting
text.
often uses loops that examine the characters of a string
one by one
A for loop can examine each character in a string
in sequence.
Example:
for c in "booyah":
print c
Output:
b
o
o
y
a
h
27. Strings and numbers
ord(text) - converts a string into a number.
Example: ord("a") is 97, ord("b") is 98, ...
Characters map to numbers using standardized mappings such
as ASCII and Unicode.
chr(number) - converts a number into a string.
Example: chr(99) is "c"
Exercise: Write a program that performs a rotation cypher.
e.g. "Attack" when rotated by 1 becomes "buubdl"
28. Lists
• Flexible arrays, not Lisp-like linked
lists
• a = [99, "bottles of beer", ["on", "the",
"wall"]]
• Same operators as for strings
• a+b, a*3, a[0], a[-1], a[1:], len(a)
• Item and slice assignment
• a[0] = 98
• a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]
• del a[-1] # -> [98, "bottles", "of", "beer"]
31. More Dictionary Ops
• Keys, values, items:
• d.keys() -> ["duck", "back"]
• d.values() -> ["duik", "rug"]
• d.items() -> [("duck","duik"),
("back","rug")]
• Presence check:
• d.has_key("duck") -> 1; d.has_key("spam") -
> 0
• Values of any type; keys almost any
• {"name":"Guido", "age":43,
("hello","world"):1,
42:"yes", "flag": ["red","white","blue"]}
32. Dictionary Details
• Keys must be immutable:
– numbers, strings, tuples of immutables
• these cannot be changed after creation
– reason is hashing (fast lookup technique)
– not lists or other dictionaries
• these types of objects can be changed "in
place"
– no restrictions on values
• Keys will be listed in arbitrary order
– again, because of hashing
33. Reference Semantics
• Assignment manipulates references
• x = y does not make a copy of y
• x = y makes x reference the object y
references
• Very useful; but beware!
• Example:
>>> a = [1, 2, 3]
>>> b = a
>>> a.append(4)
>>> print b
[1, 2, 3, 4]
34. a
1 2 3
b
a
1 2 3
b
4
a = [1, 2, 3]
a.append(4)
b = a
a
1 2 3
Changing a Shared List
35. a
1
b
a
1b
a = 1
a = a+1
b = a
a
1
2
Changing an Integer
old reference deleted
by assignment (a=...)
new int object created
by add operator (1+1)
36. Example Function
def gcd(a, b):
"greatest common divisor"
while a != 0:
a, b = b%a, a # parallel assignment
return b
>>> gcd.__doc__
'greatest common divisor'
>>> gcd(12, 20)
4
37. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Strings
REGULAR EXPRESSIONS
38. What is a regular expression?
• A regular expression (regex) is
simply a way of describing text.
• Regular expressions are built up of
small units (atoms) which can
represent the type and number of
characters in the text
• Regular expressions can be very
broad (describing everything), or
very narrow (describing only one
pattern).
39. Why would you use a regex?
• Often you wish to test a string for
the presence of a specific character,
word, or phrase
– Examples
• “Are there any letter characters in my
string?”
• “Is this a valid accession number?”
• “Does my sequence contain a start codon
(ATG)?”
• The EcoRI restriction enzyme cuts at the
consensus sequence GAATTC.
40. Real world problems
• Match IP Addresses, email addresses,
URLs
• Match balanced sets of parenthesis
• Substitute words
• Tokenize
• Validate
• Count
• Delete duplicates
• Natural Language processing
41. RE in Python
• Unleash the power - built-in re module
• Functions
– to compile patterns
• compile
– to perform matches
• match, search, findall, finditer
– to perform operations on match object
• group, start, end, span
– to substitute
• sub, subn
• - Metacharacters
42. Quantifiers
• [ATGC]
• You can specify the number of times
you want to see an atom. Examples
• d* : Zero or more times
• d+ : One or more times
• d{3} : Exactly three times
• d{4,7} : At least four, and not more
than seven
• d{3,} : Three or more times
• We could rewrite /ddd-dddd/ as:
– /d{3}-d{4}/
43. Anchors
• Anchors force a pattern match to a
certain location
• ^ : start matching at beginning of string
• $ : start matching at end of string
• b : match at word boundary (between w
and W)
• Example:
• /^ddd-dddd$/ : matches only valid
phone numbers
44. Grouping, capturing
• You can group atoms together with
parentheses
• /cat+/ matches cat, catt, cattt
• /(cat)+/ matches cat, catcat, catcatcat
• Use as many sets of parentheses as
you need
• match.group()
45. Regex.py
import re
line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line,
re.M|re.I)
if matchObj:
print ("matchObj.group() : ", matchObj.group())
print ("matchObj.group(1) : ", matchObj.group(1))
print ("matchObj.group(2) : ", matchObj.group(2))
else:
print ("No match!!")
46. Regex.py
text = 'abbaaabbbbaaaaa'
pattern = 'ab'
for match in re.finditer(pattern, text):
s = match.start()
e = match.end()
print ('Found "%s" at %d:%d' % (text[s:e], s, e))
48. Oefening 1
1. Which of following 4 sequences
(seq1/2/3/4)
a) contains a “Galactokinase signature”
b) How many of them?
http://us.expasy.org/prosite/