FBW
6-11-2018
Wim Van Criekinge
Google Calendar
Bioinformatics.be
Recap
if condition:
statements
[elif condition:
statements] ...
else:
statements
while condition:
statements
for var in sequence:
statements
break
continue
Strings
REGULAR EXPRESSIONS
Warm Up
• Random_Sequence.py
Python Functions
• There are two kinds of functions in Python.
• Built-in functions that are provided as part of Python -
raw_input(), type(), float(), int() ...
• Functions that we define ourselves and then use
• We treat the of the built-in function names as "new" reserved
words (i.e. we avoid them as variable names)
Building our Own Functions
• We create a new function using the def keyword followed by
optional parameters in parenthesis.
• We indent the body of the function
• This defines the function but does not execute the body of the
function
def print_lyrics():
print "I'm a lumberjack, and I'm okay.”
print 'I sleep all night and I work all day.'
x = 5
print 'Hello'
def print_lyrics():
print "I'm a lumberjack, and I'm okay.”
print 'I sleep all night and I work all day.'
print 'Yo'
print_lyrics()
x = x + 2
print x
Hello
Yo
I'm a lumberjack, and I'm okay.I
sleep all night and I work all day.
7
Arguments
• An argument is a value we pass into the function as its input
when we call the function
• We use arguments so we can direct the function to do different
kinds of work when we call it at different times
• We put the arguments in parenthesis after the name of the
function
big = max('Hello world')
Argument
Parameters
• A parameter is a variable
which we use in the
function definition that is
a “handle” that allows the
code in the function to
access the arguments for
a particular function
invocation.
>>> def greet(lang):
... if lang == 'es':
... print 'Hola’
... elif lang == 'fr':
... print 'Bonjour’
... else:
... print 'Hello’
...
>>> greet('en')Hello
>>> greet('es')Hola
>>> greet('fr')Bonjour
>>>
Return Values
• Often a function will take its arguments, do some computation
and return a value to be used as the value of the function call in
the calling expression. The return keyword is used for this.
def greet():
return "Hello”
print greet(), "Glenn”
print greet(), "Sally"
Hello Glenn
Hello Sally
Return Value
• A “fruitful” function is one
that produces a result (or
return value)
• The return statement
ends the function
execution and “sends
back” the result of the
function
>>> def greet(lang):
... if lang == 'es':
... return 'Hola’
... elif lang == 'fr':
... return 'Bonjour’
... else:
... return 'Hello’
... >>> print greet('en'),'Glenn’
Hello Glenn
>>> print greet('es'),'Sally’
Hola Sally
>>> print greet('fr'),'Michael’
Bonjour Michael
>>>
Multiple Parameters /
Arguments
• We can define more than
one parameter in the
function definition
• We simply add more
arguments when we call
the function
• We match the number and
order of arguments and
parameters
def addtwo(a, b):
added = a + b
return added
x = addtwo(3, 5)
print x
import matplotlib.pyplot as plt
xs = range(-100,100,10)
x2 = [x**2 for x in xs]
negx2 = [-x**2 for x in xs]
plt.plot(xs, x2)
plt.plot(xs, negx2)
plt.xlabel("x”)
plt.ylabel("y”)
plt.ylim(-2000, 2000)
plt.axhline(0) # horiz line
plt.axvline(0) # vert line
plt.savefig(“quad.png”)
plt.show()
Incrementally
modify the figure.
Show it on the screen
Save your figure to a file
We can group these options into functions as usual, but remember that they
are operating on a global, hidden variable
What makes a good visualization?
• Edward Tufte: Maximize the data-ink ratio
• Evolution.py
• Needleman-Wunsch-Simple.py
• Needleman-Wunsch-Complete.py
• Smith-Waterman.py
Swiss-Knife.py
• Using a database as input ! Parse
the entire Swiss Prot collection
– How many entries are there ?
– Average Protein Length (in aa and
MW)
– Relative frequency of amino acids
• Compare to the ones used to construct
the PAM scoring matrixes from 1978 –
1991
Question 3: Getting the database
Uniprot_sprot.dat.gz – 528Mb
(on Github onder Files)
https://github.ugent.be/wvcrieki/Bioinformatics_2018.py/blob/script_branch/Files/
https://github.ugent.be/wvcrieki/Bioinformatics_2018.py/blob/script_branch/Files/
http://www.ebi.ac.uk/uniprot/download-center
Amino acid frequencies
1978 1991
L 0.085 0.091
A 0.087 0.077
G 0.089 0.074
S 0.070 0.069
V 0.065 0.066
E 0.050 0.062
T 0.058 0.059
K 0.081 0.059
I 0.037 0.053
D 0.047 0.052
R 0.041 0.051
P 0.051 0.051
N 0.040 0.043
Q 0.038 0.041
F 0.040 0.040
Y 0.030 0.032
M 0.015 0.024
H 0.034 0.023
C 0.033 0.020
W 0.010 0.014
Second step: Frequencies of Occurence
Extra
• Program your own prosite parser !
• Download prosite pattern database
(prosite.dat)
• Automatically generate >2000 search
patterns, and search in sequence set
from question 1
Oefening 1
1. Which of following 4 sequences
(seq1/2/3/4)
a) contains a “Galactokinase signature”
b) How many of them?
http://us.expasy.org/prosite/
• Evolution.py
• Needleman-Wunsch-Simple.py
• Needleman-Wunsch-Complete.py
• Smith-Waterman.py
Check installed libraries
import pip
import sys
import platform
import webbrowser
print ("Python " + platform.python_version()+ " installed packages:")
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
for i in installed_packages])
print(*installed_packages_list,sep="n")
Extra Questions
• How many records have a sequence of length 260?
• What are the first 20 residues of 143X_MAIZE?
• What is the identifier for the record with the
shortest sequence? Is there more than one record
with that length?
• What is the identifier for the record with the
longest sequence? Is there more than one record
with that length?
• How many contain the subsequence "ARRA"?
• How many contain the substring "KCIP-1" in the
description?

P4 2018 io_functions

  • 2.
  • 3.
  • 4.
  • 5.
    Recap if condition: statements [elif condition: statements]... else: statements while condition: statements for var in sequence: statements break continue Strings REGULAR EXPRESSIONS
  • 6.
  • 7.
    Python Functions • Thereare two kinds of functions in Python. • Built-in functions that are provided as part of Python - raw_input(), type(), float(), int() ... • Functions that we define ourselves and then use • We treat the of the built-in function names as "new" reserved words (i.e. we avoid them as variable names)
  • 8.
    Building our OwnFunctions • We create a new function using the def keyword followed by optional parameters in parenthesis. • We indent the body of the function • This defines the function but does not execute the body of the function def print_lyrics(): print "I'm a lumberjack, and I'm okay.” print 'I sleep all night and I work all day.'
  • 9.
    x = 5 print'Hello' def print_lyrics(): print "I'm a lumberjack, and I'm okay.” print 'I sleep all night and I work all day.' print 'Yo' print_lyrics() x = x + 2 print x Hello Yo I'm a lumberjack, and I'm okay.I sleep all night and I work all day. 7
  • 10.
    Arguments • An argumentis a value we pass into the function as its input when we call the function • We use arguments so we can direct the function to do different kinds of work when we call it at different times • We put the arguments in parenthesis after the name of the function big = max('Hello world') Argument
  • 11.
    Parameters • A parameteris a variable which we use in the function definition that is a “handle” that allows the code in the function to access the arguments for a particular function invocation. >>> def greet(lang): ... if lang == 'es': ... print 'Hola’ ... elif lang == 'fr': ... print 'Bonjour’ ... else: ... print 'Hello’ ... >>> greet('en')Hello >>> greet('es')Hola >>> greet('fr')Bonjour >>>
  • 12.
    Return Values • Oftena function will take its arguments, do some computation and return a value to be used as the value of the function call in the calling expression. The return keyword is used for this. def greet(): return "Hello” print greet(), "Glenn” print greet(), "Sally" Hello Glenn Hello Sally
  • 13.
    Return Value • A“fruitful” function is one that produces a result (or return value) • The return statement ends the function execution and “sends back” the result of the function >>> def greet(lang): ... if lang == 'es': ... return 'Hola’ ... elif lang == 'fr': ... return 'Bonjour’ ... else: ... return 'Hello’ ... >>> print greet('en'),'Glenn’ Hello Glenn >>> print greet('es'),'Sally’ Hola Sally >>> print greet('fr'),'Michael’ Bonjour Michael >>>
  • 14.
    Multiple Parameters / Arguments •We can define more than one parameter in the function definition • We simply add more arguments when we call the function • We match the number and order of arguments and parameters def addtwo(a, b): added = a + b return added x = addtwo(3, 5) print x
  • 15.
    import matplotlib.pyplot asplt xs = range(-100,100,10) x2 = [x**2 for x in xs] negx2 = [-x**2 for x in xs] plt.plot(xs, x2) plt.plot(xs, negx2) plt.xlabel("x”) plt.ylabel("y”) plt.ylim(-2000, 2000) plt.axhline(0) # horiz line plt.axvline(0) # vert line plt.savefig(“quad.png”) plt.show() Incrementally modify the figure. Show it on the screen Save your figure to a file
  • 16.
    We can groupthese options into functions as usual, but remember that they are operating on a global, hidden variable
  • 18.
    What makes agood visualization? • Edward Tufte: Maximize the data-ink ratio
  • 19.
    • Evolution.py • Needleman-Wunsch-Simple.py •Needleman-Wunsch-Complete.py • Smith-Waterman.py
  • 20.
    Swiss-Knife.py • Using adatabase as input ! Parse the entire Swiss Prot collection – How many entries are there ? – Average Protein Length (in aa and MW) – Relative frequency of amino acids • Compare to the ones used to construct the PAM scoring matrixes from 1978 – 1991
  • 21.
    Question 3: Gettingthe database Uniprot_sprot.dat.gz – 528Mb (on Github onder Files) https://github.ugent.be/wvcrieki/Bioinformatics_2018.py/blob/script_branch/Files/ https://github.ugent.be/wvcrieki/Bioinformatics_2018.py/blob/script_branch/Files/ http://www.ebi.ac.uk/uniprot/download-center
  • 22.
    Amino acid frequencies 19781991 L 0.085 0.091 A 0.087 0.077 G 0.089 0.074 S 0.070 0.069 V 0.065 0.066 E 0.050 0.062 T 0.058 0.059 K 0.081 0.059 I 0.037 0.053 D 0.047 0.052 R 0.041 0.051 P 0.051 0.051 N 0.040 0.043 Q 0.038 0.041 F 0.040 0.040 Y 0.030 0.032 M 0.015 0.024 H 0.034 0.023 C 0.033 0.020 W 0.010 0.014 Second step: Frequencies of Occurence
  • 23.
    Extra • Program yourown prosite parser ! • Download prosite pattern database (prosite.dat) • Automatically generate >2000 search patterns, and search in sequence set from question 1
  • 24.
    Oefening 1 1. Whichof following 4 sequences (seq1/2/3/4) a) contains a “Galactokinase signature” b) How many of them? http://us.expasy.org/prosite/
  • 25.
    • Evolution.py • Needleman-Wunsch-Simple.py •Needleman-Wunsch-Complete.py • Smith-Waterman.py
  • 26.
    Check installed libraries importpip import sys import platform import webbrowser print ("Python " + platform.python_version()+ " installed packages:") installed_packages = pip.get_installed_distributions() installed_packages_list = sorted(["%s==%s" % (i.key, i.version) for i in installed_packages]) print(*installed_packages_list,sep="n")
  • 27.
    Extra Questions • Howmany records have a sequence of length 260? • What are the first 20 residues of 143X_MAIZE? • What is the identifier for the record with the shortest sequence? Is there more than one record with that length? • What is the identifier for the record with the longest sequence? Is there more than one record with that length? • How many contain the subsequence "ARRA"? • How many contain the substring "KCIP-1" in the description?