Dictionaries, sets and
      flow control
       Karin Lagesen

  karin.lagesen@bio.uio.no
Dictionaries
Stores unordered, arbitrarily indexed data
Consists of key-value pairs
Dict = {key:value, key:value, key:value...}
Note: keys must be immutable!
  ergo: numbers, tuples or strings
Values may be anything, incl. another
 dictionary
Mainly used for storing associations or
 mappings
Create, add, lookup, remove
Creation:
  mydict = {} (empty), or
  mydict = { mykey:myval, mykey2:myval2 }
Adding:
  mydict[key] = value
Lookup:
  mydict[key]
Remove:
  del mydict[key]
Dictionary methods
All keys:
   mylist.keys() - returns list of keys
All values:
   mydict.values() - returns list of values
All key-value pairs as list of tuples:
   mydict.items()
Get one specific value:
   mydict.get(key [, default])
   if default is given, that is returned if key is not present in the
       dictionary, else None is returned
Test for presence of key:
   key in mydict – returns True or False
Dictionary exercise
Log in to freebee as before
Do module load python, then start python
Create this dictionary:
  {“A”: 1, 1:”A”, “B”:[1,2,3]}
Find out the following:
   how many keys are there?
   add “str”: {1:”X”} to the dictionary
   is there something stored with key “strx?”
   what about the key “str”?
   remove the number 3 from the list stored under “B” -
     print the results
Sets
Similar to lists but:
  no order
  every element is unique
Can create set from list (duplicates are then
 removed)
Add elements with
  myset = set()
  myset.add(elem)
Neat trick - how to create unique list:
  newlist = list(set(oldlist))
Set operations
Intersection – found in both sets
  set1.intersection(set2)
Union – all elements from both sets
  set1.union(set2)
Difference
  set1 – set2
Symmetrical difference
  set1.symmetric_difference(set2)
Set exercise
Create these lists:
 [“a”, “B”, 1, “a”, 4], [“c”, 1, 2, “A”, “c”, “a”]
make sets from these two lists
Figure out:
  the number of unique elements in each list
  the elements present in both
  the elements that are not shared
  the number of unique elements altogether
  the elements that are present in the second set,
    but not in the first
Input from terminal
Can get input from terminal (user)
Code:
  variable = raw_input(“Promt text”)
Prompt text will be printed to screen and the
 text the user types in will be stored in
 variable
Indentation and scope
Python does not use brackets or other
 symbols to delineate a block of code
Python uses indentation – either tab or
 space
Note: variables can only be seen and used
 within the block of code it is in – this is
 called scope
Flow control
Flow control determines which blocks of
  code that will to be executed
One conditional statement
  If – else
Two iteration statements
  For: iterate over group of elements
  While: do until something is true
If
Structure:
  if <boolean expression>:
     code block 1
  elif <boolean expression>:
     code block 2
  else:
     code block 3
Only one of these code blocks are executed
Executed block: the one whose expression
 first evaluates to True
Boolean expressions
Comparisons
    A> B                 A greater than B
    A< B                 A smaller than B
    A >= B               A greater than or equal to B
    A <=B                A smaller than or equal to B
    A == B               A equal to B
    A != B               A not equal to B

Comparisons can be combined:
   and, or, and not
   B != C and B > A - results evaluated left-right
Other values
   True: non-empty lists, sets, tuples etc
   False: 0 and None
If exercise
Use the interactive python shell
Create the following:
  Empty list
  List with elements
  A variable with value 0
  A variable with value -1
  A variable with value None
Use these in an if structure to see which
 ones that evaluate to True
If script
Create variable that takes input from user
Test to see if:
  The sequence contains anything else than
    ATGC
  The sequence is at least 10 nucleotides long
Report results to user
If script
inputstring = raw_input("Input your DNA string: ")
mystring = inputstring.upper()
mylength = len(mystring)
myAs = mystring.count("A")
myCs = mystring.count("C")
myTs = mystring.count("T")
myGs = mystring.count("G")

nucleotidesum = myAs + myCs + myTs + myGs

if nucleotidesum < mylength:
    print "String contains something else than DNA"
elif mylength < 10:
    print "Length is below 10"
else:
    print "Sequence is ok"
For
Structure:
  For VAR in ITERABLE:
     code block
Code block executed for each element in
 ITERABLE
VAR takes on value of current element
Iterables are:
  Strings, lists, tuples, xrange, byte arrays,
    buffers
For example
Use the python interactive shell
Create string “ATGGCGGA”
Print out each letter in this string
  >>> a = "ATGGCGGA"
  >>> for var in a:
  ...     print var
  ...
  A
  T
  G
  G
  C
  G
  G
  A
  >>>
For exercise
Define list of numbers 1-9
Show each number multiplied with itself
  >>> a = [1,2,3,4,5,6,7,8,9]
  >>> for var in a:
  ...     print var*var
  ...
  1
  4
  9
  16
  25
  36
  49
  64
  81
  >>>
xrange
Iterate over a range of numbers
xrange(int): numbers from 0 to int
xrange(start, stop, step):
  Start at start, stop at stop, skip step
    between each
  >>> for i in xrange(0,10,2):
  ...     print i
  ...
  0
  2
  4
  6
  8
  >>>
For exercise
Create dictionary where:
  Keys are all combinations of A, B, C
  Values are increasing from 1 and up
Hints
  Can use two for loops
  Adding to an integer variable:
     i += 1
For exercise

letters = "ABC"
valuedict = {}
i = 1
for letter1 in letters:
    for letter2 in letters:
        k = letter1 + letter2
        i += 1
        valuedict[k] = i
print valuedict


[karinlag@freebee]~/tmp/course% python forloopdict.py
{'AA': 2, 'AC': 4, 'AB': 3, 'BA': 5, 'BB': 6, 'BC': 7,
'CC': 10, 'CB': 9, 'CA': 8}
[karinlag@freebee]~/tmp/course%
While
Structure
  while EXPRESSION:
     code block
Important: code block MUST change truth
  value of expression, otherwise infinite loop
While example
>>>   a=10
>>>   while True:
...   if a<40:
...   print a
...   else:
...   break
...   a += 10
...
10
20
30
Break
Can be used to break out of a loop
Can greatly improve legibility and efficiency
What happens when next tuple is iterated
 over, after 'blue' is found?
Homework
ATCurve.py
  take an input string from the user
  check if the sequence only contains DNA – if
    not, promt for new sequence.
  calculate a running average of AT content along
    the sequence. Window size should be 3, and
    the step size should be 1. Print one value per
    line.
Note: you need to include several runtime
 examples to show that all parts of the code
 works.
Homework
CodonFrequency.py
 take an input string from the user
 check if the sequence only contains DNA
   – if not, promt for new sequence
 find an open reading frame in the string (note,
    must be multiple of three)
    – if not, prompt for new sequence
 calculate the frequency of each codon in the
   ORF

Day2

  • 1.
    Dictionaries, sets and flow control Karin Lagesen karin.lagesen@bio.uio.no
  • 2.
    Dictionaries Stores unordered, arbitrarilyindexed data Consists of key-value pairs Dict = {key:value, key:value, key:value...} Note: keys must be immutable! ergo: numbers, tuples or strings Values may be anything, incl. another dictionary Mainly used for storing associations or mappings
  • 3.
    Create, add, lookup,remove Creation: mydict = {} (empty), or mydict = { mykey:myval, mykey2:myval2 } Adding: mydict[key] = value Lookup: mydict[key] Remove: del mydict[key]
  • 4.
    Dictionary methods All keys: mylist.keys() - returns list of keys All values: mydict.values() - returns list of values All key-value pairs as list of tuples: mydict.items() Get one specific value: mydict.get(key [, default]) if default is given, that is returned if key is not present in the dictionary, else None is returned Test for presence of key: key in mydict – returns True or False
  • 5.
    Dictionary exercise Log into freebee as before Do module load python, then start python Create this dictionary: {“A”: 1, 1:”A”, “B”:[1,2,3]} Find out the following: how many keys are there? add “str”: {1:”X”} to the dictionary is there something stored with key “strx?” what about the key “str”? remove the number 3 from the list stored under “B” - print the results
  • 6.
    Sets Similar to listsbut: no order every element is unique Can create set from list (duplicates are then removed) Add elements with myset = set() myset.add(elem) Neat trick - how to create unique list: newlist = list(set(oldlist))
  • 7.
    Set operations Intersection –found in both sets set1.intersection(set2) Union – all elements from both sets set1.union(set2) Difference set1 – set2 Symmetrical difference set1.symmetric_difference(set2)
  • 8.
    Set exercise Create theselists: [“a”, “B”, 1, “a”, 4], [“c”, 1, 2, “A”, “c”, “a”] make sets from these two lists Figure out: the number of unique elements in each list the elements present in both the elements that are not shared the number of unique elements altogether the elements that are present in the second set, but not in the first
  • 9.
    Input from terminal Canget input from terminal (user) Code: variable = raw_input(“Promt text”) Prompt text will be printed to screen and the text the user types in will be stored in variable
  • 10.
    Indentation and scope Pythondoes not use brackets or other symbols to delineate a block of code Python uses indentation – either tab or space Note: variables can only be seen and used within the block of code it is in – this is called scope
  • 11.
    Flow control Flow controldetermines which blocks of code that will to be executed One conditional statement If – else Two iteration statements For: iterate over group of elements While: do until something is true
  • 12.
    If Structure: if<boolean expression>: code block 1 elif <boolean expression>: code block 2 else: code block 3 Only one of these code blocks are executed Executed block: the one whose expression first evaluates to True
  • 13.
    Boolean expressions Comparisons A> B A greater than B A< B A smaller than B A >= B A greater than or equal to B A <=B A smaller than or equal to B A == B A equal to B A != B A not equal to B Comparisons can be combined: and, or, and not B != C and B > A - results evaluated left-right Other values True: non-empty lists, sets, tuples etc False: 0 and None
  • 14.
    If exercise Use theinteractive python shell Create the following: Empty list List with elements A variable with value 0 A variable with value -1 A variable with value None Use these in an if structure to see which ones that evaluate to True
  • 15.
    If script Create variablethat takes input from user Test to see if: The sequence contains anything else than ATGC The sequence is at least 10 nucleotides long Report results to user
  • 16.
    If script inputstring =raw_input("Input your DNA string: ") mystring = inputstring.upper() mylength = len(mystring) myAs = mystring.count("A") myCs = mystring.count("C") myTs = mystring.count("T") myGs = mystring.count("G") nucleotidesum = myAs + myCs + myTs + myGs if nucleotidesum < mylength: print "String contains something else than DNA" elif mylength < 10: print "Length is below 10" else: print "Sequence is ok"
  • 17.
    For Structure: ForVAR in ITERABLE: code block Code block executed for each element in ITERABLE VAR takes on value of current element Iterables are: Strings, lists, tuples, xrange, byte arrays, buffers
  • 18.
    For example Use thepython interactive shell Create string “ATGGCGGA” Print out each letter in this string >>> a = "ATGGCGGA" >>> for var in a: ... print var ... A T G G C G G A >>>
  • 19.
    For exercise Define listof numbers 1-9 Show each number multiplied with itself >>> a = [1,2,3,4,5,6,7,8,9] >>> for var in a: ... print var*var ... 1 4 9 16 25 36 49 64 81 >>>
  • 20.
    xrange Iterate over arange of numbers xrange(int): numbers from 0 to int xrange(start, stop, step): Start at start, stop at stop, skip step between each >>> for i in xrange(0,10,2): ... print i ... 0 2 4 6 8 >>>
  • 21.
    For exercise Create dictionarywhere: Keys are all combinations of A, B, C Values are increasing from 1 and up Hints Can use two for loops Adding to an integer variable: i += 1
  • 22.
    For exercise letters ="ABC" valuedict = {} i = 1 for letter1 in letters: for letter2 in letters: k = letter1 + letter2 i += 1 valuedict[k] = i print valuedict [karinlag@freebee]~/tmp/course% python forloopdict.py {'AA': 2, 'AC': 4, 'AB': 3, 'BA': 5, 'BB': 6, 'BC': 7, 'CC': 10, 'CB': 9, 'CA': 8} [karinlag@freebee]~/tmp/course%
  • 23.
    While Structure whileEXPRESSION: code block Important: code block MUST change truth value of expression, otherwise infinite loop
  • 24.
    While example >>> a=10 >>> while True: ... if a<40: ... print a ... else: ... break ... a += 10 ... 10 20 30
  • 25.
    Break Can be usedto break out of a loop Can greatly improve legibility and efficiency What happens when next tuple is iterated over, after 'blue' is found?
  • 26.
    Homework ATCurve.py takean input string from the user check if the sequence only contains DNA – if not, promt for new sequence. calculate a running average of AT content along the sequence. Window size should be 3, and the step size should be 1. Print one value per line. Note: you need to include several runtime examples to show that all parts of the code works.
  • 27.
    Homework CodonFrequency.py take aninput string from the user check if the sequence only contains DNA – if not, promt for new sequence find an open reading frame in the string (note, must be multiple of three) – if not, prompt for new sequence calculate the frequency of each codon in the ORF