Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Day3

255

Published on

Day 3 of a Python intro course for biologists. …

Day 3 of a Python intro course for biologists.
Theme: how to work with files

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
255
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
5
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. File handling Karin Lagesenkarin.lagesen@bio.uio.no
• 2. Homework● ATCurve.py ● take an input string from the user ● check if the sequence only contains DNA – if not, prompt for new sequence. ● calculate a running average of AT content along the sequence. Window size should be 3, and the step size should be 1. Print one value per line.● Note: you need to include several runtime examples to show that all parts of the code works.
• 3. ATCurve.py - thinking● Take input from user: ● raw_input● Check for the presence of !ATCG ● use sets – very easy● Calculate AT – window = 3, step = 1 ● iterate over string in slices of three
• 4. ATCurve.py# variable valid is used to see if the string is ok or not.valid = Falsewhile not valid: # promt user for input using raw_input() and store in string, # convert all characters into uppercase test_string = raw_input("Enter string: ") upper_string = test_string.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your string, try again" else: valid = Trueif valid: for i in range(0, len(upper_string)-3, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+2) at_sum += upper_string.count("T",i,i+2)
• 5. Homework● CodonFrequency.py ● take an input string from the user ● if the sequence only contains DNA – find a start codon in your string – if startcodon is present ● count the occurrences of each three-mer from start codon and onwards ● print the results
• 6. CodonFrequency.py - thinking● First part – same as earlier● Find start codon: locate index of AUG ● Note, can simplify and find ATG● If start codon is found: ● create dictionary ● for slice of three in input[StartCodon:]: – get codon – if codon is in dict: ● add to count – if not: ● create key-value pair in dict
• 7. CodonFrequency.pyinput = raw_input("Type a piece of DNA here: ")if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence"else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input)-3,3): codon = input[i:i+3] if codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
• 8. CodonFrequency.py w/ stopcodoninput = raw_input("Type a piece of DNA here: ")if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence"else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input) -3,3): codon = input[i:i+3] if codon in [UAG, UAA, UAG]: break elif codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
• 9. Results[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.pyType a piece of DNA here: ATGATTATTTAAATGATG 1ATT 2TAA 1[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.pyType a piece of DNA here: ATGATTATTTAAATGTATG 2ATT 2TAA 1[karinlag@freebee]/projects/temporary/cees-python-course/Karin%
• 10. Working with files● Reading – get info into your program● Parsing – processing file contents● Writing – get info out of your program
• 11. Reading and writing● Three-step process ● Open file – create file handle – reference to file ● Read or write to file ● Close file – will be automatically close on program end, but bad form to not close
• 12. Opening files● Opening modes: ● “r” - read file ● “w” - write file ● “a” - append to end of file● fh = open(“filename”, “mode”)● fh = filehandle, reference to a file, NOT the file itself
• 14. Reading example● Log on to freebee, and go to your area● do cp ../Karin/fastafile.fsa .● open python >>> fh = open("fastafile.fsa", "r") >>> fh● Q: what does the response mean?
• 15. Read example● Use all three methods to read the file. Print the results. ● read ● readlines ● readline● Q: what happens after you have read the file?● Q: What is the difference between the three?