10518261_biopython_python_slides_notes.ppt

What is Biopython?
• tools for computational molecular biology
• to program in python and want to make it as
easy as possible to use python for bioinformatics
by creating high-quality, reusable modules and
scripts
2

What can Biopython do?
• Manipulate DNA and protein sequences
• Run BLAST
• Access public databases
• Manipulate protein structures
• Population genetics
• Supervised learning methods
• Networks of various kinds

Obtaining Biopython
• http://www.biopython.org
4

Making sure it worked
>>> new_seq.complement()
>>> new_seq.reverse_complement()
5

Working with sequences
• A biopython Seq object has two important
attributes:
– data : as the name implies, this is the actual
sequence data string of the sequence
– alphabet : an object describing what the individual
characters making up the string "mean" and how they
should be interpreted
• Two advantages
1. this gives an idea of the type of information the data object
contains
2. this provides a means of contraining the information you have
in the data object, as a means of type checking
6

Working with sequences
>>> protein_seq = Seq('EVRNAK', IUPAC.protein)
>>> dna_seq = Seq('ACGT', IUPAC.unambiguous_dna)
>>> protein_seq + dna_seq
>>> my_seq.tostring()
>>> my_seq[5] = 'G
>>> mutable_seq = my_seq.tomutable()
>>> print mutable_seq
>>> mutable_seq[5] = 'T'
>>> mutable_seq.remove('T')
>>> mutable_seq.reverse()
8

Parsing biological file formats
>gi|6273290|gb|AF191664.1|AF191664 Opuntia clavata rpl16 gene; chloroplast
gene for...
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAA
TCTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTAAT
AAA...
import string
from Bio.ParserSupport import AbstractConsumer
class SpeciesExtractor(AbstractConsumer):
def __init__(self):
self.species_list = []
def title(self, title_info):
title_atoms = string.split(title_info)
new_species = title_atoms[1]
if new_species not in self.species_list:
self.species_list.append(new_species)
9

Parsing biological file formats
from Bio import Fasta
def extract_organisms(file, num_records):
scanner = Fasta._Scanner()
consumer = SpeciesExtractor()
file_to_parse = open(file, 'r')
for fasta_record in range(num_records):
scanner.feed(file_to_parse, consumer)
file_to_parse.close()
return handler.species_list
10

Parsing biological file formats(easier)
>>> from Bio import Fasta
>>> parser = Fasta.RecordParser()
>>> file = open("ls_orchid.fasta")
>>> iterator = Fasta.Iterator(file, parser)
>>> cur_record = iterator.next()
>>> dir(cur_record)
>>> print cur_record.title
>>> print cur_record
11

Parsing biological file
formats(easier)
from Bio import SeqIO
myFile = open("ls_orchid.fasta")
for seq_record in SeqIO.parse(myFile, "fasta"):
print seq_record.id
print repr(seq_record.seq)
print len(seq_record)
myFile.close()
12

FASTA files as Dictionaries
import string
def get_accession_num(fasta_record):
title_atoms = string.split(fasta_record.title)
# all of the accession number information is stuck
in the first element
# and separated by '|'s
accession_atoms = string.split(title_atoms[0], '|')
# the accession number is the 4th element
gb_name = accession_atoms[3]
# strip the version info before returning
return gb_name[:-2]
13

FASTA files as Dictionaries(easier)
>>> from Bio import Fasta
>>> Fasta.index_file("ls_orchid.fasta", "my_orchid_dict.idx",
get_accession_num)
>>> from Bio.Alphabet import IUPAC
>>> dna_parser = Fasta.SequenceParser(IUPAC.ambiguous_dna)
>>> orchid_dict = Fasta.Dictionary("my_orchid_dict.idx", dna_parser)
14

Blast
for seq in SeqIO.parse('marker.fa', 'fasta'):
b_results = NCBIWWW.qblast('blastn', 'nr',
seq.seq, format_type='Text')
print b_results.read()
15

More information
http://www.biopython.org

Problem
• Write a program to read a FASTA file and print
the number of sequences, number of residues,
and minimum, maximum and average lengths of
the sequences.
> python read-fasta-file.py sample.fa
Number of sequences = 7
Number of residues = 285
Minimum length = 21
Maximum length = 94
Average length = 40.7

10518261_biopython_python_slides_notes.ppt

More Related Content

Similar to 10518261_biopython_python_slides_notes.ppt

Recently uploaded

10518261_biopython_python_slides_notes.ppt