SlideShare a Scribd company logo
Afile infasta format is probablythe most commonwayto store sequence information. The format is verysimple:the name ofeachsequence is
listed ona line beginningwitha '>' (the sequence name is everythingafter the '>'), and the sequence associated withthat name is listed onone or
more lines after the name line. For a more detailed description, see http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml. For the examples below,
programoutput willbe shownfor the followingfasta-format file, called seqs. fa:humanORF Finding:Write a programfind0rfs .pythat takes the
name ofa single fasta-format file fromthe command line. Your programshould read DNAsequences fromthe fasta file, translate theminto protein
sequences, and finallyprint out the proteinsequence ofallpossible ORFs (openreadingframes startingwitha start codonand endingwithone of
the three stop codons, withno stop codons inbetween]. Youshould look at allthree readingframes onthe forward strand (do not worryabout
the three readingframes onthe reverse complement). The ORFs your programwillbe able to find canrange inlengthfrom2 amino acids (a start
and a stop) to the entire fasta sequence record (ofanylength). There maybe more thanone ORF ineachfasta sequence record, possiblyin
different readingframes. Youdo not need to worryabout outputtingthe ORFs inanyspecific order so longas your output includes allORFs
present inthe sequence. To get youstarted, think about tacklingthe problemas follows:Read sequences fromthe fasta file For eachsequence,
translate the readingframes startingat the 1st, 2nd, and 3rd base inthe sequence For eachtranslated sequence, find anyORFs present inthat
sequence Example programusage:pythonfindOrfs.pyseqs.fa ORFs inseqA- human:ORFs inseqC - gorilla:MTRORFs inthat sequence

More Related Content

Similar to A file in fasta format is probably the most common way to store sequence information. The format is

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational BiologyAtreyiB
 
Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models
Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models
Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models
Anax Fotopoulos
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nawfal Aldujaily
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading FramesOsama Zahid
 
file handling1
file handling1file handling1
file handling1student
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2
solgenomics
 
Handout#01
Handout#01Handout#01
Handout#01
Sunita Milind Dol
 
File handling in c
File handling in cFile handling in c
File handling in c
aakanksha s
 
Unit 5 dwqb ans
Unit 5 dwqb ansUnit 5 dwqb ans
Unit 5 dwqb ans
Sowri Rajan
 
File management
File managementFile management
File management
AnishaThakkar2
 
Mesics lecture files in 'c'
Mesics lecture   files in 'c'Mesics lecture   files in 'c'
Mesics lecture files in 'c'
eShikshak
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
Dan Gaston
 
Raptor user manual3.0
Raptor user manual3.0Raptor user manual3.0
Raptor user manual3.0
Elizabeth Reyna
 
file_handling_in_c.ppt
file_handling_in_c.pptfile_handling_in_c.ppt
file_handling_in_c.ppt
yuvrajkeshri
 
Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8thotakoti
 
grep.1.pdf
grep.1.pdfgrep.1.pdf
grep.1.pdf
Saravana Kumar
 
grep.1.pdf
grep.1.pdfgrep.1.pdf
grep.1.pdf
Saravana Kumar
 

Similar to A file in fasta format is probably the most common way to store sequence information. The format is (20)

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models
Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models
Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Fasta
FastaFasta
Fasta
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
 
file handling1
file handling1file handling1
file handling1
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2
 
Handout#01
Handout#01Handout#01
Handout#01
 
Unit5
Unit5Unit5
Unit5
 
50320130403003 2
50320130403003 250320130403003 2
50320130403003 2
 
File handling in c
File handling in cFile handling in c
File handling in c
 
Unit 5 dwqb ans
Unit 5 dwqb ansUnit 5 dwqb ans
Unit 5 dwqb ans
 
File management
File managementFile management
File management
 
Mesics lecture files in 'c'
Mesics lecture   files in 'c'Mesics lecture   files in 'c'
Mesics lecture files in 'c'
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Raptor user manual3.0
Raptor user manual3.0Raptor user manual3.0
Raptor user manual3.0
 
file_handling_in_c.ppt
file_handling_in_c.pptfile_handling_in_c.ppt
file_handling_in_c.ppt
 
Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8
 
grep.1.pdf
grep.1.pdfgrep.1.pdf
grep.1.pdf
 
grep.1.pdf
grep.1.pdfgrep.1.pdf
grep.1.pdf
 

Recently uploaded

"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 

Recently uploaded (20)

"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 

A file in fasta format is probably the most common way to store sequence information. The format is

  • 1. Afile infasta format is probablythe most commonwayto store sequence information. The format is verysimple:the name ofeachsequence is listed ona line beginningwitha '>' (the sequence name is everythingafter the '>'), and the sequence associated withthat name is listed onone or more lines after the name line. For a more detailed description, see http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml. For the examples below, programoutput willbe shownfor the followingfasta-format file, called seqs. fa:humanORF Finding:Write a programfind0rfs .pythat takes the name ofa single fasta-format file fromthe command line. Your programshould read DNAsequences fromthe fasta file, translate theminto protein sequences, and finallyprint out the proteinsequence ofallpossible ORFs (openreadingframes startingwitha start codonand endingwithone of the three stop codons, withno stop codons inbetween]. Youshould look at allthree readingframes onthe forward strand (do not worryabout the three readingframes onthe reverse complement). The ORFs your programwillbe able to find canrange inlengthfrom2 amino acids (a start and a stop) to the entire fasta sequence record (ofanylength). There maybe more thanone ORF ineachfasta sequence record, possiblyin different readingframes. Youdo not need to worryabout outputtingthe ORFs inanyspecific order so longas your output includes allORFs present inthe sequence. To get youstarted, think about tacklingthe problemas follows:Read sequences fromthe fasta file For eachsequence, translate the readingframes startingat the 1st, 2nd, and 3rd base inthe sequence For eachtranslated sequence, find anyORFs present inthat sequence Example programusage:pythonfindOrfs.pyseqs.fa ORFs inseqA- human:ORFs inseqC - gorilla:MTRORFs inthat sequence