Use of bio-informatic tools in bacterial genetics

  • 286 views
Uploaded on

A nice paper on the topic as obvious from the title.

A nice paper on the topic as obvious from the title.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
286
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. USE OF BIO-INFORMATIC TOOLS TOSTUDY IMPLICATIONS OF G-CCONTENT OF DNA ON THE PROTEIN.DEBTANU CHAKRABORTY
  • 2. Index 1) Note of Acknowledgement 2) Bio-informatics 3) G-C content 4) Classification tree of Bacteria 5) List of low G-C bacteria 6) List of high G-C bacteria 7) Introduction to Carbonic Anhydrase 8) Peptide Sequence and their analysis 9) Gene Sequences and their analysis 10) Codon usage plot 11) Conclusion 12) Future work-scope
  • 3. Note of AcknowledgementThe project would have been incomplete without the help of a number of persons. First Iwould like to thank my mentor and guide Prof. Chanchal K. Das Gupta who gave me theidea and inspiration to do the project and helped me in every step whenever I was introuble. I would like to thank Prof. Punyasloke Bhadury who helped me by introducing toNCBI website and showing me to perform tasks like alignment, BLAST in internet.I cannot repay the sin if I don’t mention the names of my superiors Papri di, Amit da andShimonti di who also helped me with the project.I have in my work, extensively used the websites- NCBI and Uniprot.
  • 4. Bioinformatics is the application of statistics and computer science to the fieldof molecular biology.The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study ofinformatic processes in biotic systems. Its primary use since at least the late 1980s hasbeen in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing.Bioinformatics now entails the creation and advancement of databases, algorithms,computational and statistical techniques and theory to solve formal and practicalproblems arising from the management and analysis of biological data.Over the past few decades rapid developments in genomic and other molecularresearch technologies and developments in information technologies have combined toproduce a tremendous amount of information related to molecular biology. It is thename given to these mathematical and computing approaches used to gleanunderstanding of biological processes.Common activities in bioinformatics include mapping and analyzing DNA and proteinsequences, aligning different DNA and protein sequences to compare them andcreating and viewing 3-D models of protein structures.The primary goal of bioinformatics is to increase the understanding of biologicalprocesses. What sets it apart from other approaches, however, is its focus ondeveloping and applying computationally intensive techniques (e.g., patternrecognition, data mining, machine learning algorithms, and visualization) to achieve thisgoal. Major research efforts in the field include sequence alignment, genefinding, genome assembly, drug design, drug discovery, protein structurealignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution.
  • 5. GC-content (or guanine-cytosine content), in molecular biology, is the percentageof nitrogenous bases on a molecule which are either guanine or cytosine (from apossibility of four different ones, also including adenine and thymine). This may refer toa specific fragment of DNA or RNA, or that of the whole genome. When it refers to afragment of the genetic material, it may denote the GC-content of part of a gene(domain), single gene, group of genes (or gene clusters) or even a non-coding region. G(guanine) and C (cytosine) undergo a specific hydrogen bonding whereas A (adenine)bonds specifically with T (thymine).The GC pair is bound by three hydrogen bonds, while AT pairs are bound by twohydrogen bonds. DNA with high GC-content is more stable than DNA with low GC-content, but contrary to popular belief, the hydrogen bonds do not stabilize the DNAsignificantly and stabilization is mainly due to stacking interactions. In spite of thehigher conferred to the genetic material, it is envisaged that cells with DNA with highGC-content undergo autolysis, thereby reducing the longevity of the cell per se. Due tothe robustness endowed to the genetic materials in high GC organisms it wascommonly believed that the GC content played a vital part in adaptation temperatures, ahypothesis which has recently been refuted.In PCR experiments, the GC-content of primers are used to predict their annealingtemperature to the template DNA. A higher GC-content level indicates a higher meltingtemperature.
  • 6. THE EVOLUTION TREE IN BACTERIA WHERE IS G-C CONTENT STUDY IS ANANALYTICAL TOOL.The guanine plus cytosine (GC) content in bacteria ranges from ~20% to 75% where aswe will see in a later lecture that eukaryotic genomes have GC contents that often havea restricted range from ~35-50% (about 40%-45% in vertebrates).
  • 7. Some Bacteria with low G-C content -
  • 8. Some Bacteria with high G-C content-
  • 9. For our convenience, we chose Carbonic Anhydrase because it is present in all bacteriaacross the G-C content spectrum of Bacterias-The carbonic anhydrases (or carbonate dehydratases) form a family of enzymes thatcatalyze the rapid conversion of carbon dioxide and water to bicarbonate and protons, areaction that occurs rather slowly in the absence of a catalyst.[1] The active site of mostcarbonic anhydrases contains a zinc ion; they are therefore classifiedas metalloenzymes.THE CARBONIC ANHYDRASE PROTEIN-
  • 10. In our analysis, we choose the following bacteria- 1) Methaococcus voltae A3 (UI-A8TF20) (G-Cc=27%) 2) Staphylococcus carnosus (UI-B9DMU8_STACT) (G-Cc=34%) 3) Vibrio cholera (UI-Q9KMP6_VIBCH) (G-Cc=47%) 4) Escherichia coli (UI-P61517) (G-Cc=50%%) 5) Truepera radiovictrix DSM1703 (UI-ADI14363) (G-Cc=68.2%) 6) Salinispora arenicola (UI-A8MOD8) (G-Cc=69.2%%) 7) Frankia CcI (UI-Q2JF50) (G-Cc=71%) *UI stands for the Uniprot Accession number of the Carbonic Anhydrase protein of the respective bacteria.We begin analyzing the protein Carbonic Anhydrase from these bacteria-The peptide sequence goes as follows->Methanococcus voltae Carbonic Anhydrase ProteinLN*LFNLASVNVNHKPFNFHIFRNCRVIFD*FDTFQHVFFFVIHFTHPSFKVWRKVWIYSSFNHFFSYLFNICSCHSTVGMTYDSYLFNI*TVYCNY*RP*YIVCNNITCVFDDFCVASF*THFFR*EIYESCIHTSYYC*FLFRFGFCSDSFTYTQ>Staphylococcus carnosus Carbonic Anhydrase ProteinYPXXXMTLLESILAYNKDFVGNKEFENYTTSKKPDKKAVLFTCMDTRLQDLGTKALGFNNGDLKVVKNAGAIITHPYGSTIKSLLVGIYALGAEEIIIMAHKDCGMGCLDVSTVKDAMKERGVTEETFKIIEHSGVDVDSFLQGFKDAEENVRRNIDMVYNHPLFDKSVPIHGLVIDPHTGELDLIQDGYELAAQNK*>Vibrio cholerae Carbonic Anhydrase ProteinMKKTTWVLAMVASMSFGVQASEWGYEGEHAPEHWGKVAPLCAEGKNQSPIDVAQSVEADLQPFTLNYQGQVVGLLNNGHTLQAIVRGNNPLQIDGKTFQLKQFHFHTPSENLLKGKQFPLEAHFVHADEQGNLAVVAVMYQVGSENPLLKVLTADMPTKGNSTQLTQGIPLADWIPESKHYYRFNGSLTTPPCSEGVRWIVLKEPAHLSNQQEQQLSAVMGHNNRPVQPHNARLVLQAD*>1st Escherichia coli Carbonic Anhydrase ProteinLFVVGVFQLEVGDPVTVTLLKGFAVSRCDIQITQQAVVNAVGPAVNGDFLPAFPR*LHNSGVAQVIHLFHDVQFTQGIQTALLRHFAEQ*AMFEPDIADMQQPVVDKPQFRVFNCGLYAAATVV
  • 11. >2nd Escherichia coli Carbonic Anhydrase Final ripMKDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHINNWLLHIRDIWFKHSSLLGEMPQERRLDTLCELNVMEQVYNLGHSTIMQSAWKRGQKVTIHGWAYGIHDGLLRDLDVTATNRETLEQRYRHGISNLKLKHANHK*>3rd Escherichia coli Carbonic Anhydrase Final rip2VKEIIDGFLKFQREAFPKREALFKQLATQQSPRTLFISCSDSRLVPELVTQREPGDLFVIRNAGNIVPSYGPEPGGVSASVEYAVAALRVSDIVICGHSNCGAMTAIASCQCMDHMPAVSHWLRYADSARVVNEARPHSDLPSKAAAMVRENVIAQLANLQTHPSVRLALEEGRIALHGWVYDIESGSIAAFDGATRQFVPLAANPRVCAIPLRQPTAA*.>Truepera radiovictrix DSM1703 Carbonic Anhydrase ProteinS*PFQKRAVSGRAG*KGCRQQLEPARLEVVHGADDGERALGDARL*GRVRGDEANGRLDVLPHGPLERTPRPPLSRVAATSGAPQSGLERPHEGRQRGVGAPLLEGCGGGRDRAAAGVPQHHDERHAEHRDAVGEARQNRVVDDVAGDPVGKEVAQALVEDDLRRHARVGAAEHRREGVLLARQGRAPARVLVRVRHAPLEVALVPGQQALERPLGGQGRLGGGH>1 Salinispora arenicola Carbonic Anhydrase 1st ProteinMNCPGTPDTQPGSHPVSSSGIGGSRSGPVGPEQALAELYDGNRRFAVGVPIRPHQDIDRRVALADGQQPFAVIVGCSDSRLAAEIIFDRGLGDLFVVRTAGHTVGPEVLGSVEYAVTVLGAPLVVVLGHDSCGAVQAARTADATGAPASGHLRAVVDGVVPSVRRAGARGVTEIDQIVDIHIEQTVEAVLGRSEAVAAAVAGGRCAVVGMSYRLTAGEVHTVTAVGLAAPTTPPAAPETRPSAGPA*>2 Salinispora arenicola Carbonic Anhydrase 2nd ProteinXXTXXESGRVAESESTAFRWAGGRCGRACGVFVDEGALVGDQRITDSVAHHAHRRIREADGGQPAVGAWRPSTTQPGSSSASSRGPRTEALWGGPRMH*LAGAA*TPLRHRPG*SFRDTYGR*GDRPSGHWFCRVWTSDQWHSAHRGPWASALRRRQGGVHLPS*GQAAARQPTGDRYGPPAGV*TGSLSGERRPDRRHGPSPGRADRKRPALQPGTSPTRGEAGPCRGQCLLFPRYRRGGSPQWQTLL>1 Frankia CcI Carbonic Anhydrase 1st ProteinCPSPTTT*PTTPPTRRPSPGRFRCRRPSTSPPSPAWTHGSTSTRSLAWATARLTSSATPAASSPTTRSVPSRSASACSAPARSS*STTPTAAC*PSPTTILNARSRTRPGSNQNGPWSRLPTWPKTYASRLRGSRRARSSRIPTPSAASSSMLPPDCSPKSR
  • 12. >2 Frankia CcI3 Carbonic Anhydrase 2nd ProteinVDTDDHTAVDPVADVHADDVHADTVRPADTVSPVSGAATATELLLSYAAGHPARRREAGLPALPGARPRLGVAVVACMDVRIQVEALLGLVEGDAHILRNAGGVITPDVVRSLAVSQHVLGTTEIILLHHTGCGLERITDDGFRDQLECKTGVRPEWAVYSFPDVEEDVRKSVRVLRSSPFLQSTTSVRGFVYQVETGALVEVLP*We have 3 protein sequences for E coli and 2 sequences each for Salinispora andFrankia. We now compare them amongst themselves.For E coli-The sequence marked Escherishsia is the 1st sequence.The sequence Ecoli is the 2nd sequence.The sequence Final is the 3rd sequence.
  • 13. For Salinispora- . :For Frankia-After viewing the alignment of the suspected Carbonic Anhydrase within the samespecies, we wish to align the proteins from all the sources, all proteins from samespecies is also incorporated.
  • 14. The alignment sequence of the bacteria is as follows-
  • 15. Analysis- we can see two things from the above. 1) Bacteria with high G-C have two genes for Carbonic Anhydrase and consequently 2 proteins suspected to be Carbonic Anhydrase. 2) Bacteria with high G-C incorporate synonymous amino acid which requires G-C rich codons to compensate in their protons.We will elaborate on the 2nd point later using Codon-plot. We can show that thecorresponding codon of the DNA of Carbonic Anhydrase gene of this bacteria.Now we move to analyzing the DNA of the genes of Carbonic Anhydrase-The DNA sequences are as follows->Methanococcus voltae Carbonic Anhydrase of 471 basesttaaattaactttttaatctcgccagtgttaatgtcaatcataagcccttcaacttccacatctttaggaattgcagggtgatttttgattaattcgacacctttcaacacgttttcttcttcgttatccattttacccatccaagcttcaaagtctggcgtaaagtatggatttactcctcttttaatcatttcttttcttatctcttcaatatctgctcctgccattccacagtcggtatgacctacgatagctatcttttcaacatctaaacagtatattgcaactactaacgaccttaatacatcgtctgtaataatattacctgcgtttttgatgacttttgcgtcgcctctttctaaacccattttttcaggtaagaaatttacgagtcttgtatccatacaagttattactgctaatttctttttcggtttggcttctgctccgatagtttcacctatactcaa>Staphylococcus carnosus Carbonic Anhydrase of 594 basestaccccancancanaatgacgttattagaaagcattttagcttataataaagattttgtcggcaacaaagaatttgaaaactatacaacaagtaaaaaaccagataaaaaagcagtgttatttacatgtatggatacacgtttgcaagatttaggtacaaaagcactcggttttaataatggtgacttgaaagttgttaaaaatgcaggtgcaattatcacgcacccatatggttcaactataaaaagcttactagtaggtatttatgcattaggtgctgaagaaattattattatggcacataaagattgcggaatgggttgtcttgatgtcagcactgttaaagacgcaatgaaagaacgtggcgtaacagaagaaacatttaaaatcatcgaacattctggtgtagatgtagacagctttttacaaggtttcaaagatgctgaagaaaatgtccgcagaaatatcgatatggtatataatcatcccttatttgataaatccgtacctattcacggcttagtcatcgatcctcatacgggggaattagatttaattcaagacggctatgaattagctgctcaaaataaataa
  • 16. >Vibrio cholerae Carbonic Anhydrase of 720 basesatgaaaaagacaacgtgggtattagcgatggtagccagtatgagcttcggcgtacaggcttccgagtgggggtatgaaggagagcatgctccggagcattggggcaaagttgcccctctttgcgcagagggtaaaaatcaaagcccgattgatgtcgcgcaaagcgtagaagcggatctacagcctttcacgctcaattatcaagggcaagtggttgggctgctcaataacgggcacactttacaagcgatagtccgtggtaataacccactgcagatcgatggcaaaacgtttcagcttaagcagtttcattttcataccccttctgaaaatttgctaaaaggaaaacaattcccactggaagcgcattttgttcatgccgacgagcaaggcaatctggcggttgttgcggtgatgtaccaagtggggtcggaaaatccgctgcttaaggttctcacggcggatatgccgaccaaagggaattcgactcagctcacgcaagggatccctttggctgattggatcccagaatcgaagcactactatcgtttcaatggttcattgactacgccgccttgcagtgaaggtgtacgttggattgtgttaaaagagccagcacatttgtcgaatcaacaagagcagcagcttagtgccgtgatgggacacaataatcgacccgtacaaccgcataatgctcgtcttgtcttgcaagccgactaa>Escherichia coli Carbonic Anhydrase of 372 basesttatttgtggttggcgtgtttcagcttgaggttggagatcccgtgacggtaacgttgctcaagggtttcgcggttagtcgctgtgacatccagatcacgcagcaagccgtcgtgaatgccgtaggcccagccgtgaatggtgactttctgcccgcgtttccacgctgattgcataatagtggagtggcccaggttatacacctgttccatgacgttcagttcacacaaggtatccagacggcgctcttgcggcatttcgccgagcaatgagctatgtttgaaccagatatcgcggatatgcagcagccagttgttgataagccccagttccgggttttcaactgcggcttgtacgccgccgcaaccgtagtg>123 Escherichia coli carbonic Anhydrase Finalaagccccagttccgggttttcaactgcggcttgtacgccgccgcaaccgtagtggccacagataataatgtgttcaacttcgagtacatccactgcatactgaaccacggaaaggcagttcaggtcagtttatttgtggttggcgtgtttcagcttgaggttggaaatcccgtgacggtaacgttgctcaagggtttcgcggttggtggcggtaacatccagatcacgcagcaagccgtcgtgaatgccgtaggcccagccgtgaatggtaactttctgcccgcgtttccacgctgattgcataatggtggagtggcccaggttatacacctgttccatgacgttcagttcacacaaggtatccagacggcgctcttgcggcatttcgccgagcaatgagctatgtttgaaccagatatcgcggatatgcagcagccagttgttgatgtgaatgaccaggttagcaacattacggtgaacaaagagttcgcccggctcaagaccggttaaacgttctgcaggaacgcgactgtcggaacatccaatccatagaaagcgcggtttttgcgcttgtgccagtttctcaaaaaacccgggatcctcttccaccagcatttttgaccatagtgcattgttgctgatgagtgtatctatgtcttt cat>456 Escherichia coli Carbonic Anhydrase Final 2gtgaaagagattattgatggattccttaaattccagcgcgaggcatttccgaagcgggaagccttgtttaaacagctggcgacacagcaaagcccgcgcacactttttatctcctgctccgacagccg
  • 17. tctggtccctgagctggtgacgcaacgtgagcctggcgatctgttcgttattcgcaacgcgggcaatatcgtcccttcctacgggccggaacccggtggcgtttctgcttcggtggagtatgccgtcgctgcgcttcgggtatctgacattgtgatttgtggtcattccaactgtggcgcgatgaccgccattgccagctgtcagtgcatggaccatatgcctgccgtctcccactggctgcgttatgccgattcagcccgcgtcgttaatgaggcgcgcccgcattccgatttaccgtcaaaagctgcggcgatggtacgtgaaaacgtcattgctcagttggctaatttgcaaactcatccatcggtgcgcctggcgctcgaagaggggcggatcgccctgcacggctgggtctacgacattgaaagcggcagcatcgcagcttttgacggcgcaacccgccagtttgtgccactggccgctaatcctcgcgtttgtgccataccgctacgccaaccgaccgcagcgtaa>Truepera radiovictrix DSM1703 Carbonic Anhydrase consistingof 675 basestcataaccgttccaaaagcgggccgtgagcgggcgcgctgggtaaaaggggtgtcggcagcagctcgagcccgcccgtctcgaggtcgtacacggcgccgacgacggcgagcgtgcgctgggcgatgcgcgcctttaggggcgggtgcgcggcgatgaggcgaacggacgcctcgacgttctccctcacggcccccttgagcgtacaccccgtccccccctcagccgtgtcgcagcgacgagcggcgcgccgcaaagcgggctcgagcgcccgcacgaggggcgtcagcgaggggtcggcgcccccctcctcgagggctgcggcggcggccgcgaccgcgccgcagccggtgtgccccagcaccacgatgagcggcacgccgagcaccgagacgccgtaggtgaggctcgccaaaatcgcgtcgtcgacgatgttgccggcgacccggttggtaaagaggtcgcccaagccctggtcgaagatgatctgcggcggcacgcgcgagtcggcgcagccgagcaccgccgcgaaggggtgctgctcgcgcgtcaaggacgcgcgccagcgcgcgtcttggtgcgggtgcgccatgcgcccctcgaggtagcgctggtgcccggccaacaggcgctcgagcgccccttggggggtcaggggcgtctggggggcggtcat>Salinispora arenicola Carbonic Anhydrase 1 of 741 bases.atgaactgcccaggaacgcccgacacacagccgggctcgcacccggtgtcctccagtggaatcggcggttcccggagcgggccggtcgggcccgagcaggcgcttgccgagttgtacgacggcaaccggcgattcgccgttggtgttccgatccgcccacaccaggacatcgaccgtcgggtcgccctggcggatggtcagcagcccttcgcggtgatcgtcggctgttccgactcccgacttgctgctgagatcatctttgaccgtggtctcggtgacctgttcgtggtacgcaccgctgggcacacggtcgggccagaggtgctgggcagcgtcgagtacgcggtcaccgtgctgggtgcgccgctggtggtggtgctcggccacgactcctgtggagcggtacaggcggcccggaccgccgacgccaccggcgcaccggcgtccgggcacctccgcgctgtggtggacggggtggtgccgagcgtgcgtcgggccggggcccgtggggttaccgagatcgaccagatcgtcgacattcatatcgagcagaccgttgaggcggtgcttggccgttctgaggcggtcgcagccgcggtggccggcggacggtgtgcggtggtgggaatgtcgtaccggctcaccgcaggtgaggtgcacacggttaccgcggttggcctcgcggcgccgaccacaccaccggccgcgcctgagacccgccccagcgccggaccggcgtaa>abc Salinispora arenicola Carbonic Anhydrase 2 of 748 basesnaancacancanatgaatcgggccgtgtggccgagtcggagagcactgctttccggtgggctggtgggcgctgtggccgtgcttgcggggtgttcgtcgacgaaggcgcgctcgtcggc
  • 18. gaccagcgcatcaccgacagcgtcgcccaccacgcccaccgccgcattcgagaggctgatggagggcaaccagcggtgggtgcgtggagaccttcaacaacccaaccgggatccagctcggcgtcaagtcgtggcccacgaacagaagccctttggggcggtcctcgcatgcattgactcgcgggtgccgcctgaactcctcttcgacaccggcctgggtgatcttttcgtgacacgtacgggaggtgaggcgatcggcccagtggtcactggttctgtcgagtttggacctctgaccagtggcactccgctcatcgtggtccttgggcatcagcgttgcggcgccgtcaaggcggcgtacacctcccttcgtgagggcaagccgctgcccggcaacctaccggcgatcgttacggccctccagccggcgtatgaacaggtagcctcagcggggagcgccgacccgatcgacgccatggcccgagcccaggccgagctgatcgcaaacgacctgcgctccaacccggaactagccccactcgtggcgaagcgggaccttgccgtggtcagtgcctactattccctcgataccggcgcggtggaagtcctcagtggcagaccctcctga>Frankia CcI Carbonic Anhydrase 1 of 488 basestgtccgtcaccgacgactacctgaccaacaacgccgcctacgcgaagaccttcgccgggccgcttccgctgccgccgtccaagcacatcgccgccgtcgcctgcatggacgcacggctcaacgtctacgcgatccttggcctgggcgacggcgaggctcacgtcatccgcaacgccggcggcgtcgtcaccgacgacgagatccgttccctcgcgatcagccagcgcctgctcggcacccgcgagatcatcctgatccaccacaccgactgcggcatgctgaccttcaccgacgacgattttaaacgctcgatccaggacgagaccgggatcaaaccagaatgggccgtggagtcgtttaccgacctggccgaagacatacgccagtcgattgcgcggatcaaggcgagcccgttcatcccgcataccgacgccatccgcggcttcatcttcgatgttgccaccggactgctcaccgaagtcgcgtga>xyz Frankia CcI3 Carbonic Anhydrase 2 of 618 basesgtggacaccgatgaccacaccgctgtcgaccccgttgccgatgtccatgcagacgatgtccatgcggacaccgtgcgccccgcggatacggtgagcccggtgagcggcgctgccacggcgaccgaactcctgctgagctacgctgcaggtcaccccgcccggcggcgggaggccgggctacctgccctgcccggcgcgcggccgcgcctgggcgtcgcggtggttgcgtgtatggacgtgcggatccaggtggaggccttgctcggtcttgtcgaaggtgacgcccacatcctgcgcaacgccggtggtgtcatcaccccggatgtggtccgctcgctcgccgtgagccagcacgtgctgggaacgacggagatcattcttttgcatcacaccgggtgtggtctcgaaaggatcaccgacgacgggttccgggaccagttggagtgcaagacgggcgttcgtcccgaatgggccgtgtattcctttcccgatgtcgaggaggacgtgcgcaagtccgtcagggtgctgcgttcgtcgccgttcctgcagtccaccacctcggtacgcgggttcgtctaccaggtggagaccggggcactggtcgaggttctgccgtagWe will now proceed to compare the translation product of the ORF of the gene with theoriginal protein product. Methanococcus produces the protein in reading frame 1 of thereverse strand of the DNA segment. It does not start with ATG.first amino acid is Linplace of M.Staphylococcus and Vibrio does the same thing in frame 1 of forwarddirection. The same is observed in Frankia and Salinispora.
  • 19. The gene product is typically labeled ‘orf’. 1) 1) Methanococcus Voltae A3- 2)Staphylococcus Carnosus- 3)Vibrio cholera-
  • 20. 3)The comparison of E.coli gene-pro and protein are as follows-For the rest, we will be comparing only 1 suspected protein and gene productfor consistency.For Truepera-
  • 21. 5) For Salinispora-6) For Frankia-
  • 22. Codon Analysis is as follows-Results for 411 residue sequence "Methanococcus voltae CarbonicAnhydrase of 471 basesAmAcid Codon Number /1000 Fraction ..Ala GCG 0.00 0.00 0.00Ala GCA 0.00 0.00 0.00Ala GCT 0.00 0.00 0.00Ala GCC 1.00 7.30 1.00Cys TGT 2.00 14.60 0.20Cys TGC 8.00 58.39 0.80Asp GAT 4.00 29.20 0.67Asp GAC 2.00 14.60 0.33Glu GAG 1.00 7.30 0.50Glu GAA 1.00 7.30 0.50Phe TTT 12.00 87.59 0.50Phe TTC 12.00 87.59 0.50Gly GGG 0.00 0.00 0.00Gly GGA 0.00 0.00 0.00Gly GGT 1.00 7.30 0.50Gly GGC 1.00 7.30 0.50His CAT 6.00 43.80 0.86His CAC 1.00 7.30 0.14Ile ATA 0.00 0.00 0.00Ile ATT 4.00 29.20 0.40Ile ATC 6.00 43.80 0.60Lys AAG 0.00 0.00 0.00Lys AAA 2.00 14.60 1.00Leu TTG 0.00 0.00 0.00Leu TTA 0.00 0.00 0.00Leu CTG 0.00 0.00 0.00Leu CTA 0.00 0.00 0.00Leu CTT 2.00 14.60 0.67Leu CTC 1.00 7.30 0.33Met ATG 1.00 7.30 1.00
  • 23. Asn AAT 5.00 36.50 0.71Asn AAC 2.00 14.60 0.29Pro CCG 0.00 0.00 0.00Pro CCA 1.00 7.30 0.50Pro CCT 1.00 7.30 0.50Pro CCC 0.00 0.00 0.00Gln CAG 0.00 0.00 0.00Gln CAA 2.00 14.60 1.00Arg AGG 3.00 21.90 0.50Arg AGA 0.00 0.00 0.00Arg CGG 1.00 7.30 0.17Arg CGA 1.00 7.30 0.17Arg CGT 1.00 7.30 0.17Arg CGC 0.00 0.00 0.00Ser AGT 2.00 14.60 0.17Ser AGC 2.00 14.60 0.17Ser TCG 0.00 0.00 0.00Ser TCA 0.00 0.00 0.00Ser TCT 4.00 29.20 0.33Ser TCC 4.00 29.20 0.33Thr ACG 0.00 0.00 0.00Thr ACA 3.00 21.90 0.30Thr ACT 1.00 7.30 0.10Thr ACC 6.00 43.80 0.60Val GTG 1.00 7.30 0.10Val GTA 2.00 14.60 0.20Val GTT 3.00 21.90 0.30Val GTC 4.00 29.20 0.40Trp TGG 2.00 14.60 1.00Tyr TAT 5.00 36.50 0.45Tyr TAC 6.00 43.80 0.55End TGA 0.00 0.00 0.00End TAG 0.00 0.00 0.00End TAA 7.00 51.09 1.00
  • 24. Results for 594 residue sequence "Staphylococcus carnosus”Carbonic Anhydrase of 594 bases"AmAcid Codon Number /1000 Fraction ...Ala GCG 0.00 0.00 0.00Ala GCA 7.00 35.90 0.58Ala GCT 5.00 25.64 0.42Ala GCC 0.00 0.00 0.00Cys TGT 2.00 10.26 0.67Cys TGC 1.00 5.13 0.33Asp GAT 12.00 61.54 0.75Asp GAC 4.00 20.51 0.25Glu GAG 0.00 0.00 0.00Glu GAA 13.00 66.67 1.00Phe TTT 7.00 35.90 0.88Phe TTC 1.00 5.13 0.13Gly GGG 1.00 5.13 0.06Gly GGA 1.00 5.13 0.06Gly GGT 10.00 51.28 0.63Gly GGC 4.00 20.51 0.25His CAT 4.00 20.51 0.67His CAC 2.00 10.26 0.33Ile ATA 1.00 5.13 0.07Ile ATT 8.00 41.03 0.57Ile ATC 5.00 25.64 0.36Lys AAG 0.00 0.00 0.00Lys AAA 17.00 87.18 1.00Leu TTG 2.00 10.26 0.11Leu TTA 13.00 66.67 0.72Leu CTG 0.00 0.00 0.00Leu CTA 1.00 5.13 0.06Leu CTT 1.00 5.13 0.06Leu CTC 1.00 5.13 0.06Met ATG 6.00 30.77 1.00Asn AAT 8.00 41.03 0.80Asn AAC 2.00 10.26 0.20
  • 25. Pro CCG 0.00 0.00 0.00Pro CCA 2.00 10.26 0.33Pro CCT 2.00 10.26 0.33Pro CCC 2.00 10.26 0.33Gln CAG 0.00 0.00 0.00Gln CAA 4.00 20.51 1.00Arg AGG 0.00 0.00 0.00Arg AGA 1.00 5.13 0.25Arg CGG 0.00 0.00 0.00Arg CGA 0.00 0.00 0.00Arg CGT 2.00 10.26 0.50Arg CGC 1.00 5.13 0.25Ser AGT 1.00 5.13 0.13Ser AGC 4.00 20.51 0.50Ser TCG 0.00 0.00 0.00Ser TCA 1.00 5.13 0.13Ser TCT 1.00 5.13 0.13Ser TCC 1.00 5.13 0.13Thr ACG 3.00 15.38 0.25Thr ACA 7.00 35.90 0.58Thr ACT 2.00 10.26 0.17Thr ACC 0.00 0.00 0.00Val GTG 1.00 5.13 0.07Val GTA 6.00 30.77 0.43Val GTT 3.00 15.38 0.21Val GTC 4.00 20.51 0.29Trp TGG 0.00 0.00 0.00Tyr TAT 6.00 30.77 0.86Tyr TAC 1.00 5.13 0.14End TGA 0.00 0.00 0.00End TAG 0.00 0.00 0.00End TAA 1.00 5.13 1.00
  • 26. Results for 660 residue sequence "Vibrio cholerae CarbonicAnhydrase of 720 basesAmAcid Codon Number /1000 Fraction ..Ala GCG 7.00 31.82 0.44Ala GCA 2.00 9.09 0.13Ala GCT 3.00 13.64 0.19Ala GCC 4.00 18.18 0.25Cys TGT 0.00 0.00 0.00Cys TGC 2.00 9.09 1.00Asp GAT 5.00 22.73 0.71Asp GAC 2.00 9.09 0.29Glu GAG 7.00 31.82 0.50Glu GAA 7.00 31.82 0.50Phe TTT 4.00 18.18 0.57Phe TTC 3.00 13.64 0.43Gly GGG 7.00 31.82 0.41Gly GGA 3.00 13.64 0.18Gly GGT 4.00 18.18 0.24Gly GGC 3.00 13.64 0.18His CAT 8.00 36.36 0.73His CAC 3.00 13.64 0.27Ile ATA 1.00 4.55 0.17Ile ATT 2.00 9.09 0.33Ile ATC 3.00 13.64 0.50Lys AAG 3.00 13.64 0.30Lys AAA 7.00 31.82 0.70Leu TTG 5.00 22.73 0.22Leu TTA 2.00 9.09 0.09Leu CTG 5.00 22.73 0.22Leu CTA 2.00 9.09 0.09Leu CTT 5.00 22.73 0.22Leu CTC 4.00 18.18 0.17Met ATG 3.00 13.64 1.00Asn AAT 13.00 59.09 0.87Asn AAC 2.00 9.09 0.13
  • 27. Pro CCG 6.00 27.27 0.38Pro CCA 4.00 18.18 0.25Pro CCT 5.00 22.73 0.31Pro CCC 1.00 4.55 0.06Gln CAG 7.00 31.82 0.35Gln CAA 13.00 59.09 0.65Arg AGG 0.00 0.00 0.00Arg AGA 0.00 0.00 0.00Arg CGG 0.00 0.00 0.00Arg CGA 1.00 4.55 0.20Arg CGT 4.00 18.18 0.80Arg CGC 0.00 0.00 0.00Ser AGT 2.00 9.09 0.18Ser AGC 2.00 9.09 0.18Ser TCG 4.00 18.18 0.36Ser TCA 1.00 4.55 0.09Ser TCT 1.00 4.55 0.09Ser TCC 1.00 4.55 0.09Thr ACG 5.00 22.73 0.50Thr ACA 0.00 0.00 0.00Thr ACT 3.00 13.64 0.30Thr ACC 2.00 9.09 0.20Val GTG 5.00 22.73 0.29Val GTA 3.00 13.64 0.18Val GTT 6.00 27.27 0.35Val GTC 3.00 13.64 0.18Trp TGG 4.00 18.18 1.00Tyr TAT 3.00 13.64 0.60Tyr TAC 2.00 9.09 0.40End TGA 0.00 0.00 0.00End TAG 0.00 0.00 0.00End TAA 1.00 4.55 1.00
  • 28. Results for 372 residue sequence "Eschereshia coli CarbonicAnhydrase of 372 bases"AmAcid Codon Number /1000 Fraction ..Ala GCG 4.00 32.26 0.31Ala GCA 1.00 8.06 0.08Ala GCT 1.00 8.06 0.08Ala GCC 7.00 56.45 0.54Cys TGT 1.00 8.06 0.50Cys TGC 1.00 8.06 0.50Asp GAT 4.00 32.26 0.57Asp GAC 3.00 24.19 0.43Glu GAG 2.00 16.13 0.67Glu GAA 1.00 8.06 0.33Phe TTT 5.00 40.32 0.45Phe TTC 6.00 48.39 0.55Gly GGG 0.00 0.00 0.00Gly GGA 2.00 16.13 0.25Gly GGT 3.00 24.19 0.38Gly GGC 3.00 24.19 0.38His CAT 3.00 24.19 0.75His CAC 1.00 8.06 0.25Ile ATA 1.00 8.06 0.20Ile ATT 0.00 0.00 0.00Ile ATC 4.00 32.26 0.80Lys AAG 2.00 16.13 1.00Lys AAA 0.00 0.00 0.00Leu TTG 4.00 32.26 0.40Leu TTA 1.00 8.06 0.10Leu CTG 2.00 16.13 0.20Leu CTA 0.00 0.00 0.00Leu CTT 1.00 8.06 0.10Leu CTC 2.00 16.13 0.20Met ATG 2.00 16.13 1.00
  • 29. Asn AAT 3.00 24.19 0.75Asn AAC 1.00 8.06 0.25Pro CCG 0.00 0.00 0.00Pro CCA 4.00 32.26 0.57Pro CCT 0.00 0.00 0.00Pro CCC 3.00 24.19 0.43Gln CAG 9.00 72.58 0.75Gln CAA 3.00 24.19 0.25Arg AGG 0.00 0.00 0.00Arg AGA 0.00 0.00 0.00Arg CGG 2.00 16.13 0.50Arg CGA 0.00 0.00 0.00Arg CGT 0.00 0.00 0.00Arg CGC 2.00 16.13 0.50Ser AGT 2.00 16.13 1.00Ser AGC 0.00 0.00 0.00Ser TCG 0.00 0.00 0.00Ser TCA 0.00 0.00 0.00Ser TCT 0.00 0.00 0.00Ser TCC 0.00 0.00 0.00Thr ACG 4.00 32.26 0.67Thr ACA 1.00 8.06 0.17Thr ACT 0.00 0.00 0.00Thr ACC 1.00 8.06 0.17Val GTG 7.00 56.45 0.37Val GTA 3.00 24.19 0.16Val GTT 8.00 64.52 0.42Val GTC 1.00 8.06 0.05Trp TGG 0.00 0.00 0.00Tyr TAT 0.00 0.00 0.00Tyr TAC 1.00 8.06 1.00End TGA 2.00 16.13 1.00End TAG 0.00 0.00 0.00End TAA 0.00 0.00 0.00
  • 30. Results for 660 residue sequence "456 Ecoli Carbonic anhydraseFinal 2"AmAcid Codon Number /1000 Fraction ..Ala GCG 9.00 40.91 0.30Ala GCA 4.00 18.18 0.13Ala GCT 7.00 31.82 0.23Ala GCC 10.00 45.45 0.33Cys TGT 4.00 18.18 0.67Cys TGC 2.00 9.09 0.33Asp GAT 4.00 18.18 0.44Asp GAC 5.00 22.73 0.56Glu GAG 7.00 31.82 0.58Glu GAA 5.00 22.73 0.42Phe TTT 5.00 22.73 0.63Phe TTC 3.00 13.64 0.38Gly GGG 2.00 9.09 0.17Gly GGA 1.00 4.55 0.08Gly GGT 2.00 9.09 0.17Gly GGC 7.00 31.82 0.58His CAT 4.00 18.18 0.67His CAC 2.00 9.09 0.33Ile ATA 1.00 4.55 0.08Ile ATT 8.00 36.36 0.62Ile ATC 4.00 18.18 0.31Lys AAG 1.00 4.55 0.20Lys AAA 4.00 18.18 0.80Leu TTG 3.00 13.64 0.18Leu TTA 1.00 4.55 0.06Leu CTG 8.00 36.36 0.47Leu CTA 1.00 4.55 0.06Leu CTT 3.00 13.64 0.18Leu CTC 1.00 4.55 0.06Met ATG 4.00 18.18 1.00
  • 31. Asn AAT 4.00 18.18 0.57Asn AAC 3.00 13.64 0.43Pro CCG 7.00 31.82 0.47Pro CCA 2.00 9.09 0.13Pro CCT 5.00 22.73 0.33Pro CCC 1.00 4.55 0.07Gln CAG 6.00 27.27 0.60Gln CAA 4.00 18.18 0.40Arg AGG 0.00 0.00 0.00Arg AGA 0.00 0.00 0.00Arg CGG 3.00 13.64 0.19Arg CGA 0.00 0.00 0.00Arg CGT 4.00 18.18 0.25Arg CGC 9.00 40.91 0.56Ser AGT 0.00 0.00 0.00Ser AGC 5.00 22.73 0.29Ser TCG 2.00 9.09 0.12Ser TCA 2.00 9.09 0.12Ser TCT 2.00 9.09 0.12Ser TCC 6.00 27.27 0.35Thr ACG 1.00 4.55 0.14Thr ACA 2.00 9.09 0.29Thr ACT 1.00 4.55 0.14Thr ACC 3.00 13.64 0.43Val GTG 6.00 27.27 0.32Val GTA 2.00 9.09 0.11Val GTT 4.00 18.18 0.21Val GTC 7.00 31.82 0.37Trp TGG 2.00 9.09 1.00Tyr TAT 2.00 9.09 0.50Tyr TAC 2.00 9.09 0.50End TGA 0.00 0.00 0.00End TAG 0.00 0.00 0.00End TAA 1.00 4.55 1.00
  • 32. Results for 663 residue sequence "123 Ecoli carbonic AnhydraseFinal"AmAcid Codon Number /1000 Fraction ..Ala GCG 6.00 27.15 0.32Ala GCA 2.00 9.05 0.11Ala GCT 2.00 9.05 0.11Ala GCC 9.00 40.72 0.47Cys TGT 1.00 4.52 0.17Cys TGC 5.00 22.62 0.83Asp GAT 5.00 22.62 0.71Asp GAC 2.00 9.05 0.29Glu GAG 5.00 22.62 0.83Glu GAA 1.00 4.52 0.17Phe TTT 9.00 40.72 0.45Phe TTC 11.00 49.77 0.55Gly GGG 1.00 4.52 0.06Gly GGA 4.00 18.10 0.25Gly GGT 7.00 31.67 0.44Gly GGC 4.00 18.10 0.25His CAT 5.00 22.62 0.56His CAC 4.00 18.10 0.44Ile ATA 2.00 9.05 0.18Ile ATT 2.00 9.05 0.18Ile ATC 7.00 31.67 0.64Lys AAG 4.00 18.10 0.50Lys AAA 4.00 18.10 0.50Leu TTG 6.00 27.15 0.38Leu TTA 1.00 4.52 0.06Leu CTG 3.00 13.57 0.19Leu CTA 0.00 0.00 0.00Leu CTT 1.00 4.52 0.06Leu CTC 5.00 22.62 0.31Met ATG 2.00 9.05 1.00
  • 33. Asn AAT 8.00 36.20 0.50Asn AAC 8.00 36.20 0.50Pro CCG 0.00 0.00 0.00Pro CCA 6.00 27.15 0.60Pro CCT 0.00 0.00 0.00Pro CCC 4.00 18.10 0.40Gln CAG 13.00 58.82 0.81Gln CAA 3.00 13.57 0.19Arg AGG 1.00 4.52 0.14Arg AGA 0.00 0.00 0.00Arg CGG 4.00 18.10 0.57Arg CGA 0.00 0.00 0.00Arg CGT 0.00 0.00 0.00Arg CGC 2.00 9.05 0.29Ser AGT 1.00 4.52 0.33Ser AGC 1.00 4.52 0.33Ser TCG 0.00 0.00 0.00Ser TCA 0.00 0.00 0.00Ser TCT 0.00 0.00 0.00Ser TCC 1.00 4.52 0.33Thr ACG 6.00 27.15 0.50Thr ACA 3.00 13.57 0.25Thr ACT 1.00 4.52 0.08Thr ACC 2.00 9.05 0.17Val GTG 10.00 45.25 0.36Val GTA 3.00 13.57 0.11Val GTT 11.00 49.77 0.39Val GTC 4.00 18.10 0.14Trp TGG 0.00 0.00 0.00Tyr TAT 1.00 4.52 0.33Tyr TAC 2.00 9.05 0.67End TGA 3.00 13.57 0.50End TAG 2.00 9.05 0.33End TAA 1.00 4.52 0.17
  • 34. Results for 675 residue sequence "Truepera radiovictrix DSM1703Carbo Anhyd consisting of 675 bases"AmAcid Codon Number /1000 Fraction ..Ala GCG 12.00 53.33 0.41Ala GCA 3.00 13.33 0.10Ala GCT 2.00 8.89 0.07Ala GCC 12.00 53.33 0.41Cys TGT 1.00 4.44 0.50Cys TGC 1.00 4.44 0.50Asp GAT 6.00 26.67 0.46Asp GAC 7.00 31.11 0.54Glu GAG 15.00 66.67 0.88Glu GAA 2.00 8.89 0.12Phe TTT 0.00 0.00 0.00Phe TTC 1.00 4.44 1.00Gly GGG 11.00 48.89 0.33Gly GGA 2.00 8.89 0.06Gly GGT 5.00 22.22 0.15Gly GGC 15.00 66.67 0.45His CAT 2.00 8.89 0.18His CAC 9.00 40.00 0.82Ile ATA 0.00 0.00 0.00Ile ATT 0.00 0.00 0.00Ile ATC 0.00 0.00 0.00Lys AAG 2.00 8.89 0.67Lys AAA 1.00 4.44 0.33Leu TTG 2.00 8.89 0.10Leu TTA 0.00 0.00 0.00Leu CTG 6.00 26.67 0.29Leu CTA 0.00 0.00 0.00Leu CTT 2.00 8.89 0.10Leu CTC 11.00 48.89 0.52Met ATG 0.00 0.00 0.00Asn AAT 1.00 4.44 0.50
  • 35. Asn AAC 1.00 4.44 0.50Pro CCG 4.00 17.78 0.25Pro CCA 1.00 4.44 0.06Pro CCT 1.00 4.44 0.06Pro CCC 10.00 44.44 0.63Gln CAG 6.00 26.67 0.50Gln CAA 6.00 26.67 0.50Arg AGG 0.00 0.00 0.00Arg AGA 0.00 0.00 0.00Arg CGG 7.00 31.11 0.21Arg CGA 3.00 13.33 0.09Arg CGT 8.00 35.56 0.24Arg CGC 15.00 66.67 0.45Ser AGT 0.00 0.00 0.00Ser AGC 4.00 17.78 0.80Ser TCG 0.00 0.00 0.00Ser TCA 1.00 4.44 0.20Ser TCT 0.00 0.00 0.00Ser TCC 0.00 0.00 0.00Thr ACG 1.00 4.44 0.50Thr ACA 1.00 4.44 0.50Thr ACT 0.00 0.00 0.00Thr ACC 0.00 0.00 0.00Val GTG 7.00 31.11 0.32Val GTA 3.00 13.33 0.14Val GTT 3.00 13.33 0.14Val GTC 9.00 40.00 0.41Trp TGG 0.00 0.00 0.00Tyr TAT 0.00 0.00 0.00Tyr TAC 0.00 0.00 0.00End TGA 0.00 0.00 0.00End TAG 1.00 4.44 0.33End TAA 2.00 8.89 0.67
  • 36. Results for 765 residue sequence " Salinispora arenicola "AmAcid Codon Number /1000 Fraction ..Ala GCG 17.00 68.55 0.47Ala GCA 3.00 12.10 0.08Ala GCT 4.00 16.13 0.11Ala GCC 12.00 48.39 0.33Cys TGT 3.00 12.10 0.75Cys TGC 1.00 4.03 0.25Asp GAT 1.00 4.03 0.08Asp GAC 12.00 48.39 0.92Glu GAG 11.00 44.35 1.00Glu GAA 0.00 0.00 0.00Phe TTT 1.00 4.03 0.25Phe TTC 3.00 12.10 0.75Gly GGG 8.00 32.26 0.26Gly GGA 6.00 24.19 0.19Gly GGT 7.00 28.23 0.23Gly GGC 10.00 40.32 0.32His CAT 1.00 4.03 0.13His CAC 7.00 28.23 0.88Ile ATA 0.00 0.00 0.00Ile ATT 1.00 4.03 0.10Ile ATC 9.00 36.29 0.90Lys AAG 0.00 0.00 0.00Lys AAA 0.00 0.00 0.00Leu TTG 1.00 4.03 0.07Leu TTA 0.00 0.00 0.00Leu CTG 5.00 20.16 0.36Leu CTA 0.00 0.00 0.00Leu CTT 3.00 12.10 0.21Leu CTC 5.00 20.16 0.36Met ATG 2.00 8.06 1.00Asn AAT 0.00 0.00 0.00Asn AAC 2.00 8.06 1.00
  • 37. Pro CCG 10.00 40.32 0.53Pro CCA 4.00 16.13 0.21Pro CCT 1.00 4.03 0.05Pro CCC 4.00 16.13 0.21Gln CAG 8.00 32.26 1.00Gln CAA 0.00 0.00 0.00Arg AGG 0.00 0.00 0.00Arg AGA 0.00 0.00 0.00Arg CGG 7.00 28.23 0.39Arg CGA 2.00 8.06 0.11Arg CGT 5.00 20.16 0.28Arg CGC 4.00 16.13 0.22Ser AGT 1.00 4.03 0.07Ser AGC 4.00 16.13 0.27Ser TCG 2.00 8.06 0.13Ser TCA 0.00 0.00 0.00Ser TCT 1.00 4.03 0.07Ser TCC 7.00 28.23 0.47Thr ACG 3.00 12.10 0.20Thr ACA 2.00 8.06 0.13Thr ACT 0.00 0.00 0.00Thr ACC 10.00 40.32 0.67Val GTG 18.00 72.58 0.53Val GTA 2.00 8.06 0.06Val GTT 6.00 24.19 0.18Val GTC 8.00 32.26 0.24Trp TGG 0.00 0.00 0.00Tyr TAT 0.00 0.00 0.00Tyr TAC 3.00 12.10 1.00End TGA 0.00 0.00 0.00End TAG 0.00 0.00 0.00End TAA 1.00 4.03 1.00
  • 38. Results for 774 residue sequence " abc Salinispora " startingAmAcid Codon Number /1000 Fraction ..Ala GCG 13.00 52.85 0.37Ala GCA 5.00 20.33 0.14Ala GCT 2.00 8.13 0.06Ala GCC 15.00 60.98 0.43Cys TGT 1.00 4.07 0.33Cys TGC 2.00 8.13 0.67Asp GAT 3.00 12.20 0.30Asp GAC 7.00 28.46 0.70Glu GAG 6.00 24.39 0.55Glu GAA 5.00 20.33 0.45Phe TTT 2.00 8.13 0.40Phe TTC 3.00 12.20 0.60Gly GGG 5.00 20.33 0.23Gly GGA 3.00 12.20 0.14Gly GGT 4.00 16.26 0.18Gly GGC 10.00 40.65 0.45His CAT 1.00 4.07 0.33His CAC 2.00 8.13 0.67Ile ATA 0.00 0.00 0.00Ile ATT 1.00 4.07 0.17Ile ATC 5.00 20.33 0.83Lys AAG 5.00 20.33 1.00Lys AAA 0.00 0.00 0.00Leu TTG 0.00 0.00 0.00Leu TTA 0.00 0.00 0.00Leu CTG 8.00 32.52 0.32Leu CTA 2.00 8.13 0.08Leu CTT 7.00 28.46 0.28Leu CTC 8.00 32.52 0.32Met ATG 3.00 12.20 1.00
  • 39. Asn AAT 1.00 4.07 0.17Asn AAC 5.00 20.33 0.83Pro CCG 9.00 36.59 0.45Pro CCA 3.00 12.20 0.15Pro CCT 2.00 8.13 0.10Pro CCC 6.00 24.39 0.30Gln CAG 6.00 24.39 0.67Gln CAA 3.00 12.20 0.33Arg AGG 1.00 4.07 0.06Arg AGA 2.00 8.13 0.11Arg CGG 7.00 28.46 0.39Arg CGA 1.00 4.07 0.06Arg CGT 5.00 20.33 0.28Arg CGC 2.00 8.13 0.11Ser AGT 4.00 16.26 0.20Ser AGC 2.00 8.13 0.10Ser TCG 6.00 24.39 0.30Ser TCA 2.00 8.13 0.10Ser TCT 1.00 4.07 0.05Ser TCC 5.00 20.33 0.25Thr ACG 4.00 16.26 0.27Thr ACA 2.00 8.13 0.13Thr ACT 2.00 8.13 0.13Thr ACC 7.00 28.46 0.47Val GTG 13.00 52.85 0.57Val GTA 1.00 4.07 0.04Val GTT 1.00 4.07 0.04Val GTC 8.00 32.52 0.35Trp TGG 2.00 8.13 1.00Tyr TAT 2.00 8.13 0.50Tyr TAC 2.00 8.13 0.50End TGA 1.00 4.07 1.00End TAG 0.00 0.00 0.00End TAA 0.00 0.00 0.00
  • 40. Results for 488 residue sequence "Frankia CcI Carbonic Anhydrase1 of 488 bases"AmAcid Codon Number /1000 Fraction ..Ala GCG 7.00 43.21 0.39Ala GCA 4.00 24.69 0.22Ala GCT 2.00 12.35 0.11Ala GCC 5.00 30.86 0.28Cys TGT 1.00 6.17 0.20Cys TGC 4.00 24.69 0.80Asp GAT 0.00 0.00 0.00Asp GAC 1.00 6.17 1.00Glu GAG 0.00 0.00 0.00Glu GAA 0.00 0.00 0.00Phe TTT 0.00 0.00 0.00Phe TTC 1.00 6.17 1.00Gly GGG 1.00 6.17 0.20Gly GGA 2.00 12.35 0.40Gly GGT 0.00 0.00 0.00Gly GGC 2.00 12.35 0.40His CAT 0.00 0.00 0.00His CAC 1.00 6.17 1.00Ile ATA 1.00 6.17 0.50Ile ATT 1.00 6.17 0.50Ile ATC 0.00 0.00 0.00Lys AAG 2.00 12.35 1.00Lys AAA 0.00 0.00 0.00Leu TTG 3.00 18.52 0.50Leu TTA 2.00 12.35 0.33Leu CTG 0.00 0.00 0.00Leu CTA 0.00 0.00 0.00Leu CTT 0.00 0.00 0.00Leu CTC 1.00 6.17 0.17Met ATG 1.00 6.17 1.00
  • 41. Asn AAT 1.00 6.17 0.33Asn AAC 2.00 12.35 0.67Pro CCG 17.00 104.94 0.63Pro CCA 4.00 24.69 0.15Pro CCT 4.00 24.69 0.15Pro CCC 2.00 12.35 0.07Gln CAG 1.00 6.17 1.00Gln CAA 0.00 0.00 0.00Arg AGG 3.00 18.52 0.14Arg AGA 4.00 24.69 0.18Arg CGG 0.00 0.00 0.00Arg CGA 6.00 37.04 0.27Arg CGT 4.00 24.69 0.18Arg CGC 5.00 30.86 0.23Ser AGT 2.00 12.35 0.06Ser AGC 2.00 12.35 0.06Ser TCG 8.00 49.38 0.24Ser TCA 12.00 74.07 0.35Ser TCT 2.00 12.35 0.06Ser TCC 8.00 49.38 0.24Thr ACG 15.00 92.59 0.63Thr ACA 4.00 24.69 0.17Thr ACT 2.00 12.35 0.08Thr ACC 3.00 18.52 0.13Val GTG 0.00 0.00 0.00Val GTA 0.00 0.00 0.00Val GTT 1.00 6.17 1.00Val GTC 0.00 0.00 0.00Trp TGG 4.00 24.69 1.00Tyr TAT 0.00 0.00 0.00Tyr TAC 1.00 6.17 1.00End TGA 3.00 18.52 1.00End TAG 0.00 0.00 0.00End TAA 0.00 0.00 0.00
  • 42. Results for 618 residue sequence "xyz Frankia CcI3 CarbonicAnhydrase 2 of 618 bases"67AmAcid Codon Number /1000 Fraction ..Ala GCG 6.00 29.13 0.27Ala GCA 3.00 14.56 0.14Ala GCT 3.00 14.56 0.14Ala GCC 10.00 48.54 0.45Cys TGT 2.00 9.71 0.67Cys TGC 1.00 4.85 0.33Asp GAT 6.00 29.13 0.35Asp GAC 11.00 53.40 0.65Glu GAG 8.00 38.83 0.67Glu GAA 4.00 19.42 0.33Phe TTT 1.00 4.85 0.25Phe TTC 3.00 14.56 0.75Gly GGG 5.00 24.27 0.31Gly GGA 1.00 4.85 0.06Gly GGT 6.00 29.13 0.38Gly GGC 4.00 19.42 0.25His CAT 3.00 14.56 0.38His CAC 5.00 24.27 0.63Ile ATA 0.00 0.00 0.00Ile ATT 1.00 4.85 0.17Ile ATC 5.00 24.27 0.83Lys AAG 2.00 9.71 1.00Lys AAA 0.00 0.00 0.00Leu TTG 3.00 14.56 0.15Leu TTA 0.00 0.00 0.00Leu CTG 10.00 48.54 0.50Leu CTA 1.00 4.85 0.05Leu CTT 2.00 9.71 0.10Leu CTC 4.00 19.42 0.20Met ATG 1.00 4.85 1.00
  • 43. Asn AAT 0.00 0.00 0.00Asn AAC 1.00 4.85 1.00Pro CCG 5.00 24.27 0.42Pro CCA 0.00 0.00 0.00Pro CCT 1.00 4.85 0.08Pro CCC 6.00 29.13 0.50Gln CAG 5.00 24.27 1.00Gln CAA 0.00 0.00 0.00Arg AGG 2.00 9.71 0.13Arg AGA 0.00 0.00 0.00Arg CGG 6.00 29.13 0.38Arg CGA 0.00 0.00 0.00Arg CGT 2.00 9.71 0.13Arg CGC 6.00 29.13 0.38Ser AGT 0.00 0.00 0.00Ser AGC 4.00 19.42 0.36Ser TCG 4.00 19.42 0.36Ser TCA 0.00 0.00 0.00Ser TCT 0.00 0.00 0.00Ser TCC 3.00 14.56 0.27Thr ACG 5.00 24.27 0.33Thr ACA 0.00 0.00 0.00Thr ACT 0.00 0.00 0.00Thr ACC 10.00 48.54 0.67Val GTG 14.00 67.96 0.47Val GTA 1.00 4.85 0.03Val GTT 4.00 19.42 0.13Val GTC 11.00 53.40 0.37Trp TGG 1.00 4.85 1.00Tyr TAT 1.00 4.85 0.33Tyr TAC 2.00 9.71 0.67End TGA 0.00 0.00 0.00End TAG 1.00 4.85 1.00End TAA 0.00 0.00 0.00
  • 44. From the above list, we conclude two things- 1) The codon-plot of the different gene o.r.f.s from the same organism are the same except at some minor points. 2) The codon-plot of the organisms only confirm our suspicion while analyzing the peptide sequences that choice of codons is different to suit the G-C content of the organism.
  • 45. Corrections-We undertake this because we noticed that gene products of Methanococcus voltae andFrankia were not starting with amino-acid Methionine.Methanococcus voltae corrections-The mistake seems to be in the database from where sequence has been downloaded. The DNA seq.had ‘ata’ instead of ‘atg’.Frankia sp CcI3 corrections-The mistake seems to have been in the sequence again. The DNA seq. began 27 bp before and theclaimed starting site of the protein actually coded for Valine.
  • 46. Conclusion:After studying the three analysis we did with the protein, DNA and the ORF codons,weconclude the following- 1) Bacteria choose codons based on its G-C composition to get same amino acid for creation of protein. G-C rich codon of course gets preference for G-C rich bacteria. Similarly and conversely, A-T rich codon gets preference for G-C poor bacteria. 2) If same amino acid is not there, a synonymous amino acid is used having the same or near about same chemical properties. 3) High G-C content bacteria often employ two different genes for same purpose. The finding of two possible genes in their genome for Carbonic Anhydrase is the proof for such a statement. 4) Most bacteria use Zinc at the metal site yet a small number of bacteria use Cadmium and other metals. 5) Even if they are of varied length, one may look for Serine and Glycine on the peptide chain and see that this region is conserved in all protein,. This is because the protein domains must be similar for all the anhydrases.