• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to Bioinformatics
 

Introduction to Bioinformatics

on

  • 548 views

Elements of Bioinformatics

Elements of Bioinformatics

Statistics

Views

Total Views
548
Views on SlideShare
548
Embed Views
0

Actions

Likes
0
Downloads
46
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to Bioinformatics Introduction to Bioinformatics Presentation Transcript

    • V. K. SinghInformation officerCentre for BioinformaticsBanaras Hindu UniversityIntroduction to Bioinformatics
    • What is Bioinformatics“The analysis of biological information usingcomputers and statistical techniques;the science of developing and utilizingcomputer databases and algorithms toaccelerate and enhance biological research”www.niehs.nih.gov/dert/trc/glossary.htm1: Introduction
    • What Bioinformatics can offer tobiologists?1: Introduction
    • 1: IntroductionComputational biology –Insilico genome revolution atthe turn of the century.
    • •Life was classified asplants and animals•When Bacteria were discoveredthey were initially classified as plants.•Ernst Haeckel (1866) placed all unicellularorganisms in a kingdom called Protista,separated from Plantae and Animalia.In the very beginning1: Introduction
    • 1: Introduction
    • Thus, life were classified to 5 kingdoms:When electron microscopes were developed, it wasfound that Protista in fact include both cells with andwithout nucleus. Also, fungi were found to differ fromplants, since they are heterotrophs (they do notsynthesize their food).LIFEFungiPlants Animals ProtistsProcaryotes1: Introduction
    • Later, plants, animals, protists and fungi werecollectively called the Eucarya domain, and theprocaryotes were shifted from a kingdom to be aBacteria domain.Domains EucaryaBacteriaFungiPlants Animals ProtistsKingdomsEven later, a new Domain was discovered…1: Introduction
    • rRNA was sequenced from a greatnumber of organisms to study phylogeny1: Introduction
    • Revolutionizing the Classification of Life1: IntroductionThe rRNA phylogenetic tree
    • From sequence analysis only, it was thusestablished that life is divided into 3:BacteriaArchaeaEucarya1: Introduction
    • Gregor Mendellaws of inheritance,“gene”1866Watson and CrickDNA Discovery1953GenomeProject20031: Introduction
    • Sequencing of Genomes
    • Genomic Sequencing – shotgun sequencingSequencing is usually ~700 bp in a single run.How can we sequence a genome?1: Introduction
    • Genomic Sequencing – Walking.1.Design a primer2.Sequence.3.Design a new primer4.Sequence5.…One has to designnew primers everytime. To do so, onehas to wait for thesequencing results1: Introduction
    • GAGGAGACGAACACCCGTATACAGTCGACGACCCCGAGGAGACGAACACCCGTATACAGTCGACGTTTATATATAGTATACAGTCGACGTTTATATATAACCCCGAGGAGACGAGenomic Sequencing – shotgunsequencing1. Break DNA to small pieces2. Sequence each piece3. Assemble1: Introduction
    • GAGGAGACGAACACCCGTATACAGTCGACGACCCCGAGGAGACGA ? GTATACAGTCGACGTTTATATATAGTATACAGTCGACGTTTATATATAACCCCGAGGAGACGAShotgun sequencing – why isn’t it a trivialtask?1. By chance, some parts are not sequencedeven once!!!1: Introduction
    • Shotgun sequencing – why isn’t it a trivial task?2. Some pieces do not align because ofsequencing errorsGAGGTGAGGAACACCCGTATACAGTCGACGACCCCGAGG?GA?GAACACCCGTATACAGTCGACGTTTATATATAACCCCGAGGAGACGA1: Introduction
    • Shotgun sequencing – why not a trivial task?3. Repetitive sequences –satellites DNA.GGGGGGGGGGGGGGGGGGGGGGGGGGGGACCCCGGGGGGGGGGGGG????GGGGGGGGGGGGGAGGGGGGGGGGGGGGGGGGGGGGAACCCCGGGGG1: Introduction
    • A section of the genome that could bereliably assembled.A contig1: Introduction
    • 23BIOINFORMATICS DATABASES
    • 24What’s in a database?• Sequences – genes, proteins, etc…• Full genomes• Expression data• Structures• Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases• Publications
    • 25NCBI and Entrez• One of the most largest and comprehensivedatabases belonging to the NIH (national institute ofhealth. The primary Federal agency for conductingand supporting medical research in the USA)• Entrez is the search engine of NCBI• Search for :genes, proteins, genomes, structures, diseases,publications, and morehttp://www.ncbi.nlm.nih.gov
    • 32PubMed: NCBI’s database of biomedicalarticlesYang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometryYang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometryof human immunodeficiency virus type 1 envelope glycoprotein trimersof human immunodeficiency virus type 1 envelope glycoprotein trimersduring virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.
    • 33Use fields!Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]For the full list of field tags: go to help -> Search Field Descriptions and Tags
    • 34Example• Retrieve all publications in which the firstauthor is: Davidovich C and the last author is:Yonath A
    • 35Using limitsRetrieve the publications ofYonath A, in the journals:Nature and Proc Natl AcadSci U S A., in the last 5 years
    • 36Searching NCBI for the proteinhuman CD4Search demonstrationSearch demonstration
    • 37
    • 38Using field descriptions, qualifiers, andboolean operators• Cd4[GENE] AND human[ORGN]OrCd4[gene name] AND human[organism]• List of field codes: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers– Boolean Operators:ANDORNOTNote: do not use the field Protein name [PROT], only GENE!
    • 39This time we directly search in the protein databaseThis time we directly search in the protein database
    • 40RefSeq• Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA,transcript (RNA), and protein products)
    • 41
    • 42An explanation on GenBank records
    • 4343Fasta format> gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPISave accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1headerID/accession descriptionsequence
    • 4444Downloading
    • Homology SearchUsingSequence Alignment
    • || || ||||| ||| || || |||||||||||||||||||MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE…ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACGTGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAGGAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACTGATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCAGAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAGGTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACAACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGTCATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGCATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTTTCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACAATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTTTCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTACTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAGGGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGGTTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAACAAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGTCTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAAGGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCCCTGGCTCACAAGTACCATTGAMVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE…Before we begin…
    • What is sequence alignment?Alignment: Comparing two (pairwise) or more(multiple) sequences. Searching for a series ofidentical or similar characters in thesequences.MVNLTSDEKTAVLALWNKVDVEDCGGE|| || ||||| ||| || || ||MVHLTPEEKTAVNALWGKVNVDAVGGE
    • Why sequence alignment?Predict characteristics of a protein –use the structure or function information on knownproteins with similar sequences available indatabases in order to predict the structure orfunction of an unknown proteinAssumptions: similar sequences producesimilar proteins
    • Local vs. Global• Global alignment – finds the bestalignment across the whole twosequences.• Local alignment – finds regions ofhigh similarity in parts of thesequences.ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN-CDRYYQADLG CDRYFQ|||| |||| |ADLG CDRYYQGlobalalignment:forcesalignment inregions whichdifferLocalalignmentconcentrateson regions ofhigh similarity
    • In the course of evolution, the sequences changed from theancestral sequence by random mutationsThree types of changes:1. Insertion - an insertion of a letter or several letters to the sequence.AAGA AAGTASequence evolutionAAGAAGAAInsertionInsertion
    • In the course of evolution, the sequences changed from theancestral sequence by random mutationsThree types of changes :1. Insertion - an insertion of a letter or several letters to the sequence.AAGA AAGTA2. Deletion – a deletion of a letter (or more) from the sequence.AAGA AGASequence evolutionAA AGAGDeletionDeletionAA
    • In the course of evolution, the sequences changed from theancestral sequence by random mutationsThree types of mutations:1. Insertion - an insertion of a letter or several letters to the sequence.AAGA AAGTA2. Deletion - deleting a letter (or more) from the sequence.AAGA AGA1. Substitution – a replacement of one (or more) sequence letter byanotherAAGA AACAEvolutionary changes in sequencesAAAA AASubstitutionSubstitutionGGCCInsertionInsertion ++ DeletionDeletion  IndelIndel
    • Sequence alignmentAAGCTGAATTCGAAAGGCTCATTTCTGAAAGCTGAATT-C-GAAAGGCT-CATTTCTGA-One possible alignment:This alignment includes:2 mismatches4 indels (gap)10 perfect matches
    • Choosing an alignment:• Many different alignments are possible:AAGCTGAATTCGAAAGGCTCATTTCTGAA-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-Which alignment is better?AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-
    • Scoring an alignment:example - naïve scoring system:• Match: +1• Mismatch: -2• Indel: -1AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-Score: = (+1)x10 + (-2)x2 + (-1)x4 = 2 Score: = (+1)x9 + (-2)x2 + (-1)x6 = -1A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-Higher score  Better alignment
    • Scoring system:• Different scoring systems can produce differentoptimal alignments• Scoring systems implicitly represent a particulartheory of similarity/dissimilarity betweensequence characters: evolution based, physico-chemical properties based– Some mismatches are more plausible• Transition vs. Transversion• LysArg ≠ LysCys– Gap extension Vs. Gap opening
    • Substitutions Matrices• Nucleic acids:– Transition-transversion• Amino acids:– Evolution (empirical data) based: (PAM, BLOSUM)– Physico-chemical properties based (Grantham,McLachlan)
    • Web server for pairwise alignment
    • BLAST 2 sequences (bl2Seq) at NCBIProduces the local alignment of two givensequences using BLAST (Basic Local AlignmentSearch Tool) engine for local alignment• Does not use an exact algorithm but aheuristic
    • Back to NCBI
    • BLAST – bl2seq
    • blastnblastn – nucleotide– nucleotideblastpblastp – protein– proteinBl2Seq - query
    • Bl2seq results
    • Bl2seq resultsMatchMatch DissimilarityDissimilarityGapsGaps SimilaritySimilarity LowLowcomplexitycomplexity
    • Bl2seq results:• Bits score – A score for the alignment according tothe number of similarities, identities, etc.• Expected-score (E-value) –The number of alignmentswith the same score one can “expect” to see bychance when searching a database of a particularsize. The closer the e-value approaches zero, thegreater the confidence that the hit is real
    • BLAST – programsQuery: DNA ProteinDatabase: DNA Protein
    • BLAST – Blastp
    • Blastp - results
    • Blastp – results (cont’)
    • Blastp – acquiring sequences
    • blastp – acquiring sequences(cont’)
    • Multiple SequenceAlignment (MSA)andPhylogeny
    • One of the options to get multiplesequence Fasta file
    • One of the options to get multiplesequence Fasta file
    • Input: multiple sequence Fasta file>gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein [Homo sapiens]MNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS>gi|114051746|ref|NP_001040585.1| protease, serine, 2 [Macaca mulatta]MNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANS>gi|6755891|ref|NP_035775.1| mesotrypsin [Mus musculus]MNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN>gi|6981422|ref|NP_036861.1| protease, serine, 2 [Rattus norvegicus]MRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN>gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen [Bos taurus]MHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECL. . .
    • Input: multiple sequence Fasta file>gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein [Homo sapiens]MNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS>gi|114051746|ref|NP_001040585.1| protease, serine, 2 [Macaca mulatta]MNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANS>gi|6755891|ref|NP_035775.1| mesotrypsin [Mus musculus]MNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN>gi|6981422|ref|NP_036861.1| protease, serine, 2 [Rattus norvegicus]MRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN>gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen [Bos taurus]MHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECL. . .
    • Step1: Load the sequences
    • Sequences and conservation view
    • Step2: Perform Alignment
    • Sequences and conservation view
    • Sequences and conservation view
    • Step 3: Create tree
    • Step 4: NJPlot
    • • We need some statistical way to estimate theconfidence in the tree topology• But we don’t know anything about the treetopology distribution or parameters• The only data source we have is our data(MSA)• So, we must rely on our own resources: “pullup by your own bootstraps”How robust is our tree?
    • Bootstrap1. Resample K positions n times12345 K1 : ATCTG…A2 : ATCTG…C3 : ACTTA…CN : ACCTA…T11244 K1 : AATTT…T2 : AATTT…G3 : AACTT…TN : AACTT…T47789…K1 : TTTAT…T2 : TAACC…G3 : TAACC…TN : TGGGA…T15578… K1 : AGGTA…T2 : AGGAC…G3 : AAAAC…AN : AAAGG…C
    • Bootstrap2. Reconstruct a tree from each data set using the samemethod used for reconstructing the original treeSp1Sp2Sp3Sp4Sp1Sp2Sp3Sp4Sp1Sp2Sp3Sp411244 K1 : AATTT…T2 : AATTT…G3 : AACTT…TN : AACTT…T47789…K1 : TTTAT…T2 : TAACC…G3 : TAACC…TN : TGGGA…T15578… K1 : AGGTA…T2 : AGGAC…G3 : AAAAC…AN : AAAGG…C
    • Bootstrap3. For each node in our original tree, we count the numberof times it appeared in the bootstrap analysisSp1Sp2Sp3Sp4Sp1Sp2Sp3Sp4Sp1Sp2Sp3Sp4Sp1Sp2Sp3Sp467%100%
    • Step 3.5 - Bootstrap
    • Bootstrap values on NJPlotNote:ClustalX saves trees as .ph filetrees with bootstrap are savedas .phbYou might have to reopen thetree…
    • Protein information Resource• Swissprot• PDB
    • 91Swissprot• A protein sequence database which strives toprovide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants• One entry for each proteinhttp://www.expasy.ch/sprot
    • 92
    • 93PDB: Protein Data Bank• Main database of 3D structures ofmacromolecules• Includes ~61,000 entries (proteins, nucleicacids, complex assemblies)• Is highly redundanthttp://www.rcsb.org
    • 94Human CD4 in complex with HIV gp120gp120CD4PDB ID 1G9M
    • What do bioinformaticians study?• Bioinformatics today is part of almost everymolecular biological research.• Just a few examples…1: Introduction
    • Example 1• Compare proteins with similar sequences (for instance–kinases) and understand what the similarities anddifferences mean1: Introduction
    • Example 2• Look at the genome and predict where genesare (promoters; transcription binding sites;introns; exons)1: Introduction
    • • Predict the 3-dimensional structure of aprotein from its primary sequenceExample 3Ab-initioprediction –extremelydifficult!1: Introduction
    • • Correlate between gene expression anddiseaseExample 4A gene chip – quantifying geneexpression in different tissuesunder different conditionsMay be used for personalizedmedicine1: Introduction
    • Role of Centre for Bioinformatics inSchool of Biotechnology, BHU
    • MAIN©1996-2007 All Rights Reserved. Online Journal of Bioinformatics . You may not store these pages in any formexcept for your own personal use. All other usage or distribution is illegal under international copyright treaties.Permission to use any of these pages in any other way besides the before mentioned must be gained in writingfrom the publisher. This article is exclusively copyrighted in its entirety to OJB publications. This article may becopied once but may not be, reproduced or re-transmitted without the express permission of the editors. Thisjournal satisfies the refereeing requirements (DEST) for the Higher Education Research Data Collection(Australia). Linking:To link to this page or any pages linking to this page you must link directly to this page onlyhere rather than put up your own page.OJBTMOnline Journal ofBioinformatics©8 (1) : 75-83, 2007In silico Cis-regulatory Elements Analysis of SeedStorage Protein Promoters Cloned from DifferentCultivars of Wheat, Rice and OatYadav D1, Singh VK1, Singh NK21Department of Molecular Biology and Genetic Engineering, College of Basic Sciences and Humanities G.B.pant University of Agriculture and Technology, Pantnagar (Uttarakhand) 2National Research Center onPlant Biotechnology Indian Agriculture Research Institute, New Delhi 110012ABSTRACTA total of 24 promoter sequences withassigned accession number EF393165 toEF393188 and representing major seedstorage proteins of wheat namely Highmolecular weight glutenin subunit (HMW-GS),low molecular weight glutenin subunits (LMW-GS) alpha/beta gliadins, triticin along withrice glutelins and oat 12S globulins werecloned from indigenous cultivars of wheat,rice and oat and was subjected to in silicoanalysis using bioinformatic softwares for thepresence of different cis-regulatory motifs.The phylogeny studies based on the multiplesequence alignment of these promotersrevealed four distinct clusters showing majorgroup of seed storage promoters. Thepresence of additional motifs like RY repeats,ABRE, AC-11, CAAT box, LTR, UTR, CCGTCCbox, G box, GARE, MBS along with thecommon motifs present in seed storagepromoters like Prolamin-box, TATA, CAATprovides a better option for multifarious uses.Keywords: Seed storage protein promoters,Cis-regulatory Elements, In silico.Seed StorageProtein PromotersAccessionNumberCultivars Length(bp)HMW Glutenin(Triticumaestivum)EF396165EF396184EF396166EF396167EF396168EF396169EF396170EF396171EF396172EF396173UP-262UP-262UP-262UP-262UP-262UP-301UP-301UP-301UP-301UP-301402487397412385385393398392393LMW Glutenin(Triticumaestivum)EF396187 HD-2329 551α/β gliadin(Triticumaestivum)EF396174EF396175EF396177EF396178EF396176EF396182KalyansonaKalyansonaUP-262UP-262UP-262UP-301520564591521563548Triticin (Triticumaestivum)EF396181EF396183EF396185EF396186HD-2329HD-2329HD-2329Kalyansona42837045234312S Globulin( Avena sativa)EF396179 UPO-94 549Glutelins ( Oryzasativa)EF396180EF396188PantDhan-12Pusa Basmati562487
    • 200 bp172 bpMotif-1Motif-2 Motif-3
    • MAIN©1996-2007 All Rights Reserved. Online Journal of Bioinformatics . You may not store these pages in any formexcept for your own personal use. All other usage or distribution is illegal under international copyright treaties.Permission to use any of these pages in any other way besides the before mentioned must be gained in writingfrom the publisher. This article is exclusively copyrighted in its entirety to OJB publications. This article may becopied once but may not be, reproduced or re-transmitted without the express permission of the editors. Thisjournal satisfies the refereeing requirements (DEST) for the Higher Education Research Data Collection(Australia). Linking:To link to this page or any pages linking to this page you must link directly to this page onlyhere rather than put up your own page.OJBTMOnline Journal ofBioinformatics©8 (1) : 75-83, 2007In silico Cis-regulatory Elements Analysis of SeedStorage Protein Promoters Cloned from DifferentCultivars of Wheat, Rice and OatYadav D1, Singh VK1, Singh NK21Department of Molecular Biology and Genetic Engineering, College of Basic Sciences and Humanities G.B.pant University of Agriculture and Technology, Pantnagar (Uttarakhand) 2National Research Center onPlant Biotechnology Indian Agriculture Research Institute, New Delhi 110012ABSTRACTA total of 24 promoter sequences withassigned accession number EF393165 toEF393188 and representing major seedstorage proteins of wheat namely Highmolecular weight glutenin subunit (HMW-GS),low molecular weight glutenin subunits (LMW-GS) alpha/beta gliadins, triticin along withrice glutelins and oat 12S globulins werecloned from indigenous cultivars of wheat,rice and oat and was subjected to in silicoanalysis using bioinformatic softwares for thepresence of different cis-regulatory motifs.The phylogeny studies based on the multiplesequence alignment of these promotersrevealed four distinct clusters showing majorgroup of seed storage promoters. Thepresence of additional motifs like RY repeats,ABRE, AC-11, CAAT box, LTR, UTR, CCGTCCbox, G box, GARE, MBS along with thecommon motifs present in seed storagepromoters like Prolamin-box, TATA, CAATprovides a better option for multifarious uses.Keywords: Seed storage protein promoters,Cis-regulatory Elements, In silico.Seed StorageProtein PromotersAccessionNumberCultivars Length(bp)HMW Glutenin(Triticumaestivum)EF396165EF396184EF396166EF396167EF396168EF396169EF396170EF396171EF396172EF396173UP-262UP-262UP-262UP-262UP-262UP-301UP-301UP-301UP-301UP-301402487397412385385393398392393LMW Glutenin(Triticumaestivum)EF396187 HD-2329 551α/β gliadin(Triticumaestivum)EF396174EF396175EF396177EF396178EF396176EF396182KalyansonaKalyansonaUP-262UP-262UP-262UP-301520564591521563548Triticin (Triticumaestivum)EF396181EF396183EF396185EF396186HD-2329HD-2329HD-2329Kalyansona42837045234312S Globulin( Avena sativa)EF396179 UPO-94 549Glutelins ( Oryzasativa)EF396180EF396188PantDhan-12Pusa Basmati562487
    • http://www.insilicogenomics.in/cry-bt-search.asp
    • CERCOSPORA LEAF SPOT DISEASE OF PIGEONPEA AND ITS MANAGEMENT