SlideShare a Scribd company logo
1 of 21
Download to read offline
Protein function and bioinformatics



   Outline of talk

       Why do we need bioinformatics?
   ●




       What tools do we need?
   ●




       Case study: The Methanococcoides burtonii genome
   ●




                                        Neil Saunders
                                        76-455
                                        n.saunders@uq.edu.au
                                        www.uq.edu.au/~uqnsaun1/
Protein function and bioinformatics
            Why do we need bioinformatics?




        Rapid increase in data due to genomics
    ●


        Too much data to characterise genes/proteins individually
    ●


        Bioinformatics = “smart use” of information
    ●


        Ideally, computational and experimental biology are partners
    ●
Protein function and bioinformatics
    The ideal computational – wet lab cycle


         Biological system                   Biological objects




            Experiments                    Computational objects




        Biological inferences                    Analyses




      Bioinformatics is about helping biologists solve problems
Protein function and bioinformatics
              Introduction to genomics


                                 Genomes Online database
                                   www.genomesonline.org
                                 ●




                                 Published/complete     413
                                 Bacteria in progress   977
                                 Eukarya in progress    629
                                 Archaea in progress     57
                                 Metagenomes             56




   10-50% of genes in a new genome may have no known function
Protein function and bioinformatics
        Computational skills for genomics



      "So what new skills will postdocs need to ensure that 
      they don't become science relics? The answer is math,
      statistics, and knowledge of a scripting language for 
      computers."

      ­The Scientist, "Bioinformatics Knowledge Vital to Careers"
      Volume 16 | Issue 17 | 53 | Sep. 2, 2002
      www.the­scientist.com
Protein function and bioinformatics
                    Using WWW resources

       The best web resources provide:
   ●


            - useful tools for analysis
            - integrated data from many sources

   Good examples
     InterPro database          http://www.ebi.ac.uk/interpro/
   ●


     Expasy                     http://au.expasy.org
   ●


     UniProt                    http://www.uniprot.org/
   ●


     CBS Prediction servers     http://www.cbs.dtu.dk/services/
   ●


     IMG Database               http://img.jgi.doe.gov/
   ●




   But...
     Web services no good for genome-scale analyses
   ●


     Usually limits to data input (with good reason)
   ●




   Nucleic Acids Research publishes annual database and
   web servers editions:       http://nar.oxfordjournals.org/
Protein function and bioinformatics
    Computational infrastructure for genomics

    Biological                                    Analysis
     objects                                     (limitless)

      Genome                                  Sequence analysis

     Assembly                                  Regulatory motifs
                        Computational
                          objects
  Gene sequence                               Structural modeling

  Protein sequence                                Phylogeny

  Protein structure                         Comparative genomics

      Pathway                               Pathway reconstruction


          Key points
            Appropriate hardware: workstation v. cluster
          ●


            Linux Linux Linux!
          ●


            Freely-available, open source software is all you need
          ●


            Toolkits and libraries (e.g. BioPerl) to build your own solutions
          ●


            Philosophy of “many small tools plus glue” - scripting language
          ●


            Website + database skills - sharing
          ●
Protein function and bioinformatics
    BioPerl: a life sciences computational toolkit
    Website: http://www.bioperl.org
●



    A collection of Perl modules for biology
●



    Handles many common tasks in sequence/structure analysis, e.g.
●


     - read/write various sequence formats
     - run BLAST and parse the output
     - read/write/analyse sequence alignments
     - access local or remote databases
Protein function and bioinformatics
           Annotation (or not) using BLAST
     BLAST: Basic Local Alignment and Search Tool
      Is useful for finding similar sequences quickly
    ●


      Not sensitive – less useful for weakly-similar sequences
    ●


      Not much good at all for annotation
    ●




    Why not?
      “Hypothetical”: the database sequence is unique
    ●


      “Conserved hypothetical”: several hits but no known function
    ●


      Multi-domain proteins
    ●


      BLAST database contains incorrect annotations
    ●


      Annotation is at the whim of whoever deposited the sequence
    ●




  Classic example: IMPDH
  Wu et al. (2003)
  Comp. Biol. Chem. 27: 37-47
Protein function and bioinformatics
     A better annotation tool: InterProScan
        IPRScan is a tool to search the InterPro database
    ●


        It uses sequence signature profiles – more sensitive than BLAST
    ●


        Integrates the search results from multiple databases
    ●


        A good first step to characterise a new sequence
    ●


        Available as standalone package and runs on clusters
    ●
Protein function and bioinformatics
     Structure prediction: threading and modelling
    The structure of a protein often explains how it functions
●


    However, structural determination is laborious, difficult and time-consuming
●


    Modelling can be useful in cases sequence is similar to a known structure
●




       Threading                                    Homology modelling




    Fit query sequence to fold database   Assume similar sequence = similar structure
Protein function and bioinformatics
         Some modelling tools and databases

        SwissModel:   http://swissmodel.expasy.org/
    ●



        MODELLER:     http://www.salilab.org/modeller/
    ●



        PROSPECT:     http://compbio.ornl.gov/structure/prospect2/
    ●



        ModBase:      http://modbase.compbio.ucsf.edu/
    ●
Protein function and bioinformatics
                Introduction to M. burtonii




  M. burtonii      Ace Lake, Vestfold Hills               The Archaea




                Methanococcoides burtonii
                  Isolated from Ace Lake, Antarctica (1-2 °C)
                ●


                  Grows optimally at 23 °C
                ●


                  Is an archaeon
                ●


                  Is a psychrophilic methanogen
                ●
Protein function and bioinformatics
            The M. burtonii genome




                           What features of this genome
                           are related to cold adaptation?
Protein function and bioinformatics
     Discovery of CSP-like proteins in M. burtonii




   CSP = cold shock protein
 ●


   Expressed in bacteria at low temperature
 ●


   Functions as RNA chaperone to facilitate
 ●


 transcription at low temperature
   Present in some Archaea, including
 ●


 M. frigidum, but not M. burtonii
Protein function and bioinformatics
  Discovery of CSP-like proteins in M. burtonii

   Protein sequences




      PROSPECT
  thread v. CSD folds



      MODELLER                              d1sro__        M. burtonii YP_564958
    structural model




                Both proteins are expressed (proteomics)
            ●


                Located in a putative exosome/proteasome superoperon
            ●


                This is consistent with their proposed function
            ●
Protein function and bioinformatics
   Integrating information: structural RNA study

                                  stems
% GC




                                  all bases




                   OGT (°C)

Is tRNA GC content related to OGT?            Dihydrouridine in M. burtonii
  tRNAScan find tRNA in genomes                 tRNA contains > 1 hU/tRNA
●                                             ●


  GC content calculated using Perl scripts      Maintains flexibility at low temperature
●                                             ●


                                                DUS gene identified using iprscan
                                              ●
Protein function and bioinformatics
       Pyrrolysine: a problem for bioinformatics
                               Proteomics used to identify expressed proteins
                           ●


                               One is trimethylamine methyltransferase (TMA-MT)
                           ●


                               It shows post-translational modification
                           ●


                               It also maps to 2 ORFs in the genome sequence
                           ●




     The ORFs are actually one gene with a read-through UAG codon
 ●


     Pyrrolysine is incorporated at the UAG
 ●


     This is the 22nd genetically-encoded amino acid
 ●
Protein function and bioinformatics
    Statistical analysis of protein properties

          Archaea
        27 organisms
        62 338 ORFs    Amino acid frequency
                             (bioperl)
         Bacteria
       52 organisms
       165 192 ORFs
                             data matrix
                         organisms (rows) x
                       composition (columns)


                                PCA
                       principal components
                         (R stats package)
Protein function and bioinformatics
 Principal components analysis of composition




        2 components explain most of the variation in amino acid composition
    ●


        PC1 correlates with genome GC content
    ●


        PC2 correlates with optimum growth temperature
    ●


        The psychrophilic archaea are distinguished by PC2 score
    ●


        Their proteins contain:  more Gln, Ser, Thr, His, Asp
    ●


                                 less Leu, Trp and Glu
Protein function and bioinformatics
                               Conclusions

    Computational biology and bioinformatics are essential to modern biology
●



    Many tools are available to annotate proteins: web-based
●



                                                    standalone

    Without experiments, bioinformatics is just predictions
●




    Data integration is our biggest problem
●




                                                  www.uq.edu.au/~uqnsaun1/

More Related Content

What's hot (20)

Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Fasta
FastaFasta
Fasta
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Scop database
Scop databaseScop database
Scop database
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Composite protein databases
Composite protein databasesComposite protein databases
Composite protein databases
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Clustal
ClustalClustal
Clustal
 

Viewers also liked

4.3 proteins
4.3   proteins4.3   proteins
4.3 proteinsSMKTA
 
Classification and properties of protein
Classification and properties of proteinClassification and properties of protein
Classification and properties of proteinMark Philip Besana
 
Protein structure: details
Protein structure: detailsProtein structure: details
Protein structure: detailsdamarisb
 
Protein Structure & Function
Protein Structure & FunctionProtein Structure & Function
Protein Structure & Functioniptharis
 

Viewers also liked (6)

Protein classification
Protein classificationProtein classification
Protein classification
 
4.3 proteins
4.3   proteins4.3   proteins
4.3 proteins
 
Protein
ProteinProtein
Protein
 
Classification and properties of protein
Classification and properties of proteinClassification and properties of protein
Classification and properties of protein
 
Protein structure: details
Protein structure: detailsProtein structure: details
Protein structure: details
 
Protein Structure & Function
Protein Structure & FunctionProtein Structure & Function
Protein Structure & Function
 

Similar to Protein Function and Bioinformatics: Computational Tools for Genomics Analysis

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureRobert Cormia
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsNeil Saunders
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Sijo A
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wdWagied Davids
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseRai University
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyChrist College, Rajkot
 

Similar to Protein Function and Bioinformatics: Computational Tools for Genomics Analysis (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 
Protein database
Protein databaseProtein database
Protein database
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Thesis def
Thesis defThesis def
Thesis def
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 

More from Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansNeil Saunders
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?Neil Saunders
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedNeil Saunders
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesNeil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitNeil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for youNeil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?Neil Saunders
 

More from Neil Saunders (11)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Protein Function and Bioinformatics: Computational Tools for Genomics Analysis

  • 1. Protein function and bioinformatics Outline of talk Why do we need bioinformatics? ● What tools do we need? ● Case study: The Methanococcoides burtonii genome ● Neil Saunders 76-455 n.saunders@uq.edu.au www.uq.edu.au/~uqnsaun1/
  • 2. Protein function and bioinformatics Why do we need bioinformatics? Rapid increase in data due to genomics ● Too much data to characterise genes/proteins individually ● Bioinformatics = “smart use” of information ● Ideally, computational and experimental biology are partners ●
  • 3. Protein function and bioinformatics The ideal computational – wet lab cycle Biological system Biological objects Experiments Computational objects Biological inferences Analyses Bioinformatics is about helping biologists solve problems
  • 4. Protein function and bioinformatics Introduction to genomics Genomes Online database www.genomesonline.org ● Published/complete 413 Bacteria in progress 977 Eukarya in progress 629 Archaea in progress 57 Metagenomes 56 10-50% of genes in a new genome may have no known function
  • 5. Protein function and bioinformatics Computational skills for genomics "So what new skills will postdocs need to ensure that  they don't become science relics? The answer is math, statistics, and knowledge of a scripting language for  computers." ­The Scientist, "Bioinformatics Knowledge Vital to Careers" Volume 16 | Issue 17 | 53 | Sep. 2, 2002 www.the­scientist.com
  • 6. Protein function and bioinformatics Using WWW resources The best web resources provide: ● - useful tools for analysis - integrated data from many sources Good examples InterPro database http://www.ebi.ac.uk/interpro/ ● Expasy http://au.expasy.org ● UniProt http://www.uniprot.org/ ● CBS Prediction servers http://www.cbs.dtu.dk/services/ ● IMG Database http://img.jgi.doe.gov/ ● But... Web services no good for genome-scale analyses ● Usually limits to data input (with good reason) ● Nucleic Acids Research publishes annual database and web servers editions: http://nar.oxfordjournals.org/
  • 7. Protein function and bioinformatics Computational infrastructure for genomics Biological Analysis objects (limitless) Genome Sequence analysis Assembly Regulatory motifs Computational objects Gene sequence Structural modeling Protein sequence Phylogeny Protein structure Comparative genomics Pathway Pathway reconstruction Key points Appropriate hardware: workstation v. cluster ● Linux Linux Linux! ● Freely-available, open source software is all you need ● Toolkits and libraries (e.g. BioPerl) to build your own solutions ● Philosophy of “many small tools plus glue” - scripting language ● Website + database skills - sharing ●
  • 8. Protein function and bioinformatics BioPerl: a life sciences computational toolkit Website: http://www.bioperl.org ● A collection of Perl modules for biology ● Handles many common tasks in sequence/structure analysis, e.g. ● - read/write various sequence formats - run BLAST and parse the output - read/write/analyse sequence alignments - access local or remote databases
  • 9. Protein function and bioinformatics Annotation (or not) using BLAST BLAST: Basic Local Alignment and Search Tool Is useful for finding similar sequences quickly ● Not sensitive – less useful for weakly-similar sequences ● Not much good at all for annotation ● Why not? “Hypothetical”: the database sequence is unique ● “Conserved hypothetical”: several hits but no known function ● Multi-domain proteins ● BLAST database contains incorrect annotations ● Annotation is at the whim of whoever deposited the sequence ● Classic example: IMPDH Wu et al. (2003) Comp. Biol. Chem. 27: 37-47
  • 10. Protein function and bioinformatics A better annotation tool: InterProScan IPRScan is a tool to search the InterPro database ● It uses sequence signature profiles – more sensitive than BLAST ● Integrates the search results from multiple databases ● A good first step to characterise a new sequence ● Available as standalone package and runs on clusters ●
  • 11. Protein function and bioinformatics Structure prediction: threading and modelling The structure of a protein often explains how it functions ● However, structural determination is laborious, difficult and time-consuming ● Modelling can be useful in cases sequence is similar to a known structure ● Threading Homology modelling Fit query sequence to fold database Assume similar sequence = similar structure
  • 12. Protein function and bioinformatics Some modelling tools and databases SwissModel: http://swissmodel.expasy.org/ ● MODELLER: http://www.salilab.org/modeller/ ● PROSPECT: http://compbio.ornl.gov/structure/prospect2/ ● ModBase: http://modbase.compbio.ucsf.edu/ ●
  • 13. Protein function and bioinformatics Introduction to M. burtonii M. burtonii Ace Lake, Vestfold Hills The Archaea Methanococcoides burtonii Isolated from Ace Lake, Antarctica (1-2 °C) ● Grows optimally at 23 °C ● Is an archaeon ● Is a psychrophilic methanogen ●
  • 14. Protein function and bioinformatics The M. burtonii genome What features of this genome are related to cold adaptation?
  • 15. Protein function and bioinformatics Discovery of CSP-like proteins in M. burtonii CSP = cold shock protein ● Expressed in bacteria at low temperature ● Functions as RNA chaperone to facilitate ● transcription at low temperature Present in some Archaea, including ● M. frigidum, but not M. burtonii
  • 16. Protein function and bioinformatics Discovery of CSP-like proteins in M. burtonii Protein sequences PROSPECT thread v. CSD folds MODELLER d1sro__ M. burtonii YP_564958 structural model Both proteins are expressed (proteomics) ● Located in a putative exosome/proteasome superoperon ● This is consistent with their proposed function ●
  • 17. Protein function and bioinformatics Integrating information: structural RNA study stems % GC all bases OGT (°C) Is tRNA GC content related to OGT? Dihydrouridine in M. burtonii tRNAScan find tRNA in genomes tRNA contains > 1 hU/tRNA ● ● GC content calculated using Perl scripts Maintains flexibility at low temperature ● ● DUS gene identified using iprscan ●
  • 18. Protein function and bioinformatics Pyrrolysine: a problem for bioinformatics Proteomics used to identify expressed proteins ● One is trimethylamine methyltransferase (TMA-MT) ● It shows post-translational modification ● It also maps to 2 ORFs in the genome sequence ● The ORFs are actually one gene with a read-through UAG codon ● Pyrrolysine is incorporated at the UAG ● This is the 22nd genetically-encoded amino acid ●
  • 19. Protein function and bioinformatics Statistical analysis of protein properties Archaea 27 organisms 62 338 ORFs Amino acid frequency (bioperl) Bacteria 52 organisms 165 192 ORFs data matrix organisms (rows) x composition (columns) PCA principal components (R stats package)
  • 20. Protein function and bioinformatics Principal components analysis of composition 2 components explain most of the variation in amino acid composition ● PC1 correlates with genome GC content ● PC2 correlates with optimum growth temperature ● The psychrophilic archaea are distinguished by PC2 score ● Their proteins contain: more Gln, Ser, Thr, His, Asp ● less Leu, Trp and Glu
  • 21. Protein function and bioinformatics Conclusions Computational biology and bioinformatics are essential to modern biology ● Many tools are available to annotate proteins: web-based ● standalone Without experiments, bioinformatics is just predictions ● Data integration is our biggest problem ● www.uq.edu.au/~uqnsaun1/