SlideShare a Scribd company logo
1 of 50
Download to read offline
AgBase:
 bioinformatics enabling
knowledge generation from
  agricultural omics data
       Fiona McCarthy
Summary
 „omics‟ technologies: the „data deluge‟
 organising data: bioinformatics and
  biocuration
 data sharing and analysis: bio-ontologies
 from data to knowledge
 making sense of agricultural data
Databases and Biological Data
   The number of databases has increased
       Sequence repositories: NCBI, EMBL, DDJB
       Model Organism Databases (MODs)
       Specialist biological databases or „knowledge
        databases‟ (eg, InterPro, interaction
        databases, gene expression data)
 Need to connect information in different
  databases
 Databases are increasing in size and
  complexity
No.
No. x 106
  25000
   18

   16
  20000

   14

   12
  15000

   10

     8
  10000


      6

  5000
      4

      2
  0
      0   „00   „01    „02        „03        „04     „05   „06   „07     „08        „09
          70      75         80         85         90      95       00         05
Generating Biological Data
 Amount of biological data is increasing
  exponentially
 Completed and ongoing genome
  sequencing projects
 High throughput “omics” technologies
       New sequencing technologies
       Existing microarrays
       Proteomics
Biocomputing
 Technologies enable „omics‟ technologies
  to move from large database/consortiums
  into individual laboratories
 Managing this data:
       acquire
       store
       access
       analyze
       visualize
       share
NIH WORKING DEFINITION OF BIOINFORMATICS AND
          COMPUTATIONAL BIOLOGY


Bioinformatics: Research, development, or application of
computational tools and approaches for expanding the use
of biological, medical, behavioral or health data, including
those to acquire, store, organize, archive, analyze, or
visualize such data.

Computational Biology: The development and application of
data-analytical and theoretical methods, mathematical
modeling and computational simulation techniques to the
study of biological, behavioral, and social systems.
Bioinformatics
   Managing data
       different file formats
       linking between different databases
   Adding value
       multiple levels of information from one „omics‟
        data set
       re-analysis
       linking data sets
   Organizing
       annotating data
       biocuration - annotation
Annotation
 ANNOTATE: to denote or demarcate
 Genome annotation is the process of
  attaching biological information to
  genomic sequences. It consists of two
  main steps:
1. identifying functional elements in the
  genome: “structural annotation”
2. attaching biological information to these
  elements: “functional annotation”
Community Annotation
   Researchers are the domain experts – but
    relatively few contribute to annotation
       time
       'reward' & 'employer/funding agency recognition'
       training – easy to use tools, clear instructions
   Required submission
   Community annotation
       Groups with special interest do focused
        annotation or ontology development
       As part of a meeting/conference or distributed
        (eg. wikis)
   Students!
Biocuration
 biocurators are biologists who are trained
  to annotate biological data (using
  database structures, bio-ontologies, etc).
 databases use biocuration to enhance
  value of biological data
       “knowledge databases”
   but how to ensure data consistency
    between databases?
What Are Ontologies?
“An ontology is a controlled vocabulary of well defined terms
with specified relationships between those terms, capable of
interpretation by both humans and computers.”
 Bio-ontologies are used to capture biological
   information in a way that can be read by both
   humans and computers
     annotate data in a consistent way
     allows data sharing across databases
     allows computational analysis of high-throughput
      “omics” datasets
 Objects in an ontology (eg. genes, cell types, tissue
   types, stages of development) are well defined.

   The ontology shows how the objects relate to each
    other
Ontologies
 relationships
between terms
                                                 digital identifier
                                                   (computers)




                                                         description
                                                          (humans)
        Gene Ontology version 1.1348 (27/07/2010):

        32,091 terms, 99.3% defined

           19,169 biological process
            2,745 cellular component
            8,736 molecular function

        1,441 obsolete terms (not included in figures above)
Relationships: the True Path Rule
   Why are relationships between terms
    important?
   TRUE PATH RULE: all attributes of
    children must hold for all parents
   so if a protein is annotated to a term, it
    must also be true for all the parent
    terms
   this enables us to move up the ontology
    structure from a granular term to a
    broader term
            Premise of many GO anaylsis tools
Genomic Annotation
Structural Annotation:
 Open reading frames (ORFs) predicted during
  genome assembly
 predicted ORFs require experimental confirmation


Functional Annotation:
 annotation of gene products = Gene Ontology (GO)
  annotation
 initially, predicted ORFs have no functional literature
  and GO annotation relies on computational methods
  (rapid)
 functional literature exists for many genes/proteins
  prior to genome sequencing
 Gene Ontology annotation does not rely on a
   completed genome sequence
Genomic Annotation


                            Structural Annotation
                            including Sequence Ontology
              Other
           annotations
         using other bio-
         ontologies e.g.
            Anatomy
            Ontology                         Nomenclature
                                             (species‟ genome
                                             nomenclature
                                             committees)

               Functional annotation using
               Gene Ontology
http://obo.sourceforge.net/




        Gene Ontology
         Plant Ontology
     Sequence Ontology
         Trait Ontology
Expression/Tissue Ontologies
 Infectious Disease Ontology
         Cell Ontology
Bio-ontology requirements
   bio-ontologies (Open Biomedical Ontologies)
   computational pipelines („breadth‟)
       for computational annotations
       useful for gene products without published information
   manual biocuration („depth‟)
        requires trained biocurators
        community annotation efforts
       each species has its own body of literature
   biocuration co-ordination
       MODs? Consortium? Community?
       biocuration prioritization
       co-ordination with existing Dbs, annotation, nomenclature
        initiatives
       data updates
Gene Ontology (GO)
 de facto method for functional annotation
 Assigns functions based upon Biological
  Process, Molecular Function, Cellular
  Component
 Widely used for functional genomics (high
  throughput)
 Many tools available for gene expression
  analysis using GO

       http://www.geneontology.org
Plant Ontology (PO)
   describes plant structures and growth and
    developmental stages
   Currently used for Arabidopsis, maize, rice – more
    being added (soybean, tomato, cotton, etc)
   Plant Structure: describes morphological and
    anatomical structures representing organ, tissue and
    cell types
   Growth and developmental stages: describes (i)
    whole plant growth stages and (ii) plant structure
    developmental stages

        http://www.plantontology.org/
Use GO for…….
1.   Determining which classes of gene products
     are over-represented or under-represented.
2.   Grouping gene products.
3.   Relating a protein‟s location to its function.
4.   Focusing on particular biological pathways
     and functions (hypothesis-testing).
Pathways &
Ontologies                   Networks
GO Cellular Component   Pathway Studio 5.0
GO Biological Process   Ingenuity Pathway Analyses
GO Molecular Function   Cytoscape
      BRENDA            Interactome Databases




      Functional Understanding
http://www.agbase.msstate.edu/
1.   Provides structural annotation for
     agriculturally important genomes
2.   Provides functional annotation (GO)
3.   Provides tools for functional modeling
4.   Provides bioinformatics & modeling
     support for research community
Avian Gene Nomenclature
GO & PO: literature annotation for rice,
 computational annotation for rice,
 maize, sorghum, Brachypodia

1. Literature annotation for Agrobacterium
   tumefaciens, Dickeya dadantii,
   Magnaporthe grisea, Oomycetes
2. Computational annotation for
   Pseudomonas syringae pv tomato,
   Phytophthora spp and the nematode
   Meloidogyne hapla.


   Literature annotation for chicken,
      cow, maize, cotton;
   Computational annotation for
      agricultural species & pathogens.

literature annotation for human;
computational annotation for
UniProtKB entries (237,201 taxa).
Comparing AgBase & EBI-GOA Annotations
                 14,000
                                                          computational
                 12,000
                                                          manual - sequence
 Gene Products




                 10,000                                   manual - literature
   annotated




                 8,000                                    Complementary to
                                                          EBI-GOA: Genbank
                 6,000                                    proteins not
                                                          represented in UniProt
                 4,000                                    & EST sequences on
                                                          arrays
                 2,000

                     0
                          AgBase   EBI-GOA AgBase   EBI-GOA
                           Chick    Chick   Cow       Cow
                                        Project
Contribution to GO Literature Biocuration
           AgBase    EBI GOA


Chicken


 97.82%                              EBI-IntAct

                                     Roslin

                                     HGNC
                      < 0.50%
                                     UCL-Heart project

                                     MGI

 Cow                                 Reactome



 88.78%



                      < 1.50%
AgBase Quality Checks & Releases
  AgBase
Biocurators
‘sanity’ check


  AgBase           ‘sanity’
                    check       AgBase          GO analysis tools
biocuration        & GOC        database        Microarray developers
 interface           QC        ‘sanity’ check
                                                UniProt db
                                EBI GOA         QuickGO browser
                                 Project        GO analysis tools
‘sanity’ check: checks                          Microarray developers
to ensure all appropriate      ‘sanity’ check
information is captured,         & GOC QC
no obsolete GO:IDs are                             Public databases
used, etc.                                         AmiGO browser
                              GO Consortium        GO analysis tools
                                database           Microarray developers
Quality improvement Microarray annotations
IITA Crops
 cowpea – “reduced representation” sequencing
underway
 soybean - preliminary assembly
 banana - sequencing in progress
 yam - genome sequencing for Dioscorea alata
– EST development (IITA & VSU)
 cassava - genome sequencing in progress
 maize - genome sequencing completed; other
subspecies being sequenced
Cowpea
 54,123 genome sequences
 187,483 ESTs
 Annotated via homology to Arabidopsis &
  other plants
 GO annotation via homology – availability?
Soybean
 NCBI: 1,459,639 ESTs, 34,946 proteins,
  2,882 genes
 UniProt: 12,837 proteins (EBI GOA
  automatic GO annotation)
 UniGene assemblies available
 multiple microarrays available
Banana

 7,102 genome sequences
 14,864 ESTs
 1,399 NCBI proteins; 680 UniProt
 Musa acuminata (sweet banana): 3,898
  GO annotations to 491 proteins
 Musa acuminata AAA Group (Cavendish
  banana): 579 annotations to 96 proteins
Plantain
 Musa ABB Group (taxon:214693) -
  cooking banana or plantain
 11,070 ESTs, 112 proteins
 173 GO annotations to 53 proteins
 functional genomics based on banana?
Yams
55577       Dioscorea rotundata    white yam
55571       Dioscorea alata        water yam
29710       Dioscorea cayenensis   yellow yam

   Dioscorea (taxon:4672) & subspecies
   NCBI: 31 ESTs, 623 proteins
   Genome sequencing for Dioscorea alata – EST
    development (IITA & VSU)
   183 GO annotations to 25 proteins
Cassava
   ESTs: 80,631
   NCBI proteins: 568, UniProt:253
   2,251 GO annotations assigned to 218 proteins
   2 Euphorbia esula (leafy spurge) /cassava arrays
Maize
 Zea mays (taxon:4577)
 Genome sequencing completed by
  Washington University – other subspecies
  being sequenced
 Active GO annotation project - 131,925
  GO annotations to 20,288 proteins
AgBase Collaborative Model
 How can we help you?
 Can make GO annotations public via the
  GO Consortium
 Have computational pipelines to do rapid,
  first pass GO annotation (including
  transcript/EST sequences)
 Provide bioinformatics support for
  collaborators
 Developing new tools
 Training/support for modeling data
Dr Teresia Buza

Dr Susan Bridges    Cathy Grisham




                                         Divya Pedinti   Lakshmi Pillai

                   Philippe Chouvarine




                                           Seval Ozkan     Hui Wang

More Related Content

What's hot

Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsMakarand Bhale
 
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...eventi-ITBbari
 
Application of bioinformatics in climate smart horticulture
Application of bioinformatics in climate smart horticultureApplication of bioinformatics in climate smart horticulture
Application of bioinformatics in climate smart horticultureDr.Hetalkumar Panchal
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingJonathan Eisen
 
Bioinformatics, its application main
Bioinformatics, its application mainBioinformatics, its application main
Bioinformatics, its application mainKAUSHAL SAHU
 
ADARSH JOSE_Resume
ADARSH JOSE_ResumeADARSH JOSE_Resume
ADARSH JOSE_ResumeAdarsh Jose
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Bioinformatics & its scope in biotech.
Bioinformatics & its scope in biotech.Bioinformatics & its scope in biotech.
Bioinformatics & its scope in biotech.Muhammad Hunan Faiz
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 

What's hot (20)

Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
 
Application of bioinformatics in climate smart horticulture
Application of bioinformatics in climate smart horticultureApplication of bioinformatics in climate smart horticulture
Application of bioinformatics in climate smart horticulture
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Bioinformatics, its application main
Bioinformatics, its application mainBioinformatics, its application main
Bioinformatics, its application main
 
ADARSH JOSE_Resume
ADARSH JOSE_ResumeADARSH JOSE_Resume
ADARSH JOSE_Resume
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics & its scope in biotech.
Bioinformatics & its scope in biotech.Bioinformatics & its scope in biotech.
Bioinformatics & its scope in biotech.
 
Bioinformatics intervention in crop improvement
Bioinformatics intervention in crop improvementBioinformatics intervention in crop improvement
Bioinformatics intervention in crop improvement
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Ddbj
DdbjDdbj
Ddbj
 
Bioinformatics in present and its future
Bioinformatics in present and its futureBioinformatics in present and its future
Bioinformatics in present and its future
 
Biological database
Biological databaseBiological database
Biological database
 

Similar to bioinformatics enabling knowledge generation from agricultural omics data

Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptxrnath286
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptxAshuAsh15
 
Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO) Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO) Jie Bao
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...Phoenix Bioinformatics
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
 
ICAR2016 TAIR talk
ICAR2016 TAIR talkICAR2016 TAIR talk
ICAR2016 TAIR talkDonghui Li
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesJanna Hastings
 
What_is_Bioinformatics_Dr_Sudha.pdf
What_is_Bioinformatics_Dr_Sudha.pdfWhat_is_Bioinformatics_Dr_Sudha.pdf
What_is_Bioinformatics_Dr_Sudha.pdfVishwanathAvanti
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)Andrew Su
 

Similar to bioinformatics enabling knowledge generation from agricultural omics data (20)

Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO) Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO)
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
ICAR2016 TAIR talk
ICAR2016 TAIR talkICAR2016 TAIR talk
ICAR2016 TAIR talk
 
David
DavidDavid
David
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
What_is_Bioinformatics_Dr_Sudha.pdf
What_is_Bioinformatics_Dr_Sudha.pdfWhat_is_Bioinformatics_Dr_Sudha.pdf
What_is_Bioinformatics_Dr_Sudha.pdf
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 

More from International Institute of Tropical Agriculture

More from International Institute of Tropical Agriculture (20)

Make your research visible and create more impact using DataCite DOIs
Make your research visible and  create more impact using  DataCite DOIsMake your research visible and  create more impact using  DataCite DOIs
Make your research visible and create more impact using DataCite DOIs
 
Induction of early flowering in cassava through light supplementation and CM...
Induction of early flowering in cassava  through light supplementation and CM...Induction of early flowering in cassava  through light supplementation and CM...
Induction of early flowering in cassava through light supplementation and CM...
 
Producing yam mother plants to collect vines for propagation
Producing yam mother plants to collect  vines for propagationProducing yam mother plants to collect  vines for propagation
Producing yam mother plants to collect vines for propagation
 
Effects of moult and breeding on the body condition of some forest birds in s...
Effects of moult and breeding on the body condition of some forest birds in s...Effects of moult and breeding on the body condition of some forest birds in s...
Effects of moult and breeding on the body condition of some forest birds in s...
 
Conserving Nigeria’s rarest endemic bird: Ibadan Malimbe, Malimbusibadanensis
Conserving Nigeria’s rarest endemic bird: Ibadan Malimbe, MalimbusibadanensisConserving Nigeria’s rarest endemic bird: Ibadan Malimbe, Malimbusibadanensis
Conserving Nigeria’s rarest endemic bird: Ibadan Malimbe, Malimbusibadanensis
 
Cassava brown streak epidemiology in Eastern Democratic Republic of the Congo
Cassava brown streak epidemiology in Eastern Democratic  Republic of the CongoCassava brown streak epidemiology in Eastern Democratic  Republic of the Congo
Cassava brown streak epidemiology in Eastern Democratic Republic of the Congo
 
Assessment of genetic diversity among Rwandan cassava (Manihot esculenta) ger...
Assessment of genetic diversity among Rwandan cassava (Manihot esculenta) ger...Assessment of genetic diversity among Rwandan cassava (Manihot esculenta) ger...
Assessment of genetic diversity among Rwandan cassava (Manihot esculenta) ger...
 
9 osunbade identification of end users preferences of a cassava product
9 osunbade identification of end users preferences of a cassava product9 osunbade identification of end users preferences of a cassava product
9 osunbade identification of end users preferences of a cassava product
 
7 helen ufondu perception of yam landraces quality among value chain actors i...
7 helen ufondu perception of yam landraces quality among value chain actors i...7 helen ufondu perception of yam landraces quality among value chain actors i...
7 helen ufondu perception of yam landraces quality among value chain actors i...
 
8 kazeem quality attributes and consumer acceptability of cookies flavoured
8 kazeem quality attributes and consumer acceptability of cookies flavoured8 kazeem quality attributes and consumer acceptability of cookies flavoured
8 kazeem quality attributes and consumer acceptability of cookies flavoured
 
6 anajekwu ekpereka chemical, functional and pasting properties of flours pro...
6 anajekwu ekpereka chemical, functional and pasting properties of flours pro...6 anajekwu ekpereka chemical, functional and pasting properties of flours pro...
6 anajekwu ekpereka chemical, functional and pasting properties of flours pro...
 
5 seun olowote effect of drying method on caroteniod content of yellow maize
5 seun olowote effect of drying method on caroteniod content of yellow maize5 seun olowote effect of drying method on caroteniod content of yellow maize
5 seun olowote effect of drying method on caroteniod content of yellow maize
 
4 ayodele adenitan survey of dried plantain (musa paradisiaca) chips processo...
4 ayodele adenitan survey of dried plantain (musa paradisiaca) chips processo...4 ayodele adenitan survey of dried plantain (musa paradisiaca) chips processo...
4 ayodele adenitan survey of dried plantain (musa paradisiaca) chips processo...
 
2 akin olagunju does crop diversification influenc e food and nutrition secur...
2 akin olagunju does crop diversification influenc e food and nutrition secur...2 akin olagunju does crop diversification influenc e food and nutrition secur...
2 akin olagunju does crop diversification influenc e food and nutrition secur...
 
3 akinsola carotenoid apparent retention in ogi flour made from different pro...
3 akinsola carotenoid apparent retention in ogi flour made from different pro...3 akinsola carotenoid apparent retention in ogi flour made from different pro...
3 akinsola carotenoid apparent retention in ogi flour made from different pro...
 
1 pearl amadi assessing the level of consumption of pro vitamin a cassava pr...
1 pearl amadi assessing the level of consumption of pro  vitamin a cassava pr...1 pearl amadi assessing the level of consumption of pro  vitamin a cassava pr...
1 pearl amadi assessing the level of consumption of pro vitamin a cassava pr...
 
Prof janice olawoye
Prof janice olawoyeProf janice olawoye
Prof janice olawoye
 
Inqaba biotech presentation
Inqaba biotech presentationInqaba biotech presentation
Inqaba biotech presentation
 
Iarsaf symposium adaptation to climate change
Iarsaf symposium adaptation to climate changeIarsaf symposium adaptation to climate change
Iarsaf symposium adaptation to climate change
 
Bimaf iita iarsaf presentation-ibadan 21.05.19
Bimaf  iita iarsaf presentation-ibadan 21.05.19Bimaf  iita iarsaf presentation-ibadan 21.05.19
Bimaf iita iarsaf presentation-ibadan 21.05.19
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

bioinformatics enabling knowledge generation from agricultural omics data

  • 1. AgBase: bioinformatics enabling knowledge generation from agricultural omics data Fiona McCarthy
  • 2. Summary  „omics‟ technologies: the „data deluge‟  organising data: bioinformatics and biocuration  data sharing and analysis: bio-ontologies  from data to knowledge  making sense of agricultural data
  • 3. Databases and Biological Data  The number of databases has increased  Sequence repositories: NCBI, EMBL, DDJB  Model Organism Databases (MODs)  Specialist biological databases or „knowledge databases‟ (eg, InterPro, interaction databases, gene expression data)  Need to connect information in different databases  Databases are increasing in size and complexity
  • 4. No. No. x 106 25000 18 16 20000 14 12 15000 10 8 10000 6 5000 4 2 0 0 „00 „01 „02 „03 „04 „05 „06 „07 „08 „09 70 75 80 85 90 95 00 05
  • 5. Generating Biological Data  Amount of biological data is increasing exponentially  Completed and ongoing genome sequencing projects  High throughput “omics” technologies  New sequencing technologies  Existing microarrays  Proteomics
  • 6.
  • 7. Biocomputing  Technologies enable „omics‟ technologies to move from large database/consortiums into individual laboratories  Managing this data:  acquire  store  access  analyze  visualize  share
  • 8. NIH WORKING DEFINITION OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
  • 9. Bioinformatics  Managing data  different file formats  linking between different databases  Adding value  multiple levels of information from one „omics‟ data set  re-analysis  linking data sets  Organizing  annotating data  biocuration - annotation
  • 10. Annotation  ANNOTATE: to denote or demarcate  Genome annotation is the process of attaching biological information to genomic sequences. It consists of two main steps: 1. identifying functional elements in the genome: “structural annotation” 2. attaching biological information to these elements: “functional annotation”
  • 11. Community Annotation  Researchers are the domain experts – but relatively few contribute to annotation  time  'reward' & 'employer/funding agency recognition'  training – easy to use tools, clear instructions  Required submission  Community annotation  Groups with special interest do focused annotation or ontology development  As part of a meeting/conference or distributed (eg. wikis)  Students!
  • 12. Biocuration  biocurators are biologists who are trained to annotate biological data (using database structures, bio-ontologies, etc).  databases use biocuration to enhance value of biological data  “knowledge databases”  but how to ensure data consistency between databases?
  • 13. What Are Ontologies? “An ontology is a controlled vocabulary of well defined terms with specified relationships between those terms, capable of interpretation by both humans and computers.”  Bio-ontologies are used to capture biological information in a way that can be read by both humans and computers  annotate data in a consistent way  allows data sharing across databases  allows computational analysis of high-throughput “omics” datasets  Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined.  The ontology shows how the objects relate to each other
  • 14. Ontologies relationships between terms digital identifier (computers) description (humans) Gene Ontology version 1.1348 (27/07/2010): 32,091 terms, 99.3% defined 19,169 biological process 2,745 cellular component 8,736 molecular function 1,441 obsolete terms (not included in figures above)
  • 15.
  • 16. Relationships: the True Path Rule  Why are relationships between terms important?  TRUE PATH RULE: all attributes of children must hold for all parents  so if a protein is annotated to a term, it must also be true for all the parent terms  this enables us to move up the ontology structure from a granular term to a broader term Premise of many GO anaylsis tools
  • 17. Genomic Annotation Structural Annotation:  Open reading frames (ORFs) predicted during genome assembly  predicted ORFs require experimental confirmation Functional Annotation:  annotation of gene products = Gene Ontology (GO) annotation  initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid)  functional literature exists for many genes/proteins prior to genome sequencing Gene Ontology annotation does not rely on a completed genome sequence
  • 18. Genomic Annotation Structural Annotation including Sequence Ontology Other annotations using other bio- ontologies e.g. Anatomy Ontology Nomenclature (species‟ genome nomenclature committees) Functional annotation using Gene Ontology
  • 19. http://obo.sourceforge.net/ Gene Ontology Plant Ontology Sequence Ontology Trait Ontology Expression/Tissue Ontologies Infectious Disease Ontology Cell Ontology
  • 20. Bio-ontology requirements  bio-ontologies (Open Biomedical Ontologies)  computational pipelines („breadth‟)  for computational annotations  useful for gene products without published information  manual biocuration („depth‟)  requires trained biocurators  community annotation efforts  each species has its own body of literature  biocuration co-ordination  MODs? Consortium? Community?  biocuration prioritization  co-ordination with existing Dbs, annotation, nomenclature initiatives  data updates
  • 21. Gene Ontology (GO)  de facto method for functional annotation  Assigns functions based upon Biological Process, Molecular Function, Cellular Component  Widely used for functional genomics (high throughput)  Many tools available for gene expression analysis using GO http://www.geneontology.org
  • 22. Plant Ontology (PO)  describes plant structures and growth and developmental stages  Currently used for Arabidopsis, maize, rice – more being added (soybean, tomato, cotton, etc)  Plant Structure: describes morphological and anatomical structures representing organ, tissue and cell types  Growth and developmental stages: describes (i) whole plant growth stages and (ii) plant structure developmental stages http://www.plantontology.org/
  • 23. Use GO for……. 1. Determining which classes of gene products are over-represented or under-represented. 2. Grouping gene products. 3. Relating a protein‟s location to its function. 4. Focusing on particular biological pathways and functions (hypothesis-testing).
  • 24. Pathways & Ontologies Networks GO Cellular Component Pathway Studio 5.0 GO Biological Process Ingenuity Pathway Analyses GO Molecular Function Cytoscape BRENDA Interactome Databases Functional Understanding
  • 26. 1. Provides structural annotation for agriculturally important genomes 2. Provides functional annotation (GO) 3. Provides tools for functional modeling 4. Provides bioinformatics & modeling support for research community
  • 28.
  • 29. GO & PO: literature annotation for rice, computational annotation for rice, maize, sorghum, Brachypodia 1. Literature annotation for Agrobacterium tumefaciens, Dickeya dadantii, Magnaporthe grisea, Oomycetes 2. Computational annotation for Pseudomonas syringae pv tomato, Phytophthora spp and the nematode Meloidogyne hapla. Literature annotation for chicken, cow, maize, cotton; Computational annotation for agricultural species & pathogens. literature annotation for human; computational annotation for UniProtKB entries (237,201 taxa).
  • 30.
  • 31. Comparing AgBase & EBI-GOA Annotations 14,000 computational 12,000 manual - sequence Gene Products 10,000 manual - literature annotated 8,000 Complementary to EBI-GOA: Genbank 6,000 proteins not represented in UniProt 4,000 & EST sequences on arrays 2,000 0 AgBase EBI-GOA AgBase EBI-GOA Chick Chick Cow Cow Project
  • 32. Contribution to GO Literature Biocuration AgBase EBI GOA Chicken 97.82% EBI-IntAct Roslin HGNC < 0.50% UCL-Heart project MGI Cow Reactome 88.78% < 1.50%
  • 33. AgBase Quality Checks & Releases AgBase Biocurators ‘sanity’ check AgBase ‘sanity’ check AgBase GO analysis tools biocuration & GOC database Microarray developers interface QC ‘sanity’ check UniProt db EBI GOA QuickGO browser Project GO analysis tools ‘sanity’ check: checks Microarray developers to ensure all appropriate ‘sanity’ check information is captured, & GOC QC no obsolete GO:IDs are Public databases used, etc. AmiGO browser GO Consortium GO analysis tools database Microarray developers
  • 35.
  • 36. IITA Crops  cowpea – “reduced representation” sequencing underway  soybean - preliminary assembly  banana - sequencing in progress  yam - genome sequencing for Dioscorea alata – EST development (IITA & VSU)  cassava - genome sequencing in progress  maize - genome sequencing completed; other subspecies being sequenced
  • 37. Cowpea  54,123 genome sequences  187,483 ESTs  Annotated via homology to Arabidopsis & other plants  GO annotation via homology – availability?
  • 38. Soybean  NCBI: 1,459,639 ESTs, 34,946 proteins, 2,882 genes  UniProt: 12,837 proteins (EBI GOA automatic GO annotation)  UniGene assemblies available  multiple microarrays available
  • 39.
  • 40.
  • 41. Banana  7,102 genome sequences  14,864 ESTs  1,399 NCBI proteins; 680 UniProt  Musa acuminata (sweet banana): 3,898 GO annotations to 491 proteins  Musa acuminata AAA Group (Cavendish banana): 579 annotations to 96 proteins
  • 42. Plantain  Musa ABB Group (taxon:214693) - cooking banana or plantain  11,070 ESTs, 112 proteins  173 GO annotations to 53 proteins  functional genomics based on banana?
  • 43. Yams 55577 Dioscorea rotundata white yam 55571 Dioscorea alata water yam 29710 Dioscorea cayenensis yellow yam  Dioscorea (taxon:4672) & subspecies  NCBI: 31 ESTs, 623 proteins  Genome sequencing for Dioscorea alata – EST development (IITA & VSU)  183 GO annotations to 25 proteins
  • 44. Cassava  ESTs: 80,631  NCBI proteins: 568, UniProt:253  2,251 GO annotations assigned to 218 proteins  2 Euphorbia esula (leafy spurge) /cassava arrays
  • 45. Maize  Zea mays (taxon:4577)  Genome sequencing completed by Washington University – other subspecies being sequenced  Active GO annotation project - 131,925 GO annotations to 20,288 proteins
  • 46.
  • 47.
  • 48.
  • 49. AgBase Collaborative Model  How can we help you?  Can make GO annotations public via the GO Consortium  Have computational pipelines to do rapid, first pass GO annotation (including transcript/EST sequences)  Provide bioinformatics support for collaborators  Developing new tools  Training/support for modeling data
  • 50. Dr Teresia Buza Dr Susan Bridges Cathy Grisham Divya Pedinti Lakshmi Pillai Philippe Chouvarine Seval Ozkan Hui Wang