TAIR is a database that provides comprehensive functional annotations of the Arabidopsis thaliana genome based on manual curation of scientific literature. Researchers can use TAIR annotations to infer gene function in other plant species through orthology mapping. TAIR annotations can also be used for applications like functional categorization and term enrichment analysis. Submitting community annotations to TAIR makes research more discoverable and credits individual contributors.
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Analysis and visualization of microarray experiment data integrating Pipeline...Vladimir Morozov
More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Analysis and visualization of microarray experiment data integrating Pipeline...Vladimir Morozov
More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
Rapidly spreading invasive diseases in systems with little or no prior experimental data or resources pose a unique set of challenges for growers, scientists as well as regulators. As a part of a USDA NIFA CAPS project focused on the psyllid, Diaphorina citri, we have released improved genomics resources including high quality genome assemblies and annotation. We have also created an open access web portal for analyses around the Citrus Greening/Huanglongbing disease complex. Citrusgreening.org includes pathosystem-wide resources and bioinformatics tools for multiple Citrus spp. hosts, the Asian citrus psyllid vector (ACP, Diaphorina citri), and multiple pathogens including Candidatus Liberibacter asiaticus (CLas). To the best of our knowledge, this is the first example of a database to use the pathosystem as a holistic framework to understand an insect transmitted plant disease. Users can submit relevant data sets to enable sharing and allow the community to leverage their data within an integrated system. The system includes the metabolic pathway databases CitrusCyc and DiaphorinaCyc with organism specific pathways that can be used to mine metabolomics, transcriptomics and proteomics results to identify pathways and regulatory mechanisms involved in disease response. The Psyllid Expression Network (PEN) contains expression profiles of ACP genes from multiple life stages, tissues, conditions and hosts. The Citrus Expression Network (CEN) contains public expression data from multiple tissues and conditions for various citrus hosts. All tools connect to a central database. The portal also includes electrical penetration graph (EPG) recordings, information about citrus rootstock trials and metabolomics data in addition to traditional omics data types with a goal of combining and mining all information related to the Huanglongbing pathosystem. User-friendly manual curation tools will allow the continuous improvement of knowledge base as more experimental research is published. The portal can be accessed at https://citrusgreening.org/.
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Introduction to Gene Mining Part A: BLASTn-off!adcobb
In this lesson, students will learn to use bioinformatics portals and tools to mine plant versions of human genes. Student handout and teacher resource materials are available at www.Araport.org, Teaching Resources (Community tab). Suitable for grades 9-12 or first year undergraduate students.
Slides contain information about why bioinformatics appeared,
who bioinformaticians are, what they do, what kind of cool applications and challenges in bioinformatics there are.
Slides were prepared for the Bioinformatics seminar 2016, Institute of Computer Science, University of Tartu.
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
Overview of collaborative projects in the life sciences building out the necessary ontologies, schemas, and knowledge graphs for describing biological knowledge
INTRODUCTION
HISTORY
WHAT ARE THE DATABASE…?
WHY DATABASE….?
THE “PERFECT” DATABASE
IDENTIFIERS and ACCESSION NUMBER
TECHNICAL DESIGN
MAINTAINANCE OF BIOLOGICAL DATABASES..
GENERAL FEATURES
SOURCES OF BIOLOGICAL DATA…
DIFFERENT TYPES OF BIOLOGICAL DATABASE
FUNCTION
DATA ENTRY AND QUALITY CONTROL
AVAILIBILITY
APPLICATION
DATA RECORD AT THE YEAR 2004
CONCLUSION
REFFERENCES
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
Abstract: This session will focus on the first steps involved in identifying SNPs from whole genome, exome capture or targeted resequencing data: The different read mapping approaches to a DNA reference sequence will be introduced and quality metrics discussed.
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
Rapidly spreading invasive diseases in systems with little or no prior experimental data or resources pose a unique set of challenges for growers, scientists as well as regulators. As a part of a USDA NIFA CAPS project focused on the psyllid, Diaphorina citri, we have released improved genomics resources including high quality genome assemblies and annotation. We have also created an open access web portal for analyses around the Citrus Greening/Huanglongbing disease complex. Citrusgreening.org includes pathosystem-wide resources and bioinformatics tools for multiple Citrus spp. hosts, the Asian citrus psyllid vector (ACP, Diaphorina citri), and multiple pathogens including Candidatus Liberibacter asiaticus (CLas). To the best of our knowledge, this is the first example of a database to use the pathosystem as a holistic framework to understand an insect transmitted plant disease. Users can submit relevant data sets to enable sharing and allow the community to leverage their data within an integrated system. The system includes the metabolic pathway databases CitrusCyc and DiaphorinaCyc with organism specific pathways that can be used to mine metabolomics, transcriptomics and proteomics results to identify pathways and regulatory mechanisms involved in disease response. The Psyllid Expression Network (PEN) contains expression profiles of ACP genes from multiple life stages, tissues, conditions and hosts. The Citrus Expression Network (CEN) contains public expression data from multiple tissues and conditions for various citrus hosts. All tools connect to a central database. The portal also includes electrical penetration graph (EPG) recordings, information about citrus rootstock trials and metabolomics data in addition to traditional omics data types with a goal of combining and mining all information related to the Huanglongbing pathosystem. User-friendly manual curation tools will allow the continuous improvement of knowledge base as more experimental research is published. The portal can be accessed at https://citrusgreening.org/.
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Introduction to Gene Mining Part A: BLASTn-off!adcobb
In this lesson, students will learn to use bioinformatics portals and tools to mine plant versions of human genes. Student handout and teacher resource materials are available at www.Araport.org, Teaching Resources (Community tab). Suitable for grades 9-12 or first year undergraduate students.
Slides contain information about why bioinformatics appeared,
who bioinformaticians are, what they do, what kind of cool applications and challenges in bioinformatics there are.
Slides were prepared for the Bioinformatics seminar 2016, Institute of Computer Science, University of Tartu.
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
Overview of collaborative projects in the life sciences building out the necessary ontologies, schemas, and knowledge graphs for describing biological knowledge
INTRODUCTION
HISTORY
WHAT ARE THE DATABASE…?
WHY DATABASE….?
THE “PERFECT” DATABASE
IDENTIFIERS and ACCESSION NUMBER
TECHNICAL DESIGN
MAINTAINANCE OF BIOLOGICAL DATABASES..
GENERAL FEATURES
SOURCES OF BIOLOGICAL DATA…
DIFFERENT TYPES OF BIOLOGICAL DATABASE
FUNCTION
DATA ENTRY AND QUALITY CONTROL
AVAILIBILITY
APPLICATION
DATA RECORD AT THE YEAR 2004
CONCLUSION
REFFERENCES
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
Abstract: This session will focus on the first steps involved in identifying SNPs from whole genome, exome capture or targeted resequencing data: The different read mapping approaches to a DNA reference sequence will be introduced and quality metrics discussed.
PomBase Community Curation: A Fast Track to Capture Expert Knowledge, Antonia Lock, Kim Rutherford, Midori Harris, Mark Mcdowall, Paul Kersey, Stephen Oliver, Jurg Bahler and Valerie Wood.
Presented at the 5th International Biocuration Conference, hosted by PIR in Washington, DC, April 2-4, 2012.
Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022.
FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.
RDA Wheat Data Interoperability Cookbook and last developmentsCIARD Movement
Esther Dzale, French National Institute for Agricultural Research (INRA), France, and Richard Fulss. International Maize and Wheat Improvement Center (CIMMYT), at RDA 5th Plenary Meeting, IG Agriculture Data Interoperability Session in San Diego (CA, US) on the 9th of March 2015
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraMonica Munoz-Torres
Introduction to Web Apollo for the i5K Pilot species project. WebApollo is genome annotation editor; it provides a web-based environment that allows multiple distributed users to review, edit, and share manual annotations. This presentation includes information specific to the projects of the Global Initiative to sequence the genomes of 5,000 species of arthropods, i5K. Let's get started!
Similar to TAIR -Using biological ontologies to accelerate progress in plant biology research (20)
Introduction to an online resource that displays pre-computed phylogenetic trees of gene families alongside experimental gene function data to facilitate inference of unknown gene function in plants. From the same team that brings you TAIR (The Arabidopsis Information Resource)
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
Seminar Presentation for PMB Department, UC Berkeley for Love Data Week. Subject is how to prepare publications and associated data sets for maximum reuse.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Group Presentation 2 Economics.Ariana Buscigliopptx
TAIR -Using biological ontologies to accelerate progress in plant biology research
1. TAIR: A Sustainable Community Resource
for Arabidopsis Research
International Conference on Arabidopsis Research (ICAR 2016), GyeongJu, Korea
2. 1. TAIR: a sustainable community resource for Arabidopsis
research (Eva Huala)
2. Using biological ontologies to accelerate progress in plant
biology research (Donghui Li)
3. Community annotation: making your data and publication
more discoverable (Donghui Li)
3. Using biological ontologies to accelerate
progress in plant biology research
Donghui Li
TAIR/Phoenix Bioinformatics
4. Every year, an average of:
• Over 3000 Arabidopsis research articles are added
• Over 2000 papers are associated with genes
• Over 400 articles have gene function, expression or
phenotype data extracted
• Over 5000 experiment-based annotations are added
using controlled vocabularies (GO and PO ontologies)
Producing a ‘gold standard’ annotated reference plant genome
Highly structured, searchable, computable
functional annotations
5.
6.
7. • How do we use biological ontologies to annotate Arabidopsis
gene function?
• How to read/interpret annotations?
• What can you do with these annotations?
Outline
8. Why do we need ontologies?
Inconsistency in free text:
Different names for the same concept
translation, protein synthesis
Same name for different concepts
Bud initiation?
9. A Gene Ontology (GO) term
Accession: GO:0006412
Name: translation
Ontology: biological_process
Synonyms: protein anabolism, protein biosynthesis, protein biosynthetic
process, protein formation, protein synthesis, protein translation
Definition: The cellular metabolic process in which a protein is formed,
using the sequence of a mature mRNA molecule to specify the
sequence of amino acids in a polypeptide chain. Translation is
mediated by the ribosome, and begins with the formation of a ternary
complex between aminoacylated initiator methionine tRNA, GTP, and
initiation factor 2, which subsequently associates with the small subunit
of the ribosome and an mRNA. Translation ends with the release of a
polypeptide chain from the ribosome. Source: GOC:go_curators
10. molecular function: catalytic / binding activities
kinase activity, DNA binding activity
biological process: biological goal or objective
protein translation, mitosis
cellular component: location or complex
nucleus, ribosome, proteasome
More info at www.geneontology.org
Gene Ontology (GO)
11. Terms in an ontology are connected
is_a
part_of
16. Experimental evidence codes (EXP)
IDA Inferred from Direct Assay (enzyme assays, in situ hybridization)
IMP Inferred from Mutant Phenotype (analysis of visible trait)
IPI Inferred from Physical Interaction (yeast-2-hybrid)
IEP Inferred from Expression Pattern (RT-PCR, Western blot)
IGI Inferred from Genetic Interaction (double mutant analysis)
Examples
http://geneontology.org/page/guide-go-evidence-codes
Commonly used evidence codes
17. Experimental evidence codes (EXP)
IDA Inferred from Direct Assay (enzyme assays, in situ hybridization)
IMP Inferred from Mutant Phenotype (analysis of visible trait)
IPI Inferred from Physical Interaction (yeast-2-hybrid)
IEP Inferred from Expression Pattern (RT-PCR, Western blot)
IGI Inferred from Genetic Interaction (double mutant analysis)
Computational Analysis Evidence Codes (non-EXP)
ISS Inferred from Sequence or Structural Similarity
- based on published sequence alignment
IEA Inferred from Electronic Annotation
- InterPro2GO
Examples
http://geneontology.org/page/guide-go-evidence-codes
Commonly used evidence codes
18. Evidence
code
Annotation
counts %
Evidence
code
Annotation
counts %
EXP 95,435 34.7 IDA 56,271 20.4
IEP 6,651 2.4
IGI 4,286 1.6
IMP 19,441 7.1
IPI 8,786 3.2
Non-EXP 179,801 66.2
Total 275,236 101
Summary of Arabidopsis GO annotations in TAIR
Notes: 9,186 unique publications used in EXP annotations
Based on TAIR ATH_GO_GOSLIM.txt 2016-06-05
19. Based on annotation data as of May 24, 2016
Summary of Arabidopsis GO annotations in TAIR
20. - Query gene function information
- GO annotation projection
- Functional categorization
- Term enrichment
Application: What can you do with TAIR GO/PO annotations?
21.
22. Get annotations for individual genes from the TAIR locus page
Gene Ontology
annotations
Plant Ontology
annotations
23. Get annotations for individual genes from the TAIR locus page
Other functional information:
Gene summary
Polymorphism
Phenotype
Publications
Gene symbols
31. - Query gene function information
- GO annotation projection
- Functional categorization
- Term enrichment
Application: What can you do with TAIR GO/PO annotations?
33. Annotating new plant genomes by projecting GO terms from Arabidopsis
onto other non-model plant species based on gene orthology
EnsemblPlants Compara
• Use the Compara pipeline to build orthology
• Automatically transfer GO annotations to plant orthologs
Rules
at least a 40% peptide identity to each other
only GO annotations with an evidence type of IDA, IEP, IGI,
IMP or IPI are projected
no annotations with a 'NOT' qualifier are projected
annotations to the GO:0005515 protein binding term are not
projected
34. - Query gene function information
- GO annotation projection
- Functional categorization
- Term enrichment
Application: What can you do with TAIR GO/PO annotations?
39. Biological
process
Functional category Gene count
Overrepresentation statistical test:
In my list of genes, are any functional classes (for
example a GO process) found more often than
expected when compared with the reference list?
Term enrichment analysis
40. GOC provides a term enrichment tool powered by PANTHER
pantherdb.org geneontology.org
43. Model for the regulation of long-term drought
responses in Q. suber root
Model for ABA-dependent drought response in cork oak
44. 1 The main activity of TAIR curators is producing a ‘gold standard’
annotated reference genome dataset by integrating
experimental data from the research literature. New annotations
are constantly added.
2 One common use of TAIR is to infer the function of genes in
agriculturally important species based on orthology to
Arabidopsis genes.
3 TAIR’s annotations are used in applications such as functional
categorization, term enrichment. It is important to use the latest
annotation file from TAIR.
Summary
50. 1.Pre-publication: register your
gene symbol to minimize
accidental duplications in gene
nomenclature
2.Preparing your manuscript:
include AGI locus identifiers
3. Post-publication: submit your
annotation to us (any journal)
Tips to make your research more discoverable
51. AT1G56650 PAP1 PRODUCTION OF ANTHOCYANIN PIGMENT 1
AT2G01180 PAP1 PHOSPHATIDIC ACID PHOSPHATASE 1
AT2G27190 PAP1 PURPLE ACID PHOSPHATASE 1
AT3G16500 PAP1 PHYTOCHROME-ASSOCIATED PROTEIN 1
Gene name duplication make it harder to find the
right gene
52. Plant Cell Physiol. 2010 Jun;51(6):866-76
Plant Cell Physiol. Jun;51(6):877-83
Conflicting nomenclature / error in publication not
uncommon
59. • “I do profit a lot from the data on TAIR, thus
this submission is a small contribution to
extend the data present on TAIR.”
• “I gratefully did it [data submission] because I
already benefit from similar information for
other genes.”
Community feedback
61. AT3G25070
AT2G32700
IPI - protein interacting partner
IGI - other mutated loci in a double,
triple mutant
Some (but not all) annotations have supporting information in
the Evidence with field