This document describes SObA, a tool that generates concise graphical summaries of ontology-based annotations for genes. It summarizes extensive use of ontologies by WormBase to annotate C. elegans genes with terms related to gene function, anatomy, phenotypes, life stages and human diseases. It addresses the problem that lists of mutant phenotypes for genes with many annotations are difficult to comprehend. SObA generates interactive graphs that integrate the ontology hierarchy and logical inferences to provide a complete yet simplified view of a gene's most essential ontology annotations. It was shown to better represent the biological meaning than untrimmed lists of annotations.
Utilizing system biology resources to decipher a tritrophic disease complex presented at 2017 Annual meeting of Entomological Society of America at Denver, Colorado
IN-SILICO CHARACTERISATION OF PROTEIN CODED BY CYT-B GENE OF Radopholus simil...Amit Yadav
Of the more than 30 species in the genus Radopholus, the burrowing nematode, Radopholus similis, is the only pathogen of widespread economic importance (Duncan and Moens, 2006). Radopholus similis is a migratory endoparasitic nematode that is known to be a destructive pest of citrus crops, pepper and, most importantly, banana, on which it causes toppling disease. The nematode causes economic problems throughout the world, most notably in warmer regions, including South America, the Caribbean, Africa, Asia and the Pacific.
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
The Asian citrus psyllid (ACP,Diaphorina citri) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for citrus greening disease, which threatens the citrus industry worldwide. The Asian citrus psyllid genome project is a coordinated effort to define the psyllid genome, including the identification and annotation of every psyllid gene. This discovery of psyllid genes regulating CLas acquisition and transmission by the psyllid will transform future vector management strategies for controlling citrus greening. Advances in psyllid genome sequencing to improve genome assembly, including using Pacbio and long-range Hi-C scaffolding, resulted in the identification of 13 psyllid chromosomes, the first description of chromosome number for this economically important hemipteran insect vector. Together with Pacbio IsoSeq technology to sequence psyllid transcripts from different life stages and those reared on CLas + and - trees, approximately 20,000 putative full-length protein coding psyllid genes were identified. Student driven annotation resulted in more than 500 high quality models of genes involved in CLas-ACP interactions. New assemblies and annotations of the Florida strains of the ACP bacterial endosymbionts, Wolbachia, Profftella, and Carsonella were also characterized from the genome sequencing data.
Finally, we developed a data visualization platform, the Psyllid Expression Network (PEN), which is a user-friendly web-based tool for mining gene and protein expression patterns. PEN enabled us to identify tissue and host plant specific changes in ACP genes in response to CLas at the transcript and proteome level. The availability of a high quality reference genome, endosymbiont genomes and tools for analyzing transcriptomics, proteomics and metabolomics data in an integrated, systems biology approach will enable novel approaches to control the transmission of citrus greening disease. The new ACP genome assembly (Diaci v3), PEN and other tools are available on https://citrusgreening.org/ which is our portal for all omics resources for the citrus greening disease.
https://plan.core-apps.com/pag_2019/event/b6da6bc5896fea594de507e257910266
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for the citrus greening or Huanglongbing disease which threatens citrus industry worldwide. This vector is the primary target of approaches to stop the transmission of the pathogen. Accurate structural and functional annotation of the psyllid’s gene models and understanding its interactions with the pathogenic bacterium, CLas, is required for precise targeting using molecular methods such as RNAi. We opted for manual curation of gene families in the draft genome of D. citri (Diaci v1.1, contig N50 34.4Kb) that have key functional roles in D. citri biology and pathology. The community effort resulted in Official Gene Set v1.0 with more than 500 manually curated gene models across developmental, RNAi regulatory, and immune-related pathways.
Single copy marker analysis of the current genome shows a significant proportion of 3,350 markers conserved in Hemipterans to be missing (25%) with only 74% present in full-length copies. The manual genome annotation also identified a number of misassemblies and missing genes in the current genome. This is, in-part, due to the complexity introduced when assembling a heterogeneous sample containing DNA from multiple psyllids and exacerbated by the use of short reads. This challenge is common with insect genomes due to the size of individuals. To improve quality of genome assembly, we generated 36.2Gb of Pacbio long reads with a coverage of 80X for the 450Mb psyllid genome. The Canu assembler followed by Dovetail Chicago-based scaffolding was used to create an improved assembly (Diaci v2.0) with a contig N50 of 758.7kb and 1906 contigs. The assembly was polished with Pacbio and Illumina paired-end reads to remove indel and SNP errors. We are employing Dovetail Chicago and 10X Illumina libraries generated from a single psyllid in conjunction with Bionano optical maps to achieve long-range scaffolding of the genome. We have also generated full-length cDNA transcripts from diseased and healthy tissue from multiple life stages with the Pacbio IsoSeq technology. This will be the first time all these methods have been applied to resolve a complex insect genome from a highly heterogeneous sample. The new assembly will be available on https://citrusgreening.org/ which is our portal for all omics resources for the citrusgreening disease. We are continuing with the manual curation effort using the improved genome. We will also present how the improved genome and annotation is contributing to the development of molecular interdiction methods to disrupt the vectoring ability of D. citri.
Utilizing system biology resources to decipher a tritrophic disease complex presented at 2017 Annual meeting of Entomological Society of America at Denver, Colorado
IN-SILICO CHARACTERISATION OF PROTEIN CODED BY CYT-B GENE OF Radopholus simil...Amit Yadav
Of the more than 30 species in the genus Radopholus, the burrowing nematode, Radopholus similis, is the only pathogen of widespread economic importance (Duncan and Moens, 2006). Radopholus similis is a migratory endoparasitic nematode that is known to be a destructive pest of citrus crops, pepper and, most importantly, banana, on which it causes toppling disease. The nematode causes economic problems throughout the world, most notably in warmer regions, including South America, the Caribbean, Africa, Asia and the Pacific.
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
The Asian citrus psyllid (ACP,Diaphorina citri) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for citrus greening disease, which threatens the citrus industry worldwide. The Asian citrus psyllid genome project is a coordinated effort to define the psyllid genome, including the identification and annotation of every psyllid gene. This discovery of psyllid genes regulating CLas acquisition and transmission by the psyllid will transform future vector management strategies for controlling citrus greening. Advances in psyllid genome sequencing to improve genome assembly, including using Pacbio and long-range Hi-C scaffolding, resulted in the identification of 13 psyllid chromosomes, the first description of chromosome number for this economically important hemipteran insect vector. Together with Pacbio IsoSeq technology to sequence psyllid transcripts from different life stages and those reared on CLas + and - trees, approximately 20,000 putative full-length protein coding psyllid genes were identified. Student driven annotation resulted in more than 500 high quality models of genes involved in CLas-ACP interactions. New assemblies and annotations of the Florida strains of the ACP bacterial endosymbionts, Wolbachia, Profftella, and Carsonella were also characterized from the genome sequencing data.
Finally, we developed a data visualization platform, the Psyllid Expression Network (PEN), which is a user-friendly web-based tool for mining gene and protein expression patterns. PEN enabled us to identify tissue and host plant specific changes in ACP genes in response to CLas at the transcript and proteome level. The availability of a high quality reference genome, endosymbiont genomes and tools for analyzing transcriptomics, proteomics and metabolomics data in an integrated, systems biology approach will enable novel approaches to control the transmission of citrus greening disease. The new ACP genome assembly (Diaci v3), PEN and other tools are available on https://citrusgreening.org/ which is our portal for all omics resources for the citrus greening disease.
https://plan.core-apps.com/pag_2019/event/b6da6bc5896fea594de507e257910266
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for the citrus greening or Huanglongbing disease which threatens citrus industry worldwide. This vector is the primary target of approaches to stop the transmission of the pathogen. Accurate structural and functional annotation of the psyllid’s gene models and understanding its interactions with the pathogenic bacterium, CLas, is required for precise targeting using molecular methods such as RNAi. We opted for manual curation of gene families in the draft genome of D. citri (Diaci v1.1, contig N50 34.4Kb) that have key functional roles in D. citri biology and pathology. The community effort resulted in Official Gene Set v1.0 with more than 500 manually curated gene models across developmental, RNAi regulatory, and immune-related pathways.
Single copy marker analysis of the current genome shows a significant proportion of 3,350 markers conserved in Hemipterans to be missing (25%) with only 74% present in full-length copies. The manual genome annotation also identified a number of misassemblies and missing genes in the current genome. This is, in-part, due to the complexity introduced when assembling a heterogeneous sample containing DNA from multiple psyllids and exacerbated by the use of short reads. This challenge is common with insect genomes due to the size of individuals. To improve quality of genome assembly, we generated 36.2Gb of Pacbio long reads with a coverage of 80X for the 450Mb psyllid genome. The Canu assembler followed by Dovetail Chicago-based scaffolding was used to create an improved assembly (Diaci v2.0) with a contig N50 of 758.7kb and 1906 contigs. The assembly was polished with Pacbio and Illumina paired-end reads to remove indel and SNP errors. We are employing Dovetail Chicago and 10X Illumina libraries generated from a single psyllid in conjunction with Bionano optical maps to achieve long-range scaffolding of the genome. We have also generated full-length cDNA transcripts from diseased and healthy tissue from multiple life stages with the Pacbio IsoSeq technology. This will be the first time all these methods have been applied to resolve a complex insect genome from a highly heterogeneous sample. The new assembly will be available on https://citrusgreening.org/ which is our portal for all omics resources for the citrusgreening disease. We are continuing with the manual curation effort using the improved genome. We will also present how the improved genome and annotation is contributing to the development of molecular interdiction methods to disrupt the vectoring ability of D. citri.
Despite greatly improved understanding of endothelial heterogeneity, the number of molecules discriminating human arterial and venous endothelium remains limited. Indeed, there have been few reports validating markers proposed in animal model studies in freshly isolated human tissues. We report here the global characterization of freshly isolated human umbilical arterial and venous endothelial cell (HUAECs and HUVECs) plasma membrane proteins using an experimentally validated label-free quantitative LC-MS/MS platform.
Parasitic plants steal genes from hosts to weaken their defensesMabel_Berry
A team of researchers from Virginia Tech and Pennsylvania State University recently discovered some species of parasitic plants, such as the dodder, manage to steal genetic material as well when they establish a direct connection with their hosts.
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...QIAGEN
High-throughput sequencing, combined with high-resolution metagenomic analysis, provides a powerful diagnostic tool for clinical management of enteric disease. Forty-five patient samples of known and unknown disease etiology and 20 samples from health individuals were subjected to next-generation sequencing. Subsequent metagenomic analysis identified all microorganisms (bacteria, viruses, fungi and parasites) in the samples, including the expected pathogens in the samples of known etiology. Multiple pathogens were detected in the individual samples, providing evidence for polymicrobial infection. Patients were clearly differentiated from healthy individuals based on microorganism abundance and diversity. The speed, accuracy and actionable features of CosmosID bioinformatics and curated GenBook® databases, implemented in the QIAGEN Microbial Genomics Pro Suite, and the functional analysis, leveraging the QIAGEN functional metagenomics workflow, provide a powerful tool contributing to the revolution in clinical diagnostics, prophylactics and therapeutics that is now in progress globally.
Disease Ontology: Improvements for Clinical Care and Research ApplicationsLynn Schriml
Human Disease Ontology, www.disease-ontology.org
Standardized descriptions of human disease that improve capture and communication of health-related data across biomedical resources.
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014Lynn Schriml
The Human Disease Ontology (DO), organized as a directed acyclic graph, represents a knowledge base of inherited, environmental, infectious diseases (http://www.disease-ontology.org). DO's textual definition model incorporates a semi-structured format describing the disease etiology built to capture the complex nature of human disease etiology within a is_a hierarchy. DO includes disease concepts for cancer, metabolic disease, infectious disease, mental disorders, genetic disease and syndromes. DO contains disease definitions, external references to resources including ICD, NCI-metathesaurus, SNOMED, MeSH and OMIM and extended relationships that conform to OBO guidelines. DO provides a central ‘switchboard’ for connecting resources, datasets, and computational tools that include disease terms or relationships.
The flood of nextgen sequencing data is changing the landscape of computation biology, pushing the need for more robust infrastructures, tools, and visualization techniques.
Despite greatly improved understanding of endothelial heterogeneity, the number of molecules discriminating human arterial and venous endothelium remains limited. Indeed, there have been few reports validating markers proposed in animal model studies in freshly isolated human tissues. We report here the global characterization of freshly isolated human umbilical arterial and venous endothelial cell (HUAECs and HUVECs) plasma membrane proteins using an experimentally validated label-free quantitative LC-MS/MS platform.
Parasitic plants steal genes from hosts to weaken their defensesMabel_Berry
A team of researchers from Virginia Tech and Pennsylvania State University recently discovered some species of parasitic plants, such as the dodder, manage to steal genetic material as well when they establish a direct connection with their hosts.
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...QIAGEN
High-throughput sequencing, combined with high-resolution metagenomic analysis, provides a powerful diagnostic tool for clinical management of enteric disease. Forty-five patient samples of known and unknown disease etiology and 20 samples from health individuals were subjected to next-generation sequencing. Subsequent metagenomic analysis identified all microorganisms (bacteria, viruses, fungi and parasites) in the samples, including the expected pathogens in the samples of known etiology. Multiple pathogens were detected in the individual samples, providing evidence for polymicrobial infection. Patients were clearly differentiated from healthy individuals based on microorganism abundance and diversity. The speed, accuracy and actionable features of CosmosID bioinformatics and curated GenBook® databases, implemented in the QIAGEN Microbial Genomics Pro Suite, and the functional analysis, leveraging the QIAGEN functional metagenomics workflow, provide a powerful tool contributing to the revolution in clinical diagnostics, prophylactics and therapeutics that is now in progress globally.
Disease Ontology: Improvements for Clinical Care and Research ApplicationsLynn Schriml
Human Disease Ontology, www.disease-ontology.org
Standardized descriptions of human disease that improve capture and communication of health-related data across biomedical resources.
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014Lynn Schriml
The Human Disease Ontology (DO), organized as a directed acyclic graph, represents a knowledge base of inherited, environmental, infectious diseases (http://www.disease-ontology.org). DO's textual definition model incorporates a semi-structured format describing the disease etiology built to capture the complex nature of human disease etiology within a is_a hierarchy. DO includes disease concepts for cancer, metabolic disease, infectious disease, mental disorders, genetic disease and syndromes. DO contains disease definitions, external references to resources including ICD, NCI-metathesaurus, SNOMED, MeSH and OMIM and extended relationships that conform to OBO guidelines. DO provides a central ‘switchboard’ for connecting resources, datasets, and computational tools that include disease terms or relationships.
The flood of nextgen sequencing data is changing the landscape of computation biology, pushing the need for more robust infrastructures, tools, and visualization techniques.
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
How to transform genomic big data into valuable clinical information
The impact of genomics in translational medicine: present view
13th October 2014, Vall d’Hebron Institute of Research (VHIR), Barcelona, Spain
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
‘NATIONAL CONFERENCE ON MAN AND ENVIRONMENT’October 15 – 16, 2012
Organized by
Department of Zoology and Environmental Sciences, Punjabi University, Patiala (Pb.) – 147 002, India
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Similar to SObA WormBase Workshop International Worm Meeting 2017 (20)
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
SObA WormBase Workshop International Worm Meeting 2017
1. Ontology Annotation Browsing
Made Easy - SObA
Raymond Lee*, Juancarlos Chan, Chris A. Grove, and
Paul W. Sternberg
June 2017
21st International C. elegans Conference
Los Angeles, CA
2. IWM 2017
Extensive Use of Ontologies
• Controlled vocabulary in a hierarchy.
• Developed @ WormBase
Total Terms (%
Used)
Annotations / Genes
Gene (GO) 44,458 (21%) 318,561 / 79,949
Anatomy (AO) 6,814 (39%) 134,163 / 17,609
Phenotype (PO) 2,423 (88%) 107,785 / 9,654
Life Stage (LSO) 731 (75%) 72,895 / 4,066
Human Disease (DO) 8,029 (18%) 3,050 / 1,470
3. IWM 2017
WormBase Ontology Browser
WOBr
top down
expandable
browser
graph
viewer
inference
tree
viewer
canned
query
4. IWM 2017
Gene -> Phenotype Annotations
• 9,600 Genes have reported variation or
RNAi phenotypes.
• Each Gene has 1-328 annotations.
• 3,000 Genes have 10 or more
annotations.
• More annotations to each gene makes it
more difficulty to comprehend biological
function.
14. IWM 2017
Big Thanks To
• Sibyl Gao and Todd Harris
• Seth Carbon and Chris Mungall (AmiGO 2)
Editor's Notes
At WormBase, we develop ontologies and use these structured controlled vocabularies to annotate genes extensively. Therefore, we want to make sure this information is easily accessible to our users.
Browsing from the perspective of the ontologies,
There is a WormBase Ontology Browser suite which offers multiple ways to view the hierarchies and to get at the set of genes annotated with each term.
How about browsing from the Genes’ perspective?
Say phenotype annotations:
There are 10 Thousand genes have phenotype annotation
Each gene can have upto 300 annotations (Pop-1 is the winner, followed by daf-16 and daf-2.)
The annotation numbers tend to just keep growing as you publish more and we curate more.
For genes that have more than a few phenotypes, comprehension across them becomes difficult.
How can we make long and varied lists more comprehensible? One way is to canned them into a short list of static items.
That is the RIBBON. The benefit of this ribbon approach is that it makes comparisons between genes easier by predefining what phenotypic aspects are important. The potential problem of ribbon approach is just that, it defines what’s important a priori.
We started exploring a graph-based approach.
The product is the SObA graph.
Because ontologies are intrinsically graphs and graphs are visually intuitive to navigate along hierarchical structures.
The SObA graph fully supports logical inference, each and every annotation is there in the graph but we do prune nonessential parts of the ontology hierarchy to make it less complex. And importantly, the whole graph is redrawn dynamically based on updated data, most of it is not static. We believe that better represent the biological meaning of annotations.
Here I use let-23 as an example to show the pruning process.
Let-23 is annotated with 30 phenotype terms. Adding inferred ones, there are 92 annotating terms and the ontology specifies 102 edges among the terms.
An LCA, lowest common ancestor phenotype is an inferred, broader phenotype that multiple annotating phenotypes share. It represents the union or summary, if you will, of the phenotypes that specialize or partiion from it. For example, “Vulval Cell Induction Variant” is the LCA of “Vulvaless” and “Vulval Cell Induction Increase”.
We think non-LCAs provide less useful information thus they are filtered out.
We also hide redundant connections to simplify the graph.
We are also developing a SObA graph for Gene Ontology annotations.
Sibyl and Todd helped us implementing Cytoscape javascript on WormBase web site.
Our backend SOLR document store is modified from the one developed by Seth, Chris and others for the AmiGO2 project.