PomBase uses a combination of annotation conventions and QC mechanisms. In addition to identifying annotation inconsistencies and errors, these combined methods improve information content, annotation coverage, depth or specificity and redundancy.
This document discusses G protein-coupled receptors (GPCRs) and efforts to design mutant GPCRs (called RASSLs) to better understand GPCR signaling and ligand binding. Specifically, it describes how researchers modified the human kappa opioid receptor to create the Ro1 and Ro2 RASSLs, which showed decreased responses to endogenous opioid peptides but near normal responses to synthetic agonists. This proves useful for drug discovery efforts. The document also discusses how combinatorial libraries were used to identify potential drug candidates for the 5HT1F and CCR1 GPCRs, though limited structural information remained a challenge.
The document discusses analyzing Gene Ontology (GO) annotations using Zipf's law, which states that word frequencies follow a power law distribution. The author:
1) Analyzed GO annotations from several species and found they generally follow a power law, indicating GO acts like a language. Exponents were in the normal range for communication.
2) Biological process annotations had higher exponents, suggesting more precise encoding, while molecular function and cellular component annotations favored the speaker with lower exponents.
3) High confidence annotations fit power laws better than low confidence ones, indicating higher quality communication.
The analysis provides a way to rapidly assess the "language-like" qualities and potential quality of GO annotations through the power law
This presentation discusses mapping RNA-seq reads to genes in order to construct a count table. There are two scenarios - mapping to a reference genome sequence or performing a de novo assembly. When a reference is available, reads are mapped using gapped alignment to account for reads spanning introns. It is important to use genome annotations and check mapping quality using tools like RSeQC and BamQC to visualize coverage and duplication rates.
PomBase developed an online curation tool called Canto to enable community curation of Schizosaccharomyces pombe publications. Over 1,100 papers have been assigned to curators using Canto, generating over 7,700 annotations. While initial participation was 36%, reminders increased response. Curation quality is high, around 95-100% for term specificity and accuracy. The community is motivated by increased visibility and data dissemination. Lessons show that recent papers and post-curation dialogue improve quality, and reminders are often needed to solicit participation.
This document discusses G protein-coupled receptors (GPCRs) and efforts to design mutant GPCRs (called RASSLs) to better understand GPCR signaling and ligand binding. Specifically, it describes how researchers modified the human kappa opioid receptor to create the Ro1 and Ro2 RASSLs, which showed decreased responses to endogenous opioid peptides but near normal responses to synthetic agonists. This proves useful for drug discovery efforts. The document also discusses how combinatorial libraries were used to identify potential drug candidates for the 5HT1F and CCR1 GPCRs, though limited structural information remained a challenge.
The document discusses analyzing Gene Ontology (GO) annotations using Zipf's law, which states that word frequencies follow a power law distribution. The author:
1) Analyzed GO annotations from several species and found they generally follow a power law, indicating GO acts like a language. Exponents were in the normal range for communication.
2) Biological process annotations had higher exponents, suggesting more precise encoding, while molecular function and cellular component annotations favored the speaker with lower exponents.
3) High confidence annotations fit power laws better than low confidence ones, indicating higher quality communication.
The analysis provides a way to rapidly assess the "language-like" qualities and potential quality of GO annotations through the power law
This presentation discusses mapping RNA-seq reads to genes in order to construct a count table. There are two scenarios - mapping to a reference genome sequence or performing a de novo assembly. When a reference is available, reads are mapped using gapped alignment to account for reads spanning introns. It is important to use genome annotations and check mapping quality using tools like RSeQC and BamQC to visualize coverage and duplication rates.
PomBase developed an online curation tool called Canto to enable community curation of Schizosaccharomyces pombe publications. Over 1,100 papers have been assigned to curators using Canto, generating over 7,700 annotations. While initial participation was 36%, reminders increased response. Curation quality is high, around 95-100% for term specificity and accuracy. The community is motivated by increased visibility and data dissemination. Lessons show that recent papers and post-curation dialogue improve quality, and reminders are often needed to solicit participation.
A tese apresenta um estudo sobre espectrometria de lente térmica, com foco na teoria e aplicações desta técnica. A autora descreve a teoria da difração de Fresnel-Kirchhoff e sua aplicação para propagação de feixes laser. Em seguida, a tese detalha a teoria da lente térmica radial e seus modelos parabólico e aberrante. Por fim, são discutidos experimentos realizados com lente térmica no regime de grandes ângulos.
Linux Ubuntu es un sistema operativo basado en Debian mantenido por la comunidad de desarrolladores Canonical, está orientado al usuario promedio con un fuerte enfoque en mejorar la experiencia del usuario a través de la facilidad de uso y efectos visuales como ventanas rotatorias y tambaleantes.
El documento presenta el plan de estudios y la evaluación para el tercer cohorte de enero a diciembre de 2012 de la Escuela de Derecho de San Félix. El plan incluye ocho encuentros para cubrir los temas IX al XI en tres horas cada uno. Los temas a cubrir son la adopción, tutela, curatela y régimen de bienes. Las estrategias de enseñanza son exposiciones, investigaciones en clase y presentaciones de grupos. La evaluación considera asistencia, participación, disciplina, respeto, colaboración e investig
El documento describe el desarrollo psicológico, conductual e intelectual de los niños entre 6 y 7 años. En esta etapa, los niños se desarrollan a través del juego, el aprendizaje escolar y las interacciones sociales. La entrada a la escuela les permite aprender nuevas habilidades sociales mientras se enfrentan a un ambiente menos protector.
This document discusses using co-annotation analysis and biological knowledge as a quality control procedure for gene ontology (GO) annotations. It describes a multi-step iterative process where GO term intersections are identified and used to generate rules of expected relationships. Violations of these rules are reported to contributing databases and used to identify annotation errors. The process led to over 100 rules being created and 83 rule violations found by analyzing 568 gene annotations across several species. Future work involves implementing the rules directly into the GO pipeline and expanding the approach to additional biological processes and functions.
The document discusses protein structure and its importance in determining protein function. It covers several key points:
1) There are multiple levels of protein structure from primary to quaternary structure. Higher-order structures like tertiary structure bring distant parts of the amino acid sequence into proximity, allowing proteins to perform their functions.
2) Protein structure is determined by the amino acid sequence through the physical properties of residues. The sequence encodes the folding pathway that results in a stable, functional 3D structure.
3) Experimental methods like X-ray crystallography and NMR spectroscopy are used to determine high-resolution protein structures that reveal how structure enables function. Databases like PDB archive and classify protein structures.
This document discusses various topics relating to protein structure and bioinformatics. It begins with an overview of protein structure and why understanding protein structure is important. It then discusses the different levels of protein structure from primary to quaternary structure. Methods for determining protein structure like X-ray crystallography and NMR are mentioned. Databases for storing protein structures like the Protein Data Bank are also summarized. The document touches on topics like protein folding, domains, membrane protein topology, and secondary structure prediction methods.
The document discusses:
1) An overview of bioinformatics lessons including introductions to databases, scoring matrices, and pairwise sequence alignment.
2) Descriptions of major bioinformatics databases and resources including NCBI, ExPASy, and EBI.
3) The importance of scoring matrices in sequence analysis and how the choice of matrix can influence outcomes. Matrices are discussed for nucleotides and proteins.
This document discusses different levels of protein structure from primary to quaternary structure. It explains that primary structure refers to the amino acid sequence of a protein. Secondary structure describes local folding patterns like alpha helices and beta sheets. Tertiary structure is the overall 3D shape of a single protein chain that results from folding. Quaternary structure involves the shape and interactions of multiple protein subunits. The document provides examples and diagrams to illustrate each level of structure and how they relate to determining a protein's function.
The document discusses various topics in bioinformatics and protein structure. It provides an overview of ongoing thesis topics at Biobix including biomarker prediction, methylation, metabolomics, peptidomics, and more. It also discusses the rationale for understanding protein structure and function, levels of protein structure from primary to quaternary, methods for determining structure like X-ray crystallography, and approaches to secondary structure prediction including Chou-Fasman.
This document discusses functional annotation and the Gene Ontology. It describes how functional annotation attaches biological information to sequences through searches of databases for homology, domains, and pathways as well as manual curation. Searches include BLAST for homology, Pfam and InterPro for domains, and KEGG and Reactome for pathways. Assignments include EC numbers for metabolic pathways and Gene Ontology terms from automated and manual annotation. Manual annotation combines all evidence and allows incorporation of experimental data but requires more time.
A tese apresenta um estudo sobre espectrometria de lente térmica, com foco na teoria e aplicações desta técnica. A autora descreve a teoria da difração de Fresnel-Kirchhoff e sua aplicação para propagação de feixes laser. Em seguida, a tese detalha a teoria da lente térmica radial e seus modelos parabólico e aberrante. Por fim, são discutidos experimentos realizados com lente térmica no regime de grandes ângulos.
Linux Ubuntu es un sistema operativo basado en Debian mantenido por la comunidad de desarrolladores Canonical, está orientado al usuario promedio con un fuerte enfoque en mejorar la experiencia del usuario a través de la facilidad de uso y efectos visuales como ventanas rotatorias y tambaleantes.
El documento presenta el plan de estudios y la evaluación para el tercer cohorte de enero a diciembre de 2012 de la Escuela de Derecho de San Félix. El plan incluye ocho encuentros para cubrir los temas IX al XI en tres horas cada uno. Los temas a cubrir son la adopción, tutela, curatela y régimen de bienes. Las estrategias de enseñanza son exposiciones, investigaciones en clase y presentaciones de grupos. La evaluación considera asistencia, participación, disciplina, respeto, colaboración e investig
El documento describe el desarrollo psicológico, conductual e intelectual de los niños entre 6 y 7 años. En esta etapa, los niños se desarrollan a través del juego, el aprendizaje escolar y las interacciones sociales. La entrada a la escuela les permite aprender nuevas habilidades sociales mientras se enfrentan a un ambiente menos protector.
This document discusses using co-annotation analysis and biological knowledge as a quality control procedure for gene ontology (GO) annotations. It describes a multi-step iterative process where GO term intersections are identified and used to generate rules of expected relationships. Violations of these rules are reported to contributing databases and used to identify annotation errors. The process led to over 100 rules being created and 83 rule violations found by analyzing 568 gene annotations across several species. Future work involves implementing the rules directly into the GO pipeline and expanding the approach to additional biological processes and functions.
The document discusses protein structure and its importance in determining protein function. It covers several key points:
1) There are multiple levels of protein structure from primary to quaternary structure. Higher-order structures like tertiary structure bring distant parts of the amino acid sequence into proximity, allowing proteins to perform their functions.
2) Protein structure is determined by the amino acid sequence through the physical properties of residues. The sequence encodes the folding pathway that results in a stable, functional 3D structure.
3) Experimental methods like X-ray crystallography and NMR spectroscopy are used to determine high-resolution protein structures that reveal how structure enables function. Databases like PDB archive and classify protein structures.
This document discusses various topics relating to protein structure and bioinformatics. It begins with an overview of protein structure and why understanding protein structure is important. It then discusses the different levels of protein structure from primary to quaternary structure. Methods for determining protein structure like X-ray crystallography and NMR are mentioned. Databases for storing protein structures like the Protein Data Bank are also summarized. The document touches on topics like protein folding, domains, membrane protein topology, and secondary structure prediction methods.
The document discusses:
1) An overview of bioinformatics lessons including introductions to databases, scoring matrices, and pairwise sequence alignment.
2) Descriptions of major bioinformatics databases and resources including NCBI, ExPASy, and EBI.
3) The importance of scoring matrices in sequence analysis and how the choice of matrix can influence outcomes. Matrices are discussed for nucleotides and proteins.
This document discusses different levels of protein structure from primary to quaternary structure. It explains that primary structure refers to the amino acid sequence of a protein. Secondary structure describes local folding patterns like alpha helices and beta sheets. Tertiary structure is the overall 3D shape of a single protein chain that results from folding. Quaternary structure involves the shape and interactions of multiple protein subunits. The document provides examples and diagrams to illustrate each level of structure and how they relate to determining a protein's function.
The document discusses various topics in bioinformatics and protein structure. It provides an overview of ongoing thesis topics at Biobix including biomarker prediction, methylation, metabolomics, peptidomics, and more. It also discusses the rationale for understanding protein structure and function, levels of protein structure from primary to quaternary, methods for determining structure like X-ray crystallography, and approaches to secondary structure prediction including Chou-Fasman.
This document discusses functional annotation and the Gene Ontology. It describes how functional annotation attaches biological information to sequences through searches of databases for homology, domains, and pathways as well as manual curation. Searches include BLAST for homology, Pfam and InterPro for domains, and KEGG and Reactome for pathways. Assignments include EC numbers for metabolic pathways and Gene Ontology terms from automated and manual annotation. Manual annotation combines all evidence and allows incorporation of experimental data but requires more time.
Cross Product Extensions to the Gene OntologyChris Mungall
The document discusses extending the Gene Ontology (GO) through assigning logical computable definitions to GO classes. This involves partitioning GO classes into "cross product" sets based on the ontologies used in the definitions. Over 13,000 GO classes now have provisional logical definitions assigned using this approach, covering molecular function, biological process, cellular component, and other ontologies. The logical definitions allow for nested descriptions and reasoning over GO classes. Anatomy classes are standardized in the Uberon cross-species anatomy ontology.
Computational Prediction Of Protein-1.pptxashharnomani
This document discusses computational methods for predicting protein structure, including homology modeling, fold recognition/threading, and ab initio prediction. Homology modeling predicts structure based on sequence similarity to proteins with known structures. It involves aligning the target sequence to template structures, then modeling secondary structure, loops, and side chains. Accuracy depends on template quality and sequence identity above 30%. Fold recognition matches sequences to structure folds without clear homology. Ab initio prediction predicts structure from sequence alone using physics-based forces.
1) The document discusses various bioinformatics databases including nucleotide databases like GenBank that contain nucleic acid sequences, protein databases like PDB that contain 3D protein structures, and specialized databases like dbSNP that contain human single nucleotide variations.
2) It also discusses tools for analyzing sequences like BLAST for similarity searches, multiple sequence alignments, and genome browsers for interactively viewing complete genomes.
3) Feature annotation is described as the process of identifying genes and other biological features in DNA sequences to increase their usefulness to the scientific community.
Homology modeling is a technique used to predict the 3D structure of a protein based on the alignment of its amino acid sequence to known protein structures. It relies on the observation that structure is more conserved than sequence during evolution. The key steps in homology modeling include: 1) identifying a template structure through sequence alignment tools like BLAST, 2) correcting any errors in the initial alignment, 3) generating the protein backbone based on the template structure, 4) modeling any loops or missing regions, 5) adding side chains, 6) optimizing the model structure energetically, and 7) validating that the final model matches the template structure and has correct stereochemistry. Homology modeling is useful for applications like structure-based drug design
This document discusses various methods for predicting protein function from sequence and structure. It begins by explaining the importance of predicting protein function for applications like disease diagnosis and drug discovery. It then outlines different types of data that can be used for functional prediction, including sequence, structure, expression profiles, and interactions. Both sequence-based methods like homology searches and domain identification as well as structure-based approaches are covered. Specific tools discussed include BLAST, Pfam, SCOP, CATH, and ProFunc. The document emphasizes that functional prediction is challenging given proteins can have multiple functions and homology does not always imply similar function. It also notes limitations of simple homology searches.
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
This document discusses protein structure and bioinformatics. It begins by explaining the rationale for understanding protein structure and function, including determining protein sequences, structures, and relating this to function. It then covers levels of protein structure from primary to quaternary, methods for determining protein structures like X-ray crystallography, and uses of protein modeling and databases. The document provides examples of protein domains, folds, and membrane protein topology. It emphasizes that sequence determines conformation and that structure implies function.
Single-cell RNA sequencing (scRNA-seq) allows researchers to analyze gene expression at the individual cell level, exposing heterogeneity that is hidden in bulk tissue analysis. There are various platforms for scRNA-seq that differ in throughput and customizability. Experimental design considerations include the number of cells to sequence, desired sequencing depth, and controlling for batch effects. The analysis workflow generally involves processing and filtering data, normalization, clustering, differential expression analysis, and trajectory inference to reconstruct cellular responses.
This is a comprehensive account for homology modeling and protein docking do's and dont's. Also, it briefly discusses the modes of research reproducibility one could use.
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
Techniques used for separation in proteomicsNilesh Chandra
Proteomics aims to characterize the complete set of proteins in a biological system. It faces challenges due to sample complexity and wide protein concentration ranges. Common separation techniques include 2D electrophoresis, 2D-DIGE, ICAT, SILAC, iTRAQ, MudPIT, and protein microarrays. Mass spectrometry is central to protein identification. Data analysis is challenging due to the large datasets and lack of standardization. Effective proteomics requires optimized multi-step workflows combining separation, labeling, mass spectrometry, and bioinformatics.
Critical Reading Biomedical Research Papers-2022.pptxMingdergLai
1. The study investigates whether the ATAC complex, which contains the histone acetyltransferase Gcn5, regulates mitotic progression.
2. Experiments using siRNA to knock down subunits of ATAC and SAGA complexes in NIH-3T3 cells show that ATAC knockdown, but not SAGA knockdown, leads to mitotic defects including delayed or asymmetric cell divisions.
3. Further experiments localize ATAC subunits to mitotic cells and show that the ATAC complex remains intact during mitosis.
Similar to PomBase conventions for improving annotation depth, breadth, consistency and accuracy (20)
This document discusses the state of characterization of the eukaryotic proteome. It notes that as of 2019, around 20% of proteins in fission yeast and humans are still classified as having unknown biological processes. While the number of known or inferred protein roles has increased since 1992, progress in characterizing unknowns has been slow. Many recently characterized proteins in fission yeast are involved in non-core functions like environmental response, aging, and damage accumulation. The document calls for more research on these unknown and less studied proteins that are "hidden in plain sight" within the eukaryotic proteome.
Curators are necessarily detail oriented -- a trait born of, and reinforced by, our efforts to describe biological data accurately and precisely. To ensure comprehensive coverage and meaningful integration of new and existing knowledge, however, it is important to periodically step back from this fine-grained view and assess emergent features in accumulated curation. I will explore how PomBase has used the global "big picture" view of curated data to provide biological summaries, modularise content, and improve data display and access for our users. The global perspective can also be used to detect annotation errors and identify knowledge gaps, thereby improving overall annotation quality. I will also describe the progress we have made in engaging fission yeast researchers in community curation. Finally, I will show that the global curation perspective and community engagement share a common theme: both improve overall understanding, accessibility and reuse of accumulated knowledge by our user community.
This document discusses the use and development of Gene Ontology (GO) slims. GO slims provide a summary of an organism's biology using a reduced set of GO terms with less redundancy. The document outlines best practices for creating useful GO slims, including: 1) balancing coverage of annotated genes with biological relevance of terms; 2) minimizing overlaps between terms; and 3) splitting or lumping related terms as needed to improve biological interpretation. An iterative process is proposed to continually evaluate and improve existing GO slims.
1. The document discusses the Community Genome Organization, which provides a database for researchers studying the fission yeast Schizosaccharomyces pombe, also known as pombe or beer yeast. The database has over 5,000 S. pombe proteins annotated with information on their functions, localization, and evolutionary relationships.
2. A brief history of S. pombe is given, noting that it was first described in 1893 and standard lab strains were isolated in the 1940s. It has traditionally been used as an inebriant in Eastern Africa.
3. Statistics are provided on the number of annotations in the PomBase database, which is curated by both the community and PomBase specifically.
The document discusses upcoming new features and pages for the Pombase website including publication details, genotype descriptions, phenotype annotations, and building curated networks that connect genes and gene products through their biological activities and processes to represent pathways in a visual format. It also outlines current limitations and plans to further develop network displays and features to better represent curated gene interaction and regulation data.
This document discusses classifying and characterizing proteins of unknown function in the fission yeast Schizosaccharomyces pombe. It begins by showing the progress that has been made in characterizing known proteins, but that the number of unknown proteins is decreasing only gradually. It then discusses classifying proteins as unknown if their broad cellular role is not known. The rest of the document proposes different methods for predicting the potential functions of unknown proteins, such as identifying informative features, clustering unknown proteins based on similar features, and predicting potential functions by finding the best matching known proteins. It emphasizes the need for high-quality curation and experimental data to make accurate predictions.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
2. Annotation numbers are important
…but numbers aren’t everything…..
• Use of annotation for data-mining and data-analysis is limited
by errors, inconsistencies and omissions.
• PomBase uses a combination of annotation conventions, to
improve information content (annotation coverage, specificity
and redundancy), and QC mechanisms to identify possible
annotation inconsistencies and errors.
• In combination these mechanisms address many recurring
annotation issues.
3. 1. The definition is critical
All ontology terms have a “fixed” definition
• If a definition is misleading or incorrect its meaning cannot
be changed. To fix the term is obsoleted and annotations
are migrated.
• This makes annotations very robust to ontology changes. If
a term needs to be repositioned the annotations remain
correct .
• We annotate to the definition, not the term name. Always
check the definition.
4. 2. Improving annotation specificity
• i) Consider descendant terms
• ii) Veto use of uninformative terms
5. 2i. Consider descendants
Annotate as specifically as experiment allows and be
unambiguous about the biology
• regulation: positive or negative?
• translation: cytoplasmic or mitochondrial?
• transport: of what? to where? how?
• chromosome segregation: mitotic or meiotic?
If the available terms are insufficient, request a more specific
term
6. • For a carboxylic acid carrier
“carboxylic acid transport”
looks initially OK
• However “transmembrane transport”
is not explicit here… Carboxylic acid
might be transported in other ways…
2i. Consider descendants e.g.
7. More specific annotation can
provide additional detail e.g.
• substrate,
• type (transmembrane),
• sometimes directionality
Additional parents increase the
information content as
annotating indirectly to more
terms.
2. Consider descendants e.g.
8. 2. Veto use of non-specific terms
Identify the set of ontology terms where more specific
annotation should be possible (more biological detail)
Examples:
• e.g. cellular process (which one?)
• e.g. translation (cytoplasmic? mitochondrial?)
• e.g. transport ( of what? to where? )
Some GO terms are already flagged as not for manual
annotation. Review and improve annotations to vetoed terms
PomBase blocks 1298 upper level GO terms for direct
annotation (<200 violations)
9. 3. i) Missing parents
Original arrangement
3. Improve the ontologies
10. 3i. Missing parents
These process annotations were originally in different branches
of the ontology, so all annotations were required
12. 3.i Missing parents
Collapsed 6 processes to 2. Exactly the same information content
Less redundancy, easier for users to interpret annotation
13. 3.ii Report incorrect parents
AKA “True Path Violations” or “TPVs”
For example
protein maturation
--protein processing (part_of)
----proteolysis (part_of)
(not all proteolysis is processing or
maturation)
14. 4. The power of Annotation Extensions
Provide additional specificity for a GO annotation e.g.
• Target gene (kinase substrate, TF regulation target)
• Location of a function
• Localization dependencies (protein A localizes protein B)
• Spatial and temporal aspects of processes, functions, locations (cell cycle stage
of occurrence)
• ADD an example of a gene product specific AE
See: Huntley et. al. A method for increasing expressivity of Gene Ontology
annotations using a compositional approach. PMID:24885854
15. cyclin-dependent protein serine/threonine kinase
• has substrate fkh2 involved in negative regulation of conjugation with cellular fusion
• directly inhibits srw1 involved in positive regulation regulation of G1/S transition
• has substrate drc1 involved in positive regulation of mitotic cell cycle DNA replication
• has substrate cdc18, orc2 involved in negative regulation of DNA replication during mitotic G2 phase
• has substrate xlf1 involved in negative regulation of double-strand break repair via nonhomologous end joining,
during mitotic G2 phase
• has substrate rap1 involved in negative regulation of mitotic telomere tethering at nuclear periphery
during mitotic M phase
• has substrate hcn1 during mitotic M phase
• has substrate cut3 involved in positive regulation of mitotic chromosome condensation during mitotic metaphase
• has substrate mde4 involved in correction of merotelic attachment, mitotic during mitotic metaphase
• has substrate, nsk1, involved in negative regulation of attachment of mitotic spindle microtubules during mitotic
metaphase
• has substrate mde4,cut7 involved in negative regulation of mitotic spindle elongation during mitotic metaphase
• has substrate klp9 involved in negative regulation of mitotic spindle elongation during mitotic anaphase A
• directly inhibits clp1 involved in negative regulation of exit from mitosis
• has substrate byr4 involved in positive regulation of septation initiation signaling
• directly inhibits dis2,
• has substrate rum1, crb2, sds23
Link function (cyclin-dependent-kinase) to target genes, processes,
and temporal information
4. Annotation Extension e.g. cdc2
17. 4. Using AE for effectors
• Reciprocal of the extension (automated) called “target of”
• Collects known “upstream effectors” on cdc2 page
18. • We can use effector substrate connections to generate
networks (interaction, metabolic, regulatory)
• Provide directional links to support pathway reconstruction
4. Using Annotation Extensions to
generate networks/pathways
sty1
cmk2
srk1
rum1
atf1
srk1
gsa1
gpx1
ntp1
sro1
ish1
19. 4. Automated AE networks e.g.
44/59 connected in automated network based on annotated
connections within “regulation of G2/M transition” (fission yeast)
(Network for each GO slim category from the slim page)
20. 5. Suppress redundant IEA annotation
• PomBase pipelines filter redundant IEA
(Inferred from Electronic Annotation)
evidence
• Removes >90% of IEA (because an existing
manual annotation exists)
21. 5. Suppress redundant IEA annotation
13 annotations are reduced to 4
Same information, fewer terms
22. Incorrect annotations are more easily spotted
Mis16 is not involved in ‘chromatin modification,- > fix mapping
5. Suppress redundant IEA,
QC of mappings
23. Missing parents in ontology more obvious
“inorganic anion exchanger” should be an ‘ancestor’ of
GO:0005452, to suppress the IEA as redundant
5. Suppress redundant IEA,
QC of ontology
(SPBC543.05c)
24. 5. Suppress redundant IEA annotation
• >40,000 fission yeast IEAs available.
• PomBase filter 36000 redundant, retain 4000 (IEAs are at least
90% accurate if manual correct).
• It is easier to evaluate the remaining IEA’s to identify/fix
anomalies
Reducing IEAs over time
25. 5. Suppress redundant IEA
• More concise view with zero loss of information
• IEA mappings derived from a single experiment/publication
can be interpreted as proof by repetition and make weak EXP
data appear multiply supported/acceptable
• Fewer annotations, easier QC of remaining IEA’s
Q “Why isn’t an IEA covered by manual annotation?” Either:
1. Incorrect mapping
2. Missing parent in ontology
3. Missing annotation -> find supporting evidence and
annotate manually (EXP or ISO)
(PomBase also filter NAS/TAS/IC)
26. 6. Annotate by process (pathway)
• Annotating by process rather than “ad hoc”
improves consistency and allows ‘annotation
gaps’ to be targeted
• Process papers more quickly (become more
familiar with the field, experimental methods)
Become familiar with an area of biology and
the techniques used. Don’t need to read the
background every time. Recognise
phenotypes.
27. From PMID:22898774
Regulation of the
metaphase/anaphase
transition by the MCC, the
APC and upstream
Signalling
Identify obvious missing
annotation, for example
between complex
members
6. Annotate by process or pathway
28. 6. Annotate by process or pathway
cdc20
proteasome
APC separase
Cohesin subunit
securin
Post transition
SAC/MCC
Can perform QC on processed or components
e.g. Use STRING to evaluate outliers (potential annotation
errors) Input list “regulation of mitotic metaphase/anaphase
transition”
Can also ask “are any
Complex members missing”
29. • We are annotating whole organisms…use a
holistic whole annotation approach
• Evaluate annotation breadth (coverage) using
slims
• Evaluate intersections between slim processes
7. Assess annotation at the
organismal level
30. 7. Evaluate organismal annotation
coverage using “slims”
• EXP supported BP
• ISO/IEA inferred BP
‘unknowns’
• Species specific, no
inference possible
• Conserved, but
unannotated in any
species
33. 7. Monitor unslimmed gene products
Note: Exclude biologically uninformative terms like “phosphorylation” or
“response to chemical” as these could apply to any real biological role.
34. Unknown 830
TOTAL
5054
cytoskeleton
org 206
nuclear DNA
replica on,
recombina on,
repair
305
mito c
chromosome
segrega on
184 regula on of mito c
cell cycle 232
10
CELL DIVISION 751
27
cytokinesis
110
0
39 1
46
3
4. MITOCHONDRIAL
ORG/EXP
280
4
cell wall
org 1303
4
1
MEMBRANES, TRAFFICKING, CELL SURFACE 787
14
lipid met
222 vesicle
Mediated
transport
324
6
glycosyla on
polysacc met
140membrane
org 199
75
0
6
74
10
33
0
detox
SMALL MOLECULE TM
TRANSPORT
288
13
9
0
AA &
sulfur
met
220
vitamin
cofactor
met
9
5 nucleo-base/
side/ de met
219
small
sugar met
77
CENTRAL MET,
ENERGY
AND BUILDING
BLOCKS 549
Nitrogen
15
25
174
54
3430
other energy
genera on
25
23
signalling
404
sexual reproduc ve
process 262
(Many intersec ons)
Other 290
No intersec ons.
Includes adhesion,
many proteases,
peroxions
EXPRESSION 1294
````
EXPRESSION submod 863
4
1
3
ribosome
biogenesis
317
RNA
metabolism
772cytoplasmic
transla on
249
189
c
nucleocyto
transport
110
5
34
26
2
Transcrip on
479
32
18
PROTEIN ASSEMBLY/STABILITY 765
protein
catabolism
& autophagy
251
ubiqui na on
192
63
folding
102
complex
Assembly
325
1
3
4
1
7. Visual slim, all pombe proteins
35. 7. Evaluate intersections between slim
categories
Evaluate intersections between processes
Many GO processes are rarely co-annotated because they are
functionally spatially or temporally distant. For example, would
not expect “ribosome biogenesis” to intersect with “vitamin
metabolism”
We can use this observation to identify potential conflicts using
the GO term matrix
38. 7. Identifies ontology errors (e.g)
DNA metabolism and chromosome segregation do not usually intersect
Regulation of chromosome condensation should not be a DNA metabolic process
39. 7. Ontology error (e.g.)
Genes annotated to folic acid metabolism were also incorrectly annotated to amino acid
metabolism. Folic acid was classified as an amino acid by CHEBI -> fix, CHEBI, which fixes GO
40. 7. Finds incorrect mappings (e.g)
Intersect between tRNA metabolism and transcription.
Elongator is no longer thought to have a direct role in transcription, mapping removed
41. 8. Consider Author intent
Think about the biology the author intended
e.g. rubidium ion transmembrane transporter/ transport
Rubidium ion is used as an assay for K+ transport not rubidium
(non-physiological substrate)
e.g. Apoptosis (RPS19)
Rps19 mutant displayed condensed DNA, a fragmented nucleus
and caspase activation - indicative of apoptosis.
Since RPS19 has an essential role in ribosome biogenesis
apoptosis is likely to be an indirect effect of the disruption of an
upstream process translation (i.e. an experimental readout)
42. 9. Communication with the author
and community curation
• Most authors are happy to discuss their publications. If unsure
about an annotation ask them. PomBase routinely use the
authors as a QC step to refine annotation.
43. 9. Community Curation
• Most authors are happy to curate their own papers
• Co-curation by author and curator improves annotation quality
(especially PhD/post doc/recent papers).
• 9619 annotations (FTPO/GO/MOD) created by Community
from 510 publications (excludes HTP spreadsheet submissions)
44. Some example sessions
• http://tinyurl.com/q2bgyqv
• http://tinyurl.com/p7d979b
• http://tinyurl.com/o72bzul
45. Very specific annotation is possible because Canto guides the user
step by step to construct genotypes and ontology based annotations.
“Drill down” to more specific terms is assisted.
Prompts are provided for AE of specified types for certain terms.
46. 10. Prioritise error fixing
• Fixing known errors takes precedence over new annotation....
like critical bugs in code
• Even small errors often uncover larger issues, or can fix many
problems simultaneously across multiple species.
• Prevents propagation of annotation errors
47. 11. GO process vs. phenotype
• GO annotation should reflect a gene's direct involvement
in, or role in regulating, processes or functions.
• Phenotypes may indicate that a mutation *affects* a
process, but may reflect downstream or indirect effects.
e.g. ER membrane defect -> nuclear envelope defect -> chromosome
decondensation defect-> defects in next round of DNA replication.
• A “DNA replication phenotype” alone is not enough to
make a “DNA replication” GO annotation.
• Single phenotype is often NOT SPECIFIC FOR A PROCESS.
48. Phenotype annotation rules
• To make GO annotations based on phenotypes
• Ask the question
“Is this phenotype or collection of phenotypes
specific to this process (usually need detailed
phenotypes)
Additional data can support GO inference from
phenotype (location, orthology), and author intent.
(Intersections between processes useful for identifying
annotation errors caused by indirect annotation)
describe some pombase curation procedures, might be useful to other daabases/curators
Coverage, genes annotated OR number of different processes for a gene
Another improtant poitn is that annotations are explicity coupled by using a term which covers both (although this can also be done with extensions)
Arrange temporally?
NOTE: we don’t filter redundant EXP annotations, but we do manage this in the display so the term is presented and the source (often multiple) is avaiable in a full view
Later we hope to hide higher level EXP annotations
Complexes cluster together, some genes incorrectly annotated, can work out how they are connected, check appropriate sub porcesses annotated fr complexes, complex annotations are internally consistent etc
Add error type examples
CHEbi u= sed to define chemicals in GO
This isn’t speculative, its the curator using what is known but not explicitly stated, it’s a valid interpretation of the experiment based on what is presented- we are modelling the biology