Ondex is a data integration and visualization platform used to integrate large amounts of biological data from multiple sources. It transforms the data into a graph of biological concepts and relationships. Ondex allows users to integrate data, perform semantic alignment of concepts, and visualize the integrated network. Filters and annotators can then be used to highlight specific areas of interest within the large integrated network. Ondex has been applied to problems such as candidate gene prioritization, pathway mapping, and analysis of quantitative trait loci regions in plants.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
Data is the most powerful resource in any field or subject of study. In Biology, data comes from scientists and their actions, while any institution that makes sense of the data collected, will be in the forefront in their respective research field. In the beginning of any data collection endeavour, it is critical to find proper management techniques to store data and to maximise its utilisation. This presentation reflects upon the current trends and techniques of data modeling, architecture with a highlight on the uses of database, focusing on Bioinformatics examples and case studies. Finally, the future of bioinformatics databases is highlighted to give an overview of the modeling techniques to accommodate the biological data escalation in coming years.
Bioinformatics for beginners (exam point of view)Sijo A
. The term bioinformatics is coined by…………………………….
Paulien Hogeweg
2. What is an entry in database?
The process of entering data into a computerised database or spreadsheet.
3. Define BLASTp
BLAST- Basic Local Alignment Search Tool
It is a homology and similarity search tool.
It is provided by NCBI.
It is used to compare a query DNA sequence with a database of sequences.
4. What is Ecogenes?
Ecogene is a database and website and it is developed to improve structural and functional annotation of E.coli K-12 MG 1655.
INTRODUCTION
DEFINITION OF BIOINFORMATICS
HISTORY
OBJECTIVE OF BIOINFORMATIC
TOOLS OF BIOINFORMATICS
PROCEDURE AND TOOLS OF BIOINFORMATIC
BIOLOGICAL DATABASES
HOMOLOGY AND SIMILARITY TOOLS (SEQUENCE ALIGNMENT)
PROTEIN FUNCTION ANALYSIS TOOLS
STRUCTURAL ANALYSIS TOOLS
SEQUENCE MANIPULATION TOOLS
SEQUENCE ANALYSIS TOOLS
APPLICATION
CONCLUSION
REFERENCES
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
Data is the most powerful resource in any field or subject of study. In Biology, data comes from scientists and their actions, while any institution that makes sense of the data collected, will be in the forefront in their respective research field. In the beginning of any data collection endeavour, it is critical to find proper management techniques to store data and to maximise its utilisation. This presentation reflects upon the current trends and techniques of data modeling, architecture with a highlight on the uses of database, focusing on Bioinformatics examples and case studies. Finally, the future of bioinformatics databases is highlighted to give an overview of the modeling techniques to accommodate the biological data escalation in coming years.
Bioinformatics for beginners (exam point of view)Sijo A
. The term bioinformatics is coined by…………………………….
Paulien Hogeweg
2. What is an entry in database?
The process of entering data into a computerised database or spreadsheet.
3. Define BLASTp
BLAST- Basic Local Alignment Search Tool
It is a homology and similarity search tool.
It is provided by NCBI.
It is used to compare a query DNA sequence with a database of sequences.
4. What is Ecogenes?
Ecogene is a database and website and it is developed to improve structural and functional annotation of E.coli K-12 MG 1655.
INTRODUCTION
DEFINITION OF BIOINFORMATICS
HISTORY
OBJECTIVE OF BIOINFORMATIC
TOOLS OF BIOINFORMATICS
PROCEDURE AND TOOLS OF BIOINFORMATIC
BIOLOGICAL DATABASES
HOMOLOGY AND SIMILARITY TOOLS (SEQUENCE ALIGNMENT)
PROTEIN FUNCTION ANALYSIS TOOLS
STRUCTURAL ANALYSIS TOOLS
SEQUENCE MANIPULATION TOOLS
SEQUENCE ANALYSIS TOOLS
APPLICATION
CONCLUSION
REFERENCES
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
introduction,history scope and applications of
relation to other fields , bioinformatics,biological databases,computers internet,sequence development, and
introduction to sequence development and alignment
Using ontologies to do integrative systems biologyChris Evelo
To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.
Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology
It is part of this series: http://www.bioontology.org/webinar-series
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
Presentation about collaborative development of open source pathway analysis code and pathways and about usage in analytical software distributed with analytical machines like mass spectrophotometers.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
INTRODUCTION
DEFINITION OF BIOINFORMATICS
HISTORY
OBJECTIVES OF BIOINFORMATICS
TOOLS OF BIOINFORMATICS
BIOLOGICAL DATABASES
HOMOLOGY AND SIMILARITY TOOLS (SEQUENCE ALIGNMENT)
PROTEIN FUNCTION ANALYSIS TOOLS
STRUCTURAL ANALYSIS TOOLS
SEQUENCE MANIPULATION TOOLS
SEQUENCE ANALYSIS TOOLS
APPLICATION
CONCLUSION
REFERENCES
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Presented by Richard Kidd at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry", Monday 28 November 2011 at The Linnean Society, Burlington Square, London run by the RSC CICAG group.
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
introduction,history scope and applications of
relation to other fields , bioinformatics,biological databases,computers internet,sequence development, and
introduction to sequence development and alignment
Using ontologies to do integrative systems biologyChris Evelo
To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.
Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology
It is part of this series: http://www.bioontology.org/webinar-series
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
Presentation about collaborative development of open source pathway analysis code and pathways and about usage in analytical software distributed with analytical machines like mass spectrophotometers.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
INTRODUCTION
DEFINITION OF BIOINFORMATICS
HISTORY
OBJECTIVES OF BIOINFORMATICS
TOOLS OF BIOINFORMATICS
BIOLOGICAL DATABASES
HOMOLOGY AND SIMILARITY TOOLS (SEQUENCE ALIGNMENT)
PROTEIN FUNCTION ANALYSIS TOOLS
STRUCTURAL ANALYSIS TOOLS
SEQUENCE MANIPULATION TOOLS
SEQUENCE ANALYSIS TOOLS
APPLICATION
CONCLUSION
REFERENCES
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Presented by Richard Kidd at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry", Monday 28 November 2011 at The Linnean Society, Burlington Square, London run by the RSC CICAG group.
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
OVium Bio-Information Solutions use forefront algorithms to analyze key data resources such NCBI, EBLM and PDB to develop cell signal pathways.
OVium employs cloud and MPP computing solutions with homology and signal network mapping to develop chemical and protein pathways for discovery research.
A full picture of -omics cellular networks of regulation brings researchers closer to a realistic and reliable understanding of complex conditions. For more information, please visit: http://tbioinfopb.pine-biotech.com/
T-Bioinfo is a comprehensive bioinformatics platform that allows the user to navigate NGS, Mass-Spec and Structural Biology data analysis pipelines using consistent interface. Analysis and integration of such data allows for better and faster discovery and optimization of personalized and precision treatment of complex diseases and understanding of medical conditions. For more information, go to pine-biotech.com
Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Life sciences in general such as genetics and biology have traditionally benefited from Perl with excellent projects leading the way such as BioPerl. Unfortunately, in medical research and epidemiology, the picture is different. Researchers are struggling with the ever increasing size and complexity of datasets. This presentation will briefly describe the situation I faced when I first joined a research team working on coronary heart disease, what I did to make things better and how I achieved one small victory for Perl.
Poing: a coder’s take on protein modellingBiogeeks
Poing is a protein structure and folding model, designed to predict the tertiary structure of a protein from its sequence. I’ve been developing Poing for five years, after moving into computational biology from a background in software engineering. I’ve tried to keep the engineering ethos whilst dealing with the vagaries of scientific enquiry. My talk will focus on the engineering aspect, and how I’ve used a combination of C++, Python, various Python libraries, Subversion and server farms to produce a fairly slick workflow for both software engineering and developing and using the protein structure model. I will also talk about what I would have done differently with the benefit of hindsight.
Identifying genes and proteins in text: a short review of available tools and...Biogeeks
Nathan from Imperial College London, gave a presentation at London Biogeeks on Thursday 24 Feb, between 6 - 6.30pm at King’s College London, Rm 1.20, Franklin Wilkins Building, Waterloo Campus, Stamford Street, London, SE1 9NH, see: biogeeks.wordpress.com/2011/02/16/ february-tech-meet-24th-kcl/
His presentation was about identifying genes and proteins in text: a short review of available tools and resources
Abstract below:
The ever-increasing publication rate now means that manually extracting information from biological papers is now intractable. This situation has led to a sustained interest in the application of text mining (TM) methods to the biological literature. The first stage in any text-mining pipeline is to recognise named entities in text (a process called Named Entity Recognition or NER). I will discuss the basic concepts behind these methods and provide a basic evaluation of some of the freely available software (standalone and web services).
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...Biogeeks
N Purswani, L. Tweedy, Z. Patel, C.Suriel-Melchor – DASbrick: A cloud based Rich internet application for Synthetic Biology Parts Registries.
Biobricks are an important standard in the emerging field of synthetic biology. These standardised genetic parts are essentially lego for geneticists (or at least the ones who don’t play with actual lego). They allow for an easy, modular approach to genetic engineering and their component-like nature is a valuable starting point for the computer aided design of complete genetic systems.
Such tools rely on informatics tool and a well-kept database is vital. The current repository for biobrick information is at MIT, and has been around since the inception of the idea. As such, it contains parts of multiple standards, and the quality of their annotation is extremely variable. There is a need for alternative tools, and DASBrick, a client-server system employing the widely used Distributed Annotation System, aims to address this need.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
14. Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
15. Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
19. Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Ontology of Concept Classes, Relation Types and additional Properties
20. Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Concepts: Relations:
24. Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
25. Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
26. Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous data sources Ondexgraph warehouse Integration Methods Ondex Visualization Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
27. Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
34. Filters Integrating different datasets large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
46. Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network Ondex resizes the concepts based on this score Scale Concept by Value Pie charts Up/down regulation is indicated in red/green
47. AraCyc ONDEX Application case2: Mapping microarray expression data to integrated pathways Parser tab file Arabidopsis C/N uptake OXL tab file Jan Taubert Accession based Mapping usingTAIR IDs Ondex Interactive exploration Enriched spreadsheet, e.g. AraCyc pathways
54. Application case 3: Arabidopsis PPI network Artem Lysenko IntAct TAIR BioGRID Mapping the 3 databases based on TAIR accessions
55. Adding 3 sources of evidence co-expression sequence similarity co-occurrence in scientific literature facilitate the identification of functionally related groups of proteins
56. Added attributes to nodes/edges Network stats Betweenness centrality (BWC) How influential (bridge) Degree centrality (DC) Hub likeness Markov Clustering Identifies strongly connected groups of proteins in the network
60. Filters, annotators and layouts Combination of these three types of tools in Ondex a more complex application case …
61. Application case 4: Bioenergy Project Use bioinformatics to support phenotype-genotype research in bioenergy crops Given a phenotypic variant is it possible to pin down the relevant genes? Develop tools to support systematic analysis of QTL regions to pin down relevant genes Identify genes implicated in biomass production in willow Prioritise genes for experimental validation Keywan Hassani-Pak Biofuel Conversion Process http://www.jgi.doe.gov/education/bioenergy/bioenergy_1.html
62. QTL and Genomic Data QTL Willow genome is not sequenced yetQTL may encompass many potentialcandidates, perhaps hundreds Poplar is the first tree with fully sequenced genome 19 Chromosomes, 45778 predicted genes 4x larger than Arabidopsis genome Not much known about the function of the genes
63. Linking genes to data sources Linked References model e.g. Poplar, Arabidopsis Willow Pathways Plant Hormones QTL Map Orthologous Markers Physical map Expression Patterns Genes Gene Function List of candidate genes linked to biological processes
64. Relevant Data Sources Release 15.10 Poplar Gene Prediction v2.0 (Jan 2010) All plants: 739,396 proteins Reviewed: 28,404 proteins (3,84%) PoplarCyc 1.0: 285 pathways, 3434 enzymes, 1363 compounds (Oct 2009) Pfam 24.0: 11,912 protein families (Oct 2009) Poplar Transcription Factors - DPTF: 2,576 putative TF (March 2007) - PlnTFDB: 2,901 putative TF (July 2009) 29,365 GO terms (Jan 2010) Poplar/ Willow QTL - work in progress - preliminary dataset available Only loading referenced publications ~15,000 articles
65. Unique Knowledge Base for Poplar Proteins annotated with functional information and publications Based on Comparative genomics and Protein familyanalysis Genes, QTLs enriched withpositionalinformation Data integration was done in Ondex
66. Ondex Genomics Layout Genomic Layout displays chromosomes, genes and QTLs Chromosomal regions and QTLs can be selected
68. Phenotypic Information in Literature HMMer: 650581 – HLH E-Value: 3.4E-7 Score: 30.0 BLAST 217086 – LAX E-Value: 8.3E-17 Score: 80.88 BLAST 217086 – BHLH63 E-Value: 8.3E-9 Score: 54.3 PMID:13130077 “LAX and SPA: major regulators of shoot branching in rice.” Poplar protein 217086 We identified two remote homologs in Rice (LAX) and in Arabidopsis (BHLH63), as well as one protein domain HLH The LAX homolog contains evidence to be a major regulator of shoot branching Hypothesis generation
Light pink – Increased virulenceLight blue – Reduced virulenceLight Green – Loss of pathogenicityYellow – Unaffected pathogenicityStar – animalCircle – plant
Virtual KO scoreis based on 3 other scores: - "extension" gives the number of paths that would be extended if a concept was added- "deletion" gives the number of paths that would be deleted if this concept was deleted- "nochange" gives the number of paths that would not be shortened/extended if this concept was deleted
IntAct4625 protein interactions (data derived from literature curation or direct user submissions)TAIR (The Arabidopsis Information Resource) – 1143 interactionsgenome sequence, gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publicationsBioGrid (General Repository for Interaction Datasets)collections of protein and genetic interactions from major model organism species1223 interactions for Arabidopsis derived from high-throughput studies and conventional focused studies
ATTED II (Arabidopsis thalianatrans-factor and cis-element prediction database)provides co-regulated gene relationships in Arabidopsis to estimate gene functionsgives the Pearson correlation coefficients of co-expressed genes in Arabidopsis calculated from available microarray dataNCBI PSI-BLASTidentify similarities between our reference set of proteinsMatching against Arabidopsis subset of UNIPROTCo-occurrence of protein names25,900 Medline abstracts related to Arabidopsis ThalianaIntegrated Lucene-based mapping method
Solid biomass (in the form of plants and trees) can be converted into liquid fuels (such as ethanol, methanol, and biodiesel)The challenge lies in efficient conversion,creating more energy than the input required to produce itincrease biomass yieldDevelop means to support systematic analysis of QTL regions and prioritise genes for experimental analyses identify genes controlling biomass production in willow
QTL are genomic regions that assign variations observed in a phenotype to a region on the genetic mapBiomass traits: branching, height, leaf number etc.Going from Willow to Poplar to Arabidopsis and other species
Reduced hypothesis space from 100 potential candidates to 3 hot candidates.Next steps: Cloning and transformation for experimental validation.