Mining biological knowledge networks for
gene-phenotype discovery
Keywan Hassani-Pak
EBI course: Introduction to Omics data integration
March 2017
http://knetminer.rothamsted.ac.uk/
@KnetMiner
• Rothamsted is the longest running
agricultural research station in the world,
providing cutting-edge science and
innovation for more than 170 years.
• Over 450 staff
• Bioinformatics group
• Bioinformatics Analysts
• Software Developers
About us…
Agenda for today
Kevin Dialdestoro
Stephanie Brunet
Part I Part II
Keywan Hassani-Pak
Ajit Singh
Monika Mistry
• To understand why linking genotype to phenotype is complex
• To learn which information types are useful for candidate gene prioritization
• To understand the concept of knowledge networks/graphs
• To use KnetMiner for the interpretation of your RNA-seq, QTL, GWAS results
• To learn a little bit about neurodegenerative diseases
Learning Objectives for Part I
The Genotype to Phenotype Challenge
Genotype
QTL and GWAS
Omics
Includes any ‘omics
Phenotype
Disease
Intelligence
Flowering
Stress tolerance
Biological Knowledge Discovery
Data selection, processing, transformation, integration and
interpretation
The approach is generic and works similarly for other species
• Free and open source
• Data warehousing using a graph-
database
• Platform to integrate public and private
datasets in various formats
• Provides a GUI, CLI, APIs and workflows
for reproducible data integration
Ondex – Data Integration Platform
Ondex
www.ondex.org Not covered in this training course!
Let’s start with some GWAS data…
http://plants.ensembl.org/biomart
Example Arabidopsis
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
… transform into a network
(SNP)
(Phenotype)
associated
Biological interaction datasets
http://thebiogrid.org
(SNP)
(Phenotype)
associated
… add biological interactions
• Gene-GO
• Gene-Phenotype
Gene knock-out or overexpression
Text mining publications
• Gene-Publication
• Gene-Pathway
• Gene-Expression
• Protein-Small Molecule
• Homology to other species
… add other open linked data
>800,000 nodes
>3,000,000 edges
Genome-scale knowledge network
• Progressive loss of structure or function of
neurons, including death of neurons
• Many types including Alzheimer, Parkinson,
Huntington…
• Many similarities between these diseases on
a sub-cellular level
• Discovering these similarities offers hope for
therapeutic advances that could ameliorate
many diseases simultaneously
Neurodegenerative Diseases
• Use OMIM advanced search
Query: alzheimer parkinson huntington
Tick: Search in Title
Tick: MIM Number Prefix: “# phenotype”
• Download results as Tab-delimited file
• Copy MIM ids without the prefix “#”
• Use UniProt Retrieve/ID mapping
Provide your MIM identifiers
Select option: From MIM to UniProtKB
Press Go and download all proteins in
XML format (compressed)
Tutorial data – based on 33 human genes
Integration of public datasets
Public Databases
Quantitative data
Interaction data
Omics Data
Datasets and workflows: https://github.com/Rothamsted/ondex-knet-builder
Relationships in Biological Knowledge Networks
Genes Homology AnnotationsGenetics Interactions Phenotype
• Methods needed to evaluate millions of
relationships in knowledge network, prioritize
genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision
making
• Interpretation should be the task of domain
experts i.e. biologists!
How to search and interpret too much information?
Web Browser
Server
Servlets and JSP Page
Java Socket
Knowledge
Graph DBOndex API
DHTML
JavaScript
Apache Tomcat
Multithreaded
Java Server
HTML, JSON, XML and images
over HTTP via Ajax
Views
Java Socket
KnetMiner System OverviewClient
KnetMiner UI Overview
Search
Gene View
Map View
Evidence View
Network View
1
2
3
http://knetminer.rothamsted.ac.uk/HumanDisease
KnetMiner search interface (1)
Ontology-based
term suggestions
Concept Types
e.g. GO, Trait
OR, NOT, Replace
KnetMiner search interface (2)
User provided
QTL region
Supports gene
IDs and names
Example 1
1. Search terms: "cell death" OR apoptosis
2. Open Query Suggestor
3. Click on cell death tab
4. Replace with neuron death
5. Does it change the number of
documents and genes that can be found?
Exercise 1 – Search Interface
Video 1
Gene View - Ranked genes and evidence summaries
1. Uses TF*IDF to rank documents by their relevance to a search term
2. Uses the properties of gene-evidence networks such as
 the specificity of documents to a gene
 the frequency of evidence concepts
3. Calculates Knet-Score for every gene
Smart pre-indexing of the knowledge network makes the computation of
the score very fast
Gene Ranking
Network View – Interactive network visualization
• Enlarge
• Show all
• Re-layout
• Info Box
Add hidden nodes
and edges
Example 2
Exercise 2: Gene View  Network
Search terms: Alzheimer OR Parkinson OR Heparin OR "cell death“
Gene List: APP, MAPT, PRNP
1. Click on the APP gene which loads the Network View
2. Open the Info Box
3. Click on different Concept and Relation types
4. Check their attributes and click on links to external databases
5. Explore all direct and indirect paths from APP to Alzheimer and
Parkinson
6. Hide Publication concepts
7. Show all drugs that can target the APP interaction network
8. Go back to Gene View and select Known targets
9. Click View Network and find out if APP, MAPT and PRNP
interact, are differentially expressed and have GWAS data
Video 2
Example 3
Exercise 3: Evidence View  Network
Search terms: Alzheimer "cell death"
1. Go to Evidence View
2. Sort table by column GENES
3. Find GO concept downregulation of neuron death
4. Find GO concept upregulation of apoptosis
5. Click on number of genes linked to these terms
6. In Network View, show labels for Gene and GO concepts
7. What’s the evidence linking genes to selected GO terms?
Video 3
Map View – Interactive map of chromosome, gene, SNP and QTL data
• Show
network
• Enlarge
• Reset
• Settings
GWAS studies
Example 4
Exercise 4: Map View  Network
Search terms: Parkinson "cell death“
QTL: Chromosome 12 :: 35000000 - 44000000
1. Go to Map View
2. Toggle Full Screen
3. Zoom into Chromosome 12 and find you QTL
4. Find one or several genes that are in close proximity to
GWAS SNPs
5. Select one ore more genes, e.g. LRRK2, PRNP and EIF4G1
6. Launch Network View
7. Study the network and how the genes are connected
Video 4
• Web application for very fast search of
large genome-scale knowledge graphs
• Ranking of candidate genes based on
knowledge mining
• Interactive visualisation of genome
and knowledge maps
• Facilitates hypothesis validation and
generation
KnetMiner – Making Gene Discovery Efficient & Fun
http://knetminer.rothamsted.ac.uk/
• You like KnetMiner but you might be asking…
What if I’m interested in a different disease?
What if I’m interested in a different species?
What if I want to integrate my own private data?
What if I don’t have a server to run KnetMiner?
• As part of a Innovate UK project we are working with Genestack to address these
qestions by integrating KnetMiner tools into the Genestack Bioinformatics Platform.
• Next: We will teach you how to use Genestack to build your own networks and deploy
your own KnetMiner application
Objectives for Part II
Acknowledgements
John Doonan
Sergio Feingold
Martin Castellote
Uwe Scholz
Matthias Lange
Andy Law
Keywan Hassani-Pak
Ajit Singh
Marco Brandizi
Monika Mistry
Lisa Lill
Chris Rawlings
Dave Edwards
Philipp Bayer
Misha Kapushesky
Kevin Dialdestoro
@KnetMiner

KnetMiner - EBI Workshop 2017

  • 1.
    Mining biological knowledgenetworks for gene-phenotype discovery Keywan Hassani-Pak EBI course: Introduction to Omics data integration March 2017 http://knetminer.rothamsted.ac.uk/ @KnetMiner
  • 2.
    • Rothamsted isthe longest running agricultural research station in the world, providing cutting-edge science and innovation for more than 170 years. • Over 450 staff • Bioinformatics group • Bioinformatics Analysts • Software Developers About us…
  • 3.
    Agenda for today KevinDialdestoro Stephanie Brunet Part I Part II Keywan Hassani-Pak Ajit Singh Monika Mistry
  • 4.
    • To understandwhy linking genotype to phenotype is complex • To learn which information types are useful for candidate gene prioritization • To understand the concept of knowledge networks/graphs • To use KnetMiner for the interpretation of your RNA-seq, QTL, GWAS results • To learn a little bit about neurodegenerative diseases Learning Objectives for Part I
  • 5.
    The Genotype toPhenotype Challenge Genotype QTL and GWAS Omics Includes any ‘omics Phenotype Disease Intelligence Flowering Stress tolerance Biological Knowledge Discovery Data selection, processing, transformation, integration and interpretation
  • 6.
    The approach isgeneric and works similarly for other species
  • 7.
    • Free andopen source • Data warehousing using a graph- database • Platform to integrate public and private datasets in various formats • Provides a GUI, CLI, APIs and workflows for reproducible data integration Ondex – Data Integration Platform Ondex www.ondex.org Not covered in this training course!
  • 8.
    Let’s start withsome GWAS data… http://plants.ensembl.org/biomart Example Arabidopsis #SNP=66,816 | #Gene=27,502 | #Phenotype=107
  • 9.
    … transform intoa network (SNP) (Phenotype) associated
  • 10.
  • 11.
  • 12.
    • Gene-GO • Gene-Phenotype Geneknock-out or overexpression Text mining publications • Gene-Publication • Gene-Pathway • Gene-Expression • Protein-Small Molecule • Homology to other species … add other open linked data >800,000 nodes >3,000,000 edges Genome-scale knowledge network
  • 13.
    • Progressive lossof structure or function of neurons, including death of neurons • Many types including Alzheimer, Parkinson, Huntington… • Many similarities between these diseases on a sub-cellular level • Discovering these similarities offers hope for therapeutic advances that could ameliorate many diseases simultaneously Neurodegenerative Diseases
  • 14.
    • Use OMIMadvanced search Query: alzheimer parkinson huntington Tick: Search in Title Tick: MIM Number Prefix: “# phenotype” • Download results as Tab-delimited file • Copy MIM ids without the prefix “#” • Use UniProt Retrieve/ID mapping Provide your MIM identifiers Select option: From MIM to UniProtKB Press Go and download all proteins in XML format (compressed) Tutorial data – based on 33 human genes
  • 15.
    Integration of publicdatasets Public Databases Quantitative data Interaction data Omics Data Datasets and workflows: https://github.com/Rothamsted/ondex-knet-builder
  • 16.
    Relationships in BiologicalKnowledge Networks Genes Homology AnnotationsGenetics Interactions Phenotype
  • 17.
    • Methods neededto evaluate millions of relationships in knowledge network, prioritize genes and extract relevant subnetworks • Interactive and exploratory tools needed to enable knowledge discovery and decision making • Interpretation should be the task of domain experts i.e. biologists! How to search and interpret too much information?
  • 18.
    Web Browser Server Servlets andJSP Page Java Socket Knowledge Graph DBOndex API DHTML JavaScript Apache Tomcat Multithreaded Java Server HTML, JSON, XML and images over HTTP via Ajax Views Java Socket KnetMiner System OverviewClient
  • 19.
    KnetMiner UI Overview Search GeneView Map View Evidence View Network View 1 2 3
  • 20.
  • 21.
    KnetMiner search interface(1) Ontology-based term suggestions Concept Types e.g. GO, Trait OR, NOT, Replace
  • 22.
    KnetMiner search interface(2) User provided QTL region Supports gene IDs and names
  • 23.
    Example 1 1. Searchterms: "cell death" OR apoptosis 2. Open Query Suggestor 3. Click on cell death tab 4. Replace with neuron death 5. Does it change the number of documents and genes that can be found? Exercise 1 – Search Interface Video 1
  • 24.
    Gene View -Ranked genes and evidence summaries
  • 25.
    1. Uses TF*IDFto rank documents by their relevance to a search term 2. Uses the properties of gene-evidence networks such as  the specificity of documents to a gene  the frequency of evidence concepts 3. Calculates Knet-Score for every gene Smart pre-indexing of the knowledge network makes the computation of the score very fast Gene Ranking
  • 26.
    Network View –Interactive network visualization • Enlarge • Show all • Re-layout • Info Box Add hidden nodes and edges
  • 27.
    Example 2 Exercise 2:Gene View  Network Search terms: Alzheimer OR Parkinson OR Heparin OR "cell death“ Gene List: APP, MAPT, PRNP 1. Click on the APP gene which loads the Network View 2. Open the Info Box 3. Click on different Concept and Relation types 4. Check their attributes and click on links to external databases 5. Explore all direct and indirect paths from APP to Alzheimer and Parkinson 6. Hide Publication concepts 7. Show all drugs that can target the APP interaction network 8. Go back to Gene View and select Known targets 9. Click View Network and find out if APP, MAPT and PRNP interact, are differentially expressed and have GWAS data Video 2
  • 28.
    Example 3 Exercise 3:Evidence View  Network Search terms: Alzheimer "cell death" 1. Go to Evidence View 2. Sort table by column GENES 3. Find GO concept downregulation of neuron death 4. Find GO concept upregulation of apoptosis 5. Click on number of genes linked to these terms 6. In Network View, show labels for Gene and GO concepts 7. What’s the evidence linking genes to selected GO terms? Video 3
  • 29.
    Map View –Interactive map of chromosome, gene, SNP and QTL data • Show network • Enlarge • Reset • Settings GWAS studies
  • 30.
    Example 4 Exercise 4:Map View  Network Search terms: Parkinson "cell death“ QTL: Chromosome 12 :: 35000000 - 44000000 1. Go to Map View 2. Toggle Full Screen 3. Zoom into Chromosome 12 and find you QTL 4. Find one or several genes that are in close proximity to GWAS SNPs 5. Select one ore more genes, e.g. LRRK2, PRNP and EIF4G1 6. Launch Network View 7. Study the network and how the genes are connected Video 4
  • 31.
    • Web applicationfor very fast search of large genome-scale knowledge graphs • Ranking of candidate genes based on knowledge mining • Interactive visualisation of genome and knowledge maps • Facilitates hypothesis validation and generation KnetMiner – Making Gene Discovery Efficient & Fun http://knetminer.rothamsted.ac.uk/
  • 32.
    • You likeKnetMiner but you might be asking… What if I’m interested in a different disease? What if I’m interested in a different species? What if I want to integrate my own private data? What if I don’t have a server to run KnetMiner? • As part of a Innovate UK project we are working with Genestack to address these qestions by integrating KnetMiner tools into the Genestack Bioinformatics Platform. • Next: We will teach you how to use Genestack to build your own networks and deploy your own KnetMiner application Objectives for Part II
  • 33.
    Acknowledgements John Doonan Sergio Feingold MartinCastellote Uwe Scholz Matthias Lange Andy Law Keywan Hassani-Pak Ajit Singh Marco Brandizi Monika Mistry Lisa Lill Chris Rawlings Dave Edwards Philipp Bayer Misha Kapushesky Kevin Dialdestoro @KnetMiner

Editor's Notes

  • #6 Many phenotypes are complex, polygenic and the result of complex interactions on cellular level Linking genotype and phenotype is one of the greatest challenges in biology
  • #9 SNP-Phenotype relations (122,919 relations) of significant SNPs (as defined by Ensembl, p-value<0.05?) linked to 107 phenotypes; on average 1,150 SNPs per phenotype. SNP-Gene relations are based on genes in close proximity to SNPs <1000bp (96,047 relations) How to integrate GWAS and biological interaction data
  • #10 Using Ondex
  • #13 http://www.sciencedirect.com/science/article/pii/S2212066116300308
  • #17 Highlight text-mining Add gene expression Mention xrefs and that they cn be collapsed
  • #18 Scale…