SlideShare a Scribd company logo
1 of 46
KnetMiner – Knowledge Network Miner
Keywan Hassani-Pak
http://knetminer.rothamsted.ac.uk/
@KnetMiner
About Rothamsted Research
• Rothamsted is the longest running agricultural
research station in the world (est. 1843)
• Strategic research to address global food
security demands
• Improving crops to be tolerant to drought, heat
and pests while still providing optimum
nutrition
• Interdisciplinary research from gene to field
Outline
1. Routes to candidate gene discovery
2. Building genome-scale knowledge networks
3. Overview and demo of KnetMiner
4. Extending networks with text-mining
5. Candidate gene prioritization
6. Discussion
Routes to candidate gene
discovery
Routes to candidate gene discovery
Many gene discovery routes can identify candidate
genes for complex traits
Gene
Expression
Genetic
Methods
Candidate
Genes
Prioritization Validation
Markers for
Diagnosis,
Breeding, GM
Phenotype
1
2
Research
Literature
Published
Data
3
4
Quantitative Trait Locus (QTL) Mapping
1. Developing of experimental population
2. Collection of phenotypic and genotypic data
3. Construction of linkage map
4. Correlation of marker/trait
5. Identification of QTL
1
QTL region can encompass 10s to
100s of genes. How to prioritize
them?
Genome Wide Association Studies (GWAS)
FLC gene expression (FLC)
Leaf Number (LN22)
Atwell et al., Nature 2010
• GWAS results can be simple
and complex to interpret
• Peaks can be diffuse covering
several hundred kb without a
clear centre
• Causal polymorphisms have
not always strongest
association
AvrRpm1
Gene expression analysis 2
different tissues or
genotypes
time points after infection
or treatment
Genes
• Gene expression studies can
be complex to interpret
• 100s to 1000s of differentially
expressed genes that are
somehow related to
phenotype
• What are the key pathways
leading to observed
phenotypes?
Text Mining - Trait and Gene Functions
• Publications (free text) are
most up-to-date resource for
information
• Finding sentences that link
phenotypes (flowering time)
and gene function (circadian
clock) to genes (CONSTANS)
• Term variability and ambiguity
can produce missing or false
associations
3
Life Sciences Databases
• Plethora of public Life
Sciences databases in various
formats
• Databases constantly growing
in size and content
• Challenging to keep up-to-
date with growing body of
knowledge
1500+ databases published in NAR
4
Which associations (genes) are worth following up?
Often a highly subjective decision
Evaluation of all available information is expensive
How is genotype translated to phenotype?
Often involves direct and indirect interactions
Data integration and knowledge discovery is technically challenging
Building genome-scale
knowledge networks
Biological knowledge network/graph
Genotype
• QTL
• GWAS
Omics
• Transcriptomics
• Proteomics
• Metabolomics
Phenotype
• Disease
• Development
• Stress tolerance
Biological Knowledge Network
• Prior knowledge
• Structured, unstructured data
• Cross-species data
IntegrationIntegration
The approach is generic and works similarly for other species
Ondex – Data Integration Platform
• Free and open source
• Data warehousing using a graph-database
• Platform to integrate public and private
datasets in various formats
• Provides a GUI, CLI, APIs and workflows for
reproducible data integration
Ondex
www.ondex.org
Let’s start with some GWAS data…
http://plants.ensembl.org/biomart
Example Arabidopsis
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
… transform into a network
(SNP)
(Phenotype)
associated
Biological interaction datasets
http://thebiogrid.org
(SNP)
(Phenotype)
associated
… add biological interactions
… add differential gene expression data
early vs late flowering
… add other linked data
• Gene-GO
• Gene-Phenotype
• Gene knock-out or overexpression
• Text mining publications
• Gene-Publication
• Gene-Pathway
• Gene-Expression
• Protein-Small Molecule
• Homology to other species
>800k nodes
>3 million edges
Genome-Scale Knowledge Network (GSKN)
Same principles for other species
Knowledge graph of LRRK2 human gene
How to search and interpret too much information?
• Methods needed to evaluate millions of
relationships in knowledge network, prioritize
genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision
making
• Interpretation should be the task of domain
experts i.e. biologists!
Overview and Demo of
KnetMiner
Web Browser
Server
Servlets and JSP Page
Java Socket
Knowledge
Graph DBOndex API
JavaScript
Apache Tomcat
Multithreaded
Java Server
HTML, JSON, XML and images
over HTTP via Ajax
Views
Java Socket
KnetMiner System Overview
Client
Client
• Compatibility with all major
web browsers
• Based on D3.js, cytoscape.JS,
node.JS
• Interactive and touch-enabled
Server
• Fast and scalable Java multi-
threaded server
• Pre-indexing of knowledge
graph
• Scoring and information
extraction
KnetMiner UI Overview
Search Select Explore
Google-like search interface
Search knowledge graph using trait-
based keywords
Real-time user feedback and query
suggestions
Trait related
keywords
Query term
suggestions
KnetMiner Map View (GenoMaps)
New touch-friendly web
App for Map View in
KnetMiner
Visualize genes, SNP, QTL,
GWAS data.
Select genes within QTL
regions and overlapping
with SNP’s and explore
their network
KnetMiner Network View (KnetMaps)
Touch-friendly web App for
Network View in KnetMiner
Explore networks linking
genes to proteins, SNPs,
phenotypes, publications,
etc.
Extending knowledge
networks with text-mining
Text-mining workflow in Ondex
• Ondex plugins to extract structured information from unstructured free
text
• Developed workflows to enrich knowledge networks with novel links
using the scientific literature
Import
•Ondex Graph
•PubMed
•Ontology
•Tabular
Mapping
•NER-method
•Concept
Class
Transformer
•Publication
•Abstract
•Sentence
Filter
•Relation Type
•Attribute
Value
•Unconnected
Export
•OXL
•RDF
•JSON
Hassani-Pak et al., JIB 2010
Ondex text-mining method
Input data
• 27,416 Arabidopsis gene names from Phytozome
• 52,561 Abstracts from PubMed that contain Arabidopsis
• 22,201 curated citations from TAIR
• 1,349 Trait Ontology terms from Planteome
text-mining
x
y
BA
occurrs_in
Publication
Concepts
published_in
weighted association network
IP=1.7; M=1.2; N=2
yx
BAGeneTO
TO
Hassani-Pak et al., JIB 2010
Text-mining output
These steps connect 5553 Arabidopsis genes to 409 TO terms
based on 18,341 co-citations (12,190 on sentence level)
Text-mining discussion
• TM method is flexible and can easily enhance data integration workflows and
knowledge networks
• TM is one of many evidence types in a knowledge network
• TM provides access to brand-new information that is not yet available in
structured databases
• Modest post-TM-filtering is required to retain high-quality relations
• TM for gene-phenotype adds 12k high-quality relations that were previously
absent in the knowledge network
Candidate Gene Prioritization
Definition of gene-evidence network
1. Gene-evidence network: Biologically plausible paths (semantic motifs) starting with a Gene node and
ending with Evidence nodes, e.g. 57 semantic motifs were defined in the wheat network
2. Gene-evidence networks are extracted using the Metadata-based Graph Query Engine (Hindle 2012)
3. Evidence nodes can be part of one (high specificity) or many (low specificity) gene-evidence networks
• Gene-evidence network of
Gene X contains 5 nodes
• Neighbourhood network (n=3)
of Gene X contains 9 nodes
X
Searching gene-evidence networks for keywords
1. Knowledge graph indexed and searched for user search terms using Lucene
2. A proportion of nodes in the gene-evidence network can contain the search term
auxin
cytokinin
strigolactone
CCD
MAX
subapical shoots
axillary branching
shoot branching
pathway
X
Gene-evidence network User search terms
Gene
Gene scoring function (KNETScore)
1. Uses TF*IDF (Sparck & Jones, 1972) to rank documents in gene-
evidence network by their relevance to a search term
2. Uses the specificity of documents to a gene (IGF: Inverse Gene
Frequency)
3. Uses the frequency of evidence concepts, normalised by size of gene-
evidence network (EDF: Evidence Document Frequency)
4. Calculates KNETScore (TFIDF*EDF*IGF) for every gene
Gene ranking – Example
Score:
5.72
Score:
2.71
… the left gene scores higher because it has a smaller gene-evidence network and more specific evidence documents
Two genes have a similar number of evidence documents containing the search terms…
Discussion – Candidate Gene Prioritization
• In use case study KNETScore ranked causal gene in 3rd place out of 75 genes
within a petal size QTL
• High overlaps between KnetMiner top 100 genes for “gibberellin” and “lipid”
search terms with curated gene lists
• Smart pre-indexing of the knowledge network has reduced the computation of
the score from O(2n(|V|+|E|)) to O(1)
• Many ways to improve the scoring function, e.g. using weights for different
evidence types, distance of evidence to gene and edge-attribute information
Summary
• Web application for very fast search of
large genome-scale knowledge graphs
• Ranking of candidate genes based on
knowledge mining
• Interactive visualisation of genome and
knowledge maps
• Facilitates knowledge discovery and
hypothesis generation
KnetMiner – Makes Gene Discovery Faster & Fun
International academic collaborations
Interest from industry and start-ups
http://knetminer.rothamsted.ac.uk/
KnetMiner 2.0 – BBSRC BBR (GCRF) Proposal
SNP-Seek
Genetic diversity
Novel traits
Phenotype data
KnetMiner 2.0
Interactions
Pathways
Literature
Scientist/Breeder
Novel genes
Better crops
Faster discoveries
Ensembl Plants
Reference genomes
Model Species
Homology data
Data Information Knowledge Insight
A pangenomic and network based approach to search for novel genes and clues to design better rice varieties.
Acknowledgements
John Doonan
Sergio Feingold
Martin Castellote
Uwe Scholz
Matthias Lange
Keywan Hassani-Pak
Ajit Singh
Marco Brandizi
Monika Mistry
Lisa Lill
Chris Rawlings
Dave Edwards
Philipp Bayer
Misha Kapushesky
Kevin Dialdestoro
@KnetMiner
Jan Taubert
Artem Lysenko
Matthew Hindle
Catherine CanevetRamil Mauleon
Kenneth McNally
Nickolai Alexandrov
Andy Law

More Related Content

What's hot

Technology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential NetworksTechnology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential NetworksAlexander Pico
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw Alexander Pico
 
NRNB Annual Report 2011
NRNB Annual Report 2011NRNB Annual Report 2011
NRNB Annual Report 2011Alexander Pico
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Elia Brodsky
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizAlexander Pico
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposiumguest5e6f31
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012Alexander Pico
 
NRNB Annual Report 2013
NRNB Annual Report 2013NRNB Annual Report 2013
NRNB Annual Report 2013Alexander Pico
 
NRNB Annual Report 2016: Overall
NRNB Annual Report 2016: OverallNRNB Annual Report 2016: Overall
NRNB Annual Report 2016: OverallAlexander Pico
 
NetBioSIG2013-Talk Martina Kutmon
NetBioSIG2013-Talk Martina KutmonNetBioSIG2013-Talk Martina Kutmon
NetBioSIG2013-Talk Martina KutmonAlexander Pico
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018Alexander Pico
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
NetBioSIG2012 chrisevelo
NetBioSIG2012 chriseveloNetBioSIG2012 chrisevelo
NetBioSIG2012 chriseveloAlexander Pico
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 

What's hot (20)

NRNB EAC Report 2011
NRNB EAC Report 2011NRNB EAC Report 2011
NRNB EAC Report 2011
 
Technology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential NetworksTechnology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential Networks
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 
NRNB Annual Report 2011
NRNB Annual Report 2011NRNB Annual Report 2011
NRNB Annual Report 2011
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
 
NRNB Annual Report 2013
NRNB Annual Report 2013NRNB Annual Report 2013
NRNB Annual Report 2013
 
NRNB Annual Report 2016: Overall
NRNB Annual Report 2016: OverallNRNB Annual Report 2016: Overall
NRNB Annual Report 2016: Overall
 
NetBioSIG2013-Talk Martina Kutmon
NetBioSIG2013-Talk Martina KutmonNetBioSIG2013-Talk Martina Kutmon
NetBioSIG2013-Talk Martina Kutmon
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
NetBioSIG2012 chrisevelo
NetBioSIG2012 chriseveloNetBioSIG2012 chrisevelo
NetBioSIG2012 chrisevelo
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 

Similar to KnetMiner Overview Oct 2017

FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECAProject
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainPierre Larmande
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Spark Summit
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
 
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...Catherine Canevet
 

Similar to KnetMiner Overview Oct 2017 (20)

KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic Domain
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
DR KL CV v5
DR KL CV v5DR KL CV v5
DR KL CV v5
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...
 

Recently uploaded

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 

Recently uploaded (20)

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 

KnetMiner Overview Oct 2017

  • 1. KnetMiner – Knowledge Network Miner Keywan Hassani-Pak http://knetminer.rothamsted.ac.uk/ @KnetMiner
  • 2. About Rothamsted Research • Rothamsted is the longest running agricultural research station in the world (est. 1843) • Strategic research to address global food security demands • Improving crops to be tolerant to drought, heat and pests while still providing optimum nutrition • Interdisciplinary research from gene to field
  • 3. Outline 1. Routes to candidate gene discovery 2. Building genome-scale knowledge networks 3. Overview and demo of KnetMiner 4. Extending networks with text-mining 5. Candidate gene prioritization 6. Discussion
  • 4. Routes to candidate gene discovery
  • 5. Routes to candidate gene discovery Many gene discovery routes can identify candidate genes for complex traits Gene Expression Genetic Methods Candidate Genes Prioritization Validation Markers for Diagnosis, Breeding, GM Phenotype 1 2 Research Literature Published Data 3 4
  • 6. Quantitative Trait Locus (QTL) Mapping 1. Developing of experimental population 2. Collection of phenotypic and genotypic data 3. Construction of linkage map 4. Correlation of marker/trait 5. Identification of QTL 1 QTL region can encompass 10s to 100s of genes. How to prioritize them?
  • 7. Genome Wide Association Studies (GWAS) FLC gene expression (FLC) Leaf Number (LN22) Atwell et al., Nature 2010 • GWAS results can be simple and complex to interpret • Peaks can be diffuse covering several hundred kb without a clear centre • Causal polymorphisms have not always strongest association AvrRpm1
  • 8. Gene expression analysis 2 different tissues or genotypes time points after infection or treatment Genes • Gene expression studies can be complex to interpret • 100s to 1000s of differentially expressed genes that are somehow related to phenotype • What are the key pathways leading to observed phenotypes?
  • 9. Text Mining - Trait and Gene Functions • Publications (free text) are most up-to-date resource for information • Finding sentences that link phenotypes (flowering time) and gene function (circadian clock) to genes (CONSTANS) • Term variability and ambiguity can produce missing or false associations 3
  • 10. Life Sciences Databases • Plethora of public Life Sciences databases in various formats • Databases constantly growing in size and content • Challenging to keep up-to- date with growing body of knowledge 1500+ databases published in NAR 4
  • 11. Which associations (genes) are worth following up? Often a highly subjective decision Evaluation of all available information is expensive How is genotype translated to phenotype? Often involves direct and indirect interactions Data integration and knowledge discovery is technically challenging
  • 13. Biological knowledge network/graph Genotype • QTL • GWAS Omics • Transcriptomics • Proteomics • Metabolomics Phenotype • Disease • Development • Stress tolerance Biological Knowledge Network • Prior knowledge • Structured, unstructured data • Cross-species data IntegrationIntegration
  • 14. The approach is generic and works similarly for other species
  • 15. Ondex – Data Integration Platform • Free and open source • Data warehousing using a graph-database • Platform to integrate public and private datasets in various formats • Provides a GUI, CLI, APIs and workflows for reproducible data integration Ondex www.ondex.org
  • 16. Let’s start with some GWAS data… http://plants.ensembl.org/biomart Example Arabidopsis #SNP=66,816 | #Gene=27,502 | #Phenotype=107
  • 17. … transform into a network (SNP) (Phenotype) associated
  • 20. … add differential gene expression data early vs late flowering
  • 21. … add other linked data • Gene-GO • Gene-Phenotype • Gene knock-out or overexpression • Text mining publications • Gene-Publication • Gene-Pathway • Gene-Expression • Protein-Small Molecule • Homology to other species >800k nodes >3 million edges Genome-Scale Knowledge Network (GSKN)
  • 22. Same principles for other species Knowledge graph of LRRK2 human gene
  • 23. How to search and interpret too much information? • Methods needed to evaluate millions of relationships in knowledge network, prioritize genes and extract relevant subnetworks • Interactive and exploratory tools needed to enable knowledge discovery and decision making • Interpretation should be the task of domain experts i.e. biologists!
  • 24. Overview and Demo of KnetMiner
  • 25. Web Browser Server Servlets and JSP Page Java Socket Knowledge Graph DBOndex API JavaScript Apache Tomcat Multithreaded Java Server HTML, JSON, XML and images over HTTP via Ajax Views Java Socket KnetMiner System Overview Client Client • Compatibility with all major web browsers • Based on D3.js, cytoscape.JS, node.JS • Interactive and touch-enabled Server • Fast and scalable Java multi- threaded server • Pre-indexing of knowledge graph • Scoring and information extraction
  • 27. Google-like search interface Search knowledge graph using trait- based keywords Real-time user feedback and query suggestions Trait related keywords Query term suggestions
  • 28. KnetMiner Map View (GenoMaps) New touch-friendly web App for Map View in KnetMiner Visualize genes, SNP, QTL, GWAS data. Select genes within QTL regions and overlapping with SNP’s and explore their network
  • 29. KnetMiner Network View (KnetMaps) Touch-friendly web App for Network View in KnetMiner Explore networks linking genes to proteins, SNPs, phenotypes, publications, etc.
  • 30.
  • 31.
  • 33. Text-mining workflow in Ondex • Ondex plugins to extract structured information from unstructured free text • Developed workflows to enrich knowledge networks with novel links using the scientific literature Import •Ondex Graph •PubMed •Ontology •Tabular Mapping •NER-method •Concept Class Transformer •Publication •Abstract •Sentence Filter •Relation Type •Attribute Value •Unconnected Export •OXL •RDF •JSON Hassani-Pak et al., JIB 2010
  • 34. Ondex text-mining method Input data • 27,416 Arabidopsis gene names from Phytozome • 52,561 Abstracts from PubMed that contain Arabidopsis • 22,201 curated citations from TAIR • 1,349 Trait Ontology terms from Planteome text-mining x y BA occurrs_in Publication Concepts published_in weighted association network IP=1.7; M=1.2; N=2 yx BAGeneTO TO Hassani-Pak et al., JIB 2010
  • 35. Text-mining output These steps connect 5553 Arabidopsis genes to 409 TO terms based on 18,341 co-citations (12,190 on sentence level)
  • 36. Text-mining discussion • TM method is flexible and can easily enhance data integration workflows and knowledge networks • TM is one of many evidence types in a knowledge network • TM provides access to brand-new information that is not yet available in structured databases • Modest post-TM-filtering is required to retain high-quality relations • TM for gene-phenotype adds 12k high-quality relations that were previously absent in the knowledge network
  • 38. Definition of gene-evidence network 1. Gene-evidence network: Biologically plausible paths (semantic motifs) starting with a Gene node and ending with Evidence nodes, e.g. 57 semantic motifs were defined in the wheat network 2. Gene-evidence networks are extracted using the Metadata-based Graph Query Engine (Hindle 2012) 3. Evidence nodes can be part of one (high specificity) or many (low specificity) gene-evidence networks • Gene-evidence network of Gene X contains 5 nodes • Neighbourhood network (n=3) of Gene X contains 9 nodes X
  • 39. Searching gene-evidence networks for keywords 1. Knowledge graph indexed and searched for user search terms using Lucene 2. A proportion of nodes in the gene-evidence network can contain the search term auxin cytokinin strigolactone CCD MAX subapical shoots axillary branching shoot branching pathway X Gene-evidence network User search terms Gene
  • 40. Gene scoring function (KNETScore) 1. Uses TF*IDF (Sparck & Jones, 1972) to rank documents in gene- evidence network by their relevance to a search term 2. Uses the specificity of documents to a gene (IGF: Inverse Gene Frequency) 3. Uses the frequency of evidence concepts, normalised by size of gene- evidence network (EDF: Evidence Document Frequency) 4. Calculates KNETScore (TFIDF*EDF*IGF) for every gene
  • 41. Gene ranking – Example Score: 5.72 Score: 2.71 … the left gene scores higher because it has a smaller gene-evidence network and more specific evidence documents Two genes have a similar number of evidence documents containing the search terms…
  • 42. Discussion – Candidate Gene Prioritization • In use case study KNETScore ranked causal gene in 3rd place out of 75 genes within a petal size QTL • High overlaps between KnetMiner top 100 genes for “gibberellin” and “lipid” search terms with curated gene lists • Smart pre-indexing of the knowledge network has reduced the computation of the score from O(2n(|V|+|E|)) to O(1) • Many ways to improve the scoring function, e.g. using weights for different evidence types, distance of evidence to gene and edge-attribute information
  • 43. Summary • Web application for very fast search of large genome-scale knowledge graphs • Ranking of candidate genes based on knowledge mining • Interactive visualisation of genome and knowledge maps • Facilitates knowledge discovery and hypothesis generation
  • 44. KnetMiner – Makes Gene Discovery Faster & Fun International academic collaborations Interest from industry and start-ups http://knetminer.rothamsted.ac.uk/
  • 45. KnetMiner 2.0 – BBSRC BBR (GCRF) Proposal SNP-Seek Genetic diversity Novel traits Phenotype data KnetMiner 2.0 Interactions Pathways Literature Scientist/Breeder Novel genes Better crops Faster discoveries Ensembl Plants Reference genomes Model Species Homology data Data Information Knowledge Insight A pangenomic and network based approach to search for novel genes and clues to design better rice varieties.
  • 46. Acknowledgements John Doonan Sergio Feingold Martin Castellote Uwe Scholz Matthias Lange Keywan Hassani-Pak Ajit Singh Marco Brandizi Monika Mistry Lisa Lill Chris Rawlings Dave Edwards Philipp Bayer Misha Kapushesky Kevin Dialdestoro @KnetMiner Jan Taubert Artem Lysenko Matthew Hindle Catherine CanevetRamil Mauleon Kenneth McNally Nickolai Alexandrov Andy Law

Editor's Notes

  1. https://www.thesius.de/blog/articles/verteidigung-dissertation-doktorarbeit-so-gelingts/
  2. http://www.nature.com/nrg/journal/v2/n5/fig_tab/nrg0501_370a_F1.html The basic strategy behind mapping quantitative trait loci (QTL) is illustrated here for a | the density of hairs (trichomes) that occur on a plant leaf. Inbred parents that differ in the density of trichomes are crossed to form an F1 population with intermediate trichome density. b | An F1 individual is selfed to form a population of F 2 individuals. c | Each F2 is selfed for six additional generations, ultimately forming several recombinant inbred lines (RILs). Each RIL is homozygous for a section of a parental chromosome. The RILs are scored for several genetic markers, as well as for the trichome density phenotype. In c, the arrow marks a section of chromosome that derives from the parent with low trichome density. The leaves of all individuals that have inherited that section of chromosome from the parent with low trichome density also have low trichome density, indicating that this chromosomal region probably contains a QTL for this trait.
  3. GWAS results can also be complex to interpret LD in Arabidopsis decays within 10 kb on average https://www.ncbi.nlm.nih.gov/pubmed/17676040
  4. http://www.oxfordjournals.org/nar/database/c
  5. Worth = Have a positive impact on the biological outcome in the whole organism without producing negative side effects. Significant SNPs are rarely located within the causal gene sequence… Consider LD, closest gene is not always the correct candidate… Consider cofounding, strongest association not always the main causal effect…
  6. Many phenotypes are complex, polygenic and the result of complex interactions on cellular level Linking genotype and phenotype is one of the greatest challenges in biology
  7. SNP-Phenotype relations (122,919 relations) of significant SNPs (as defined by Ensembl, p-value<0.05?) linked to 107 phenotypes; on average 1,150 SNPs per phenotype. SNP-Gene relations are based on genes in close proximity to SNPs <1000bp (96,047 relations) How to integrate GWAS and biological interaction data
  8. Using Ondex
  9. Recent work: Brought in differential gene expression data for Arabidopsis in KnetMiner. Added capability in KnetMiner Network View (KnetMaps) to visualize this data as a new concept type: DGES and “differentially_expressed” relation. The slide shows example of DGES: early vs late flowering in Arabidopsis.
  10. http://www.sciencedirect.com/science/article/pii/S2212066116300308
  11. Highlight text-mining Add gene expression Mention xrefs and that they cn be collapsed
  12. Scale: As these networks combine a wide variety of entities which are derived by parsing large volumes of data, they are quite large. A typical plant knowledge network created using Ondex has around 100,000 genes linked amongst 500,000 concepts (can be genes, proteins, SNP’s, publications) in approx. 1-1.5 million interactions/ relations. KnetMiner was developed to enable users to interactively explore such large, genome-scale knowledge networks and extract relevant plausible pathways linking candidate genes to agronomic traits. KnetMiner works by querying a species-encompassing knowledge network to mine its data and retrieve “relations” (i.e., links/ connections) between the various entities (called “concepts”) within the network, such as genes, proteins, phenotypes, SNP’s and publications. This helps create a pathway that can be visually traversed to understand how various biological entities are inter-linked and how a gene might influence a specific phenotype/ trait.
  13. KnetMiner is web-based app to search, select and explore a vast amount of information. We try to make the software intuitive and fun to use and embed the user into the discovery process Achim: “KnetMiner is a great example for how an easy-to-use software app should look like. Even I can figure it out without being a techie or having to read a thick manual. “
  14. Real-time search results, as you type. Example queries. Useful dynamic Query suggester to help refine user’s search. Add specific QTL regions to especially focus on. Provide your own Gene List to search against.
  15. Once we had all this new data in place and integrated in the knowledge networks in KnetMiner, a new challenge was to work on software tools/ components that enable interactive visualization and exploration of this new, highly dense data in a user-friendly manner. We collaborated with a UK-based company to develop a new tool called Genomaps.js, a lightweight, touch-friendly tool that enables the interactive visualization of high-density SNP, QTL, GWAS and gene data. After performing a search in KnetMiner and getting a list of ranked genes as output in Gene View, users can switch to Map View and explore the results in a chromosome-centric view Genomaps.js Users can now select genes within specific QTL regions and overlapping with SNP’s and explore their network in Network View to look for plausible pathways and interactions.
  16. In order to accommodate the new data and provide a more user-friendly way to explore these networks, the existing Network View was revamped and replaced the new KnetMaps.js KnetMaps.js is a lightweight, touch-friendly tool that allows users to visualize and explore heterogeneous networks linking genes to proteins, publications, phenotypes, biological processes, etc. Allows users to identify plausible pathways linking candidate genes to phenotypes within inter-linked networks.
  17. Use case 1 (screencast): An example use case in Arabidopsis, using our newly integrated GWAS data, where we will be looking to discover candidate genes controlling flowering. Showcasing Genomaps (Map View) in KnetMiner Arabidopsis. We will specifically look at FLC gene expression, a complex trait controlled by multiple genes. We will search the KnetMiner Arabidopsis knowledge network for the query: ‘flowering FLC’ and identify and explore the network of genes controlling this underlying trait. Search using keywords: flowering FLC or FT. Filter Genomaps: Change p-value, show top 100 genes. Select: FLC, SPA4, FRI, LD & launch KnetMaps.
  18. Use case 2 (screencast): Pre-harvest sprouting: Next use case will show how to use KnetMiner for mining genes related to grain color and PHS. This example is based on Andy Phillips (Rothamsted) RNA-seq experiment to identify differences between white and red grain wheat and links to PHS. Wheat white grain – more prone to sprouting. Showcasing new Wheat instance with TGACv1 release & new KnetMaps. Note: PHS is the result of premature germination of grain in the ear and results in loss of bread-making quality. Red grain colour is associated with increased dormancy and resistance to PHS. Grain colour is due to proanthocyanidins (condensed tannins) in the testa.
  19. Boosting queries found in the title TO by Laurel Cooper
  20. Illustration of a gene-evidence network as derived through “biologically plausible” semantic motifs. Blue nodes represent Gene concepts; red nodes are annotations such as GO, TO, EC, Pathway, Publication concepts. A path that goes via bold edges is valid (biologically meaningful path that allows annotations to be transferred to the seed gene). A path that goes via dashed edges is invalid. A gene-evidence network contains only filled nodes, whereas a gene-neighbourhood network would also contain the unfilled nodes.
  21. Our team has also strong expertise in the development of reusable research software. Our flagship software output is KnetMiner. The development of KnetMiner (fka QTLNetMiner) began in 2008 as part of a collaborative project with Steve H to prioritise candidate genes in willow QTL… Since then it has slowly grown to become a unique resource for gene discovery in many different species. 50% of users from the UK and rest from other countries including… Ajit has made significant contributions to the project that have helped to improve the usability and the user experience. Marco who joined us last year is responsible for improving the interoperability of KnetMiner using more standardised technologies. Monika was a sandwich student for one year and was a great help to maintain or knowledge resources.
  22. KnetMiner has still room to grow. As part of collaborative project with Sigrid, Gancho and scientists from IRRI, EBI and Uni Malaysia we have submitted a grant application to the Bioinformatics and Biological Resource Fund to extend KnetMiner to rice research and breeding. BBR is hugely competitive, over 300 expression of interests have been submitted under the GCRF highlight. We might need to find alternative ways to continue the Bioinformatics collaboration with IRRI.
  23. Acknowledgements: Various collaborators who have worked with us on KnetMiner, including contributing partners from science, academia and industry. Code available on GitHub: https://github.com/KeywanHP/KnetMiner