ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only genes, RNA, protein and compounds but also the complicated interactions among them. Yet, even in the most thoroughly studied model plant Arabidopsis thaliana, the knowledge regarding these interactions are scattered throughout literatures and various public databases. Thus, new scientific discovery by exploring these complex and heterogeneous data remains a challenge task for biologists.
We developed a graph-search empowered platform named HRGRN to search known and, more importantly, discover the novel relationships among genes in Arabidopsis biological networks. The HRGRN includes over 51,000 “nodes” that represent very large sets of genes, proteins, small RNAs, and compounds and approximately 150,000 “edges” that are classified into nine types of interactions (interactions between proteins, compounds and proteins, transcription factors (TFs) and their downstream target genes, small RNAs and their target genes, kinases and downstream target genes, transporters and substrates, substrate/product compounds and enzymes, as well as gene pairs with similar expression patterns to provide deep insight into gene-gene relationships) to comprehensively model and represent the complex interactions between nodes. .
The HRGRN allows users to discover novel interactions between genes and/or pathways, and build sub-networks from user-specified seed nodes by searching the comprehensive collections of interactions stored in its back-end graph databases using graph traversal algorithms. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. Currently, we are collaborating the Araport team to develop REST-like web services and provide the HRGRN’s graph search functions to Araport system.
Deploying Automated Workstreams and Computational Approaches for Generation of Toxicity Data Used for Hazard Identification, by Robert T. Dunn, II, Ph.D., DABT
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only genes, RNA, protein and compounds but also the complicated interactions among them. Yet, even in the most thoroughly studied model plant Arabidopsis thaliana, the knowledge regarding these interactions are scattered throughout literatures and various public databases. Thus, new scientific discovery by exploring these complex and heterogeneous data remains a challenge task for biologists.
We developed a graph-search empowered platform named HRGRN to search known and, more importantly, discover the novel relationships among genes in Arabidopsis biological networks. The HRGRN includes over 51,000 “nodes” that represent very large sets of genes, proteins, small RNAs, and compounds and approximately 150,000 “edges” that are classified into nine types of interactions (interactions between proteins, compounds and proteins, transcription factors (TFs) and their downstream target genes, small RNAs and their target genes, kinases and downstream target genes, transporters and substrates, substrate/product compounds and enzymes, as well as gene pairs with similar expression patterns to provide deep insight into gene-gene relationships) to comprehensively model and represent the complex interactions between nodes. .
The HRGRN allows users to discover novel interactions between genes and/or pathways, and build sub-networks from user-specified seed nodes by searching the comprehensive collections of interactions stored in its back-end graph databases using graph traversal algorithms. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. Currently, we are collaborating the Araport team to develop REST-like web services and provide the HRGRN’s graph search functions to Araport system.
Deploying Automated Workstreams and Computational Approaches for Generation of Toxicity Data Used for Hazard Identification, by Robert T. Dunn, II, Ph.D., DABT
Analysis and visualization of microarray experiment data integrating Pipeline...Vladimir Morozov
More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway
Real estate is the platform where every individual can showcase their investment for sake of earning profits in a precise manner. Maintain blogs in the real estates groups is very useful for an unknown person to get into it.
Analysis and visualization of microarray experiment data integrating Pipeline...Vladimir Morozov
More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway
Real estate is the platform where every individual can showcase their investment for sake of earning profits in a precise manner. Maintain blogs in the real estates groups is very useful for an unknown person to get into it.
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
Presentation about collaborative development of open source pathway analysis code and pathways and about usage in analytical software distributed with analytical machines like mass spectrophotometers.
Presentaion for NetBio SIG 2013 by Robin Haw, Scientific Associate and Outreach Coordinator, Ontario Institute for Cancer Research. “Reactome Knowledgebase and Functional Interaction (FI) Cytoscape Plugin”
1. DEVELOPMENT OF COMPUTATIONAL
ANALYSIS TOOLS FOR NATURAL PRODUCTS
RESEARCH AND METABOLOMICS
天然物科学およびメタボロミクスのための計算解析ツールの開発
Ahmed Mohamed
Kyoto University
2. Presentation Contents
• Metabolic analysis
• Background
• NetPathMiner: Network path mining through gene expression.
• Overview of biological network analysis
• Workflow of NetPathMiner
• NMRPro: interactive online processing of NMR spectra
• Overview of NMR spectral processing
• Natural product dereplication and spectral processing
• NMRPro capabilities.
2Computational tools for metabolic analysis
3. Background
3Computational tools for metabolic analysis
Transcription
Translation
Protein
interaction
Metabolism
DNA
RNA
Proteins
Metabolites
Metabolic Analysis
The scientific study of chemical processes involving metabolites.
4. Metabolic Analysis
Primary
Secondary
• Metabolic
engineering
Recombinant
• Explain biological
phenotypes
• Compare treatment
efficacies
• Early disease prognosis
• Identify active metabolic
pathways
• Study of metabolic
disorders
• Identify drug leads
from natural
products
Goals
Background: Applications of metabolic analysis
4Computational tools for metabolic analysisMethods
• Fluxomics• Metabolite identification
• Network Analysis
• Clustering &
Classification
• Metabolite
identification
6. Presentation Contents
• Metabolic analysis
• Background
• NetPathMiner: Network path mining through gene expression.
• Overview of biological network analysis
• Workflow of NetPathMiner
• NMRPro: interactive online processing of NMR spectra
• Overview of NMR spectral processing
• Natural product dereplication and spectral processing
• NMRPro capabilities.
6Computational tools for metabolic analysis
7. What are Biological Networks?
• Representing biological entities as nodes connected by
edges.
• Nodes can represent chemical substrates or proteins.
• Edges represent relationships: metabolite production /
consumption, activation / inhibition.
7Computational tools for metabolic analysis
8. Big (>1000s nodes)
Relationship of active parts with biological conditions
may explain observed phenotypes
Small parts are active
Characteristics of Biological Networks
Computational tools for metabolic analysis 8
Manual analysis infeasible!
9. Network Analysis in Biology
Computational tools for metabolic analysis 9
Gene expression
profiling
Network path
mining
Interpretation
by a biologist
Experimental validation
Software
needed!!
10. Mining Active Paths from Gene
Expression
Computational tools for metabolic analysis 10
REACT_22100
REACT_9422
REACT_9463
REACT_22319
REACT_9446
REACT_9436 REACT_9430
REACT_22403
REACT_9393
REACT_9408
REACT_9454REACT_9526
REACT_9437
REACT_22274
REACT_9421
REACT_9418
REACT_9461
REACT_9945
REACT_22177
species_71185
species_29368
species_189407
species_70958species_159549
species_159160
species_174384
species_189464
species_190157
species_189489
species_189386
species_189463
species_189449
species_189466
species_189400
species_190173
species_159151
species_189411
species_189461species_189478
species_189487
species_113531
species_189455
species_189396
species_71067
species_189385
species_159149species_158602
species_29382
species_190128
species_190145
species_29426
species_189481
species_189444
Paths active
in a particular condition
1. Weighting network
2. Enumeration of
active linear paths
Gene expression
Data
(Numerical Matrix)
Biological
Network
Linear paths represent metabolic paths
or signaling cascades
11. Challenges to Network Path Mining
1. Networks are downloaded in different file formats depending on
the database.
2. Networks have different types
3. Metabolic networks can have different representations
4. Linear Path enumeration output 1000s of paths.
• Difficult to investigate manually
• Hard to visualize
Computational tools for metabolic analysis 11
12. Challenges to Network Path Mining
1. Networks are downloaded in
different file formats depending
on the database.
2. Networks have different types
3. Metabolic networks can have
different representations
4. Linear Path enumeration output
1000s of paths.
• Difficult to investigate manually
• Hard to visualize
Computational tools for metabolic analysis 12
KEGG
KGML
Reactome
SBML
BioPAX
BioCyc
BioPAX
Pathway
Commons
BioPAX
13. Challenges to Network Path Mining
1. Networks are downloaded in
different file formats depending
on the database.
2. Networks have different types
3. Metabolic networks can have
different representations
4. Linear Path enumeration output
1000s of paths.
• Difficult to investigate manually
• Hard to visualize
Computational tools for metabolic analysis 13
Metabolic Networks
Signaling Networks
14. Challenges to Network Path Mining
1. Networks are downloaded in
different file formats depending
on the database.
2. Networks have different types
3. Metabolic networks can have
different representations
4. Linear Path enumeration output
1000s of paths.
• Difficult to investigate manually
• Hard to visualize
Computational tools for metabolic analysis 14
Pyruvate) Ac,CoA)
NAD+)
CoA,SH)
Reac5on)
CO2)
NADH)
R1# R2#
S1,#S2#
S3,#S4#
R3#
S5#
G1# G2,G3# G4#
R4# R5#
S6#
G2# G5#
G1#
G2#
G3#
G4#
G5#
R1
#!#R2#
R2
#!#R1#
R2 #!#R1#
R1 #!#R2#
R2#!#R3#
R
2 #!
#R
3#
R4#!#R5#
Metabolite-Reaction representation
Reaction representation
Gene representation
15. Challenges to Network Path Mining
1. Networks are downloaded in
different file formats depending
on the database.
2. Networks have different types
3. Metabolic networks can have
different representations
4. Linear Path enumeration output
1000s of paths.
• Difficult to investigate manually
• Hard to visualize
Computational tools for metabolic analysis 15
16. Current Software for Path Mining
PathRanker1 rBiopaxParser2 PathView3
Input network format KGML BioPAX KGML
Supported network types Metabolic Metabolic &
Signaling
Metabolic &
Signaling
Network representation
conversion
Limited ✗ ✗
Path extraction ✓ ✗ ✗
Visualization Paths only Networks only Networks
only
1. Hancock, T., et al. Bioinformatics, 2010, 26, 2128-2135.
2. Kramer, 1., et al. Bioinformatics 2013, 29 (4), 520-522.
3. Luo, W., et al. Bioinformatics 2013, 29 (14), 1830-1831.
16Computational tools for metabolic analysis
17. NetPathMiner: Motivation
Create a path mining software that:
1. Support different input network formats
2. Support both metabolic & signaling networks.
3. Convert between network representations.
4. Provide effective visualization of networks & paths.
5. Integrate into other software tools.
17Computational tools for metabolic analysis
18. Presentation Contents
• Metabolic analysis
• Background
• NetPathMiner: Network path mining through gene expression.
• Overview of biological network analysis
• Workflow of NetPathMiner
• NMRPro: interactive online processing of NMR spectra
• Overview of NMR spectral processing
• Natural product dereplication and spectral processing
• NMRPro capabilities.
18Computational tools for metabolic analysis
19. NetPathMiner: Process Flow
SBML KGML BioPAX
Metabolic
representation
Reaction
representation
Gene
representation
Weighted network
Ranked path list
Path clusters
Network plots
1 Pathway file
processing
2 Network
representation
3 Network
edges weighting
4 Path ranking
5 Clustering/
Classification
6 Visualization
Metabolic Signaling
Gene
Expression
Gene set analysis,
igraph network analysis,
FBA, PPI analysis
User-customized
weighting function
Processes implemented
within NetPathMiner
Possible integration
procedures
19Computational tools for metabolic analysis
20. NetPathMiner: Visualization
Visualization of top 100 paths, grouped into 3 clusters (red, green, blue)
20Computational tools for metabolic analysis
Metabolic representation Reaction representation Gene representation
Paths
Paths
21. Presentation Contents
• Metabolic analysis
• Background
• NetPathMiner: Network path mining through gene expression.
• Overview of biological network analysis
• Workflow of NetPathMiner
• NMRPro: interactive online processing of NMR spectra
• Overview of NMR spectral processing
• Natural product dereplication and spectral processing
• NMRPro capabilities.
21Computational tools for metabolic analysis
26. Current limitations in NMR spectral
processing
1. NMR spectra (raw / processed) cannot be shared easily
2. Advanced NMR processing require programming scripts
or installation of expensive software.
3. Spectral databases lack user interactivity.
Computational tools for metabolic analysis 26
27. Current limitations in NMR spectral
processing
1. NMR spectra (raw /
processed) can’t be
shared easily.
2. Advanced NMR
processing require
programming scripts or
installation of expensive
software.
3. Spectral databases lack
user interactivity.
Computational tools for metabolic analysis 27
Spectral processing and analysis
requires collaborations between
NMR technicians, spectropists and
chemists.
NMR / MS
Core center
Data
Processing
Data
Analysis
28. Current limitations in NMR spectral
processing
1. NMR spectra (raw /
processed) can’t be
shared easily.
2. Advanced NMR
processing require
programming scripts or
installation of expensive
software.
3. Spectral databases lack
user interactivity.
Computational tools for metabolic analysis 28
Advanced spectral processing
requires Matlab / R or python
scripts
29. Current limitations in NMR spectral
processing
1. NMR spectra (raw /
processed) can’t be
shared easily.
2. Advanced NMR
processing require
programming scripts or
installation of expensive
software.
3. Spectral databases lack
user interactivity.
Computational tools for metabolic analysis 29
Displaying Spectra as static image
prevents investigation of small
peaks
30. NMRPro: interactive online processing of
NMR spectra
• Motivation:
• Easy-to-use user interface for spectral processing
• Online processing allow sharing of raw and processed spectra
among collaborators
• Does not require installation of software.
• Can be used to visualize NMR spectra in spectral databases (such
as BMRB, HMDB).
• Challenges:
• Large size of NMR spectra.
• Processing is computationally expensive.
30Computational tools for metabolic analysis
31. NMRPro architecture
Server-side
1 Python Core & Plugins 2 Django App
Classes for representing NMR Spectra:
• NMRSpectrum1D • NMRSpectrum2D
• NMRDataset • NMRSampleset
Core
Each plugin provide a certain functionality:
• Reading different file formats
• Zero Filling • Apodization
• Fourier transform • Phase correction
• Baseline correction • Peak picking
• Alignment
Plugins
Convert NMR spectra to
compressed formats &
send it to client-side
Process user requests
Extract GUI info. From
plugins & send it to
client-side
Client-side
3 SpecdrawJS
Displays NMR spectra
interactively
Displays plugin GUI as
menu options
Captures user requests
and send them to the
server
31Computational tools for metabolic analysis
Spectral compression allows data transfer across the web.
32. NMRPro architecture
Server-side
1 Python Core & Plugins 2 Django App
Classes for representing NMR Spectra:
• NMRSpectrum1D • NMRSpectrum2D
• NMRDataset • NMRSampleset
Core
Each plugin provide a certain functionality:
• Reading different file formats
• Zero Filling • Apodization
• Fourier transform • Phase correction
• Baseline correction • Peak picking
• Alignment
Plugins
Convert NMR spectra to
compressed formats &
send it to client-side
Process user requests
Extract GUI info. From
plugins & send it to
client-side
Client-side
3 SpecdrawJS
Displays NMR spectra
interactively
Displays plugin GUI as
menu options
Captures user requests
and send them to the
server
32Computational tools for metabolic analysis
Integration of server-side provides advanced and
computationally expensive processing capabilities.
33. NMRPro architecture
Server-side
1 Python Core & Plugins 2 Django App
Classes for representing NMR Spectra:
• NMRSpectrum1D • NMRSpectrum2D
• NMRDataset • NMRSampleset
Core
Each plugin provide a certain functionality:
• Reading different file formats
• Zero Filling • Apodization
• Fourier transform • Phase correction
• Baseline correction • Peak picking
• Alignment
Plugins
Convert NMR spectra to
compressed formats &
send it to client-side
Process user requests
Extract GUI info. From
plugins & send it to
client-side
Client-side
3 SpecdrawJS
Displays NMR spectra
interactively
Displays plugin GUI as
menu options
Captures user requests
and send them to the
server
33Computational tools for metabolic analysis
SpectrdrawJS can be integrated into current databases for
interactive visualization of spectra.
35. Presentation Contents
• Metabolic analysis
• Background
• NetPathMiner: Network path mining through gene expression.
• Overview of biological network analysis
• Workflow of NetPathMiner
• NMRPro: interactive online processing of NMR spectra
• Overview of NMR spectral processing
• Natural product dereplication and spectral processing
• NMRPro capabilities.
35Computational tools for metabolic analysis
36. Dereplication of natural product
compounds
• Definition:
Rapid identification of previously isolated
compounds in an automated manner.
• Importance:
• Reduces time and effort
• Increases the chances of isolating new compounds.
• Challenges
Requires the integration of diverse computational resources
36Computational tools for metabolic analysis
37. Dereplication overview
Natural extract
Purification
Full spectral
measurement
Manual structure
elucidation
Literature inquiry
Search by:
• Structure
Natural extract
Fractionation /
Purification
Preliminary spectral
measurement
Database search
Search by:
• Spectra
• Structure fragments
Filter by:
• Source organism
• Bioactivity
Without
dereplication
With
dereplication
I II
a
b
c
a
d
d
b
e
c
37Computational tools for metabolic analysis
38. Computational resources for dereplication
• Databases:
• Contain spectral data, source organism and bioactivity information
of previously isolated compounds.
• Software:
• Spectral preprocessing
• Data reduction and metabolite identification.
38Computational tools for metabolic analysis
39. Dereplication databases
General
Natural product-specific
BindingDB
ChEBI
ChemBank
Chembl
ChemIDplus
ChemSpider
CSEARCH
NCI
NIAID
ChemDB
NMRShiftDB1
PubChem
Reaxys2
SciFinder1,2
SpecInfo1,2
ZINC
AntiBase1,2
BACTIBASE
CamMedNP
ConMedNP
Dictionary of marine N. P.
Dictionary of N. P.
HeteroCycles
Marinlit1,2
NAPROC-131
NPACT
NuBBE
PhytAMP
SuperNatural
TCM database
UDNP
1 Contain spectral data
2 Commercial databases
39Computational tools for metabolic analysis
40. Challenges for dereplication
• Databases:
• Scarcity of free-to-use databases that contain spectral data
• Methods and software:
• Spectral preprocessing: Lack of online processing software for
NMR spectra.
• Compound identification: Computational methods and software
require familiarity of computer programming.
40Computational tools for metabolic analysis
41. Challenges for dereplication
• Databases:
• Scarcity of free-to-use
databases that contain
spectral data
• Software:
• Spectral preprocessing: Lack
of online processing software
for NMR spectra.
• Compound identification:
Computational methods and
software require familiarity of
computer programming.
41Computational tools for metabolic analysis
NMRPro can be a building
block
NMRPro future extensions
42. Presentation Contents
• Metabolic analysis
• Background
• NetPathMiner: Network path mining through gene expression.
• Overview of biological network analysis
• Workflow of NetPathMiner
• NMRPro: interactive online processing of NMR spectra
• Overview of NMR spectral processing
• Natural product dereplication and spectral processing
• NMRPro capabilities.
42Computational tools for metabolic analysis
NMRPro Live
43. Summary
• We surveyed the current status of computational tools for
metabolic analysis, identifying several limitations.
• We presented two novel tools, which can be building
blocks for automating research in natural products and
metabolomics.
• NetPathMiner, a software package in R, is useful for
mining metabolically active paths based on gene
expression
• NMRPro is web component for extending NMR
processing functionality of web applications and spectral
databases.
43Computational tools for metabolic analysis
44. List of publications
• Mohamed, A., Hancock, T., Nguyen, C. H. & Mamitsuka,
H. NetPathMiner: R/Bioconductor package for network
path mining through gene expression. Bioinformatics 30,
3139-3141 (2014).
• Mohamed, A., Nguyen, C. H. & Mamitsuka, H. Current
status and prospects of computational resources for
natural product dereplication: a review. Briefings in
bioinformatics, bbv042 (2015).
• Mohamed, A., Nguyen, C. H. & Mamitsuka, H. NMRPro:
An integrated web component for interactive processing
and visualization of NMR spectra. Bioinformatics (in
revision).
Computational tools for metabolic analysis 44
45. Acknowledgements
Professor Hiroshi Mamitsuka, Drs. Timothy
Hancock and Canh Hao Nguyen for their
guidance and contribution during this study
and paper writing.
45Computational tools for metabolic analysis