SlideShare a Scribd company logo
Valerie De Anda
Ecology Institute UNAM México
Laboratory of Computational Biology Zaragoza
CSIC Spain
valdeanda@ciencias.unam.mx
https://github.com/valdeanda
@val_deanda
The12thInternationalConference onGenomics
O c t o b e r 2 6 t o 2 9 , 2 0 1 7
S h e n z h e n , C h i n a
Revolution in
microbial
ecology field
»
Genomic
reconstruction:
microbial dark
matter
»
Large amount of data
»Ability to evaluate
complex metabolic
functions data in
large data sets
remains:
The iceberg illusion of metagenomics
Biologically and
computationally
challenging
»»Diversity,
ecology,
evolution and
functional
makeup of the
microbial world
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 / 2 2
»Really complex to
infer and test
biological
hypothesis in
such data
M E B S
The Iceberg illusion of
metagenomics
Microbial
ecology-
derived ‘omic’
studies
What do we need to improve efficiency of
data processing?
Biological data
interpretation
(evaluate, compare
and analyze
complex data in a
large scale)
Computationally
efficiency:
(high performance,
accuracy, high speed,
data processing,
reproducibility)
» Most abundant
» Marker genes
Metagenomicdata
» Statistically
≠ features
Gomez Cabrero et al 2014 BMC SB
Reshetova et al 2013 BMC SB
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 3 / 2 2M E B S
Data
integration
For a given system,
multiple sources (and
possible types) of data
are available and we
want to study them
integratively to improve
knowledge discovery
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
C
Gomez Cabrero et al 2014 BMC SB
Reshetova et al 2013 BMC SB
Prior knowledge: To
reduce the solution
space and/or to
focus the analysis on
biological meaningful
regions
(specific metabolic
machineries)
(Targeted)
Metabolism Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
Public available
genomes?
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 4 / 2 2M E B S
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
C
Prior knowledge: To
reduce the solution
space and/or to
focus the analysis on
biological meaningful
regions
(specific metabolic
machineries)
(Targeted)
Metabolism Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
Large scale
dataset
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
Does is it really work?
Can capture an entire
metabolic machinery?
Can we used to
evaluate, compare and
analyze complex data in
large scale ? (genomes,
metagenomes)
Computationa
lly efficient?
Accurate, high
speed in large
datasets and
reproducible
Data
integration
Single Value
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 5 / 2 2M E B S
Data integration: case of study
Atmosphere
Solar
E°
Redox
reactions
Metabolic
guilds
Geological
processes
An entire biogeochemical cycle
S-cycle
CHONS-P
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
Large scale
datasets
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
They really
capture the
major
processes
involved in the
mobilization
and use of S-
compounds
through Earth
biosphere
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 6 / 2 2M E B S
Data integration: case of
study S-cycle
https://metacyc.org/META/NEW-IMAGE?object=Sulfur-Metabolism
http://www.genome.jp/kegg-bin/show_pathway?map00920
Manually curated
reconstruction of the S-
metabolic machinery
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 7 / 2 2M E B S
Data integration: case of study S-cycle
Taxa: metabolic guilds Metabolic machinery
i) CLSB: 24 genera
ii) PSB: 25 genera
iii) GSB: 9 genera
iv) SRB: 40 genera
v) SRM:19 genera
vi) SO:4 genera
Suli
N=161
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Complete nr sequenced
S-genomes
Sucy
N=152
txt
GCF_000006985.1 Chlorobium tepidum TLS
GCF_000007005.1 Sulfolobus solfataricus P2
GCF_000007305.1 Pyrococcus furiosus DSM 3638
GCF_000008545.1 Thermotoga maritima MSB8
GCF_000008625.1 Aquifex aeolicus VF5
GCF_000008665.1 Archaeoglobus fulgidus DSM 4304
GCF_000009965.1 Thermococcus kodakarensis KOD1
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
Evidence suggesting
their physiological
and biochemical
involvement in the
use of sulfur
compounds.
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 8 / 2 2M E B S
Data integration: case of study S-cycle
Metabolic machinery
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Sucy
N=152
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 9 / 2 2M E B S
Data integration: case of study S-cycle
Table 1. Metabolic pathways of global biogeochemical S-cycle
Pathway
number
Metabolisma
Chemical
processb Sulfur compound Typec
Chemical
formula
Sourced
Number of
Pfam domaise
P1 DS O Sulfite I SO32- E 9
P2 DS O Thiosulfate I S2O3
2- E 10
P3 DS O Tetrathionate I S4O6
2- E 2
P4 DS R Tetrathionate I S4O6
2- E 17
P5 DS R Sulfate I SO42- E 20
P6 DS R Elemental sulfur I Sº E 20
P7 DS D Thiosulfate I S2O3
2- E 9
P8 DS O Carbon disulfide O CS2 E 1
P9 A DE Alkanesulfonate O CH3O3SR S 5
P10 A R Sulfate I SO4
2- S 20
P11 DS O Sulfide I H2S E/S 29
P12 A DE L-cysteate O C3H6NO5S C/E 1
P13 A DE Dimethyl sulfone O C2H6O2S C/E 3
P14 A DE Sulfoacetate O C2H2O5S C/E 2
P15 A DE Sulfolactate O C3H4O6S C/S 14
P16 A DE Dimethyl sulfide O C2H6S C/S 16
P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12
P18 A DE Methylthiopropanoate O C4H7O2S C/S 7
P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7
P20 DS O Elemental sulfur I S° C/S/E 7
P21 DS D Elemental sulfur I S° C/S/E 1
P22 A DE Methanesulfonate O CH3O3S C/S/E 7
P23 A DE Taurine O C2H7NO3S C/S/E 11
P24 DS M Dimethyl sulfide O C2H6S C 1
P25 DS M Metylthio-propanoate O C4H7O2S C 1
P26 DS M Methanethiol O CH4S C 1
P27 A DE Homotaurine O C3H9NO3S N 1
P28 A B Sulfolipid O SQDG 4
P29 Markers Markers 12
1
Metabolic machinery
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Sucy
N=152
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 0 / 2 2M E B S
Data integration: case of study S-cycle
Metabolic machinery
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Sucy
N=152
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 1 / 2 2M E B S
Large omic datasets
What are the available data that can be used to
characterize large-scale metabolic pathways?
How do we integrate all
to improve the understanding the system?.
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
txt
2,107 nr genomes (faa)
Gen 1,5 GB
How many genomes were available
at the time of analysis?
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 2 / 2 2
Num of complete
prokariotic
genomes
≈4,000 (NCBI Refseq)
Dec 2016
Non redundant 2,107 Dec 2016
Public
available
and
manually
cuarted
data
M E B S
Large omic datasets
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
Taxa: Suli Proteins: Sucy
txt
2,107 nr genomes (faa)
Gen MetGenF
104GB
≈ 500 GB
1,5 GB
How many metagenomes were
available at the time of analysis?
i) were publicly available
ii) contained associated metadata
iii) had been isolated from well-defined environments
(i.e., rivers, soil, biofilms)
iv) discarding host associated microbiome sequences
(i.e., human, cow, chicken)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 3 / 2 2M E B S
112-HMM of S-proteins
C
txt
GCF_000006985.1 Chlorobium tepidum TLS
GCF_000007005.1 Sulfolobus solfataricus P2
GCF_000007305.1 Pyrococcus furiosus DSM 3638
GCF_000008545.1 Thermotoga maritima MSB8
GCF_000008625.1 Aquifex aeolicus VF5
GCF_000008665.1 Archaeoglobus fulgidus DSM 4304
GCF_000009965.1 Thermococcus kodakarensis KOD1
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
2,107 nr genomes (faa)
Gen GenF
Stage 1: Manual curation and omic datasets
Stage 2: Domain composition
Stage 4: Informative Score Can capture the S- metabolic machinery?
Can we used to evaluate, compare and analyze
complex data in large scale ? (genomes, metagenomes)
Computationally efficient? Accurate,
high speed in large datasets and
reproducibleSingle Value
Mathematical model
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)
𝑄 𝑖 (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑)
n
≥1
Informative
Non-Informative
Stage 3: Relative Entropy
Domains enriched among the microorganisms of interest
𝑃 𝑖 = frequency of protein domain i in S genomes (161)
Q 𝑖 = frequency of protein domain i in Gen (2,107)
0
≤0
Taxa: Suli Proteins: Sucy
MEBS: GENERAL OVERVIEW
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 4 / 2 2M E B S
https://github.com/eead-csic-compbio/metagenome_Pfam_score
2,107 genomes 161 Suli +
935 metagenomes
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 5 / 2 2M E B S
an unnamed endosymbiont of a
scaly snail from a black smoker
chimney
archaeon Geoglobus ahangari,
sampled from a 2,000m depth
hydrothermal vent .
Distribution of Sulfur Score (SS)
in 2,107 nr-genomes
Candidatus
Desulforudis
audaxviator MP104C
Metagenomic reconstructions hard-to culture taxa
Sur
N=192
»
»»
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 6 / 2 2M E B S
Positive instances
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 7 / 2 2
Suli
N=161
(1946) > Negative instances.
Gen
ROC CURVE
• Two-dimensional graphs in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis.
• Depicts relative tradeoffs between benefits (true positives) and costs (false positives).
Perfect
classification
M E B S
Distribution of Sulfur Score (SS) in the metagenomic dataset (935 metagenomes)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
Distribution of SS values observed in 935
metagenomes classified in terms of features
(X-axis) and colored according to their
particular habitats Features are sorted
according to their median SS values. Green
lines indicate the lowest and largest 95th
percentiles observed across MSL classes.
Geo-localized
metagenomes
sampled around the
globe are colored
according to their SS
values
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2M E B S
mebs
BG cygling
S
genes
S
genomes
Informative
Non-informative
9.5
Markers Comp
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
C
Conclusions
» We present MEBS a new open source software to evaluate, quantify, compare, and
predict the metabolic machinery of interest in large ‘omic’ datasets using one single
value
» To test the applicability of this approach, we evaluated one of the most complex
biogeochemical cycles the sulfur cycle.
» Using data integration and manual curation we reconstructed the entire sulfur
machinery: Suli and Sucy
» We prove that the use of the mathematical framework of the relative entropy can
be used to capture complex metabolic machineries in large scale omic samples.
» MEBS powerful and broadly applicable approach to predict, and classify
microorganisms closely involved in the sulfur cycle even in hard-to culture
microbial lineages
» Computationally efficient, accurate (AUC0985) and reproducible.
» Not in the presentation: the entropy can be used to detect marker domains and the
completeness of the S-cycle pathways can be benchmarked in large scale
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 9 / 2 2
MEBS
M E B S
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 0 / 2 2
mebs
BG CYGLING
9.5
C N O
SFe P
BIOREMEDIATION ANTIBIOTICS
EXTREME
ENVIRONMENTS
AGRICULTURE
?
Perspectives
• We are currently finishing the analyses to demonstrate the applicability of
this approach to other biogeochemical cycles (C, N, O, Fe, P).
• Thereby, we hope that the pipeline MEBS will facilitate analysis of
biogeochemical cycles or complex metabolic networks carried out by
specific prokaryotic guilds, such as bioremediation processes (i.e.,
degradation of hydrocarbons, toxic aromatic compounds, heavy metals
etc.).
• We look forward to collaborate and help other researchers by integrating
comprehensive databases that might be helpful to the scientific
community.
• Furthermore, we are currently working to improve the algorithm by using
only a list of sequenced genomes involved in the metabolism of interest,
in order to reduce the manual curation effort.
• We are also considering taking k-mers instead of peptide Hidden Markov
Models to increase the speed of the pipeline.
• We anticipate that our platform will stimulate interest and involvement
among the scientific community to explore uncultured genomes derived
from large metagenomic sequences: exploring microbial dark matter
M E B S
Icoquih
Zapata
Valeria Souza
Luis Equiarte
Bruno
Contreras
De Anda et al., 2017 MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic
machinery: unraveling the sulfur cycle GigaScience in press
Cesar-Poot
Hernandez
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 1 / 2 2M E B S
L A B O R A T O R Y O F M O L E C U L A R A N D
E X P E R I M E N T A L E V O L U T I O N E C O L O G Y I N S T I T U T E
U N A M M E X I C O
22
L A B O R A T O R Y O F C O M P U T A T I O N A L
B I O L O G Y
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 2 / 2 2
Thank you for your attention!
M E B S
supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d am e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 / 1 2
A B
Gen (n=2,107) Met (n=935)
D. acidiphilus
Hydrogenobacullum
A. caldus
A. ferrivorans
T. mobilis
D. aromatica
T. hauera sp.
T. humireducens
A. denitrificans
S. tokodaii
A. hospitalis (among
other 12 genomes)
P. phaeoclathratiforme
C. chlorochromatii
C. tepidum
T. denitrificans
T. violascens
S. thiotaurini
Completeness
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Table 1. Metabolic pathways of global biogeochemical S-cycle
Pathway
number
Metabolisma Chemical
processb Sulfur compound Typec Chemical
formula
Sourced Number of
Pfam domaise
P1 DS O Sulfite I SO32- E 9
P2 DS O Thiosulfate I S2O3
2- E 10
P3 DS O Tetrathionate I S4O6
2- E 2
P4 DS R Tetrathionate I S4O6
2- E 17
P5 DS R Sulfate I SO42- E 20
P6 DS R Elemental sulfur I Sº E 20
P7 DS D Thiosulfate I S2O3
2- E 9
P8 DS O Carbon disulfide O CS2 E 1
P9 A DE Alkanesulfonate O CH3O3SR S 5
P10 A R Sulfate I SO4
2- S 20
P11 DS O Sulfide I H2S E/S 29
P12 A DE L-cysteate O C3H6NO5S C/E 1
P13 A DE Dimethyl sulfone O C2H6O2S C/E 3
P14 A DE Sulfoacetate O C2H2O5S C/E 2
P15 A DE Sulfolactate O C3H4O6S C/S 14
P16 A DE Dimethyl sulfide O C2H6S C/S 16
P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12
P18 A DE Methylthiopropanoate O C4H7O2S C/S 7
P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7
P20 DS O Elemental sulfur I S° C/S/E 7
P21 DS D Elemental sulfur I S° C/S/E 1
P22 A DE Methanesulfonate O CH3O3S C/S/E 7
P23 A DE Taurine O C2H7NO3S C/S/E 11
P24 DS M Dimethyl sulfide O C2H6S C 1
P25 DS M Metylthio-propanoate O C4H7O2S C 1
P26 DS M Methanethiol O CH4S C 1
P27 A DE Homotaurine O C3H9NO3S N 1
P28 A B Sulfolipid O SQDG 4
P29 Markers Markers 12
1
The protein domains currently present in any given
sample are divided by the total number of domains
in the pre-defined pathway
Completeness
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
35 private metagenomes:
microbial mats, sediment
and lake water
Reads
Processing
ORF prediction
Gene Calling
(aa residues)
Mean Size Length
https://microbiome.wordpress.com/
Counts of prokaryotic genomes in each NCBI category as of July 2017
Non-redundantRedundant
LARGE SCALE
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
GenF size category 5-percentile 95-percentile
Real -0.091 0.101
30 -0.086 0.105
60 -0.09 0.104
100 -0.088 0.1
150 -0.09 0.103
200 -0.89 0.105
250 -0.09 0.106
300 -0.09 0.1
Completeness
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Table 2 Informative Pfam domains with high H’ and low std. Novel proposed molecular marker
domains in metagenomic data of variable MSL
Pfam ID
( Suli
ocurrences)
H’
mean
H’
std
Description
PF12139
58/161
1.2 0.01 Adenosine-5'-phosphosulfate reductase beta subunit: Key protein domain for both sulfur
oxidation/reduction metabolic pathways. Has been widely studied in the dissimilatory sulfate
reduction metabolism. In all recognized sulfate-reducing prokaryotes, the dissimilatory process is
mediated by three key enzymes: Sat, Apr and Dsr. Homologous proteins are also present in the
anoxygenic photolithotrophic and chemolithotrophic sulfur-oxidizing bacteria (CLSB, PSB, GSB), in
different cluster organization [35].
PF00374
135/161
1.1 0.09 Nickel-dependent hydrogenase: Hydrogenases with S-cluster and selenium containing Cys-x-x-Cys
motifs involved in the binding of nickel. Among the homologues of this hydrogenase domain, is
the alpha subunit of the sulfhydrogenase I complex of Pyrococcus furiosus, that catalyzes the
reduction of polysulfide to hydrogen sulfide with NADPH as the electron donor [55].
PF01747
103/161
1.03 0.06 ATP-sulfurylase: Key protein domain for both sulfur oxidation and reduction processes. The
enzyme catalyzes the transfer of the adenylyl group from ATP to inorganic sulfate, producing
adenosine 5′-phosphosulfate (APS) and pyrophosphate, or the reverse reaction [56].
PF02662
62/161
0.82 0.03 Methyl-viologen-reducing hydrogenase, delta subunit: Is one of the enzymes involved in
methanogenesis and encoded in the mth-flp-mvh-mrt cluster of methane genes in
Methanothermobacter thermautotrophicus. No specific functions have been assigned to the delta
subunit [48].
PF10418
122/161
0.78 0.06 Iron-sulfur cluster binding domain of dihydroorotate dehydrogenase B: Among the homologous
genes in this family are asrA and asrB from Salmonella enterica enterica serovar Typhimurium,
which encode 1) a dissimilatory sulfite reductase, 2) a gamma subunit of the sulfhydrogenase I
complex of Pyrococcus furiosus and, 3) a gamma subunit of the sulfhydrogenase II complex of the
same organism [12].
PF13247
149/161
0.66 0.06 4Fe-4S dicluster domain: Homologues of this family include: 1) DsrO, a ferredoxin-like protein,
related to the electron transfer subunits of respiratory enzymes, 2) dimethylsulfide dehydrogenase
β subunit (ddhB ), involved in dimethyl sulfide degradation in Rhodovulum sulfidophilum and 3)
sulfur reductase FeS subunit (sreB) of Acidianus ambivalens, involved in the sulfur reduction using
H2 or organic substrates as electron donors [12].
PF04358
73/161
0.52 0 DsrC like protein: DsrC is present in all organisms encoding a dsrAB sulfite reductase
(sulfate/sulfite reducers or sulfur oxidizers). The physiological studies suggest that sulfate
reduction rates are determined by cellular levels of this protein. The dissimilatory sulfate reduction
couples the four-electron reduction of the DsrC trisulfide to energy conservation [57]. DsrC was
initially described as a subunit of DsrAB, forming a tight complex; however, it is not a subunit, but
rather a protein with which DsrAB interacts. DsrC is involved in sulfur-transfer reactions; there is a
disulfide bond between the two DsrC cysteines as a redox-active center in the sulfite reduction
pathway. Moreover, DsrC is among the most highly expressed sulfur energy metabolism genes in
isolated organisms and meta- transcriptomes (Santos et al., 2015).
PF01058
158/161
0.45 0.01 NADH ubiquinone oxidoreductase, 20 Kd subunit: Homologous genes are found in the delta
subunits of both sulfhydrogenase complexes of Pyrococcus furiosus [12].
PF01568
156/161
0.4 0.05 Molydopterin dinucleotide binding domain: This domain corresponds to the C-terminal domain IV
in dimethyl sulfoxide (DMSO) reductase [48].
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
https://github.com/eead-csic-compbio/metagenome_Pfam_score
Modo avanzado manual
» Biogeochemical cycles (CNOPFe)
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Species SS Genus Guild
Ammonifex degensii KC4 12,508 Moorella group SRB/SR
Archaeoglobus profundus DSM 5631 12,024 Archaeoglobus SRB
Candidatus Desulforudis audaxviator MP104C 11,972 Candidatus Desulforudis Sur
Pelodictyon phaeoclathratiforme BU-1 11,836
Chlorobium/Pelodictyon
group GSB
Chlorobium phaeobacteroides BS1 11,649
Chlorobium/Pelodictyon
group GSB
Chlorobium chlorochromatii CaD3 11,625
Chlorobium/Pelodictyon
group GSB
Thiobacillus denitrificans ATCC 25259 11,61 Thiobacillus CLSB
Desulfohalobium retbaense DSM 5692 11,511 Desulfohalobium SRB
Desulfovibrio alaskensis G20 11,5 Desulfovibrio SRB
Desulfovibrio vulgaris DP4 11,442 Desulfovibrio SRB
Chlorobium tepidum TLS 11,354 Chlorobaculum GSB
endosymbiont of unidentified scaly snail isolate
Monju 11,205 0 Sur
Desulfovibrio vulgaris str. 'Miyazaki F' 11,093 Desulfovibrio SRB
Desulfovibrio desulfuricans subsp.
desulfuricans str. ATCC 27774 11,034 Desulfovibrio SRB
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
34
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Sulfur: 112 H’ Nitrogen: 176 H’ Methane: 119 H’Oxygen:55 H’
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Iron: 112 H’
Biogeochemical cycle Genes Pfam domains Genomes AUC
Sulfur (S) 152 112 161 0.9855
Nitrogen (N) 267 176 144 0.791
Methane (C) 135 119 90 0.988
Oxygenic Photosynthesis (O) 50 55 53 0.983
Phosphorous (P)
Iron (Fe) 36 33 34 0.863
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
ID Description H’ mean std
PF00067 Cytochrome P450 0.644 0.033785
PF00115 Cytochrome C and Quinol oxidase polypeptide I 0.513 0.061551
PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.55825 0.049936
PF02560 Cyanate lyase C-terminal domain 0.93625 0.001389
PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.5525 0.040324
PF04898 Glutamate synthase central domain 0.479 0.034699
PF13442 Cytochrome C oxidase, cbb3-type, subunit III 0.6565 0.047093
python3 plot_entropy.py gen_genF_entropies.oxygen.tab -0.156 0.20625
Oxygen Markers
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
ID Description H’ mean std
PF01913 Formylmethanofuran-tetrahydromethanopterin formyltransferase 3.629125 0.0227
PF01993 methylene-5,6,7,8-tetrahydromethanopterin dehydrogenase 2.876 0
PF02240 Methyl-coenzyme M reductase gamma subunit 3.168 0
PF02241 Methyl-coenzyme M reductase beta subunit, C-terminal domain 3.168 0
PF02289 Cyclohydrolase (MCH) 3.353 0
PF02741 FTR, proximal lobe 3.63475 0.034648
PF02745 Methyl-coenzyme M reductase alpha subunit, N-terminal domain 3.168 0
PF02783 Methyl-coenzyme M reductase beta subunit, N-terminal domain 3.168 0
PF04206 Tetrahydromethanopterin S-methyltransferase, subunit E 3.032 0
PF04207 Tetrahydromethanopterin S-methyltransferase, subunit D 3.032 0
PF04208 Tetrahydromethanopterin S-methyltransferase, subunit A 2.903375 0.015203
PF04211 Tetrahydromethanopterin S-methyltransferase, subunit C 3.02575 0.017678
PF05440 Tetrahydromethanopterin S-methyltransferase subunit B 2.980125 0.036537 python3 plot_entropy.py
gen_genF_entropies.methane.tab -0.121 0.1475m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Methane
ID Description H’ mean std
PF00067 Cytochrome P450 0.57375 0.0056
PF00174 Oxidoreductase molybdopterin binding domain 0.528125 0.006578
PF00355 Rieske [2Fe-2S] domain 0.507 0.032076
PF00507 NADH-ubiquinone/plastoquinone oxidoreductase, chain 3 0.36975 0.010886
PF00547 Urease, gamma subunit 0.464 0
PF00699 Urease beta subunit 0.475125 0.001126
PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.47025 0.014568
PF02211 Nitrile hydratase beta subunit 0.405625 0.005041
PF02633 Creatinine amidohydrolase 0.58725 0.017466
PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.48 0.032715
PF05899 Protein of unknown function (DUF861) 0.52175 0.022914
PF09347 Domain of unknown function (DUF1989) 0.398875 0.007415
Nitrogen
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Iron
ID Description H’ mean std
PF14522 Cytochrome c7 and related cytochrome c 1.010 0.104
PF00355 Rieske [2Fe-2S] domain 0.51912 0.02854
PF00033 Cytochrome b/b6/petB 0.55875 0.04974
PF00034 Cytochrome c 0.5061 0.1013
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Positive instances
Positive classifications
only with strong evidence so they
make few false positive
errors
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2
Suli
N=161
(1946) > Negative instances.
Gen
ROC CURVE
• Two-dimensional graphs in which tp
rate is plotted on the Y axis and fp rate is plotted on the X
axis.
• Depicts relative tradeoffs between benefits (true positives)
and costs (false positives).
Never issuing a
positive
classification; such
a classifier
commits no false
positive errors but
also gains no true
positives
Perfect
classification
Random guessing produces the
diagonal line between (0,0) and (1,
1), which has an area of 0.5, no
realistic classifier should have an AUC
less than 0.5
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
RelativeentropyH’
4Fe-4S dicluster domain
Molydopterin
dinucleotide binding
domain
Cytochrome C
oxidase, cbb3-type,
subunit III
Nitrogenase component
1 type Oxidoreductase
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files

More Related Content

Similar to Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...
Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...
Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...
Nathan Marshall
 
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Ed Griffen
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
Steve Flynn
 
Can a combination of constrained-based and kinetic modeling bridge time scale...
Can a combination of constrained-based and kinetic modeling bridge time scale...Can a combination of constrained-based and kinetic modeling bridge time scale...
Can a combination of constrained-based and kinetic modeling bridge time scale...
Natal van Riel
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
Prof. Wim Van Criekinge
 
Differential metabolic activity and discovery of therapeutic targets using su...
Differential metabolic activity and discovery of therapeutic targets using su...Differential metabolic activity and discovery of therapeutic targets using su...
Differential metabolic activity and discovery of therapeutic targets using su...
Joaquin Dopazo
 
Autism Telehealth webinar slide deck - 03122022.pptx
Autism Telehealth webinar slide deck - 03122022.pptxAutism Telehealth webinar slide deck - 03122022.pptx
Autism Telehealth webinar slide deck - 03122022.pptx
Marlene Maheu
 
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
eSAT Journals
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Prof. Wim Van Criekinge
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
Dinesh Barupal
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
Dmitry Grapov
 
MDC Connects: Cell-based screening: Old dogs with new tricks
MDC Connects: Cell-based screening: Old dogs with new tricksMDC Connects: Cell-based screening: Old dogs with new tricks
MDC Connects: Cell-based screening: Old dogs with new tricks
Medicines Discovery Catapult
 
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...
Karthik Raman
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
Prof. Wim Van Criekinge
 
Chenomx
ChenomxChenomx
Chenomx
Shahid Malik
 
Metabolomics.ppt
Metabolomics.pptMetabolomics.ppt
Metabolomics.ppt
Robinakhan13
 
ERP Biomarker Qualification Consortium, NDD Summit
ERP Biomarker Qualification Consortium, NDD SummitERP Biomarker Qualification Consortium, NDD Summit
ERP Biomarker Qualification Consortium, NDD Summit
K.C. Fadem
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
Prof. Wim Van Criekinge
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
Lee Larcombe
 
Use of bio-informatic tools in bacterial genetics
Use of bio-informatic tools in bacterial geneticsUse of bio-informatic tools in bacterial genetics
Use of bio-informatic tools in bacterial genetics
Debtanu Chakraborty
 

Similar to Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle (20)

Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...
Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...
Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...
 
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
 
Can a combination of constrained-based and kinetic modeling bridge time scale...
Can a combination of constrained-based and kinetic modeling bridge time scale...Can a combination of constrained-based and kinetic modeling bridge time scale...
Can a combination of constrained-based and kinetic modeling bridge time scale...
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Differential metabolic activity and discovery of therapeutic targets using su...
Differential metabolic activity and discovery of therapeutic targets using su...Differential metabolic activity and discovery of therapeutic targets using su...
Differential metabolic activity and discovery of therapeutic targets using su...
 
Autism Telehealth webinar slide deck - 03122022.pptx
Autism Telehealth webinar slide deck - 03122022.pptxAutism Telehealth webinar slide deck - 03122022.pptx
Autism Telehealth webinar slide deck - 03122022.pptx
 
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
MDC Connects: Cell-based screening: Old dogs with new tricks
MDC Connects: Cell-based screening: Old dogs with new tricksMDC Connects: Cell-based screening: Old dogs with new tricks
MDC Connects: Cell-based screening: Old dogs with new tricks
 
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
Chenomx
ChenomxChenomx
Chenomx
 
Metabolomics.ppt
Metabolomics.pptMetabolomics.ppt
Metabolomics.ppt
 
ERP Biomarker Qualification Consortium, NDD Summit
ERP Biomarker Qualification Consortium, NDD SummitERP Biomarker Qualification Consortium, NDD Summit
ERP Biomarker Qualification Consortium, NDD Summit
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
Use of bio-informatic tools in bacterial genetics
Use of bio-informatic tools in bacterial geneticsUse of bio-informatic tools in bacterial genetics
Use of bio-informatic tools in bacterial genetics
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
GigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
GigaScience, BGI Hong Kong
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
GigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
GigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
GigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
GigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 

Recently uploaded (20)

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 

Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

  • 1. Valerie De Anda Ecology Institute UNAM México Laboratory of Computational Biology Zaragoza CSIC Spain valdeanda@ciencias.unam.mx https://github.com/valdeanda @val_deanda The12thInternationalConference onGenomics O c t o b e r 2 6 t o 2 9 , 2 0 1 7 S h e n z h e n , C h i n a
  • 2. Revolution in microbial ecology field » Genomic reconstruction: microbial dark matter » Large amount of data »Ability to evaluate complex metabolic functions data in large data sets remains: The iceberg illusion of metagenomics Biologically and computationally challenging »»Diversity, ecology, evolution and functional makeup of the microbial world MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 / 2 2 »Really complex to infer and test biological hypothesis in such data M E B S
  • 3. The Iceberg illusion of metagenomics Microbial ecology- derived ‘omic’ studies What do we need to improve efficiency of data processing? Biological data interpretation (evaluate, compare and analyze complex data in a large scale) Computationally efficiency: (high performance, accuracy, high speed, data processing, reproducibility) » Most abundant » Marker genes Metagenomicdata » Statistically ≠ features Gomez Cabrero et al 2014 BMC SB Reshetova et al 2013 BMC SB MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 3 / 2 2M E B S
  • 4. Data integration For a given system, multiple sources (and possible types) of data are available and we want to study them integratively to improve knowledge discovery What are the available data that can be used to characterize large-scale metabolic machineries? How do we integrate all to improve the understanding the system?. C Gomez Cabrero et al 2014 BMC SB Reshetova et al 2013 BMC SB Prior knowledge: To reduce the solution space and/or to focus the analysis on biological meaningful regions (specific metabolic machineries) (Targeted) Metabolism Taxa involved in that particular metabolism Proteins involved in that particular metabolism Public available genomes? Mathematical model Relative entropy Informative Score MEBS 𝐇′ = 𝑖 𝑃 𝑖 log2 𝑃 𝑖 𝑄 𝑖 n0 ≥1 ≤0 Informative Non-Informative MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 4 / 2 2M E B S
  • 5. What are the available data that can be used to characterize large-scale metabolic machineries? How do we integrate all to improve the understanding the system?. C Prior knowledge: To reduce the solution space and/or to focus the analysis on biological meaningful regions (specific metabolic machineries) (Targeted) Metabolism Taxa involved in that particular metabolism Proteins involved in that particular metabolism Large scale dataset Mathematical model Relative entropy Informative Score MEBS 𝐇′ = 𝑖 𝑃 𝑖 log2 𝑃 𝑖 𝑄 𝑖 n0 ≥1 ≤0 Informative Non-Informative Does is it really work? Can capture an entire metabolic machinery? Can we used to evaluate, compare and analyze complex data in large scale ? (genomes, metagenomes) Computationa lly efficient? Accurate, high speed in large datasets and reproducible Data integration Single Value MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 5 / 2 2M E B S
  • 6. Data integration: case of study Atmosphere Solar E° Redox reactions Metabolic guilds Geological processes An entire biogeochemical cycle S-cycle CHONS-P What are the available data that can be used to characterize large-scale metabolic machineries? How do we integrate all to improve the understanding the system?. Taxa involved in that particular metabolism Proteins involved in that particular metabolism Large scale datasets Mathematical model Relative entropy Informative Score MEBS 𝐇′ = 𝑖 𝑃 𝑖 log2 𝑃 𝑖 𝑄 𝑖 n0 ≥1 ≤0 Informative Non-Informative They really capture the major processes involved in the mobilization and use of S- compounds through Earth biosphere MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 6 / 2 2M E B S
  • 7. Data integration: case of study S-cycle https://metacyc.org/META/NEW-IMAGE?object=Sulfur-Metabolism http://www.genome.jp/kegg-bin/show_pathway?map00920 Manually curated reconstruction of the S- metabolic machinery MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 7 / 2 2M E B S
  • 8. Data integration: case of study S-cycle Taxa: metabolic guilds Metabolic machinery i) CLSB: 24 genera ii) PSB: 25 genera iii) GSB: 9 genera iv) SRB: 40 genera v) SRM:19 genera vi) SO:4 genera Suli N=161 i) Sulfur compounds ii) Metabolic pathways iii) Genes iv) Proteins Complete nr sequenced S-genomes Sucy N=152 txt GCF_000006985.1 Chlorobium tepidum TLS GCF_000007005.1 Sulfolobus solfataricus P2 GCF_000007305.1 Pyrococcus furiosus DSM 3638 GCF_000008545.1 Thermotoga maritima MSB8 GCF_000008625.1 Aquifex aeolicus VF5 GCF_000008665.1 Archaeoglobus fulgidus DSM 4304 GCF_000009965.1 Thermococcus kodakarensis KOD1 >Protein1 MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM MGAGYFSPAGFMNV >Protein 2 MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID >Protein 3 MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC YSCVKACPHNAIDVR Evidence linking them with the S- cycle (Curated DB and primarily literature) Evidence suggesting their physiological and biochemical involvement in the use of sulfur compounds. MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 8 / 2 2M E B S
  • 9. Data integration: case of study S-cycle Metabolic machinery i) Sulfur compounds ii) Metabolic pathways iii) Genes iv) Proteins Sucy N=152 >Protein1 MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM MGAGYFSPAGFMNV >Protein 2 MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID >Protein 3 MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC YSCVKACPHNAIDVR Evidence linking them with the S- cycle (Curated DB and primarily literature) MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 9 / 2 2M E B S
  • 10. Data integration: case of study S-cycle Table 1. Metabolic pathways of global biogeochemical S-cycle Pathway number Metabolisma Chemical processb Sulfur compound Typec Chemical formula Sourced Number of Pfam domaise P1 DS O Sulfite I SO32- E 9 P2 DS O Thiosulfate I S2O3 2- E 10 P3 DS O Tetrathionate I S4O6 2- E 2 P4 DS R Tetrathionate I S4O6 2- E 17 P5 DS R Sulfate I SO42- E 20 P6 DS R Elemental sulfur I Sº E 20 P7 DS D Thiosulfate I S2O3 2- E 9 P8 DS O Carbon disulfide O CS2 E 1 P9 A DE Alkanesulfonate O CH3O3SR S 5 P10 A R Sulfate I SO4 2- S 20 P11 DS O Sulfide I H2S E/S 29 P12 A DE L-cysteate O C3H6NO5S C/E 1 P13 A DE Dimethyl sulfone O C2H6O2S C/E 3 P14 A DE Sulfoacetate O C2H2O5S C/E 2 P15 A DE Sulfolactate O C3H4O6S C/S 14 P16 A DE Dimethyl sulfide O C2H6S C/S 16 P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12 P18 A DE Methylthiopropanoate O C4H7O2S C/S 7 P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7 P20 DS O Elemental sulfur I S° C/S/E 7 P21 DS D Elemental sulfur I S° C/S/E 1 P22 A DE Methanesulfonate O CH3O3S C/S/E 7 P23 A DE Taurine O C2H7NO3S C/S/E 11 P24 DS M Dimethyl sulfide O C2H6S C 1 P25 DS M Metylthio-propanoate O C4H7O2S C 1 P26 DS M Methanethiol O CH4S C 1 P27 A DE Homotaurine O C3H9NO3S N 1 P28 A B Sulfolipid O SQDG 4 P29 Markers Markers 12 1 Metabolic machinery i) Sulfur compounds ii) Metabolic pathways iii) Genes iv) Proteins Sucy N=152 >Protein1 MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM MGAGYFSPAGFMNV >Protein 2 MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID >Protein 3 MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC YSCVKACPHNAIDVR Evidence linking them with the S- cycle (Curated DB and primarily literature) MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 0 / 2 2M E B S
  • 11. Data integration: case of study S-cycle Metabolic machinery i) Sulfur compounds ii) Metabolic pathways iii) Genes iv) Proteins Sucy N=152 >Protein1 MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM MGAGYFSPAGFMNV >Protein 2 MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID >Protein 3 MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC YSCVKACPHNAIDVR Evidence linking them with the S- cycle (Curated DB and primarily literature) MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 1 / 2 2M E B S
  • 12. Large omic datasets What are the available data that can be used to characterize large-scale metabolic pathways? How do we integrate all to improve the understanding the system?. Mathematical model Relative entropy Informative Score MEBS 𝐇′ = 𝑖 𝑃 𝑖 log2 𝑃 𝑖 𝑄 𝑖 n0 ≥1 ≤0 Informative Non-Informative Taxa involved in that particular metabolism Proteins involved in that particular metabolism txt 2,107 nr genomes (faa) Gen 1,5 GB How many genomes were available at the time of analysis? MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 2 / 2 2 Num of complete prokariotic genomes ≈4,000 (NCBI Refseq) Dec 2016 Non redundant 2,107 Dec 2016 Public available and manually cuarted data M E B S
  • 13. Large omic datasets What are the available data that can be used to characterize large-scale metabolic machineries? How do we integrate all to improve the understanding the system?. Mathematical model Relative entropy Informative Score MEBS 𝐇′ = 𝑖 𝑃 𝑖 log2 𝑃 𝑖 𝑄 𝑖 n0 ≥1 ≤0 Informative Non-Informative Taxa: Suli Proteins: Sucy txt 2,107 nr genomes (faa) Gen MetGenF 104GB ≈ 500 GB 1,5 GB How many metagenomes were available at the time of analysis? i) were publicly available ii) contained associated metadata iii) had been isolated from well-defined environments (i.e., rivers, soil, biofilms) iv) discarding host associated microbiome sequences (i.e., human, cow, chicken) MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 3 / 2 2M E B S
  • 14. 112-HMM of S-proteins C txt GCF_000006985.1 Chlorobium tepidum TLS GCF_000007005.1 Sulfolobus solfataricus P2 GCF_000007305.1 Pyrococcus furiosus DSM 3638 GCF_000008545.1 Thermotoga maritima MSB8 GCF_000008625.1 Aquifex aeolicus VF5 GCF_000008665.1 Archaeoglobus fulgidus DSM 4304 GCF_000009965.1 Thermococcus kodakarensis KOD1 >Protein1 MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM MGAGYFSPAGFMNV >Protein 2 MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID >Protein 3 MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC YSCVKACPHNAIDVR 2,107 nr genomes (faa) Gen GenF Stage 1: Manual curation and omic datasets Stage 2: Domain composition Stage 4: Informative Score Can capture the S- metabolic machinery? Can we used to evaluate, compare and analyze complex data in large scale ? (genomes, metagenomes) Computationally efficient? Accurate, high speed in large datasets and reproducibleSingle Value Mathematical model 𝐇′ = 𝑖 𝑃 𝑖 log2 𝑃 𝑖 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑) 𝑄 𝑖 (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑) n ≥1 Informative Non-Informative Stage 3: Relative Entropy Domains enriched among the microorganisms of interest 𝑃 𝑖 = frequency of protein domain i in S genomes (161) Q 𝑖 = frequency of protein domain i in Gen (2,107) 0 ≤0 Taxa: Suli Proteins: Sucy MEBS: GENERAL OVERVIEW MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 4 / 2 2M E B S
  • 15. https://github.com/eead-csic-compbio/metagenome_Pfam_score 2,107 genomes 161 Suli + 935 metagenomes MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 5 / 2 2M E B S
  • 16. an unnamed endosymbiont of a scaly snail from a black smoker chimney archaeon Geoglobus ahangari, sampled from a 2,000m depth hydrothermal vent . Distribution of Sulfur Score (SS) in 2,107 nr-genomes Candidatus Desulforudis audaxviator MP104C Metagenomic reconstructions hard-to culture taxa Sur N=192 » »» MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 6 / 2 2M E B S
  • 17. Positive instances MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 7 / 2 2 Suli N=161 (1946) > Negative instances. Gen ROC CURVE • Two-dimensional graphs in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis. • Depicts relative tradeoffs between benefits (true positives) and costs (false positives). Perfect classification M E B S
  • 18. Distribution of Sulfur Score (SS) in the metagenomic dataset (935 metagenomes) MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS Distribution of SS values observed in 935 metagenomes classified in terms of features (X-axis) and colored according to their particular habitats Features are sorted according to their median SS values. Green lines indicate the lowest and largest 95th percentiles observed across MSL classes. Geo-localized metagenomes sampled around the globe are colored according to their SS values T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2M E B S
  • 19. mebs BG cygling S genes S genomes Informative Non-informative 9.5 Markers Comp MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS C Conclusions » We present MEBS a new open source software to evaluate, quantify, compare, and predict the metabolic machinery of interest in large ‘omic’ datasets using one single value » To test the applicability of this approach, we evaluated one of the most complex biogeochemical cycles the sulfur cycle. » Using data integration and manual curation we reconstructed the entire sulfur machinery: Suli and Sucy » We prove that the use of the mathematical framework of the relative entropy can be used to capture complex metabolic machineries in large scale omic samples. » MEBS powerful and broadly applicable approach to predict, and classify microorganisms closely involved in the sulfur cycle even in hard-to culture microbial lineages » Computationally efficient, accurate (AUC0985) and reproducible. » Not in the presentation: the entropy can be used to detect marker domains and the completeness of the S-cycle pathways can be benchmarked in large scale T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 9 / 2 2 MEBS M E B S
  • 20. MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 0 / 2 2 mebs BG CYGLING 9.5 C N O SFe P BIOREMEDIATION ANTIBIOTICS EXTREME ENVIRONMENTS AGRICULTURE ? Perspectives • We are currently finishing the analyses to demonstrate the applicability of this approach to other biogeochemical cycles (C, N, O, Fe, P). • Thereby, we hope that the pipeline MEBS will facilitate analysis of biogeochemical cycles or complex metabolic networks carried out by specific prokaryotic guilds, such as bioremediation processes (i.e., degradation of hydrocarbons, toxic aromatic compounds, heavy metals etc.). • We look forward to collaborate and help other researchers by integrating comprehensive databases that might be helpful to the scientific community. • Furthermore, we are currently working to improve the algorithm by using only a list of sequenced genomes involved in the metabolism of interest, in order to reduce the manual curation effort. • We are also considering taking k-mers instead of peptide Hidden Markov Models to increase the speed of the pipeline. • We anticipate that our platform will stimulate interest and involvement among the scientific community to explore uncultured genomes derived from large metagenomic sequences: exploring microbial dark matter M E B S
  • 21. Icoquih Zapata Valeria Souza Luis Equiarte Bruno Contreras De Anda et al., 2017 MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle GigaScience in press Cesar-Poot Hernandez MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 1 / 2 2M E B S
  • 22. L A B O R A T O R Y O F M O L E C U L A R A N D E X P E R I M E N T A L E V O L U T I O N E C O L O G Y I N S T I T U T E U N A M M E X I C O 22 L A B O R A T O R Y O F C O M P U T A T I O N A L B I O L O G Y MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 2 / 2 2 Thank you for your attention! M E B S
  • 23. supplementary files m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d am e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 / 1 2
  • 24. A B Gen (n=2,107) Met (n=935) D. acidiphilus Hydrogenobacullum A. caldus A. ferrivorans T. mobilis D. aromatica T. hauera sp. T. humireducens A. denitrificans S. tokodaii A. hospitalis (among other 12 genomes) P. phaeoclathratiforme C. chlorochromatii C. tepidum T. denitrificans T. violascens S. thiotaurini Completeness Supplementary files m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
  • 25. Table 1. Metabolic pathways of global biogeochemical S-cycle Pathway number Metabolisma Chemical processb Sulfur compound Typec Chemical formula Sourced Number of Pfam domaise P1 DS O Sulfite I SO32- E 9 P2 DS O Thiosulfate I S2O3 2- E 10 P3 DS O Tetrathionate I S4O6 2- E 2 P4 DS R Tetrathionate I S4O6 2- E 17 P5 DS R Sulfate I SO42- E 20 P6 DS R Elemental sulfur I Sº E 20 P7 DS D Thiosulfate I S2O3 2- E 9 P8 DS O Carbon disulfide O CS2 E 1 P9 A DE Alkanesulfonate O CH3O3SR S 5 P10 A R Sulfate I SO4 2- S 20 P11 DS O Sulfide I H2S E/S 29 P12 A DE L-cysteate O C3H6NO5S C/E 1 P13 A DE Dimethyl sulfone O C2H6O2S C/E 3 P14 A DE Sulfoacetate O C2H2O5S C/E 2 P15 A DE Sulfolactate O C3H4O6S C/S 14 P16 A DE Dimethyl sulfide O C2H6S C/S 16 P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12 P18 A DE Methylthiopropanoate O C4H7O2S C/S 7 P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7 P20 DS O Elemental sulfur I S° C/S/E 7 P21 DS D Elemental sulfur I S° C/S/E 1 P22 A DE Methanesulfonate O CH3O3S C/S/E 7 P23 A DE Taurine O C2H7NO3S C/S/E 11 P24 DS M Dimethyl sulfide O C2H6S C 1 P25 DS M Metylthio-propanoate O C4H7O2S C 1 P26 DS M Methanethiol O CH4S C 1 P27 A DE Homotaurine O C3H9NO3S N 1 P28 A B Sulfolipid O SQDG 4 P29 Markers Markers 12 1 The protein domains currently present in any given sample are divided by the total number of domains in the pre-defined pathway Completeness Supplementary files m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
  • 26. Supplementary files m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
  • 27. 35 private metagenomes: microbial mats, sediment and lake water Reads Processing ORF prediction Gene Calling (aa residues) Mean Size Length https://microbiome.wordpress.com/ Counts of prokaryotic genomes in each NCBI category as of July 2017 Non-redundantRedundant LARGE SCALE m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 28. GenF size category 5-percentile 95-percentile Real -0.091 0.101 30 -0.086 0.105 60 -0.09 0.104 100 -0.088 0.1 150 -0.09 0.103 200 -0.89 0.105 250 -0.09 0.106 300 -0.09 0.1 Completeness Supplementary files m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
  • 29. Table 2 Informative Pfam domains with high H’ and low std. Novel proposed molecular marker domains in metagenomic data of variable MSL Pfam ID ( Suli ocurrences) H’ mean H’ std Description PF12139 58/161 1.2 0.01 Adenosine-5'-phosphosulfate reductase beta subunit: Key protein domain for both sulfur oxidation/reduction metabolic pathways. Has been widely studied in the dissimilatory sulfate reduction metabolism. In all recognized sulfate-reducing prokaryotes, the dissimilatory process is mediated by three key enzymes: Sat, Apr and Dsr. Homologous proteins are also present in the anoxygenic photolithotrophic and chemolithotrophic sulfur-oxidizing bacteria (CLSB, PSB, GSB), in different cluster organization [35]. PF00374 135/161 1.1 0.09 Nickel-dependent hydrogenase: Hydrogenases with S-cluster and selenium containing Cys-x-x-Cys motifs involved in the binding of nickel. Among the homologues of this hydrogenase domain, is the alpha subunit of the sulfhydrogenase I complex of Pyrococcus furiosus, that catalyzes the reduction of polysulfide to hydrogen sulfide with NADPH as the electron donor [55]. PF01747 103/161 1.03 0.06 ATP-sulfurylase: Key protein domain for both sulfur oxidation and reduction processes. The enzyme catalyzes the transfer of the adenylyl group from ATP to inorganic sulfate, producing adenosine 5′-phosphosulfate (APS) and pyrophosphate, or the reverse reaction [56]. PF02662 62/161 0.82 0.03 Methyl-viologen-reducing hydrogenase, delta subunit: Is one of the enzymes involved in methanogenesis and encoded in the mth-flp-mvh-mrt cluster of methane genes in Methanothermobacter thermautotrophicus. No specific functions have been assigned to the delta subunit [48]. PF10418 122/161 0.78 0.06 Iron-sulfur cluster binding domain of dihydroorotate dehydrogenase B: Among the homologous genes in this family are asrA and asrB from Salmonella enterica enterica serovar Typhimurium, which encode 1) a dissimilatory sulfite reductase, 2) a gamma subunit of the sulfhydrogenase I complex of Pyrococcus furiosus and, 3) a gamma subunit of the sulfhydrogenase II complex of the same organism [12]. PF13247 149/161 0.66 0.06 4Fe-4S dicluster domain: Homologues of this family include: 1) DsrO, a ferredoxin-like protein, related to the electron transfer subunits of respiratory enzymes, 2) dimethylsulfide dehydrogenase β subunit (ddhB ), involved in dimethyl sulfide degradation in Rhodovulum sulfidophilum and 3) sulfur reductase FeS subunit (sreB) of Acidianus ambivalens, involved in the sulfur reduction using H2 or organic substrates as electron donors [12]. PF04358 73/161 0.52 0 DsrC like protein: DsrC is present in all organisms encoding a dsrAB sulfite reductase (sulfate/sulfite reducers or sulfur oxidizers). The physiological studies suggest that sulfate reduction rates are determined by cellular levels of this protein. The dissimilatory sulfate reduction couples the four-electron reduction of the DsrC trisulfide to energy conservation [57]. DsrC was initially described as a subunit of DsrAB, forming a tight complex; however, it is not a subunit, but rather a protein with which DsrAB interacts. DsrC is involved in sulfur-transfer reactions; there is a disulfide bond between the two DsrC cysteines as a redox-active center in the sulfite reduction pathway. Moreover, DsrC is among the most highly expressed sulfur energy metabolism genes in isolated organisms and meta- transcriptomes (Santos et al., 2015). PF01058 158/161 0.45 0.01 NADH ubiquinone oxidoreductase, 20 Kd subunit: Homologous genes are found in the delta subunits of both sulfhydrogenase complexes of Pyrococcus furiosus [12]. PF01568 156/161 0.4 0.05 Molydopterin dinucleotide binding domain: This domain corresponds to the C-terminal domain IV in dimethyl sulfoxide (DMSO) reductase [48]. Supplementary files m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
  • 30. https://github.com/eead-csic-compbio/metagenome_Pfam_score Modo avanzado manual » Biogeochemical cycles (CNOPFe) m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 31. Species SS Genus Guild Ammonifex degensii KC4 12,508 Moorella group SRB/SR Archaeoglobus profundus DSM 5631 12,024 Archaeoglobus SRB Candidatus Desulforudis audaxviator MP104C 11,972 Candidatus Desulforudis Sur Pelodictyon phaeoclathratiforme BU-1 11,836 Chlorobium/Pelodictyon group GSB Chlorobium phaeobacteroides BS1 11,649 Chlorobium/Pelodictyon group GSB Chlorobium chlorochromatii CaD3 11,625 Chlorobium/Pelodictyon group GSB Thiobacillus denitrificans ATCC 25259 11,61 Thiobacillus CLSB Desulfohalobium retbaense DSM 5692 11,511 Desulfohalobium SRB Desulfovibrio alaskensis G20 11,5 Desulfovibrio SRB Desulfovibrio vulgaris DP4 11,442 Desulfovibrio SRB Chlorobium tepidum TLS 11,354 Chlorobaculum GSB endosymbiont of unidentified scaly snail isolate Monju 11,205 0 Sur Desulfovibrio vulgaris str. 'Miyazaki F' 11,093 Desulfovibrio SRB Desulfovibrio desulfuricans subsp. desulfuricans str. ATCC 27774 11,034 Desulfovibrio SRB m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 32. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 33. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 34. 34 m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 35. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 36. Sulfur: 112 H’ Nitrogen: 176 H’ Methane: 119 H’Oxygen:55 H’ m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files Iron: 112 H’
  • 37. Biogeochemical cycle Genes Pfam domains Genomes AUC Sulfur (S) 152 112 161 0.9855 Nitrogen (N) 267 176 144 0.791 Methane (C) 135 119 90 0.988 Oxygenic Photosynthesis (O) 50 55 53 0.983 Phosphorous (P) Iron (Fe) 36 33 34 0.863 m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 38. ID Description H’ mean std PF00067 Cytochrome P450 0.644 0.033785 PF00115 Cytochrome C and Quinol oxidase polypeptide I 0.513 0.061551 PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.55825 0.049936 PF02560 Cyanate lyase C-terminal domain 0.93625 0.001389 PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.5525 0.040324 PF04898 Glutamate synthase central domain 0.479 0.034699 PF13442 Cytochrome C oxidase, cbb3-type, subunit III 0.6565 0.047093 python3 plot_entropy.py gen_genF_entropies.oxygen.tab -0.156 0.20625 Oxygen Markers m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 39. ID Description H’ mean std PF01913 Formylmethanofuran-tetrahydromethanopterin formyltransferase 3.629125 0.0227 PF01993 methylene-5,6,7,8-tetrahydromethanopterin dehydrogenase 2.876 0 PF02240 Methyl-coenzyme M reductase gamma subunit 3.168 0 PF02241 Methyl-coenzyme M reductase beta subunit, C-terminal domain 3.168 0 PF02289 Cyclohydrolase (MCH) 3.353 0 PF02741 FTR, proximal lobe 3.63475 0.034648 PF02745 Methyl-coenzyme M reductase alpha subunit, N-terminal domain 3.168 0 PF02783 Methyl-coenzyme M reductase beta subunit, N-terminal domain 3.168 0 PF04206 Tetrahydromethanopterin S-methyltransferase, subunit E 3.032 0 PF04207 Tetrahydromethanopterin S-methyltransferase, subunit D 3.032 0 PF04208 Tetrahydromethanopterin S-methyltransferase, subunit A 2.903375 0.015203 PF04211 Tetrahydromethanopterin S-methyltransferase, subunit C 3.02575 0.017678 PF05440 Tetrahydromethanopterin S-methyltransferase subunit B 2.980125 0.036537 python3 plot_entropy.py gen_genF_entropies.methane.tab -0.121 0.1475m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files Methane
  • 40. ID Description H’ mean std PF00067 Cytochrome P450 0.57375 0.0056 PF00174 Oxidoreductase molybdopterin binding domain 0.528125 0.006578 PF00355 Rieske [2Fe-2S] domain 0.507 0.032076 PF00507 NADH-ubiquinone/plastoquinone oxidoreductase, chain 3 0.36975 0.010886 PF00547 Urease, gamma subunit 0.464 0 PF00699 Urease beta subunit 0.475125 0.001126 PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.47025 0.014568 PF02211 Nitrile hydratase beta subunit 0.405625 0.005041 PF02633 Creatinine amidohydrolase 0.58725 0.017466 PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.48 0.032715 PF05899 Protein of unknown function (DUF861) 0.52175 0.022914 PF09347 Domain of unknown function (DUF1989) 0.398875 0.007415 Nitrogen m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 41. Iron ID Description H’ mean std PF14522 Cytochrome c7 and related cytochrome c 1.010 0.104 PF00355 Rieske [2Fe-2S] domain 0.51912 0.02854 PF00033 Cytochrome b/b6/petB 0.55875 0.04974 PF00034 Cytochrome c 0.5061 0.1013 m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 42. Positive instances Positive classifications only with strong evidence so they make few false positive errors MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2 Suli N=161 (1946) > Negative instances. Gen ROC CURVE • Two-dimensional graphs in which tp rate is plotted on the Y axis and fp rate is plotted on the X axis. • Depicts relative tradeoffs between benefits (true positives) and costs (false positives). Never issuing a positive classification; such a classifier commits no false positive errors but also gains no true positives Perfect classification Random guessing produces the diagonal line between (0,0) and (1, 1), which has an area of 0.5, no realistic classifier should have an AUC less than 0.5
  • 43. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 44. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 45. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files
  • 46. RelativeentropyH’ 4Fe-4S dicluster domain Molydopterin dinucleotide binding domain Cytochrome C oxidase, cbb3-type, subunit III Nitrogenase component 1 type Oxidoreductase m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a Supplementary files

Editor's Notes

  1. Over the last 15 years, the enormous advances in HTS technologies have dramatically improving our understanding of life’s microbial diversity to an unprecedented level of detail Nowadays, accessing the total repertoire of genomes within complex communities by means of metagenomics is becoming a standard procedure to understand the diversity, ecology, evolution and functional makeup of the microbial world Furthermore, the accurate reconstruction of microbial genomes from metagenomic studies has been shown to be a powerful approach to get insight into the metabolic strategies of the microbial dark matter or uncultivable microorganisms. However, despite the huge amount of metagenomic and genomic sequences accumulated so far, our ability to evaluate complex metabolic fxns in large-scale ‘omic’ datasets remains biologically and computationally challenging This is largely due to the challenges involved in testing meaningful biological hypothesis in such a complex data, because only a small proportion of the metabolic information is eventually used to draw ecologically relevant conclusions. But, why?. This is why I like to see as the Iceberg illusion of metaenomics
  2. • Lets imagine the huge amount of data derived from metagenomic studies represented by this iceberg. In general we could say that most of the microbial ecology studies using metagenomics have been mainly focused on developing broad description of the metabolic pathways within a certain environment, analyzing the relative abundance of marker genes involved in several metabolic such as primary production, nitrogen fixation, etc. And also have been focused on evaluate or dicover differentially abundant, shared or unique functional units (genes, proteins or metabolic pathways). Since the present critical bottleneck in metagenomic analysis is the efficiency of data processing only a small proportion of the data is used to test biological hypothesis. What do we need? . As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive, but without loosing the sight of biological interpretation.
  3. Here, we took the concept of data integration to try to solve this problem . Gomez cabrero to understand a given system. In this case our system is the microbial metabolism. But we cannot adreess all the microbial metabolisms, Instead lets reduce our metabolic universe to targeted metabolic machineries . Currently there are several databases including manually curated information of datbases and large collections of genomes sequenced. order to address some of the limitations of these methods, we propose a novel approach to reduce the complexity of targeted metabolic pathways involved in several integral ecosystem processes -- such as entire biogeochemical cycles -- into a single informative score, called Multigenomic Entropy-Based Score (MEBS). This approach is based on the mathematical rationalization of Kullback-Leibler divergence, also known as relative entropy H’ [28]
  4. To test the applicability of this approach, we evaluated the metabolic machinery of the S-cycle. Due to its multiple redox states and its consequences on microbiological and geochemical transformations, S-metabolism can be observed as a complex metabolic machinery, involving a myriad of genes, enzymes, organic substrates and electron carriers, which largely depend on the surrounding geochemical and ecological conditions. For these reasons, the complete repertory involved in the metabolic machinery of S-cycle has remained underexplored despite the massive data produced in ‘omic’ experiments. Here, we performed an integral curation effort to describe all the elements involved in the S-cycle and then used, as explained in the following sections, to score genomic and metagenomic datasets in terms of their Sulfur relevance estos elementos provienen de fuentes geológicas derivados de procesos como tectónica de placas y procesos atmosféricos fotoquímicos que hacen posible la regeneración de las formas disponibles de estos elementos para ser utilizados y por diferentes poblaciones microbianas relacionadas metabólicamente denominados (gremios metabólicos), que afectan profundamente las propiedades geoquímicas de la biosfera. . En resumen, los ciclos biogeoquímicos son una compleja interacción de procesos biológicos, geológicos y químicos que operan en escalas de tiempo de microsegundos a eones y en espacios de micrómetros hasta sistemas que abarquen toda la atmosfera y el océano
  5. To compile this database, we first gathered the most important S-compounds derived from biogeochemical processes and biological catalyzed reactions. Then we classified each S-compound according to their chemical and thermodynamic nature (Gibbs free energy of formation, GFEF). Finally, we classified weather each compound can be used as a source of carbon, nitrogen, energy or electron donor, fermentative substrate, or terminal electron acceptor in respiratory microbial processes. The schematic representation of the manual curated effort summarizing the complexity of the sulfur biogeochemical cycle in a global scale is shown in Figure 2.
  6. To compile this database, we first gathered the most important S-compounds derived from biogeochemical processes and biological catalyzed reactions. Then we classified each S-compound according to their chemical and thermodynamic nature (Gibbs free energy of formation, GFEF). Finally, we classified weather each compound can be used as a source of carbon, nitrogen, energy or electron donor, fermentative substrate, or terminal electron acceptor in respiratory microbial processes. The schematic representation of the manual curated effort summarizing the complexity of the sulfur biogeochemical cycle in a global scale is shown in Figure 2.
  7. Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their functio
  8. At present one critical bottleneck in metagenomic analysis is the efficiency of data process because of the slow analysis speed. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive. Moreover, the increasing number of metagenomic projects usually requires the comparison of different samples. Yet current methods are limited by their low efficiency [7], [10], [11]. Thus, high-performance computational techniques are needed to speed-up analysis, without compromising the analysis accuracy. However, due to the challenges involved in testing meaningful biological hypotheses with complex data, only a small proportion of the metabolic information derived from these datasets is eventually used to draw ecologically relevant conclusions
  9. AUC of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance
  10. At present one critical bottleneck in metagenomic analysis is the efficiency of data process because of the slow analysis speed. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive. Moreover, the increasing number of metagenomic projects usually requires the comparison of different samples. Yet current methods are limited by their low efficiency [7], [10], [11]. Thus, high-performance computational techniques are needed to speed-up analysis, without compromising the analysis accuracy. However, due to the challenges involved in testing meaningful biological hypotheses with complex data, only a small proportion of the metabolic information derived from these datasets is eventually used to draw ecologically relevant conclusions
  11. Dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses"....
  12. The presence-absence patterns of Pfam domains belonging to particular pathways can be exploited to compute metabolic completeness. This optional task is invoked with parameter –keggmap and a TAB-separated file mapping Pfam identifiers to KEGG Orthology entries (KO numbers) and the corresponding pathway in Sucy (see Table S3). To compute completeness, the total number of domains involved in a given pathway (i.e., sulfate reduction, sulfide oxidation) must be retrieved from the Sucy database (See Table S2). Then, the protein domains currently present in any given sample are divided by the total number of domains in the pre-defined pathway. The script produces: i) a detailed report of the metabolic pathways of interest; and ii) a list of KO numbers with Hex color codes, corresponding to KO matches in the omic sample, which can be exported to the KEGG Mapper – Search & Color Pathway tool [53] (see Figure S2).
  13. At present one critical bottleneck in metagenomic analysis is the efficiency of data process because of the slow analysis speed. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive. Moreover, the increasing number of metagenomic projects usually requires the comparison of different samples. Yet current methods are limited by their low efficiency [7], [10], [11]. Thus, high-performance computational techniques are needed to speed-up analysis, without compromising the analysis accuracy. However, due to the challenges involved in testing meaningful biological hypotheses with complex data, only a small proportion of the metabolic information derived from these datasets is eventually used to draw ecologically relevant conclusions
  14. python3 /home/val/github/metagenome_Pfam_score/scripts/F_meanVSstd.py gen_genF_entropies.methane.tab --plot-random random_samples_tab -k 6 --labels 4 --dpi 400 -o gen_genF_entropies.methane.tab.markers.png ./get_names.sh
  15. #clustering map python3 /home/val/github/metagenome_Pfam_score/scripts/F_meanVSstd.py gen_genF_entropies.nitrogen.tab --plot-random random_samples_tab/ -k 8 --labels 7 -o gen_genF_entropies.nitrogen.tab.markers.png #Barplot python3 /home/val/github/metagenome_Pfam_score/scripts/plot_entropy.py gen_genF_entropies.nitrogen.tab -0.094 0.112
  16. /home/val/github/biogeochemical_network/notebooks/Iron #BARPLOT python3 /home/val/github/metagenome_Pfam_score/scripts/plot_entropy.py gen_genF_entropies.iron.tab -0.1965 0.24825 #!/bin/bash less gen_genF_entropies.oxygen.tab | cut -f 1 > pfam_terms.tab cat pfam_terms.tab |while read pfam; do desc=$(curl http://pfam.xfam.org/family/"$pfam"/desc | head -1); printf "$pfam\t"; printf "$desc\n"; done 2> /dev/null \ > pfam_terms.desc.tab cat pfam_terms.desc.tab | sed 's#<\!DOCTYPE.*#NF#' > tmp && mv tmp pfam_terms.desc.tab
  17. AUC of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance