Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle
Valerie de Anda Torres at the #ICG12 GigaScience Prize Track: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle. Shenzhen, 26th October 2017
This document discusses network analysis of metabolism in four kingdoms of life using elementary flux modes. It provides examples of how this analysis can be used to determine optimal pathways, predict engineering effects, and assess enzyme deficiencies. The analysis allows detection of previously unknown pathways and futile cycles. Applying this to human metabolism, studies have found evidence that fatty acids can be converted to glucose through entangled routes, and have identified futile cycles that may play a role in aging.
Technology used for High Level Expression and Purification of Recombinant Pro...SookYee1234
The document discusses protein expression and purification techniques. It describes (1) in vivo and in vitro cell-based protein expression systems, (2) transfection of cells with DNA vectors followed by lysis to extract proteins, (3) use of affinity tags like poly-His tags and GST tags to purify recombinant proteins, and (4) common purification methods like immobilized metal affinity chromatography (IMAC). The document concludes that fusing proteins with ubiquitin allows high-level expression, easy purification, and production of authentic proteins for downstream applications.
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
The document discusses the Subliminal Toolbox, which aims to automate many of the steps involved in reconstructing genome-scale metabolic networks. These steps include merging pathway databases, standardizing naming, determining charge states, balancing reactions, adding compartmentalization and a biomass function. The toolbox was able to generate a draft metabolic network for yeast, though some manual curation was still needed. Further developments could include incorporating directionality and improving the editing interface. Overall, the toolbox provides a good starting point for reconstruction and reduces the manual effort required.
JBEI Research Highlights - November 2018 Irina Silva
This document discusses recent advances in x-ray hydroxyl radical footprinting at the Advanced Light Source synchrotron. It compares dose response curves and mass spectrometry results from focused and unfocused white light sources. It also describes developing "drop-on-demand" methodologies to increase sample dose while maintaining microsecond exposure times, which enables high-dose experiments while minimizing sample volume. Preliminary experiments demonstrate this approach yields high quality data. The document contributes to improving synchrotron hydroxyl radical footprinting techniques for investigating protein and nucleic acid structures.
Genome-Scale Metabolic Models and Systems Medicine of Metabolic SyndromeNatal van Riel
workshop on 'The interplay of fat and carbohydrate metabolism with application in Metabolic Syndrome and Type 2 Diabetes', December 12 and 13, 2013, Eindhoven University of Technology
Roche Quantitative Systems Pharmacology methodology workshop
February 4th-5th, 2016, Basel, Switzerland
Bringing multi-level systems pharmacology models to life
Natal van Riel
Abstract
Computational modelling in Systems Medicine and Systems Pharmacology addresses biological processes at different levels and scales. The quantification of model parameters from experimental data is a complicated task. It will be addressed how variance in data propagates into parameter estimates and, more importantly, model predictions. The Analysis of Dynamic Adaptations in Parameter Trajectories (ADAPT) approach is discussed as method to model dynamics at multiple time-scales. Two examples will be provided: 1) modelling of longitudinal data in a cohort of Type 2 Diabetics using different medication, and 2) the application in preclinical research studying the effect of liver X receptor activation on HDL metabolism and liver steatosis.
2016 Presentation at the University of Hawaii Cancer CenterCasey Greene
Date: February 19, 2016
Time: 10:30 am
Place: University of Hawaii Cancer Center 701 Ilalo Street, Sullivan Conference Room
Details: Dr. Casey Greene
Department of Systems Pharmacology and Translational Therapeutics
Department of Genetics
University of Pennsylvania
Moore Investigator, Gordon and Betty Moore Foundation
This document discusses network analysis of metabolism in four kingdoms of life using elementary flux modes. It provides examples of how this analysis can be used to determine optimal pathways, predict engineering effects, and assess enzyme deficiencies. The analysis allows detection of previously unknown pathways and futile cycles. Applying this to human metabolism, studies have found evidence that fatty acids can be converted to glucose through entangled routes, and have identified futile cycles that may play a role in aging.
Technology used for High Level Expression and Purification of Recombinant Pro...SookYee1234
The document discusses protein expression and purification techniques. It describes (1) in vivo and in vitro cell-based protein expression systems, (2) transfection of cells with DNA vectors followed by lysis to extract proteins, (3) use of affinity tags like poly-His tags and GST tags to purify recombinant proteins, and (4) common purification methods like immobilized metal affinity chromatography (IMAC). The document concludes that fusing proteins with ubiquitin allows high-level expression, easy purification, and production of authentic proteins for downstream applications.
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
The document discusses the Subliminal Toolbox, which aims to automate many of the steps involved in reconstructing genome-scale metabolic networks. These steps include merging pathway databases, standardizing naming, determining charge states, balancing reactions, adding compartmentalization and a biomass function. The toolbox was able to generate a draft metabolic network for yeast, though some manual curation was still needed. Further developments could include incorporating directionality and improving the editing interface. Overall, the toolbox provides a good starting point for reconstruction and reduces the manual effort required.
JBEI Research Highlights - November 2018 Irina Silva
This document discusses recent advances in x-ray hydroxyl radical footprinting at the Advanced Light Source synchrotron. It compares dose response curves and mass spectrometry results from focused and unfocused white light sources. It also describes developing "drop-on-demand" methodologies to increase sample dose while maintaining microsecond exposure times, which enables high-dose experiments while minimizing sample volume. Preliminary experiments demonstrate this approach yields high quality data. The document contributes to improving synchrotron hydroxyl radical footprinting techniques for investigating protein and nucleic acid structures.
Genome-Scale Metabolic Models and Systems Medicine of Metabolic SyndromeNatal van Riel
workshop on 'The interplay of fat and carbohydrate metabolism with application in Metabolic Syndrome and Type 2 Diabetes', December 12 and 13, 2013, Eindhoven University of Technology
Roche Quantitative Systems Pharmacology methodology workshop
February 4th-5th, 2016, Basel, Switzerland
Bringing multi-level systems pharmacology models to life
Natal van Riel
Abstract
Computational modelling in Systems Medicine and Systems Pharmacology addresses biological processes at different levels and scales. The quantification of model parameters from experimental data is a complicated task. It will be addressed how variance in data propagates into parameter estimates and, more importantly, model predictions. The Analysis of Dynamic Adaptations in Parameter Trajectories (ADAPT) approach is discussed as method to model dynamics at multiple time-scales. Two examples will be provided: 1) modelling of longitudinal data in a cohort of Type 2 Diabetics using different medication, and 2) the application in preclinical research studying the effect of liver X receptor activation on HDL metabolism and liver steatosis.
2016 Presentation at the University of Hawaii Cancer CenterCasey Greene
Date: February 19, 2016
Time: 10:30 am
Place: University of Hawaii Cancer Center 701 Ilalo Street, Sullivan Conference Room
Details: Dr. Casey Greene
Department of Systems Pharmacology and Translational Therapeutics
Department of Genetics
University of Pennsylvania
Moore Investigator, Gordon and Betty Moore Foundation
Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...Nathan Marshall
This document summarizes a study investigating the use of alternative surfactants to improve the efficiency of in situ tryptic proteolysis of fingermarks. The study tested a range of non-ionic and anionic surfactants individually and in combination, including MEGA-8, OcGlu, OcThio, DDM, and RapiGest SF. Data demonstrated that higher percentages of MEGA-8 as well as combinations of detergents resulted in more peptide peaks detected from fingermarks. RapiGest SF, normally used for solution digestions, was also shown to improve in situ proteolysis. Similar results were observed on rat brain tissue sections. Overall, the study indicates that surfactants can enhance the efficiency
This document describes computational techniques used to design novel competitive inhibitors of the E. coli 5'-methylthioadenosine/S-adenosylhomocysteine nucleosidase (MTN) enzyme. It utilized core hopping to generate 10,000 structures by varying the core while keeping functional groups constant. Docking and binding energies were calculated for subsets of compounds down to the top 8 ligands. Results show several compounds have more favorable predicted binding than the control TDI inhibitor, warranting further optimization and testing of lead compounds.
Can a combination of constrained-based and kinetic modeling bridge time scale...Natal van Riel
This document summarizes Natal van Riel's presentation on using parameter transition analysis (PTA) to model progressive metabolic adaptations associated with diseases. PTA involves nesting simulations with time-dependent parameters within parameter estimation to identify parameter trajectories that connect phenotype snapshots over time. This allows linking changes in the metabolome to the proteome and transcriptome. As an example, a model of liver lipoprotein metabolism was used to analyze the effects of activating the liver X receptor and predict reductions in SR-B1 expression based on metabolic adaptations. The approach provides insights into disease mechanisms and ways to prevent side effects of therapies.
The document provides an overview of sequence alignment concepts including:
- Definitions of terms like identity, homology, orthologous, and paralogous genes
- Examples and explanations of scoring matrices used for nucleotide and protein sequence alignments like BLOSUM and PAM matrices
- An example multiple sequence alignment of glyceraldehyde-3-phosphate dehydrogenases from different species
- Descriptions of how scoring matrices are used to quantify sequence similarity and their importance in sequence analysis
Differential metabolic activity and discovery of therapeutic targets using su...Joaquin Dopazo
The document discusses obtaining estimations of cell metabolic activities from gene expression data using summarized metabolic pathway models. It describes normalizing gene expression data and using an algorithm to estimate metabolic module activities for two conditions, identifying differentially active modules using statistical tests. Metabolic module activities are correlated with protein activation status and metabolite abundance changes, associated with cancer types and treatment responses, and predictive of cell survival. A web server called Metabolizer is introduced that performs various metabolic pathway analyses on gene expression data.
The document discusses telehealth assessments for autism diagnoses and the development of new observation-based instruments. Key points:
1. The BOSCC is introduced as a new brief observation-based instrument for measuring social communication behaviors in autism. Initial studies found it was valid, reliable, and sensitive to treatment effects.
2. Telehealth assessments like the BOSA are being developed to allow for remote administration of autism evaluations, as in-person assessments are not always possible due to COVID-19 restrictions. Preliminary studies found the BOSA performed well at detecting autism symptoms.
3. Emory University discusses their use of telehealth for autism assessments. Their NODA platform previously
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...eSAT Journals
Microorganisms play a key role in wastewater bio-treatment processes and understanding the microbial community structure is of great importance to improve treatment performance. Denaturing gradient gel electrophoresis (DGGE) was used to monitor succession of the microbial community and thus predominant bands were sequenced to reveal the microbial community composition inside palm oil mill effluent (POME) wastewater.DNA bands from DGGE gels were excised with a sterile blade and placed in 1.5 ml eppendorf tube containing 50 μl deionized water (ddH2O). Tubes were incubated overnight at 4C to elute the DNA. Eluted DNA was purified using QIAquick gel extraction kits (QIAGEN, Inc., Valencia, CA) and was frozen and thawed three times.Microbial DNA successfully excised and purified from DGGE was amplified using polymerase chain reaction (PCR).Five micro liters of the supernatant were used as a template to re-amplify the DNA using 16s rDNA primers,341f (with no GC-clamp) (5'- cct-acg-gga-ggc-agc-ag-3') and reverse(r) primers 907r (5'-ccc-cgt-caa-ttc-att-tga-gtt-t-3'). Amplification was repeated referring to the steps in ‘PCR amplification of 16s rDNA’. PCR products from agarose gels were cut and purified using QIAquick Gel Extraction Kit (QIAGEN, Inc., Valencia, CA), which were similar to the purification steps after recovery of DNA from DGGE, and sequenced in both directions with the same primers (with no GC-clamp) as used in PCR. Moreover, start-up is an important step in establishing proper community structure in all kinds of biological treatment processes. In anaerobic POME wastewater, 6 sequences of Firmicutes, 5 sequence of Proteobacterium and 2 sequences of Bacteroidetes were found through denaturing gradient gel electrophoresis (DGGE) results. Sequence closely related to Rummeliibacillus suwonensis strain G20 was detected grows at band BE10, BE11, BE12, BE15, BE16, BE17 and BE18. Meanwhile BE25 and BE26 were found at band after treatment process done.Rummeliibacillus suwonensis is an aerobic, Gram-positive, rod shaped, round-spore-forming bacteria which were isolated from aerobic condition. Keywords: Palm oil mill effluent (POME), anaerobic POME, Denaturing gradient gel electrophoresis (DGGE), and microbial community.
This document provides an overview of databases, definitions, scoring matrices, and pairwise sequence alignment. It discusses major bioinformatics databases like NCBI, ExPASy, and EBI. It also defines key terms like identity, homology, orthologous, and paralogous sequences. Additionally, it examines the theoretical and empirical bases for scoring matrices like PAM, BLOSUM, and transition/transversion matrices, and how they are used in sequence alignment.
Aurelia Bioscience is a biology CRO that offers various assay development and screening services including:
- NanoBRET assays to study protein-protein interactions and target engagement of kinases
- High throughput screening of over 50,000 compounds using their NanoBRET assays which identified hit compounds
- Target engagement assays using NanoBRET technology to study binding kinetics and residency times of compounds in living cells
- PROTAC drug discovery services using their degradation technology
- High throughput western blotting using the Protein Simple WES system
- Future plans to develop 3D cell-based assays using electrospun scaffold materials.
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...Karthik Raman
Slide deck on Fast-SL, an efficient algorithm to identify synthetic lethals. Presented at the annual NNMCB meeting at Pune, India on 27 Dec 2015. Original paper reference: http://bioinformatics.oxfordjournals.org/content/31/20/3299
This document provides an overview of sequence alignment and scoring matrices. It defines key terms like identity, homology, orthologous, and paralogous genes. It discusses different types of scoring matrices including unitary matrices that score matches as 1 and mismatches as 0, and transition/transversion matrices that account for the higher likelihood of transitional mutations in nucleic acids. The document emphasizes that scoring matrices represent underlying evolutionary models and influence sequence analysis outcomes.
This document describes a study that evaluated the performance of quantitative spectral analysis tools used in metabolic profiling when applied to mixtures of biofluid samples. Three urine samples were mixed in known proportions according to an experimental design and analyzed by 1H NMR spectroscopy. Fifty-four metabolites were then quantified from the spectra using two common methods: targeted spectral fitting and targeted spectral integration. Multivariate analysis showed the mixture design was accurately recapitulated from the spectral data. A metric was calculated to assess the reliability of each metabolite measurement across the varying sample compositions. Several metabolites were found to have low reliability, largely due to spectral overlap or low signal-to-noise ratios. This strategy allows evaluation of spectral features in conditions that better represent real biological samples and
The document provides an overview of metabolomics and describes key aspects of the metabolomics workflow and analytical techniques. It defines metabolomics as the study of metabolites in biological systems and discusses areas of application including biomarker discovery. Metabolite profiling using techniques like NMR, GC-MS and LC-MS is described. The document uses Barth syndrome, a mitochondrial disorder, as a case study and discusses how a cardiolipin deficiency can be detected using metabolomics. It outlines the data processing steps in metabolomics including peak detection, matching, retention time correction, and compound identification.
This document provides an overview of sequence alignment and scoring matrices. It defines key terms like identity, homology, orthologous, and paralogous genes. It discusses different types of scoring matrices, including unitary matrices that score matches as 1 and mismatches as 0, and transition/transversion matrices that account for the different likelihood of transition vs. transversion mutations in DNA. It explains that scoring matrices represent implicit models of evolution and influence sequence analysis outcomes. The document emphasizes that results depend critically on the chosen scoring matrix and model.
Molecular modelling for in silico drug discoveryLee Larcombe
This document provides an overview of molecular modelling techniques used for in silico drug discovery. It discusses using computational approaches to model small molecule and protein interactions to assess drug safety and efficacy. The key techniques covered include obtaining protein structures from databases like PDB, simulating molecular interactions through docking and screening, and considering factors like binding affinity, pharmacokinetics and toxicity during the drug design process. Computational protein structure prediction is also discussed as an important technique when experimental structures are unavailable.
The document discusses using bioinformatics tools to analyze how genomic G-C content affects protein sequences. Specifically, it analyzes carbonic anhydrase protein sequences from bacteria with varying G-C levels. Key findings include that high G-C bacteria often have two carbonic anhydrase genes and utilize synonymous amino acids encoded by G-C-rich codons more frequently. Analysis of codon usage in the genes supports this observation.
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
Scott Edmunds at International Data Week 2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform. 21st June 2022
GigaByte Chief Editor Scott Edmunds presents on how to prepare a data paper for the TDR and WHO sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021. Presented at the GBIF webinar on 25th January 2022 and aimed at authors interested in submitting a manuscript submitted to the series.
More Related Content
Similar to Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle
Publication - Alternative Surfactants for Improved Efficiency of In Situ Tryp...Nathan Marshall
This document summarizes a study investigating the use of alternative surfactants to improve the efficiency of in situ tryptic proteolysis of fingermarks. The study tested a range of non-ionic and anionic surfactants individually and in combination, including MEGA-8, OcGlu, OcThio, DDM, and RapiGest SF. Data demonstrated that higher percentages of MEGA-8 as well as combinations of detergents resulted in more peptide peaks detected from fingermarks. RapiGest SF, normally used for solution digestions, was also shown to improve in situ proteolysis. Similar results were observed on rat brain tissue sections. Overall, the study indicates that surfactants can enhance the efficiency
This document describes computational techniques used to design novel competitive inhibitors of the E. coli 5'-methylthioadenosine/S-adenosylhomocysteine nucleosidase (MTN) enzyme. It utilized core hopping to generate 10,000 structures by varying the core while keeping functional groups constant. Docking and binding energies were calculated for subsets of compounds down to the top 8 ligands. Results show several compounds have more favorable predicted binding than the control TDI inhibitor, warranting further optimization and testing of lead compounds.
Can a combination of constrained-based and kinetic modeling bridge time scale...Natal van Riel
This document summarizes Natal van Riel's presentation on using parameter transition analysis (PTA) to model progressive metabolic adaptations associated with diseases. PTA involves nesting simulations with time-dependent parameters within parameter estimation to identify parameter trajectories that connect phenotype snapshots over time. This allows linking changes in the metabolome to the proteome and transcriptome. As an example, a model of liver lipoprotein metabolism was used to analyze the effects of activating the liver X receptor and predict reductions in SR-B1 expression based on metabolic adaptations. The approach provides insights into disease mechanisms and ways to prevent side effects of therapies.
The document provides an overview of sequence alignment concepts including:
- Definitions of terms like identity, homology, orthologous, and paralogous genes
- Examples and explanations of scoring matrices used for nucleotide and protein sequence alignments like BLOSUM and PAM matrices
- An example multiple sequence alignment of glyceraldehyde-3-phosphate dehydrogenases from different species
- Descriptions of how scoring matrices are used to quantify sequence similarity and their importance in sequence analysis
Differential metabolic activity and discovery of therapeutic targets using su...Joaquin Dopazo
The document discusses obtaining estimations of cell metabolic activities from gene expression data using summarized metabolic pathway models. It describes normalizing gene expression data and using an algorithm to estimate metabolic module activities for two conditions, identifying differentially active modules using statistical tests. Metabolic module activities are correlated with protein activation status and metabolite abundance changes, associated with cancer types and treatment responses, and predictive of cell survival. A web server called Metabolizer is introduced that performs various metabolic pathway analyses on gene expression data.
The document discusses telehealth assessments for autism diagnoses and the development of new observation-based instruments. Key points:
1. The BOSCC is introduced as a new brief observation-based instrument for measuring social communication behaviors in autism. Initial studies found it was valid, reliable, and sensitive to treatment effects.
2. Telehealth assessments like the BOSA are being developed to allow for remote administration of autism evaluations, as in-person assessments are not always possible due to COVID-19 restrictions. Preliminary studies found the BOSA performed well at detecting autism symptoms.
3. Emory University discusses their use of telehealth for autism assessments. Their NODA platform previously
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...eSAT Journals
Microorganisms play a key role in wastewater bio-treatment processes and understanding the microbial community structure is of great importance to improve treatment performance. Denaturing gradient gel electrophoresis (DGGE) was used to monitor succession of the microbial community and thus predominant bands were sequenced to reveal the microbial community composition inside palm oil mill effluent (POME) wastewater.DNA bands from DGGE gels were excised with a sterile blade and placed in 1.5 ml eppendorf tube containing 50 μl deionized water (ddH2O). Tubes were incubated overnight at 4C to elute the DNA. Eluted DNA was purified using QIAquick gel extraction kits (QIAGEN, Inc., Valencia, CA) and was frozen and thawed three times.Microbial DNA successfully excised and purified from DGGE was amplified using polymerase chain reaction (PCR).Five micro liters of the supernatant were used as a template to re-amplify the DNA using 16s rDNA primers,341f (with no GC-clamp) (5'- cct-acg-gga-ggc-agc-ag-3') and reverse(r) primers 907r (5'-ccc-cgt-caa-ttc-att-tga-gtt-t-3'). Amplification was repeated referring to the steps in ‘PCR amplification of 16s rDNA’. PCR products from agarose gels were cut and purified using QIAquick Gel Extraction Kit (QIAGEN, Inc., Valencia, CA), which were similar to the purification steps after recovery of DNA from DGGE, and sequenced in both directions with the same primers (with no GC-clamp) as used in PCR. Moreover, start-up is an important step in establishing proper community structure in all kinds of biological treatment processes. In anaerobic POME wastewater, 6 sequences of Firmicutes, 5 sequence of Proteobacterium and 2 sequences of Bacteroidetes were found through denaturing gradient gel electrophoresis (DGGE) results. Sequence closely related to Rummeliibacillus suwonensis strain G20 was detected grows at band BE10, BE11, BE12, BE15, BE16, BE17 and BE18. Meanwhile BE25 and BE26 were found at band after treatment process done.Rummeliibacillus suwonensis is an aerobic, Gram-positive, rod shaped, round-spore-forming bacteria which were isolated from aerobic condition. Keywords: Palm oil mill effluent (POME), anaerobic POME, Denaturing gradient gel electrophoresis (DGGE), and microbial community.
This document provides an overview of databases, definitions, scoring matrices, and pairwise sequence alignment. It discusses major bioinformatics databases like NCBI, ExPASy, and EBI. It also defines key terms like identity, homology, orthologous, and paralogous sequences. Additionally, it examines the theoretical and empirical bases for scoring matrices like PAM, BLOSUM, and transition/transversion matrices, and how they are used in sequence alignment.
Aurelia Bioscience is a biology CRO that offers various assay development and screening services including:
- NanoBRET assays to study protein-protein interactions and target engagement of kinases
- High throughput screening of over 50,000 compounds using their NanoBRET assays which identified hit compounds
- Target engagement assays using NanoBRET technology to study binding kinetics and residency times of compounds in living cells
- PROTAC drug discovery services using their degradation technology
- High throughput western blotting using the Protein Simple WES system
- Future plans to develop 3D cell-based assays using electrospun scaffold materials.
Fast-SL: An efficient algorithm to identify synthetic lethals in metabolic ne...Karthik Raman
Slide deck on Fast-SL, an efficient algorithm to identify synthetic lethals. Presented at the annual NNMCB meeting at Pune, India on 27 Dec 2015. Original paper reference: http://bioinformatics.oxfordjournals.org/content/31/20/3299
This document provides an overview of sequence alignment and scoring matrices. It defines key terms like identity, homology, orthologous, and paralogous genes. It discusses different types of scoring matrices including unitary matrices that score matches as 1 and mismatches as 0, and transition/transversion matrices that account for the higher likelihood of transitional mutations in nucleic acids. The document emphasizes that scoring matrices represent underlying evolutionary models and influence sequence analysis outcomes.
This document describes a study that evaluated the performance of quantitative spectral analysis tools used in metabolic profiling when applied to mixtures of biofluid samples. Three urine samples were mixed in known proportions according to an experimental design and analyzed by 1H NMR spectroscopy. Fifty-four metabolites were then quantified from the spectra using two common methods: targeted spectral fitting and targeted spectral integration. Multivariate analysis showed the mixture design was accurately recapitulated from the spectral data. A metric was calculated to assess the reliability of each metabolite measurement across the varying sample compositions. Several metabolites were found to have low reliability, largely due to spectral overlap or low signal-to-noise ratios. This strategy allows evaluation of spectral features in conditions that better represent real biological samples and
The document provides an overview of metabolomics and describes key aspects of the metabolomics workflow and analytical techniques. It defines metabolomics as the study of metabolites in biological systems and discusses areas of application including biomarker discovery. Metabolite profiling using techniques like NMR, GC-MS and LC-MS is described. The document uses Barth syndrome, a mitochondrial disorder, as a case study and discusses how a cardiolipin deficiency can be detected using metabolomics. It outlines the data processing steps in metabolomics including peak detection, matching, retention time correction, and compound identification.
This document provides an overview of sequence alignment and scoring matrices. It defines key terms like identity, homology, orthologous, and paralogous genes. It discusses different types of scoring matrices, including unitary matrices that score matches as 1 and mismatches as 0, and transition/transversion matrices that account for the different likelihood of transition vs. transversion mutations in DNA. It explains that scoring matrices represent implicit models of evolution and influence sequence analysis outcomes. The document emphasizes that results depend critically on the chosen scoring matrix and model.
Molecular modelling for in silico drug discoveryLee Larcombe
This document provides an overview of molecular modelling techniques used for in silico drug discovery. It discusses using computational approaches to model small molecule and protein interactions to assess drug safety and efficacy. The key techniques covered include obtaining protein structures from databases like PDB, simulating molecular interactions through docking and screening, and considering factors like binding affinity, pharmacokinetics and toxicity during the drug design process. Computational protein structure prediction is also discussed as an important technique when experimental structures are unavailable.
The document discusses using bioinformatics tools to analyze how genomic G-C content affects protein sequences. Specifically, it analyzes carbonic anhydrase protein sequences from bacteria with varying G-C levels. Key findings include that high G-C bacteria often have two carbonic anhydrase genes and utilize synonymous amino acids encoded by G-C-rich codons more frequently. Analysis of codon usage in the genes supports this observation.
Similar to Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle (20)
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
Scott Edmunds at International Data Week 2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform. 21st June 2022
GigaByte Chief Editor Scott Edmunds presents on how to prepare a data paper for the TDR and WHO sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021. Presented at the GBIF webinar on 25th January 2022 and aimed at authors interested in submitting a manuscript submitted to the series.
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
The document discusses the challenges of urgent research needs around climate change and disease pandemics. It proposes that scientific publishing needs to change to better disseminate information openly and quickly in a trusted peer-reviewed form, while also sharing underlying data and methods. A new open-access journal called GigaByte is presented that uses an XML-based publishing platform to allow dynamic and machine-readable publication of research in an effort to address these challenges. Key features include streamlined review and publication processes, as well as embedding interactive content and using persistent identifiers.
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
Scott Edmunds on a new publishing workflow for rapid dissemination of genomes using GigaByte & GigaDB. Presented at Biodiversity 2020 in the Annotation & Databases track, 9th October 2020.
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
Scott Edmunds talk at IARC, Lyon. How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research. 8th July 2019
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
A 3 part talk presented at PAG Asia 2019 in Shenzhen- The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use. Presented by Huan Liu (CNGB), Scott Edmunds (GigaScience) & Stephen Tsui (CUHK). 8th June 2019
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
Scott Edmunds at the China National GeneBank Youth Biodiversity MegaData Forum: Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps. 18th December 2018
Ricardo Wurmus at #ICG13: Reproducible genomics analysis pipelines with GNU Guix. Presented at the GigaScience Prize Track at the International Conference on Genomics, Shezhen 26th October 2018
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
This document discusses how changes over time to the Gene Ontology (GO) and GO annotations can impact genomic data analysis and enrichment results. The author analyzed over 2,500 gene lists from past studies and found that enrichment results become less semantically similar over time, with 47% having less similar results after 11 years on average compared to the initial time of publication. While objective changes may occur, subjective impressions of results can remain the same. Researchers are encouraged to use the GOtrack database to evaluate how changes may affect their own data and results.
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
Stefan Prost presentation for the #ICG13 GigaScience Prize Track: Genome analyses show strong selection on coloration, morphological and behavioral phenotypes in birds-of-paradise. Shenzhen, 26th October, 2018
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
Lisa Johnson's talk at the #ICG13 GigaScience Prize Track: Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Shenzhen, 26th October 2018
Democratising Data Publishing: A Global Perspective discusses the need for open and fair data globally to tackle problems more efficiently through collaboration. Some challenges to open data include cultural and technical hurdles to data sharing, as well as concerns about funding open access models internationally. The document provides examples of initiatives by GigaScience and the African Orphan Crop Consortium to make large genomic datasets more accessible and usable for researchers and plant breeders through tools like Galaxy. While bandwidth and agreements can pose difficulties, opening data benefits research and finding solutions to issues like food security.
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
Mary Ann Tuli's talk at the International Society of Biocuration meeting : What MODs can learn from Journals – a GigaDB curator’s perspective. Shanghai 9th April 2018
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
ESR spectroscopy in liquid food and beverages.pptx
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle
1. Valerie De Anda
Ecology Institute UNAM México
Laboratory of Computational Biology Zaragoza
CSIC Spain
valdeanda@ciencias.unam.mx
https://github.com/valdeanda
@val_deanda
The12thInternationalConference onGenomics
O c t o b e r 2 6 t o 2 9 , 2 0 1 7
S h e n z h e n , C h i n a
2. Revolution in
microbial
ecology field
»
Genomic
reconstruction:
microbial dark
matter
»
Large amount of data
»Ability to evaluate
complex metabolic
functions data in
large data sets
remains:
The iceberg illusion of metagenomics
Biologically and
computationally
challenging
»»Diversity,
ecology,
evolution and
functional
makeup of the
microbial world
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 / 2 2
»Really complex to
infer and test
biological
hypothesis in
such data
M E B S
3. The Iceberg illusion of
metagenomics
Microbial
ecology-
derived ‘omic’
studies
What do we need to improve efficiency of
data processing?
Biological data
interpretation
(evaluate, compare
and analyze
complex data in a
large scale)
Computationally
efficiency:
(high performance,
accuracy, high speed,
data processing,
reproducibility)
» Most abundant
» Marker genes
Metagenomicdata
» Statistically
≠ features
Gomez Cabrero et al 2014 BMC SB
Reshetova et al 2013 BMC SB
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 3 / 2 2M E B S
4. Data
integration
For a given system,
multiple sources (and
possible types) of data
are available and we
want to study them
integratively to improve
knowledge discovery
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
C
Gomez Cabrero et al 2014 BMC SB
Reshetova et al 2013 BMC SB
Prior knowledge: To
reduce the solution
space and/or to
focus the analysis on
biological meaningful
regions
(specific metabolic
machineries)
(Targeted)
Metabolism Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
Public available
genomes?
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 4 / 2 2M E B S
5. What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
C
Prior knowledge: To
reduce the solution
space and/or to
focus the analysis on
biological meaningful
regions
(specific metabolic
machineries)
(Targeted)
Metabolism Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
Large scale
dataset
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
Does is it really work?
Can capture an entire
metabolic machinery?
Can we used to
evaluate, compare and
analyze complex data in
large scale ? (genomes,
metagenomes)
Computationa
lly efficient?
Accurate, high
speed in large
datasets and
reproducible
Data
integration
Single Value
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 5 / 2 2M E B S
6. Data integration: case of study
Atmosphere
Solar
E°
Redox
reactions
Metabolic
guilds
Geological
processes
An entire biogeochemical cycle
S-cycle
CHONS-P
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
Large scale
datasets
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
They really
capture the
major
processes
involved in the
mobilization
and use of S-
compounds
through Earth
biosphere
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 6 / 2 2M E B S
7. Data integration: case of
study S-cycle
https://metacyc.org/META/NEW-IMAGE?object=Sulfur-Metabolism
http://www.genome.jp/kegg-bin/show_pathway?map00920
Manually curated
reconstruction of the S-
metabolic machinery
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 7 / 2 2M E B S
8. Data integration: case of study S-cycle
Taxa: metabolic guilds Metabolic machinery
i) CLSB: 24 genera
ii) PSB: 25 genera
iii) GSB: 9 genera
iv) SRB: 40 genera
v) SRM:19 genera
vi) SO:4 genera
Suli
N=161
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Complete nr sequenced
S-genomes
Sucy
N=152
txt
GCF_000006985.1 Chlorobium tepidum TLS
GCF_000007005.1 Sulfolobus solfataricus P2
GCF_000007305.1 Pyrococcus furiosus DSM 3638
GCF_000008545.1 Thermotoga maritima MSB8
GCF_000008625.1 Aquifex aeolicus VF5
GCF_000008665.1 Archaeoglobus fulgidus DSM 4304
GCF_000009965.1 Thermococcus kodakarensis KOD1
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
Evidence suggesting
their physiological
and biochemical
involvement in the
use of sulfur
compounds.
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 8 / 2 2M E B S
9. Data integration: case of study S-cycle
Metabolic machinery
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Sucy
N=152
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 9 / 2 2M E B S
10. Data integration: case of study S-cycle
Table 1. Metabolic pathways of global biogeochemical S-cycle
Pathway
number
Metabolisma
Chemical
processb Sulfur compound Typec
Chemical
formula
Sourced
Number of
Pfam domaise
P1 DS O Sulfite I SO32- E 9
P2 DS O Thiosulfate I S2O3
2- E 10
P3 DS O Tetrathionate I S4O6
2- E 2
P4 DS R Tetrathionate I S4O6
2- E 17
P5 DS R Sulfate I SO42- E 20
P6 DS R Elemental sulfur I Sº E 20
P7 DS D Thiosulfate I S2O3
2- E 9
P8 DS O Carbon disulfide O CS2 E 1
P9 A DE Alkanesulfonate O CH3O3SR S 5
P10 A R Sulfate I SO4
2- S 20
P11 DS O Sulfide I H2S E/S 29
P12 A DE L-cysteate O C3H6NO5S C/E 1
P13 A DE Dimethyl sulfone O C2H6O2S C/E 3
P14 A DE Sulfoacetate O C2H2O5S C/E 2
P15 A DE Sulfolactate O C3H4O6S C/S 14
P16 A DE Dimethyl sulfide O C2H6S C/S 16
P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12
P18 A DE Methylthiopropanoate O C4H7O2S C/S 7
P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7
P20 DS O Elemental sulfur I S° C/S/E 7
P21 DS D Elemental sulfur I S° C/S/E 1
P22 A DE Methanesulfonate O CH3O3S C/S/E 7
P23 A DE Taurine O C2H7NO3S C/S/E 11
P24 DS M Dimethyl sulfide O C2H6S C 1
P25 DS M Metylthio-propanoate O C4H7O2S C 1
P26 DS M Methanethiol O CH4S C 1
P27 A DE Homotaurine O C3H9NO3S N 1
P28 A B Sulfolipid O SQDG 4
P29 Markers Markers 12
1
Metabolic machinery
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Sucy
N=152
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 0 / 2 2M E B S
11. Data integration: case of study S-cycle
Metabolic machinery
i) Sulfur
compounds
ii) Metabolic
pathways
iii) Genes
iv) Proteins
Sucy
N=152
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
Evidence linking
them with the S-
cycle
(Curated DB and
primarily
literature)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 1 / 2 2M E B S
12. Large omic datasets
What are the available data that can be used to
characterize large-scale metabolic pathways?
How do we integrate all
to improve the understanding the system?.
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
Taxa involved in
that particular
metabolism
Proteins involved in
that particular
metabolism
txt
2,107 nr genomes (faa)
Gen 1,5 GB
How many genomes were available
at the time of analysis?
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 2 / 2 2
Num of complete
prokariotic
genomes
≈4,000 (NCBI Refseq)
Dec 2016
Non redundant 2,107 Dec 2016
Public
available
and
manually
cuarted
data
M E B S
13. Large omic datasets
What are the available data that can be used to
characterize large-scale metabolic machineries?
How do we integrate all
to improve the understanding the system?.
Mathematical model
Relative entropy
Informative Score
MEBS
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖
𝑄 𝑖
n0
≥1
≤0
Informative
Non-Informative
Taxa: Suli Proteins: Sucy
txt
2,107 nr genomes (faa)
Gen MetGenF
104GB
≈ 500 GB
1,5 GB
How many metagenomes were
available at the time of analysis?
i) were publicly available
ii) contained associated metadata
iii) had been isolated from well-defined environments
(i.e., rivers, soil, biofilms)
iv) discarding host associated microbiome sequences
(i.e., human, cow, chicken)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 3 / 2 2M E B S
14. 112-HMM of S-proteins
C
txt
GCF_000006985.1 Chlorobium tepidum TLS
GCF_000007005.1 Sulfolobus solfataricus P2
GCF_000007305.1 Pyrococcus furiosus DSM 3638
GCF_000008545.1 Thermotoga maritima MSB8
GCF_000008625.1 Aquifex aeolicus VF5
GCF_000008665.1 Archaeoglobus fulgidus DSM 4304
GCF_000009965.1 Thermococcus kodakarensis KOD1
>Protein1
MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM
MGAGYFSPAGFMNV
>Protein 2
MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID
>Protein 3
MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC
YSCVKACPHNAIDVR
2,107 nr genomes (faa)
Gen GenF
Stage 1: Manual curation and omic datasets
Stage 2: Domain composition
Stage 4: Informative Score Can capture the S- metabolic machinery?
Can we used to evaluate, compare and analyze
complex data in large scale ? (genomes, metagenomes)
Computationally efficient? Accurate,
high speed in large datasets and
reproducibleSingle Value
Mathematical model
𝐇′
=
𝑖
𝑃 𝑖 log2
𝑃 𝑖 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)
𝑄 𝑖 (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑)
n
≥1
Informative
Non-Informative
Stage 3: Relative Entropy
Domains enriched among the microorganisms of interest
𝑃 𝑖 = frequency of protein domain i in S genomes (161)
Q 𝑖 = frequency of protein domain i in Gen (2,107)
0
≤0
Taxa: Suli Proteins: Sucy
MEBS: GENERAL OVERVIEW
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 4 / 2 2M E B S
15. https://github.com/eead-csic-compbio/metagenome_Pfam_score
2,107 genomes 161 Suli +
935 metagenomes
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 5 / 2 2M E B S
16. an unnamed endosymbiont of a
scaly snail from a black smoker
chimney
archaeon Geoglobus ahangari,
sampled from a 2,000m depth
hydrothermal vent .
Distribution of Sulfur Score (SS)
in 2,107 nr-genomes
Candidatus
Desulforudis
audaxviator MP104C
Metagenomic reconstructions hard-to culture taxa
Sur
N=192
»
»»
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 6 / 2 2M E B S
17. Positive instances
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 7 / 2 2
Suli
N=161
(1946) > Negative instances.
Gen
ROC CURVE
• Two-dimensional graphs in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis.
• Depicts relative tradeoffs between benefits (true positives) and costs (false positives).
Perfect
classification
M E B S
18. Distribution of Sulfur Score (SS) in the metagenomic dataset (935 metagenomes)
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
Distribution of SS values observed in 935
metagenomes classified in terms of features
(X-axis) and colored according to their
particular habitats Features are sorted
according to their median SS values. Green
lines indicate the lowest and largest 95th
percentiles observed across MSL classes.
Geo-localized
metagenomes
sampled around the
globe are colored
according to their SS
values
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2M E B S
19. mebs
BG cygling
S
genes
S
genomes
Informative
Non-informative
9.5
Markers Comp
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
C
Conclusions
» We present MEBS a new open source software to evaluate, quantify, compare, and
predict the metabolic machinery of interest in large ‘omic’ datasets using one single
value
» To test the applicability of this approach, we evaluated one of the most complex
biogeochemical cycles the sulfur cycle.
» Using data integration and manual curation we reconstructed the entire sulfur
machinery: Suli and Sucy
» We prove that the use of the mathematical framework of the relative entropy can
be used to capture complex metabolic machineries in large scale omic samples.
» MEBS powerful and broadly applicable approach to predict, and classify
microorganisms closely involved in the sulfur cycle even in hard-to culture
microbial lineages
» Computationally efficient, accurate (AUC0985) and reproducible.
» Not in the presentation: the entropy can be used to detect marker domains and the
completeness of the S-cycle pathways can be benchmarked in large scale
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 9 / 2 2
MEBS
M E B S
20. MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 0 / 2 2
mebs
BG CYGLING
9.5
C N O
SFe P
BIOREMEDIATION ANTIBIOTICS
EXTREME
ENVIRONMENTS
AGRICULTURE
?
Perspectives
• We are currently finishing the analyses to demonstrate the applicability of
this approach to other biogeochemical cycles (C, N, O, Fe, P).
• Thereby, we hope that the pipeline MEBS will facilitate analysis of
biogeochemical cycles or complex metabolic networks carried out by
specific prokaryotic guilds, such as bioremediation processes (i.e.,
degradation of hydrocarbons, toxic aromatic compounds, heavy metals
etc.).
• We look forward to collaborate and help other researchers by integrating
comprehensive databases that might be helpful to the scientific
community.
• Furthermore, we are currently working to improve the algorithm by using
only a list of sequenced genomes involved in the metabolism of interest,
in order to reduce the manual curation effort.
• We are also considering taking k-mers instead of peptide Hidden Markov
Models to increase the speed of the pipeline.
• We anticipate that our platform will stimulate interest and involvement
among the scientific community to explore uncultured genomes derived
from large metagenomic sequences: exploring microbial dark matter
M E B S
21. Icoquih
Zapata
Valeria Souza
Luis Equiarte
Bruno
Contreras
De Anda et al., 2017 MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic
machinery: unraveling the sulfur cycle GigaScience in press
Cesar-Poot
Hernandez
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 1 / 2 2M E B S
22. L A B O R A T O R Y O F M O L E C U L A R A N D
E X P E R I M E N T A L E V O L U T I O N E C O L O G Y I N S T I T U T E
U N A M M E X I C O
22
L A B O R A T O R Y O F C O M P U T A T I O N A L
B I O L O G Y
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 2 / 2 2
Thank you for your attention!
M E B S
23. supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d am e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 / 1 2
24. A B
Gen (n=2,107) Met (n=935)
D. acidiphilus
Hydrogenobacullum
A. caldus
A. ferrivorans
T. mobilis
D. aromatica
T. hauera sp.
T. humireducens
A. denitrificans
S. tokodaii
A. hospitalis (among
other 12 genomes)
P. phaeoclathratiforme
C. chlorochromatii
C. tepidum
T. denitrificans
T. violascens
S. thiotaurini
Completeness
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
25. Table 1. Metabolic pathways of global biogeochemical S-cycle
Pathway
number
Metabolisma Chemical
processb Sulfur compound Typec Chemical
formula
Sourced Number of
Pfam domaise
P1 DS O Sulfite I SO32- E 9
P2 DS O Thiosulfate I S2O3
2- E 10
P3 DS O Tetrathionate I S4O6
2- E 2
P4 DS R Tetrathionate I S4O6
2- E 17
P5 DS R Sulfate I SO42- E 20
P6 DS R Elemental sulfur I Sº E 20
P7 DS D Thiosulfate I S2O3
2- E 9
P8 DS O Carbon disulfide O CS2 E 1
P9 A DE Alkanesulfonate O CH3O3SR S 5
P10 A R Sulfate I SO4
2- S 20
P11 DS O Sulfide I H2S E/S 29
P12 A DE L-cysteate O C3H6NO5S C/E 1
P13 A DE Dimethyl sulfone O C2H6O2S C/E 3
P14 A DE Sulfoacetate O C2H2O5S C/E 2
P15 A DE Sulfolactate O C3H4O6S C/S 14
P16 A DE Dimethyl sulfide O C2H6S C/S 16
P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12
P18 A DE Methylthiopropanoate O C4H7O2S C/S 7
P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7
P20 DS O Elemental sulfur I S° C/S/E 7
P21 DS D Elemental sulfur I S° C/S/E 1
P22 A DE Methanesulfonate O CH3O3S C/S/E 7
P23 A DE Taurine O C2H7NO3S C/S/E 11
P24 DS M Dimethyl sulfide O C2H6S C 1
P25 DS M Metylthio-propanoate O C4H7O2S C 1
P26 DS M Methanethiol O CH4S C 1
P27 A DE Homotaurine O C3H9NO3S N 1
P28 A B Sulfolipid O SQDG 4
P29 Markers Markers 12
1
The protein domains currently present in any given
sample are divided by the total number of domains
in the pre-defined pathway
Completeness
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
26. Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
27. 35 private metagenomes:
microbial mats, sediment
and lake water
Reads
Processing
ORF prediction
Gene Calling
(aa residues)
Mean Size Length
https://microbiome.wordpress.com/
Counts of prokaryotic genomes in each NCBI category as of July 2017
Non-redundantRedundant
LARGE SCALE
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
28. GenF size category 5-percentile 95-percentile
Real -0.091 0.101
30 -0.086 0.105
60 -0.09 0.104
100 -0.088 0.1
150 -0.09 0.103
200 -0.89 0.105
250 -0.09 0.106
300 -0.09 0.1
Completeness
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
29. Table 2 Informative Pfam domains with high H’ and low std. Novel proposed molecular marker
domains in metagenomic data of variable MSL
Pfam ID
( Suli
ocurrences)
H’
mean
H’
std
Description
PF12139
58/161
1.2 0.01 Adenosine-5'-phosphosulfate reductase beta subunit: Key protein domain for both sulfur
oxidation/reduction metabolic pathways. Has been widely studied in the dissimilatory sulfate
reduction metabolism. In all recognized sulfate-reducing prokaryotes, the dissimilatory process is
mediated by three key enzymes: Sat, Apr and Dsr. Homologous proteins are also present in the
anoxygenic photolithotrophic and chemolithotrophic sulfur-oxidizing bacteria (CLSB, PSB, GSB), in
different cluster organization [35].
PF00374
135/161
1.1 0.09 Nickel-dependent hydrogenase: Hydrogenases with S-cluster and selenium containing Cys-x-x-Cys
motifs involved in the binding of nickel. Among the homologues of this hydrogenase domain, is
the alpha subunit of the sulfhydrogenase I complex of Pyrococcus furiosus, that catalyzes the
reduction of polysulfide to hydrogen sulfide with NADPH as the electron donor [55].
PF01747
103/161
1.03 0.06 ATP-sulfurylase: Key protein domain for both sulfur oxidation and reduction processes. The
enzyme catalyzes the transfer of the adenylyl group from ATP to inorganic sulfate, producing
adenosine 5′-phosphosulfate (APS) and pyrophosphate, or the reverse reaction [56].
PF02662
62/161
0.82 0.03 Methyl-viologen-reducing hydrogenase, delta subunit: Is one of the enzymes involved in
methanogenesis and encoded in the mth-flp-mvh-mrt cluster of methane genes in
Methanothermobacter thermautotrophicus. No specific functions have been assigned to the delta
subunit [48].
PF10418
122/161
0.78 0.06 Iron-sulfur cluster binding domain of dihydroorotate dehydrogenase B: Among the homologous
genes in this family are asrA and asrB from Salmonella enterica enterica serovar Typhimurium,
which encode 1) a dissimilatory sulfite reductase, 2) a gamma subunit of the sulfhydrogenase I
complex of Pyrococcus furiosus and, 3) a gamma subunit of the sulfhydrogenase II complex of the
same organism [12].
PF13247
149/161
0.66 0.06 4Fe-4S dicluster domain: Homologues of this family include: 1) DsrO, a ferredoxin-like protein,
related to the electron transfer subunits of respiratory enzymes, 2) dimethylsulfide dehydrogenase
β subunit (ddhB ), involved in dimethyl sulfide degradation in Rhodovulum sulfidophilum and 3)
sulfur reductase FeS subunit (sreB) of Acidianus ambivalens, involved in the sulfur reduction using
H2 or organic substrates as electron donors [12].
PF04358
73/161
0.52 0 DsrC like protein: DsrC is present in all organisms encoding a dsrAB sulfite reductase
(sulfate/sulfite reducers or sulfur oxidizers). The physiological studies suggest that sulfate
reduction rates are determined by cellular levels of this protein. The dissimilatory sulfate reduction
couples the four-electron reduction of the DsrC trisulfide to energy conservation [57]. DsrC was
initially described as a subunit of DsrAB, forming a tight complex; however, it is not a subunit, but
rather a protein with which DsrAB interacts. DsrC is involved in sulfur-transfer reactions; there is a
disulfide bond between the two DsrC cysteines as a redox-active center in the sulfite reduction
pathway. Moreover, DsrC is among the most highly expressed sulfur energy metabolism genes in
isolated organisms and meta- transcriptomes (Santos et al., 2015).
PF01058
158/161
0.45 0.01 NADH ubiquinone oxidoreductase, 20 Kd subunit: Homologous genes are found in the delta
subunits of both sulfhydrogenase complexes of Pyrococcus furiosus [12].
PF01568
156/161
0.4 0.05 Molydopterin dinucleotide binding domain: This domain corresponds to the C-terminal domain IV
in dimethyl sulfoxide (DMSO) reductase [48].
Supplementary files
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
31. Species SS Genus Guild
Ammonifex degensii KC4 12,508 Moorella group SRB/SR
Archaeoglobus profundus DSM 5631 12,024 Archaeoglobus SRB
Candidatus Desulforudis audaxviator MP104C 11,972 Candidatus Desulforudis Sur
Pelodictyon phaeoclathratiforme BU-1 11,836
Chlorobium/Pelodictyon
group GSB
Chlorobium phaeobacteroides BS1 11,649
Chlorobium/Pelodictyon
group GSB
Chlorobium chlorochromatii CaD3 11,625
Chlorobium/Pelodictyon
group GSB
Thiobacillus denitrificans ATCC 25259 11,61 Thiobacillus CLSB
Desulfohalobium retbaense DSM 5692 11,511 Desulfohalobium SRB
Desulfovibrio alaskensis G20 11,5 Desulfovibrio SRB
Desulfovibrio vulgaris DP4 11,442 Desulfovibrio SRB
Chlorobium tepidum TLS 11,354 Chlorobaculum GSB
endosymbiont of unidentified scaly snail isolate
Monju 11,205 0 Sur
Desulfovibrio vulgaris str. 'Miyazaki F' 11,093 Desulfovibrio SRB
Desulfovibrio desulfuricans subsp.
desulfuricans str. ATCC 27774 11,034 Desulfovibrio SRB
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
32. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
33. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
34. 34
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
35. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
36. Sulfur: 112 H’ Nitrogen: 176 H’ Methane: 119 H’Oxygen:55 H’
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Iron: 112 H’
37. Biogeochemical cycle Genes Pfam domains Genomes AUC
Sulfur (S) 152 112 161 0.9855
Nitrogen (N) 267 176 144 0.791
Methane (C) 135 119 90 0.988
Oxygenic Photosynthesis (O) 50 55 53 0.983
Phosphorous (P)
Iron (Fe) 36 33 34 0.863
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
38. ID Description H’ mean std
PF00067 Cytochrome P450 0.644 0.033785
PF00115 Cytochrome C and Quinol oxidase polypeptide I 0.513 0.061551
PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.55825 0.049936
PF02560 Cyanate lyase C-terminal domain 0.93625 0.001389
PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.5525 0.040324
PF04898 Glutamate synthase central domain 0.479 0.034699
PF13442 Cytochrome C oxidase, cbb3-type, subunit III 0.6565 0.047093
python3 plot_entropy.py gen_genF_entropies.oxygen.tab -0.156 0.20625
Oxygen Markers
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
39. ID Description H’ mean std
PF01913 Formylmethanofuran-tetrahydromethanopterin formyltransferase 3.629125 0.0227
PF01993 methylene-5,6,7,8-tetrahydromethanopterin dehydrogenase 2.876 0
PF02240 Methyl-coenzyme M reductase gamma subunit 3.168 0
PF02241 Methyl-coenzyme M reductase beta subunit, C-terminal domain 3.168 0
PF02289 Cyclohydrolase (MCH) 3.353 0
PF02741 FTR, proximal lobe 3.63475 0.034648
PF02745 Methyl-coenzyme M reductase alpha subunit, N-terminal domain 3.168 0
PF02783 Methyl-coenzyme M reductase beta subunit, N-terminal domain 3.168 0
PF04206 Tetrahydromethanopterin S-methyltransferase, subunit E 3.032 0
PF04207 Tetrahydromethanopterin S-methyltransferase, subunit D 3.032 0
PF04208 Tetrahydromethanopterin S-methyltransferase, subunit A 2.903375 0.015203
PF04211 Tetrahydromethanopterin S-methyltransferase, subunit C 3.02575 0.017678
PF05440 Tetrahydromethanopterin S-methyltransferase subunit B 2.980125 0.036537 python3 plot_entropy.py
gen_genF_entropies.methane.tab -0.121 0.1475m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Methane
40. ID Description H’ mean std
PF00067 Cytochrome P450 0.57375 0.0056
PF00174 Oxidoreductase molybdopterin binding domain 0.528125 0.006578
PF00355 Rieske [2Fe-2S] domain 0.507 0.032076
PF00507 NADH-ubiquinone/plastoquinone oxidoreductase, chain 3 0.36975 0.010886
PF00547 Urease, gamma subunit 0.464 0
PF00699 Urease beta subunit 0.475125 0.001126
PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.47025 0.014568
PF02211 Nitrile hydratase beta subunit 0.405625 0.005041
PF02633 Creatinine amidohydrolase 0.58725 0.017466
PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.48 0.032715
PF05899 Protein of unknown function (DUF861) 0.52175 0.022914
PF09347 Domain of unknown function (DUF1989) 0.398875 0.007415
Nitrogen
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
41. Iron
ID Description H’ mean std
PF14522 Cytochrome c7 and related cytochrome c 1.010 0.104
PF00355 Rieske [2Fe-2S] domain 0.51912 0.02854
PF00033 Cytochrome b/b6/petB 0.55875 0.04974
PF00034 Cytochrome c 0.5061 0.1013
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
42. Positive instances
Positive classifications
only with strong evidence so they
make few false positive
errors
MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2
Suli
N=161
(1946) > Negative instances.
Gen
ROC CURVE
• Two-dimensional graphs in which tp
rate is plotted on the Y axis and fp rate is plotted on the X
axis.
• Depicts relative tradeoffs between benefits (true positives)
and costs (false positives).
Never issuing a
positive
classification; such
a classifier
commits no false
positive errors but
also gains no true
positives
Perfect
classification
Random guessing produces the
diagonal line between (0,0) and (1,
1), which has an area of 0.5, no
realistic classifier should have an AUC
less than 0.5
43. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
44. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
45. m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
46. RelativeentropyH’
4Fe-4S dicluster domain
Molydopterin
dinucleotide binding
domain
Cytochrome C
oxidase, cbb3-type,
subunit III
Nitrogenase component
1 type Oxidoreductase
m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a
Supplementary files
Editor's Notes
Over the last 15 years, the enormous advances in HTS technologies have dramatically improving our understanding of life’s microbial diversity to an unprecedented level of detail
Nowadays, accessing the total repertoire of genomes within complex communities by means of metagenomics is becoming a standard procedure to understand the diversity, ecology, evolution and functional makeup of the microbial world
Furthermore, the accurate reconstruction of microbial genomes from metagenomic studies has been shown to be a powerful approach to get insight into the metabolic strategies of the microbial dark matter or uncultivable microorganisms.
However, despite the huge amount of metagenomic and genomic sequences accumulated so far, our ability to evaluate complex metabolic fxns in large-scale ‘omic’ datasets remains biologically and computationally challenging
This is largely due to the challenges involved in testing meaningful biological hypothesis in such a complex data, because only a small proportion of the metabolic information is eventually used to draw ecologically relevant conclusions. But, why?. This is why I like to see as the Iceberg illusion of metaenomics
• Lets imagine the huge amount of data derived from metagenomic studies represented by this iceberg.
In general we could say that most of the microbial ecology studies using metagenomics have been mainly focused on developing broad description of the metabolic pathways within a certain environment, analyzing the relative abundance of marker genes involved in several metabolic such as primary production, nitrogen fixation, etc. And also have been focused on evaluate or dicover differentially abundant, shared or unique functional units (genes, proteins or metabolic pathways).
Since the present critical bottleneck in metagenomic analysis is the efficiency of data processing only a small proportion of the data is used to test biological hypothesis. What do we need?
. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive, but without loosing the sight of biological interpretation.
Here, we took the concept of data integration to try to solve this problem . Gomez cabrero to understand a given system. In this case our system is the microbial metabolism. But we cannot adreess all the microbial metabolisms, Instead lets reduce our metabolic universe to targeted metabolic machineries . Currently there are several databases including manually curated information of datbases and large collections of genomes sequenced.
order to address some of the limitations of these methods, we propose a novel approach to reduce the complexity of targeted metabolic pathways involved in several integral ecosystem processes -- such as entire biogeochemical cycles -- into a single informative score, called Multigenomic Entropy-Based Score (MEBS). This approach is based on the mathematical rationalization of Kullback-Leibler divergence, also known as relative entropy H’ [28]
To test the applicability of this approach, we evaluated the metabolic machinery of the S-cycle. Due to its multiple redox states and its consequences on microbiological and geochemical transformations, S-metabolism can be observed as a complex metabolic machinery, involving a myriad of genes, enzymes, organic substrates and electron carriers, which largely depend on the surrounding geochemical and ecological conditions. For these reasons, the complete repertory involved in the metabolic machinery of S-cycle has remained underexplored despite the massive data produced in ‘omic’ experiments. Here, we performed an integral curation effort to describe all the elements involved in the S-cycle and then used, as explained in the following sections, to score genomic and metagenomic datasets in terms of their Sulfur relevance
estos elementos provienen de fuentes geológicas derivados de procesos como tectónica de placas y procesos atmosféricos fotoquímicos que hacen posible la regeneración de las formas disponibles de estos elementos para ser utilizados y por diferentes poblaciones microbianas relacionadas metabólicamente denominados (gremios metabólicos), que afectan profundamente las propiedades geoquímicas de la biosfera. .
En resumen, los ciclos biogeoquímicos son una compleja interacción de procesos biológicos, geológicos y químicos que operan en escalas de tiempo de microsegundos a eones y en espacios de micrómetros hasta sistemas que abarquen toda la atmosfera y el océano
To compile this database, we first gathered the most important S-compounds derived from biogeochemical processes and biological catalyzed reactions. Then we classified each S-compound according to their chemical and thermodynamic nature (Gibbs free energy of formation, GFEF). Finally, we classified weather each compound can be used as a source of carbon, nitrogen, energy or electron donor, fermentative substrate, or terminal electron acceptor in respiratory microbial processes. The schematic representation of the manual curated effort summarizing the complexity of the sulfur biogeochemical cycle in a global scale is shown in Figure 2.
To compile this database, we first gathered the most important S-compounds derived from biogeochemical processes and biological catalyzed reactions. Then we classified each S-compound according to their chemical and thermodynamic nature (Gibbs free energy of formation, GFEF). Finally, we classified weather each compound can be used as a source of carbon, nitrogen, energy or electron donor, fermentative substrate, or terminal electron acceptor in respiratory microbial processes. The schematic representation of the manual curated effort summarizing the complexity of the sulfur biogeochemical cycle in a global scale is shown in Figure 2.
Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their functio
At present one critical bottleneck in metagenomic analysis is the efficiency of data process because of the slow analysis speed. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive. Moreover, the increasing number of metagenomic projects usually requires the comparison of different samples. Yet current methods are limited by their low efficiency [7], [10], [11]. Thus, high-performance computational techniques are needed to speed-up analysis, without compromising the analysis accuracy.
However, due to the challenges involved in testing meaningful biological hypotheses with complex data, only a small proportion of the metabolic information derived from these datasets is eventually used to draw ecologically relevant conclusions
AUC of a classifier is equivalent to the probability that
the classifier will rank a randomly chosen positive instance
higher than a randomly chosen negative instance
At present one critical bottleneck in metagenomic analysis is the efficiency of data process because of the slow analysis speed. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive. Moreover, the increasing number of metagenomic projects usually requires the comparison of different samples. Yet current methods are limited by their low efficiency [7], [10], [11]. Thus, high-performance computational techniques are needed to speed-up analysis, without compromising the analysis accuracy.
However, due to the challenges involved in testing meaningful biological hypotheses with complex data, only a small proportion of the metabolic information derived from these datasets is eventually used to draw ecologically relevant conclusions
Dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses"....
The presence-absence patterns of Pfam domains belonging to particular pathways can be exploited to compute metabolic completeness. This optional task is invoked with parameter –keggmap and a TAB-separated file mapping Pfam identifiers to KEGG Orthology entries (KO numbers) and the corresponding pathway in Sucy (see Table S3). To compute completeness, the total number of domains involved in a given pathway (i.e., sulfate reduction, sulfide oxidation) must be retrieved from the Sucy database (See Table S2). Then, the protein domains currently present in any given sample are divided by the total number of domains in the pre-defined pathway. The script produces: i) a detailed report of the metabolic pathways of interest; and ii) a list of KO numbers with Hex color codes, corresponding to KO matches in the omic sample, which can be exported to the KEGG Mapper – Search & Color Pathway tool [53] (see Figure S2).
At present one critical bottleneck in metagenomic analysis is the efficiency of data process because of the slow analysis speed. As metagenomic data analysis task is both data- and computation-intensive, high-performance computing is needed, especially when (1) the dataset size is huge for a sample, (2) a project involves many metagenomic samples and (3) the analyses are complex and time-sensitive. Moreover, the increasing number of metagenomic projects usually requires the comparison of different samples. Yet current methods are limited by their low efficiency [7], [10], [11]. Thus, high-performance computational techniques are needed to speed-up analysis, without compromising the analysis accuracy.
However, due to the challenges involved in testing meaningful biological hypotheses with complex data, only a small proportion of the metabolic information derived from these datasets is eventually used to draw ecologically relevant conclusions
AUC of a classifier is equivalent to the probability that
the classifier will rank a randomly chosen positive instance
higher than a randomly chosen negative instance