Shotgun metagenomics involves collecting environmental samples, extracting DNA from the samples, sequencing the DNA using shotgun sequencing, and then analyzing the sequence data computationally. Key steps include assembling reads into longer contigs to aid analysis and annotation. While assembly works well for some datasets, challenges include repeats, low coverage of low-abundance species, and strain variation. High coverage, often 10x or more per genome, is critical for robust assembly. The amount of sequencing needed can be substantial, such as terabases of data to deeply sample microbial communities.
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.
DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.
Evaluation of Pool-Seq as a cost-effective alternative to GWASAmin Mohamed
Whole-genome sequencing (WGS) of pools of individuals (Pool-Seq) provides a cost-effective method for genome-wide association studies (GWAS), and offers an alternative to sequencing of individuals that remains cost prohibitive. Pool-Seq is being increasingly used in population genomic studies in both model and non-model organisms. In this paper, the ability of Pool-Seq to recover known GWAS signals was evaluated. Existing GWAS data for 2,112 animals with 729K SNPs were obtained and pooled to simulate data obtained from a pooled WGS approach. Traditional GWAS results was compared with the absolute allele frequency difference (dAF) metric suitable for use with Pool-Seq data. Specifically, we tested the ability of dAF scans to recover known GWAS signals for two different traits with large and moderate gene effects. Pools of different sizes (50, 100 and 200 individuals per pool) were also compared. The results showed the ability of the absolute allele frequency difference (dAF) approach to recover known GWAS peaks obtained by traditional SNP association and recommended the use of a pool size of 100 individuals for DNA pooling.
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
"Bioinformatics tools for the diagnostic laboratory" presented at the Australian Society for Antimicrobials 2016 annual conference in Melbourne Australia. Slides are aimed at a biological / pathology / clinican audience. Some material has been re-imagined from Nick Loman's ECCMID 2015 talk.
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
Invited research seminar given to MSc students at University College Dublin on 24th October 2013.
I introduce the discipline of phylogenomics - comparative phylogenetic analyses of DNA sequences across genomes - and some of the applications and recent breakthroughs in the field.
As an in-depth case study I explain the methods and significance of our 2013 Nature paper on adaptive genotypic molecular convergence in echolocating mammals.
I then highlight some of the avenues of study on the frontiers of current research.
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.
DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.
Evaluation of Pool-Seq as a cost-effective alternative to GWASAmin Mohamed
Whole-genome sequencing (WGS) of pools of individuals (Pool-Seq) provides a cost-effective method for genome-wide association studies (GWAS), and offers an alternative to sequencing of individuals that remains cost prohibitive. Pool-Seq is being increasingly used in population genomic studies in both model and non-model organisms. In this paper, the ability of Pool-Seq to recover known GWAS signals was evaluated. Existing GWAS data for 2,112 animals with 729K SNPs were obtained and pooled to simulate data obtained from a pooled WGS approach. Traditional GWAS results was compared with the absolute allele frequency difference (dAF) metric suitable for use with Pool-Seq data. Specifically, we tested the ability of dAF scans to recover known GWAS signals for two different traits with large and moderate gene effects. Pools of different sizes (50, 100 and 200 individuals per pool) were also compared. The results showed the ability of the absolute allele frequency difference (dAF) approach to recover known GWAS peaks obtained by traditional SNP association and recommended the use of a pool size of 100 individuals for DNA pooling.
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
"Bioinformatics tools for the diagnostic laboratory" presented at the Australian Society for Antimicrobials 2016 annual conference in Melbourne Australia. Slides are aimed at a biological / pathology / clinican audience. Some material has been re-imagined from Nick Loman's ECCMID 2015 talk.
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
Invited research seminar given to MSc students at University College Dublin on 24th October 2013.
I introduce the discipline of phylogenomics - comparative phylogenetic analyses of DNA sequences across genomes - and some of the applications and recent breakthroughs in the field.
As an in-depth case study I explain the methods and significance of our 2013 Nature paper on adaptive genotypic molecular convergence in echolocating mammals.
I then highlight some of the avenues of study on the frontiers of current research.
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsChristopher Mason
Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single cells, RNA profiling, and metagenomics. Technical artifacts and contaminations can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous.
Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data.
This webinar will review work to develop standards and their applications in genomics, including the ABRF-NGS Phase II NGS Study on DNA Sequencing; the FDA’s Sequencing Quality Control Consortium (SEQC2); metagenomics standards efforts (ABRF, ATCC, Zymo, Metaquins), and the Epigenomics QC group of the SEQC2. The webinar will also review he computational methods for detection, validation, and implementation of these genomic measures.
Meren's pirate presentation at the STAMPS course to talk about the basic concepts most binning algorithms use to bin contigs into genome bins: sequence composition, and differential coverage.
The presentation includes preliminary information about the big data mainly metagenomic data and discussions related to the hurdles in analyzing using conventional approaches. In the later part, brief introduction about machine learning approaches using biological example for each. In the last, work done with special focus on implementation of a machine learning approach Random Forest for the functional annotation and taxonomic classification of metagenomic data.
Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples. The broad field was referred to as environmental genomics, ecogenomics or community genomics. Recent studies use "shotgun" Sanger sequencing or next generation sequencing (NGS) to get largely unbiased samples of all genes from all the members of the sampled communities.
As next generation sequencing has moved into the clinic, there is an increased demand for accuracy and reproducibility. Target enrichment is needed for applications where high read depth is critical, but some performance limitations, especially in GC-rich regions of the genome, have raised questions about the overall usefulness of target capture methods. In this presentation, Dr Kristina Giorda presents a method using individually synthesized and quality checked capture baits that performs well, even for GC-rich sequences, and delivers accurate coverage of the target space. Dr Giorda covers library preparation and target capture, and shares informative data generated using our xGen® Exome Research Panel.
Molecular QC: Interpreting your Bioinformatics PipelineCandy Smellie
What is the impact of assay failure in your laboratory and how do you monitor for it?
The most heavily degraded samples are not suitable for standard exome coverage: sometimes it’s not even a matter of getting bad sequencing, you might get nothing at all!
FFPE artifacts increase with storage time
Artifacts go against the statistical power of your variant calling analysis
Molecular reference standards help filter out bad mappings and spurious variants
Bioinformatics pipelines allow adding Molecular Reference Standards in your joint variant calling pipeline
Genome In A Bottle Reference Standards are invaluable for validating variant calling analysis
NIST and its collaborators shared datasets created with most NGS technologies
Horizon Diagnostics shared annotated, merged variant calls from NIST for the Ashkenazim Trio
~35K variants are predicted having high or moderate impact within the Trio
GM24385 (Ashkenazim Son) includes 352 small variants with high/moderate impact which are absent in Father and Mother
Routinely monitor the performance of your workflows and assays with independent external controls
Building bioinformatics resources for the global communityExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Building bioinformatics resources for the global community. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
Errors and Limitaions of Next Generation SequencingNixon Mendez
High throughput sequencing technologies has made whole genome sequencing and resequencing available to many more researchers and projects.
Cost and time have been greatly reduced.
The error profiles and limitations of the new platforms differ significantly from those of previous sequencing technologies.
The selection of an appropriate sequencing platform for particular types of experiments is an important consideration.
NGS sequencing errors focuses mainly on the following points:
1.Low quality bases
2.PCR errors
3.High Error rate
NGS has inherent limitations they are as follows :
1.Sequence properties and algorithmic challenges
2.Contamination or new insertions
3.Repeat content
4.Segmental duplications
5.Missing and fragmented genes
6.Reference index
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1QIAGEN
What can you do from a single cell? Actually, quite a lot! Beginning with the genome, you can discover new biomarkers by identifying new genetic variances and their association with specific diseases, including cancers. Moving on to RNA, the recent advances in RNA sequencing technology have made single-cell transcriptomics a possibility. Along with these possibilities, come challenges that start from the moment you get the sample to the final step of gaining insights into the cell. This slidedeck will provide an overview on the multiple steps involved as you move from sample acquisition to analysis and data interpretation in different sample types.
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Thermo Fisher Scientific
At this time next generation sequencing (NGS) is hindered by slow and often manual workflow procedures. Decreasing overall workflow times is critical for the widespread adoption of targeted and whole genome sequencing (WGS) for many time-sensitive applications, in particular for infectious disease analysis. To this end, we describe improvements to the four main steps of the NGS workflow: i) library preparation; ii) template preparation, iii) sequencing; iv) and data analysis. Together, these advances dramatically decrease the overall turnaround times.
Ion Torrent semiconductor-based sequencing instruments utilities flow sequencing with speed largely dependent on and the number of nucleotide flows (one flow produces ~0.5 base) and the speed of the flows (Figure 2).
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
Bioinformatica e genomica comparata: nuove strategie sperimentali e computazionali per la produzione e analisi di dati NGS finalizzati a sviluppare processi e prodotti innovativi per la salute dell’uomo, l’ambiente e l’agroalimentare.
Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.2- Next Generation Sequencing. Technologies and Applications. Part II: NGS Applications I.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsChristopher Mason
Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single cells, RNA profiling, and metagenomics. Technical artifacts and contaminations can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous.
Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data.
This webinar will review work to develop standards and their applications in genomics, including the ABRF-NGS Phase II NGS Study on DNA Sequencing; the FDA’s Sequencing Quality Control Consortium (SEQC2); metagenomics standards efforts (ABRF, ATCC, Zymo, Metaquins), and the Epigenomics QC group of the SEQC2. The webinar will also review he computational methods for detection, validation, and implementation of these genomic measures.
Meren's pirate presentation at the STAMPS course to talk about the basic concepts most binning algorithms use to bin contigs into genome bins: sequence composition, and differential coverage.
The presentation includes preliminary information about the big data mainly metagenomic data and discussions related to the hurdles in analyzing using conventional approaches. In the later part, brief introduction about machine learning approaches using biological example for each. In the last, work done with special focus on implementation of a machine learning approach Random Forest for the functional annotation and taxonomic classification of metagenomic data.
Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples. The broad field was referred to as environmental genomics, ecogenomics or community genomics. Recent studies use "shotgun" Sanger sequencing or next generation sequencing (NGS) to get largely unbiased samples of all genes from all the members of the sampled communities.
As next generation sequencing has moved into the clinic, there is an increased demand for accuracy and reproducibility. Target enrichment is needed for applications where high read depth is critical, but some performance limitations, especially in GC-rich regions of the genome, have raised questions about the overall usefulness of target capture methods. In this presentation, Dr Kristina Giorda presents a method using individually synthesized and quality checked capture baits that performs well, even for GC-rich sequences, and delivers accurate coverage of the target space. Dr Giorda covers library preparation and target capture, and shares informative data generated using our xGen® Exome Research Panel.
Molecular QC: Interpreting your Bioinformatics PipelineCandy Smellie
What is the impact of assay failure in your laboratory and how do you monitor for it?
The most heavily degraded samples are not suitable for standard exome coverage: sometimes it’s not even a matter of getting bad sequencing, you might get nothing at all!
FFPE artifacts increase with storage time
Artifacts go against the statistical power of your variant calling analysis
Molecular reference standards help filter out bad mappings and spurious variants
Bioinformatics pipelines allow adding Molecular Reference Standards in your joint variant calling pipeline
Genome In A Bottle Reference Standards are invaluable for validating variant calling analysis
NIST and its collaborators shared datasets created with most NGS technologies
Horizon Diagnostics shared annotated, merged variant calls from NIST for the Ashkenazim Trio
~35K variants are predicted having high or moderate impact within the Trio
GM24385 (Ashkenazim Son) includes 352 small variants with high/moderate impact which are absent in Father and Mother
Routinely monitor the performance of your workflows and assays with independent external controls
Building bioinformatics resources for the global communityExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Building bioinformatics resources for the global community. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
Errors and Limitaions of Next Generation SequencingNixon Mendez
High throughput sequencing technologies has made whole genome sequencing and resequencing available to many more researchers and projects.
Cost and time have been greatly reduced.
The error profiles and limitations of the new platforms differ significantly from those of previous sequencing technologies.
The selection of an appropriate sequencing platform for particular types of experiments is an important consideration.
NGS sequencing errors focuses mainly on the following points:
1.Low quality bases
2.PCR errors
3.High Error rate
NGS has inherent limitations they are as follows :
1.Sequence properties and algorithmic challenges
2.Contamination or new insertions
3.Repeat content
4.Segmental duplications
5.Missing and fragmented genes
6.Reference index
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1QIAGEN
What can you do from a single cell? Actually, quite a lot! Beginning with the genome, you can discover new biomarkers by identifying new genetic variances and their association with specific diseases, including cancers. Moving on to RNA, the recent advances in RNA sequencing technology have made single-cell transcriptomics a possibility. Along with these possibilities, come challenges that start from the moment you get the sample to the final step of gaining insights into the cell. This slidedeck will provide an overview on the multiple steps involved as you move from sample acquisition to analysis and data interpretation in different sample types.
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Thermo Fisher Scientific
At this time next generation sequencing (NGS) is hindered by slow and often manual workflow procedures. Decreasing overall workflow times is critical for the widespread adoption of targeted and whole genome sequencing (WGS) for many time-sensitive applications, in particular for infectious disease analysis. To this end, we describe improvements to the four main steps of the NGS workflow: i) library preparation; ii) template preparation, iii) sequencing; iv) and data analysis. Together, these advances dramatically decrease the overall turnaround times.
Ion Torrent semiconductor-based sequencing instruments utilities flow sequencing with speed largely dependent on and the number of nucleotide flows (one flow produces ~0.5 base) and the speed of the flows (Figure 2).
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
Bioinformatica e genomica comparata: nuove strategie sperimentali e computazionali per la produzione e analisi di dati NGS finalizzati a sviluppare processi e prodotti innovativi per la salute dell’uomo, l’ambiente e l’agroalimentare.
Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.2- Next Generation Sequencing. Technologies and Applications. Part II: NGS Applications I.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
The Global Micorbial Identifier (GMI) initiative - and its working groupsExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
The GMI initiative - and its working groups. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
How genomics is changing the practice of public health microbiology. The role of whole genome sequencing as the "one true assay". Another powerful tool for the epidemiologist.
Innovative NGS Library Construction TechnologyQIAGEN
Next-generation sequencing (NGS) is a driving force for numerous new and exciting applications, including cancer research, stem cell research, metagenomics, population genetics, medical research and single cell analysis. While NGS technology is continuously improving, library preparation remains one of the biggest bottlenecks in the NGS workflow and includes several time-consuming steps that can result in considerable sample loss and the potential to introduce handling errors. Moreover, conducting single-cell genomic analysis using NGS methods has traditionally been challenging since the amount of genomic DNA present in a single cell is very limited.
With technological breakthroughs in single cell isolation, whole genome amplification (WGA) and NGS library preparation, experiments using single cells are now possible. However, challenges still exist. In particular, methods for the unbiased and complete amplification of a single genome and for the efficient conversion of that amplified DNA into a sequencer-compatible library face several technical limitations including incomplete amplification, the introduction of PCR errors, GC-bias and locus or allelic drop-out. The presentation covers the impact of these factors and how one can mitigate it.
Presentation from the ECDC expert consultation on Whole Genome Sequencing organised by the European Centre of Disease Prevention and Control - Stockholm, 19 November 2015
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
GRC Workshop at Churchill College on Sep 21, 2014. This is Michael Schatz's talk on the theory and practice of representing population data in graph structures.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
Presentation at a workshop conducted by the UC Davis Bioinformatics Core Facility: Using the Linux Command Line for Analysis of High Throughput Sequence Data, September 15-19, 2014
This presentation explains the meaning of curation and includes an introduction to the Apollo genome annotation editing tool and its curation environment.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
9. Topics to explore -
• Experimental design to enable good data analysis.
• Extracting whole genomes and evaluating strain
variation.
• Technology review and open problems.
• Assembly graphs.
• Long reads and infinite sequencing – what next?
Or just a Q&A.
I can talk for 90 minutes straight, too, but I’d rather not.
10. Assembly graphs.Assembly “graphs” are a way of representing
sequencing data that retains all variation;
assemblers then crunch this down into a single
sequence.
Image from Iqbal et al., 2012. PMID 23172865
17. Annotating individual
reads
• Works really well when you have EITHER
o (a) evolutionarily close references
o (b) rather long sequences
(This is obvious, right?)
18. Annotating individual
reads #2
• We have found that this does not work well
with Illumina samples from unexplored
environments (e.g. soil).
• Sensitivity is fine (correct match is usually
there)
• Specificity is bad (correct match may be
drowned out by incorrect matches)
19. Assembly: good.
Howe et al., 2014, PMID 24632729
Assemblies yield much
more significant
homology matches.
20. Recommendation:
For reads < 200-300 bp,
• Annotate individual reads for human-
associated samples, or exploration of well-
studied systems.
• Otherwise, look to assembly.
21. So, why assemble?
• Increase your ability to assign
homology/orthology correctly!!
Essentially all functional annotation systems
depend on sequence similarity to assign
homology. This is why you want to assemble
your data.
22. Why else would you want
to assemble?
• Assemble new “reference”.
• Look for large-scale variation from reference
– pathogenicity islands, etc.
• Discriminate between different members of
gene families.
• Discover operon assemblages & annotate
on co-incidence of genes.
• Reduce size of data!!
23. Why don’t you want to
assemble??
• Abundance threshold – low-coverage filter.
• Strain variation
• Chimerism
26. Short read lengths are
hard.
Whiteford et al., Nuc. Acid Res, 2005
Conclusion: even with
a read length of 200, the
E. coli genome cannot be
assembled completely.
Why?
27. Short read lengths are
hard.
Whiteford et al., Nuc. Acid Res, 2005
Conclusion: even with
a read length of 200, the
E. coli genome cannot be
assembled completely.
Why? REPEATS.
This is why paired-end
sequencing is so importan
for assembly.
28. Four main challenges for
de novo sequencing.
• Repeats.
• Low coverage.
• Errors
These introduce breaks in the
construction of contigs.
• Variation in coverage – transcriptomes and
metagenomes, as well as amplified genomic.
This challenges the assembler to distinguish between
erroneous connections (e.g. repeats) and real
connections.
29. Repeats
• Overlaps don’t place sequences uniquely when
there are repeats present.
UMD assembly primer (cbcb.umd.edu)
32. Conclusions re mixed
community sampling
In shotgun metagenomics, you are sampling
randomly from the mixed population.
Therefore, the lowest abundance member of the
population (that you want to observe) drives the
required depth of sequencing!
1 in a million => ~50 Tbp sequencing for 10x coverage.
33. Assembly depends on high coverage
HMP mock community assembly; Velvet-based protocol.
34. Conclusions from
previous slide
• To recover any contigs at all, you need coverage >
10 (green line).
• To recover long sequences, you want coverage >
20 (blue line).
35. A few assembly stories
• Low complexity/Osedax symbionts
• High complexity/soil.
36. Osedax symbionts
Table S1 ! Specimens, and collection sites, used in this study
Site Dive1
Date Time Zone
(months)
# of
specimens
Osedax frankpressi
with Rs1 symbiont
2890m T486 Oct 2002 8 2
T610 Aug 2003 18 3
T1069 Jan 2007 59 2
DR010 Mar 2009 85 3
DR098 Nov 2009 93 4
DR204 Oct 2010 104 1
DR234 Jun 2011 112 2
1820m T1048 Oct 2006 7 3
T1071 Jan 2007 10 3
DR012 Mar 2009 36 4
DR2362
Jun 2011 63 2
O. frankpressi from DR236
1
Dive numbers begin with the remotely operated vehicle name; T= Tiburon, DR = Doc
Ricketts (both owned and operated by the Monterey Bay Aquarium Research Institute) .
2
Samples used for genomic analysis were collected during dive DR236.
Goffredi et al., submitted.
40. Osedax assembly story
Conclusions include:
• Osedax symbionts have genes needed for
free living stage;
• metabolic versatility in carbon, phosphate,
and iron uptake strategies;
• Genome includes mechanisms for
intracellular survival, and numerous potential
virulence capabilities.
41. Osedax assembly story
• Low diversity metagenome!
• Physical isolation => MDA => sequencing =>
diginorm => binning =>
o 94% complete Rs1
o 66-89% complete Rs2
• Note: many interesting critters are hard to
isolate => so, basically, metagenomes.
42. Human-associated
communities
• “Time series community genomics analysis
reveals rapid shifts in bacterial species,
strains, and phage during infant gut
colonization.” Sharon et al. (Banfield lab);
Genome Res. 2013 23: 111-120
43. Setup
• Collected 11 fecal samples from premature
female, days 15-24.
• 260m 100-bp paired-end Illumina HiSeq
reads; 400 and 900 base fragments.
• Assembled > 96% of reads into contigs >
500bp; 8 complete genomes; reconstructed
genes down to 0.05% of population
abundance.
45. Key strategy: abundance
binning
Bin reads by k-mer
abundance
Assemble most
abundance bin
Remove reads that
map to assembly
Sharon et al., 2013; pmid 22936250
48. Conclusions
• Recovered strain variation, phage variation,
abundance variation, lateral gene transfer.
• Claim that “recovered genomes are
superior to draft genomes generated in
most isolate genome sequencing projects.”
51. Protocol for messenger RNA
extraction, assembly,
downstream analysis, including
differential expression.
Note: used overlapping paired-
end reads.
52. Great Prairie Grand
Challenge - soil
• Together with Janet Jansson, Jim Tiedje, et
al.
• “Can we make sense of soil with deep
Illumina sequencing?”
53. Great Prairie Grand
Challenge - soil
• What ecosystem level functions are present, and
how do microbes do them?
• How does agricultural soil differ from native soil?
• How does soil respond to climate perturbation?
• Questions that are not easy to answer without
shotgun sequencing:
o What kind of strain-level heterogeneity is present in the population?
o What does the phage and viral population look like?
o What species are where?
55. Approach 1: Digital normalization
(a computational version of library normalization)
Suppose you
have a dilution
factor of A (10) to
B(1). To get 10x
of B you need to
get 100x of A!
Overkill!!
This 100x will
consume disk
space and,
because of errors,
memory.
We can discard it
for you…
56. Approach 2: Data partitioning
(a computational version of cell sorting)
Split reads into “bins”
belonging to different
source species.
Can do this based
almost entirely on
connectivity of
sequences.
“Divide and conquer”
Memory-efficient
implementation helps
to scale assembly.
57. Putting it in perspective:
Total equivalent of ~1200 bacterial genomes
Human genome ~3 billion bp
Assembly results for Iowa corn and prairie
(2x ~300 Gbp soil metagenomes)
Total
Assembly
Total Contigs
(> 300 bp)
% Reads
Assembled
Predicted
protein
coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Adina Howe
58. Resulting contigs are low
coverage.
Figure11: Coverage (median basepair) distribution of assembled contigs from soil metagenomes.
63. Concluding thoughts on
assembly (I)
• There’s no standard approach yet; almost
every paper uses a specialized pipeline of
some sort.
o More like genome assembly
o But unlike transcriptomics…
64. Concluding thoughts on
assembly (II)
• Anecdotally, everyone worries about strain
variation.
o Some groups (e.g. Banfield, us) have found that
this is not a problem in their system so far.
o Others (viral metagenomes! HMP!) have found
this to be a big concern.
65. Concluding thoughts on
assembly (III)
• Some groups have found metagenome assembly
to be very useful.
• Others (us! soil!) have not yet proven its utility.
66. High coverage is critical.
Low coverage is the dominant
problem blocking assembly of
metagenomes
67. Questions --
• How much sequencing should I do?
• How do I evaluate metagenome
assemblies?
• Which assembler is best?
69. Shotgun sequencing and
coverage
“Coverage” is simply the average number of reads that overlap
each true base in genome.
Here, the coverage is ~10 – just draw a line straight down from the
top through all of the reads.
70. Random sampling => deep sampling
needed
Typically 10-100x needed for robust recovery (300 Gbp for human)
74. K-mer based assemblers
scale poorly
Why do big data sets require big machines??
Memory usage ~ “real” variation + number of errors
Number of errors ~ size of data set
77. How much data do we
need? (I)
(“More” is rather vague…)
78. Putting it in perspective:
Total equivalent of ~1200 bacterial genomes
Human genome ~3 billion bp
Assembly results for Iowa corn and prairie
(2x ~300 Gbp soil metagenomes)
Total
Assembly
Total Contigs
(> 300 bp)
% Reads
Assembled
Predicted
protein
coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Adina Howe
79. Resulting contigs are low
coverage.
Figure11: Coverage (median basepair) distribution of assembled contigs from soil metagenomes.
80. How much? (II)
• Suppose we need 10x coverage to assemble a
microbial genome, and microbial genomes average
5e6 bp of DNA.
• Further suppose that we want to be able to assemble
a microbial species that is “1 in a 100000”, i.e. 1 in
1e5.
• Shotgun sequencing samples randomly, so must
sample deeply to be sensitive.
10x coverage x 5e6 bp x 1e5 =~ 50e11, or 5 Tbp of
sequence.
Currently this would cost approximately $100k, for 10 full
Illumina runs, but in a year we will be able to do it for
much less.
82. “Whoa, that’s a lot of
data…”
http://ivory.idyll.org/blog/how-much-sequencing-is-needed.html
83. Some approximate
metagenome sizes
• Deep carbon mine data set: 60 Mbp (x 10x => 600
Mbp)
• Great Prairie soil: 12 Gbp (4x human genome)
• Amazon Rain Forest Microbial Observatory soil: 26
Gbp
84. Partitioning separates reads by genome.
When computationally spiking HMP mock data with one E.
coli genome (left) or multiple E. coli strains (right), majority of
partitions contain reads from only a single genome (blue) vs
multi-genome partitions (green).
Partitions containing spiked data indicated with a *Adina Howe
**
85. Is your assembly good?
• Truly reference-free assembly is hard to evaluate.
• Traditional “genome” measures like N50 and NG50
simply do not apply to metagenomes, because
very often you don’t know what the genome “size”
is.
88. What’s the best
assembler?
-25
-20
-15
-10
-5
0
5
10
BCM* BCM ALLP NEWB SOAP** MERAC CBCB SOAP* SOAP SGA PHUS RAY MLK ABL
CumulativeZ-scorefromrankingmetrics
Bird assembly
Bradnam et al., Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
89. What’s the best
assembler?
-8
-6
-4
-2
0
2
4
6
8
BCM CSHL CSHL* SYM ALLP SGA SOAP* MERAC ABYSS RAY IOB CTD IOB* CSHL** CTD** CTD*
CumulativeZ-scorefromrankingmetrics
Fish assembly
Bradnam et al., Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
90. What’s the best
assembler?
Bradnam et al., Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
-14
-10
-6
-2
2
6
10
SGA PHUS MERAC SYMB ABYSS CRACS BCM RAY SOAP GAM CURT
CumulativeZ-scorefromrankingmetrics
Snake assembly
91. Answer: for eukaryotic
genomes, it depends
• Different assemblers perform differently, depending
on
o Repeat content
o Heterozygosity
• Generally the results are very good (est
completeness, etc.) but different between different
assemblers (!)
• There Is No One Answer.
92. Estimated completeness: CEGMA
Each assembler lost different ~5% CEGs
Bradnam et al., Assemblathon 2:
http://arxiv.org/pdf/1301.5406v1.pdf
94. Evaluating assemblies
• Every assembly returns different results for eukaryotic
genomes.
• For metagenomes, it’s even worse.
o No systematic exploration of precision, recall, etc.
o Very little in the way of cross-comparable data sets
o Often sequencing technology being evaluated is out of date
o etc. etc.
95. Our experience
• Our metagenome assemblies compare well with
others, but we have little in the way of ground truth
with which to evaluate.
• Scaffold assembly is tricky; we believe in contig
assembly for metagenomes, but not scaffolding.
• See arXiv paper, “Assembling large, complex
metagenomes”, for our suggested pipeline and
statistics & references.
99. Deep Carbon data set
• Name: DCO_TCO_MM5
• Masimong Gold Mine; microbial cells filtered from
fracture water from within a 1.9km borehole.
(32,000 year old water?!)
• M.C.Y. Lau, C. Magnabosco, S. Grim, G. Lacrampe
Couloume, K. Wilkie, B. Sherwood Lollar, D.N. Simkus,
G.F. Slater, S. Hendrickson, M. Pullin, T.L. Kieft, O.
Kuloyo, B. Linage, G. Borgonie, E. van Heerden, J.
Ackerman, C. van Jaarsveld, and T.C. Onstott
100. DCO_TCO_M
M5
20m reads / 2.1 Gbp
5.6m reads / 601.3 Mbp
Adapter trim &
quality filter
Diginorm to C=10
Trim high-
coverage reads at
low-abundance
k-mers
“Could you take a look at this?
MG-RAST is telling us we
have a lot of artificially
duplicated reads, i.e. the data
is bad.”
Entire process took ~4 hours of
computation, or so.
(Minimum genome size est: 60.1 Mbp)
101. Assembly stats:
All contigs Contigs > 1kb
k N contigs Sum BP N contigs Sum BP
Max
contig
21 343263 63217837 6271 10537601 9987
23 302613 63025311 7183 13867885 21348
25 276261 62874727 7375 15303646 34272
27 258073 62500739 7424 16078145 48742
29 242552 62001315 7349 16426147 48746
31 228043 61445912 7307 16864293 48750
33 214559 60744478 7241 17133827 48768
35 203292 60039871 7129 17249351 45446
37 189948 58899828 7088 17527450 59437
39 180754 58146806 7027 17610071 54112
41 172209 57126650 6914 17551789 65207
43 165563 56440648 6925 17654067 73231
DCO_TCO_MM
5
(Minimum genome size est: 60.1 Mbp)
102. Chose two:
• A: k=43 (“long contigs”)
• 165563 contigs
• 56.4 Mbp
• longest contig: 73231 bp
• B: k=43 (“high recall”)
• 343263 contigs
• 63.2 Mbp
• longest contig is 9987 bp
DCO_TCO_MM
5
How to evaluate??
103. How many reads map
back?
Mapped 3.8m paired-end reads (one subsample):
• high-recall: 41% of pairs map
• longer-contigs: 70% of pairs map
+ 150k single-end reads:
• high-recall: 49% of sequences map
• longer-contigs: 79% of sequences map
105. Conclusion
• This is a pretty good metagenome assembly – >
80% of reads map!
• Surprised that the larger dataset (6.32 Mbp, “high
recall”) accounts for a smaller percentage of the
reads – 49% vs 79% for the 56.4 Mbp “long contigs”
data set.
• I now suspect that different parameters are
recovering different subsets of the sample…
• Don’t trust MG-RAST ADR calls.
107. Estimates of metagenome
size
Calculation: # reads * (avg read len) / (diginorm
coverage)
Assumes: few entirely erroneous reads (upper bound);
saturation (lower bound).
• E. coli: 384k * 86.1 / 5.0 => 6.6 Mbp est. (true: 4.5
Mbp)
• MM5 deep carbon: 60 Mbp
• Great Prairie soil: 12 Gbp
• Amazon Rain Forest Microbial Observatory: 26 Gbp
108. Extracting whole
genomes?
So far, we have only assembled contigs, but not whole
genomes.
Can entire genomes be
assembled from metagenomic
data?
Iverson et al. (2012), from
the Armbrust lab, contains a
technique for scaffolding
metagenome contigs into
~whole genomes. YES.
Pmid 22301318, Iverson et al.
110. What works?
Today,
• From deep metagenomic data, you can get the
gene and operon content (including abundance of
both) from communities.
• You can get microarray-like expression information
from metatranscriptomics.
111. What needs work?
• Assembling ultra-deep samples is going to require
more engineering, but is straightforward. (“Infinite
assembly.”)
• Building scaffolds and extracting whole genomes
has been done, but I am not yet sure how feasible it
is to do systematically with existing tools (c.f.
Armbrust Lab).
112. What will work,
someday?
• Sensitive analysis of strain variation.
o Both assembly and mapping approaches do a poor job detecting many
kinds of biological novelty.
o The 1000 Genomes Project has developed some good tools that need to
be evaluated on community samples.
• Ecological/evolutionary dynamics in vivo.
o Most work done on 16s, not on genomes or functional content.
o Here, sensitivity is really important!
113. The interpretation
challenge
• For soil, we have generated approximately 1200
bacterial genomes worth of assembled genomic DNA
from two soil samples.
• The vast majority of this genomic DNA contains unknown
genes with largely unknown function.
• Most annotations of gene function & interaction are
from a few phylogenetically limited model organisms
o Est 98% of annotations are computationally inferred: transferred from model
organisms to genomic sequence, using homology.
o Can these annotations be transferred? (Probably not.)
This will be the biggest sequence analysis challenge of the next 50 years.
114. What are future needs?
• High-quality, medium+ throughput annotation of
genomes?
o Extrapolating from model organisms is both immensely important and yet
lacking.
o Strong phylogenetic sampling bias in existing annotations.
• Synthetic biology for investigating non-model
organisms?
(Cleverness in experimental biology doesn’t scale )
• Integration of microbiology, community
ecology/evolution modeling, and data analysis.
Editor's Notes
Both BLAST. Shotgun: annotated reads only.
Beware of Janet Jansson bearing data.
Diginorm is a subsampling approach that may help assemble highly polymorphic sequences. Observed levels of variation are quite low relative to e.g. marine free spawning animals.
A sketch showing the relationship between the number of sequence reads and the number of edges in the graph. Because the underlying genome is fixed in size, as the number of sequence reads increases the number of edges in the graph due to the underlying genome that will plateau when every part of the genome is covered. Conversely, since errors tend to be random and more or less unique, their number scales linearly with the number of sequence reads. Once enough sequence reads are present to have enough coverage to clearly distinguish true edges (which come from the underlying genome), they will usually be outnumbered by spurious edges (which arise from errors) by a substantial factor.