Long read sequencing - the good, the bad, and the really cool. Covers Illumina SLR, Pacbio RSII and Oxford Nanopore as of June 2015. Discusses bioinformatics differences of long reads over short reads.
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
Oxford Nanopore was founded in Oxford Nanolabs by Dr.Gordon Sanghera, Dr.Spike Willcocks and Professor Hagan Bayley. Nanopore sequencing has been around since the 1990s, when Church et al. and Deamer and Akeson separately proposed that it is possible to sequence DNA using nanopore sensors.
Presentation carried out by Sergi Beltran Agulló, from the CNAG, at the course: Identification and analysis of sequence variants in sequencing projects: fundamentals and tools .
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
Oxford Nanopore was founded in Oxford Nanolabs by Dr.Gordon Sanghera, Dr.Spike Willcocks and Professor Hagan Bayley. Nanopore sequencing has been around since the 1990s, when Church et al. and Deamer and Akeson separately proposed that it is possible to sequence DNA using nanopore sensors.
Presentation carried out by Sergi Beltran Agulló, from the CNAG, at the course: Identification and analysis of sequence variants in sequencing projects: fundamentals and tools .
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
Biotechnophysics: DNA Nanopore SequencingMelanie Swan
Biophysics (not merely bioengineering) is required to understand the fundamental mechanisms of biology in order to make technologies (bench and bioinformatic) for understanding them
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
A class of DNA sequencing techniques currently in active development is third-generation sequencing, commonly referred to as long-read sequencing. In comparison to second generation sequencing, also referred to as next generation sequencing, third generation sequencing technologies have the capacity to create noticeably longer reads.
HERE IN THIS PRESENTATION HY HOMOLOGY MODELING IS EXPLAIN , WITH EXAMPLES OF PROTEIN PRIMARY AND SECONDARY, SHOWING THE IMAGES FORM WHICH MAKES EASY TO UNDERSTAND
It contains information about- DNA Sequencing; History and Era sequencing; Next Generation Sequencing- Introduction, Workflow, Illumina/Solexa sequencing, Roche/454 sequencing, Ion Torrent sequencing, ABI-SOLiD sequencing; Comparison between NGS & Sangers and NGS Platforms; Advantages and Applications of NGS; Future Applications of NGS.
NGS in Clinical Research: Meet the NGS Experts Series Part 1QIAGEN
Next generation sequencing has revolutionized clinical testing but has also created novel challenges. This presentation will give an overview of state of the art clinical NGS and discuss validation, clinical implementation as well as the migration from gene panels to exome sequencing for inherited disorders with clinical and genetic heterogeneity. In addition, important shortcomings such as difficulties with regions of high sequence homology will be discussed.
Metagenomics is the study of a collection of genetic material (genomes) from a mixed community of organisms. Metagenomics usually refers to the study of microbial communities.
Biotechnophysics: DNA Nanopore SequencingMelanie Swan
Biophysics (not merely bioengineering) is required to understand the fundamental mechanisms of biology in order to make technologies (bench and bioinformatic) for understanding them
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
A class of DNA sequencing techniques currently in active development is third-generation sequencing, commonly referred to as long-read sequencing. In comparison to second generation sequencing, also referred to as next generation sequencing, third generation sequencing technologies have the capacity to create noticeably longer reads.
HERE IN THIS PRESENTATION HY HOMOLOGY MODELING IS EXPLAIN , WITH EXAMPLES OF PROTEIN PRIMARY AND SECONDARY, SHOWING THE IMAGES FORM WHICH MAKES EASY TO UNDERSTAND
It contains information about- DNA Sequencing; History and Era sequencing; Next Generation Sequencing- Introduction, Workflow, Illumina/Solexa sequencing, Roche/454 sequencing, Ion Torrent sequencing, ABI-SOLiD sequencing; Comparison between NGS & Sangers and NGS Platforms; Advantages and Applications of NGS; Future Applications of NGS.
NGS in Clinical Research: Meet the NGS Experts Series Part 1QIAGEN
Next generation sequencing has revolutionized clinical testing but has also created novel challenges. This presentation will give an overview of state of the art clinical NGS and discuss validation, clinical implementation as well as the migration from gene panels to exome sequencing for inherited disorders with clinical and genetic heterogeneity. In addition, important shortcomings such as difficulties with regions of high sequence homology will be discussed.
Metagenomics is the study of a collection of genetic material (genomes) from a mixed community of organisms. Metagenomics usually refers to the study of microbial communities.
Going Beyond Genomics in Precision Medicine: What's NextHealth Catalyst
Precision medicine processes, while involving genomics, are not confined to working with data about an individual’s genes, environment, and lifestyle. Precision medicine also means putting patients on the right path of care, taking into consideration other individual tolerances, such as participation and cost. Precision medicine processes incorporate data beyond the individual, pulling in socio-economic data, as well as relevant internal and external data, to create an entire patient data ecosystem. With reusable data modules, this information is processed within a closed-loop analytics framework to facilitate clinical decision making at the point of care. This optimizes clinical workflow, thus leading to more precise medicine.
Making powerful science: an introduction to NGS and beyondAdamCribbs1
This slide deck is from the Botnar Research Centre introduction to NGS sequencing workshop 2021- an overview of the theoretical concepts behind sequencing are given
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...Andor Kiss
A review of recent molecular biology technologies at the core genomics facility at Miami University (Ohio). The aim of this talk is to introduce the facility's capabilities to faculty, graduate and undergraduate students at MiamiU.
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
"Bioinformatics tools for the diagnostic laboratory" presented at the Australian Society for Antimicrobials 2016 annual conference in Melbourne Australia. Slides are aimed at a biological / pathology / clinican audience. Some material has been re-imagined from Nick Loman's ECCMID 2015 talk.
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...Torsten Seemann
This talk introduces a Linux Professional audience to bacterial genomics and modern sequencing technology. The title is slightly misleading and is a bit of clickbait. The diagrams are good.
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
An introduction to basic genomics bioinformatics concepts in 20 minutes for an audience of clinicians, epidemiologists and other public health officials.
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
How genomics is changing the practice of public health microbiology. The role of whole genome sequencing as the "one true assay". Another powerful tool for the epidemiologist.
Why and how to clean Illumina genome sequencing reads. Includes illustrative examples, and a case where a project was saved by using Nesoni clip: to discover the cause of non-mapping reads.
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
Invited talk at the Australian Society for Microbiology Annual Conference 2014 on "FriPan" our tool for visualizing bacterial pan genomes across 10-100s of isolates.
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
Using Snippy to call variants in bacterial short read datasets via alignment to reference, and then using these alignments to produce core SNP alignments for phylogenomics.
A presentation to a lay audience at Melbourne Knowledge Week on how bacteria are a part of our life and what we are doing with genomics to manage them.
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
I describe the three levels of parallelism that can be exploited in bioinformatics software (1) clusters of multiple computers; (2) multiple cores on each computer; and (3) vector machine code instructions.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
13. Two flavours
∷ Synthetic long reads
: Still needs an Illumina short-read sequencer
: Molecular biology tricks + local assembly / constraints
: Illumina SLR10X Genomics, Dovetail
∷ Genuine long reads
: The real deal - no tricks!
: Pacific Biosciences, Oxford Nanopore
15. Adapted from http://www.nature.com/nbt/journal/v32/n3/abs/nbt.2833.html
Synthetic long reads
1. Genomic DNA
2. Shear ~10 kbp fragments
3. Dilute ~500 fragments per well
4. Amplify
5. Shear to ~ 500 bp fragments
6. Barcode (384)
7. Short read sequencing
8. De-barcode into pools
9. De novo assemble each pool
10. Get ~500 x 10 kbp “long reads”
30. Nanopore - technology
Signal is measured
from 5 bases
Timing is irregular
Base modifications
do alter the signal
31. Nanopore - reads
2D - normal
1D complement
1D template
2D - full
hairpin
complement
template
32. Nanopore - read lengths
Read length is not limited
by technology but by
library preparation.
Can get >100kbp reads.
But not trivial to do so!
Read length
33. Nanopore - error rate
∷ 5-mer errors
∷ Homopolymer
issues
∷ Not modelling
base mods yet
∷ Changes with
pore & motor
enzyme
Percent identity (aligned)
34. MinION - applications
∷ Same as PacBio plus....
∷ Portable sequencing
: in the field eg. Josh Quick in Guinea for Ebola
: in hospitals - infection control
: monitoring - water/food supply, production facilities
: at the GP - pathogen test in 10 min from blood prick?
: spit in a home device every morning?
37. “Read until”
∷ Can access events/bases during reading
: remember reads are long 40 kbp
: examine first 100 bp say (40 bp/sec currently)
: can decide to stop reading and eject molecule!
∷ This is a killer app!
: only want pathogens? eject if human DNA
: only want exome? eject if not exonic looking
: controlled with Python code
38. “Fast mode”
∷ 2015 / MkI
: enzyme deliberately slowed down - ASIC can’t keep up!
: ~40 bases / sec / channel
: 40 bp x 24 h x 512 channels ~ 2 Gbp
∷ 2016 / MkII
: new ASIC with ~3000 channels x 500 bp / sec
: 500 bp x 1 week x 3000 channels ~ 900 Gbp
∷ PromethION
: has 48 flow cells . . . ~ 400 Tbp / week !?!
40. A new business model
∷ No capital or reagent costs
: Instrument will be free
: Flow cells will be free
: Only pay for what you want to sequence
: Min. $20 and ~$1000 for a 100x human genome
∷ But I’ll scam the system!
: Flowcell stats sent back to base
: Won’t send you new flow cells if they look unused
42. Some things never change
∷ Don’t worry!
: 50% of our job will always be converting file formats ☺
∷ HDF5 - Hierarchial Data Format
: groups, multi-dimensional, indexed, random access
: Pacbio and Nanopore produce .h5 files
∷ Can extract FASTQ from HDF5 easily
43. New work patterns
∷ Less aligning to reference, more de novo
∷ Learning to work with haplotypes
∷ More graph-based methods
∷ Complex structural variant information: VCF4.2
∷ Smaller data files ?
44. Read alignment
∷ Reads are 15% error, mainly indels
∷ PacBio
: BLASR, Daligner, MHAP
: BWA MEM: bwa mem -x pacbio
∷ Nanopore
: BWA MEM: bwa mem -x ont
: MarginAlign - sum over possible alignments
45. De novo assembly
∷ Pacbio
: Very modular system: Overlap, Layout, Consensus
: Polylploid aware assembly possible
∷ MinION
: Higher error rate, but rapid community development
: NanoCorrect + Celera Assembler + NanoPolish
∷ For existing assemblies
: Gap-filling, scaffolding/breaking, hybrid assembly
46. Streaming analysis
∷ We are not going to keep all this data
: Need to think streaming analyses
∷ Extract info we need and discard
: Cheaper to resequence?
∷ Lots of new applications
: Much scope for method development
: Even more scope for biological discovery
48. Exciting times!
∷ Genomics is changing all the time
: but science will press on
∷ Pipelines are often short lived
: except maybe clinical / accredited ones
∷ Bioinformaticians need to be able to adapt
: focus on key skills not specific apps
49. Acknowledgments
Doherty Institute
∷ Tim Stinear
∷ Ben Howden
VLSCI
∷ Andrew Lonie
∷ Helen Gardiner
∷ Dieter Bulach
∷ Simon Gladman
Twitter/Blogs
∷ Nick Loman
∷ Lex Nederbragt
∷ Keith Robison
∷ Konrad Paszkiewicz
Oxford Nanopore
∷ Clive Brown
∷ Gordon Sanghera
Millennium Science
∷ Paul Lacaze
∷ Matthew Frazer
∷ Rubber Chicken
Pacific Biosciences
∷ Siddarth Singh
∷ Jason Chin
∷ Stephen Turner
All the other CIs on the LIEF grant