Long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore offer several advantages over short read technologies. They can generate reads of over 100kb which allows for untangling of repeats and completion of genomes and phasing of haplotypes. While PacBio is more established, Nanopore offers the potential for real-time, portable sequencing. Both require adaptation of bioinformatics tools and analysis approaches. The new technologies will change genomics jobs by moving more to streaming analysis and requiring skills in adapting to changing technologies.
The next generation sequencing platform of roche 454creativebiogene1
454 is totally different from Solexa and Hiseq of Illumina. The disadvantage of 454 is that it is unable to accurately measure the homopolymer length. For this unavoidable reason, 454 technology will introduce insertion and deletion sequencing errors to the results.
An introduction to the tools and methods used for the bioinformatics analysis of ChIP-Seq data.
Written and delivered for the "Epigenetics and its applications in clinical research" course at the Karolinska Institute in Stockholm, Sweden.
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
The continuous evolution of NGS technology has led to an enormous diversification in NGS applications and dramatically decreased the costs to sequence a complete human genome.
In this presentation, we will discuss the following major topics:
• Basic overview of NGS sequencing technologies
• Next-generation sequencing workflow
• Spectrum of NGS applications
• QIAGEN universal NGS solutions
Sequencing is one of the major technological advancement that has taken shape in the last two or three decade. Starting from Sanger and Maxam-Gilbert sequencing methods to the latest high-throughput methods, sequencing technologies has changed the the landscape of biological sciences.
This slide takes a look a the major sequencing methods over time.
Note: Several images included here have been sourced from GOOGLE IMAGES. The content has been extracted from several SCIENTIFIC PAPERS and WEBSITES.
PLEASE DO CONTACT THE AUTHOR DIRECTLY IF ANY COPYRIGHT ISSUE ARISES.
A class of DNA sequencing techniques currently in active development is third-generation sequencing, commonly referred to as long-read sequencing. In comparison to second generation sequencing, also referred to as next generation sequencing, third generation sequencing technologies have the capacity to create noticeably longer reads.
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
Long read sequencing - the good, the bad, and the really cool. Covers Illumina SLR, Pacbio RSII and Oxford Nanopore as of June 2015. Discusses bioinformatics differences of long reads over short reads.
Presentation from the 3rd Joint Meeting of the Antimicrobial Resistance and Healthcare-Associated Infections (ARHAI) Networks, organised by the European Centre of Disease Prevention and Control - Stockholm, 11-13 February 2015
The next generation sequencing platform of roche 454creativebiogene1
454 is totally different from Solexa and Hiseq of Illumina. The disadvantage of 454 is that it is unable to accurately measure the homopolymer length. For this unavoidable reason, 454 technology will introduce insertion and deletion sequencing errors to the results.
An introduction to the tools and methods used for the bioinformatics analysis of ChIP-Seq data.
Written and delivered for the "Epigenetics and its applications in clinical research" course at the Karolinska Institute in Stockholm, Sweden.
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
The continuous evolution of NGS technology has led to an enormous diversification in NGS applications and dramatically decreased the costs to sequence a complete human genome.
In this presentation, we will discuss the following major topics:
• Basic overview of NGS sequencing technologies
• Next-generation sequencing workflow
• Spectrum of NGS applications
• QIAGEN universal NGS solutions
Sequencing is one of the major technological advancement that has taken shape in the last two or three decade. Starting from Sanger and Maxam-Gilbert sequencing methods to the latest high-throughput methods, sequencing technologies has changed the the landscape of biological sciences.
This slide takes a look a the major sequencing methods over time.
Note: Several images included here have been sourced from GOOGLE IMAGES. The content has been extracted from several SCIENTIFIC PAPERS and WEBSITES.
PLEASE DO CONTACT THE AUTHOR DIRECTLY IF ANY COPYRIGHT ISSUE ARISES.
A class of DNA sequencing techniques currently in active development is third-generation sequencing, commonly referred to as long-read sequencing. In comparison to second generation sequencing, also referred to as next generation sequencing, third generation sequencing technologies have the capacity to create noticeably longer reads.
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
Long read sequencing - the good, the bad, and the really cool. Covers Illumina SLR, Pacbio RSII and Oxford Nanopore as of June 2015. Discusses bioinformatics differences of long reads over short reads.
Presentation from the 3rd Joint Meeting of the Antimicrobial Resistance and Healthcare-Associated Infections (ARHAI) Networks, organised by the European Centre of Disease Prevention and Control - Stockholm, 11-13 February 2015
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
I describe the three levels of parallelism that can be exploited in bioinformatics software (1) clusters of multiple computers; (2) multiple cores on each computer; and (3) vector machine code instructions.
Why and how to clean Illumina genome sequencing reads. Includes illustrative examples, and a case where a project was saved by using Nesoni clip: to discover the cause of non-mapping reads.
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...Игорь Шадеркин
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but valid data are lacking in many geographic areas
Magnus Unemo, PhD, Assoc. Professor
Reference Laboratory for Pathogenic Neisseria
Department of Clinical Microbiology
Örebro University Hospital
Sweden
Gary Grider from Los Alamos National Laboratory presented this deck at the 2016 OpenFabrics Workshop.
"Trends in computer memory/storage technology are in flux perhaps more so now than in the last two decades. Economic analysis of HPC storage hierarchies has led to new tiers of storage being added to the next fleet of supercomputers including Burst Buffers or In-System Solid State Storage and Campaign Storage. This talk will cover the background that brought us these new storage tiers and postulate what the economic crystal ball looks like for the coming decade. Further it will suggest methods of leveraging HPC workflow studies to inform the continued evolution of the HPC storage hierarchy."
Watch the video presentation: https://www.youtube.com/watch?v=iDYLIpF-6Ew
See more talks from the Open Fabrics Workshop: http://insidehpc.com/2016-open-fabrics-workshop-video-gallery/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Keynote given at BOSC, 2010.
Does the hype surrounding cloud match the reality?
Can we use them to solve the problems in provisioning IT services to support next-generation sequencing?
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes. Overview of work underway to add applications and computational analysis pipelines to iPlant for metagenomics and microbial ecology.
Under the Hood of Alignment Algorithms for NGS ResearchersGolden Helix Inc
Most NGS analysis is founded on a very simple and powerful principle: look only at the differences of your data to a reference genome of your species. Alignment algorithms are the workhorse of this approach and accounts for the vast majority of the compute time necessary in a secondary analysis workflow. In this webcast, Gabe Rudy covers the history of alignment algorithms of short read, high-throughput sequencing data and the set of tools that represent the state of the art.
We will use the newly launched GenomeBrowse 2.0 visualization engine to review examples of different alignment artifacts, false-positive variant calls, and other alignment and variant meta-data.
What you can expect to learn:
- How all alignment algorithms are a trade-off of speed versus accuracy, and what those trade-offs can mean with your data.
- How the human reference sequence causes alignment artifacts, and how you can spot them.
- How BWA, BWA-MEM and BWA-SW differ.
- How local re-alignment works to improve variant calling, and when you will see it and won't see it in action in your data.
- How to read a CIGAR string and other per-alignment data to investigate alignments at a particular locus.
We will use the newly launched GenomeBrowse 2.0 visualization engine to review examples of different alignment artifacts, false-positive variant calls, and other alignment and variant meta-data.
Enabling Large Scale Sequencing Studies through Science as a ServiceJustin Johnson
Now
“Now” generation sequencing has drastically changed the traditional costs and infrastructure within the sequencing community. There are several technologies, platforms and algorithms that show promise, but it is not always intuitive where to start. This uncertainty is compounded by the fact that commonly used analysis tools are difficult to build, maintain, and run effectively. Sample acquisition and preparation is quickly becoming a bottleneck as projects move from small sample sizes to hundreds or even thousands of samples. We will present case studies highlighting information, methods, challenges and opportunities in leveraging large scale high throughput sequencing and bioinformatics. Specifically we will highlight a recent genome-wide study of methylation patterns in 1575 individuals with Schizophrenia. We will also discuss several cancer transcriptome and exome sequencing projects as well as a human pathogen transcriptome characterization project consisting of multiple organisms and almost a billion reads.
The Future
The Ion Torrent PGM machine is a very promising, rapid throughput, ultra scalable sequencer that could play an integral part in future human health studies. Applications such as microbial whole genome sequencing, metagenomic characterization of environmental and microbiome sample, and targeted resequencing projects stand to benefit from this technology over time. To date we have completed more than 25 runs on a single PGM and will comment on the setup as well as sequence data and analysis.
The computational requirements of next generation sequencing is placing a huge demand on IT organisations .
Building compute clusters is now a well understood and relatively straightforward problem. However, NGS sequencing applications require large amounts of storage, and high IO rates.
This talk details our approach for providing storage for next-gen sequencing applications.
Talk given at BIO-IT World, Europe, 2009.
Similar to Long read sequencing - LSCC lab talk - fri 5 june 2015 (20)
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
"Bioinformatics tools for the diagnostic laboratory" presented at the Australian Society for Antimicrobials 2016 annual conference in Melbourne Australia. Slides are aimed at a biological / pathology / clinican audience. Some material has been re-imagined from Nick Loman's ECCMID 2015 talk.
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...Torsten Seemann
This talk introduces a Linux Professional audience to bacterial genomics and modern sequencing technology. The title is slightly misleading and is a bit of clickbait. The diagrams are good.
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
An introduction to basic genomics bioinformatics concepts in 20 minutes for an audience of clinicians, epidemiologists and other public health officials.
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
How genomics is changing the practice of public health microbiology. The role of whole genome sequencing as the "one true assay". Another powerful tool for the epidemiologist.
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
Invited talk at the Australian Society for Microbiology Annual Conference 2014 on "FriPan" our tool for visualizing bacterial pan genomes across 10-100s of isolates.
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
Using Snippy to call variants in bacterial short read datasets via alignment to reference, and then using these alignments to produce core SNP alignments for phylogenomics.
A presentation to a lay audience at Melbourne Knowledge Week on how bacteria are a part of our life and what we are doing with genomics to manage them.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
22. Nanopore - types of reads
“1D reads”
∷ Template 1D
﹕ only fwd stran
∷ Complement 1D
﹕ only rev strand
“2D reads”
∷ Normal 2D
﹕ mostly fwd, some rev
∷ Full 2D
﹕ most of fwd & rev
﹕ these are high quality
23. Nanopore - read lengths
Read length is not limited
by technology but by
library preparation.
Can get >100kbp reads.
Read length
24. Nanopore - error rate
∷ 5-mer errors
∷ Not modelling
base mods yet
∷ Basically
where PacBio
was a few
years ago!
Percent identity (aligned)
25. MinION - applications
∷ Same as PacBio plus....
∷ Portable sequencing
: in the field eg. Josh Quick in Guinea for Ebola
: in hospitals - infection control
: monitoring - water/food supply, production facilities
: at the GP - pathogen test in 10 min from blood prick?
: spit in a home device every morning?
26. MinION - bioinformatics
∷ Event space -vs- base space
: MinION MkI - base calling in cloud (Metrichor)
: MinION MkII - on device?
: PromethION - can choose on-device add-on
∷ Mostly 3rd-party tools - lots of activity
: poretools, poRe
: minoTour, nanoPolish
29. “Read until”
∷ Can access events/bases during reading
: remember reads are long 40 kbp
: examine first 100 bp say
: can decide to stop reading and eject molecule!
∷ This is a killer app!
: only want pathogens? eject if human DNA
: only want exome? eject if not exonic looking
: controlled with Python code
31. A new business model
∷ No capital or reagent costs
: Instrument will be free
: Flow cells will be free
: Only pay for what you want to sequence
: Min. $20 and ~$1000 for a 100x human genome
∷ But I’ll scam the system!
: Flowcell stats sent back to base
: Won’t send you new flow cells if they look unused
33. Some things never change
∷ Don’t worry!
: 50% of our job will always be converting file formats ☺
∷ But things are improving
: Pacbio: HDF5
: MinION: HDF5 / FAST5
∷ Can convert .h5/.hd5 to .fastq easily
34. Read alignment
∷ PacBio
: BLASR - Basic Local Alignment + Successive Refinement
: BWA MEM - bwa mem -x pacbio
∷ MinION
: MarginAlign - sum over possible alignments, HMMs
: BWA MEM - bwa mem -x ont
∷ Need to modify variant caller parameters
35. De novo assembly
∷ Pacbio
: HGAP, HGAP2, Falcon, Spades, Celera Assembler
∷ MinION
: Spades, Celera Assembler, NanoPolish
∷ Lots of convergence
: Similar error models (indels)
: Long reads, lower coverage - back to the future!
36. Streaming analysis
∷ We are not going to keep all this data
∷ Extract info we need and discard
∷ Cheaper to resequence?
∷ Need to think streaming analyses
∷ Lots of new applications
38. Exciting times!
∷ Genomics is changing all the time
: new technologies
: changing attributes/properties of current technology
∷ Bioinformaticians need to be able to adapt
: focus on key skills not specific apps
∷ Pipelines are often short lived
: except maybe clinical / accredited ones