This document discusses nanopore sequencing technology. It provides an overview of nanopore sequencing, including what nanopore sequencing is, the types of nanopores used (biological and solid state), advantages such as not requiring amplification or labeling, and challenges with processing large amounts of raw data. The document then examines raw nanopore data and the initial steps needed to process the data, including creating a training data set to predict genomic bases and releasing analysis packages to the community.
Oxford Nanopore was founded in Oxford Nanolabs by Dr.Gordon Sanghera, Dr.Spike Willcocks and Professor Hagan Bayley. Nanopore sequencing has been around since the 1990s, when Church et al. and Deamer and Akeson separately proposed that it is possible to sequence DNA using nanopore sensors.
Nanopore sequencing is a unique, scalable technology that enables direct, real-time analysis of long DNA or RNA fragments. It works by monitoring changes to an electrical current as nucleic acids are passed through a protein nanopore. The resulting signal is decoded to provide the specific DNA or RNA sequence.
Oxford Nanopore was founded in Oxford Nanolabs by Dr.Gordon Sanghera, Dr.Spike Willcocks and Professor Hagan Bayley. Nanopore sequencing has been around since the 1990s, when Church et al. and Deamer and Akeson separately proposed that it is possible to sequence DNA using nanopore sensors.
Nanopore sequencing is a unique, scalable technology that enables direct, real-time analysis of long DNA or RNA fragments. It works by monitoring changes to an electrical current as nucleic acids are passed through a protein nanopore. The resulting signal is decoded to provide the specific DNA or RNA sequence.
Biotechnophysics: DNA Nanopore SequencingMelanie Swan
Biophysics (not merely bioengineering) is required to understand the fundamental mechanisms of biology in order to make technologies (bench and bioinformatic) for understanding them
complete Single Nucleotide Polymorphiitsm Detection methods with Advance techniques with its applications
Single nucleotide polymorphisms are single base variations between genomes within a species.
There are at least 10 million polymorphic sites in the human genome.
SNPs can distinguish individuals from one another
Denaturing Gradient Gel Electrophoresis
Chemical Cleavage Of Mismatch
Single-stranded Conformation Polymorphism (SSCP)
MutS Protein-binding Assays
Mismatch Repair Detection (MRD)
Heteroduplex Analysis (HA)
Denaturing High Performance Liquid Chromatography (DHPLC)
UNG-Mediated T-Sequencing
RNA-Mediated Finger printing with MALDI MS Detection
Sequencing by Hybridization
Direct DNA Sequencing
Single-feature polymorphism (SFP)
Invader probe
Allele-specific oligonucleotide probes
PCR-based methods
Allele specific primers
Sequence Polymorphism-Derived (SPD) markers
Targeting induced local lesions in genomes (TILLinG)
Minisequencing primers
Allele-specific ligation probes
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
Sequencing is one of the major technological advancement that has taken shape in the last two or three decade. Starting from Sanger and Maxam-Gilbert sequencing methods to the latest high-throughput methods, sequencing technologies has changed the the landscape of biological sciences.
This slide takes a look a the major sequencing methods over time.
Note: Several images included here have been sourced from GOOGLE IMAGES. The content has been extracted from several SCIENTIFIC PAPERS and WEBSITES.
PLEASE DO CONTACT THE AUTHOR DIRECTLY IF ANY COPYRIGHT ISSUE ARISES.
whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits
Biotechnophysics: DNA Nanopore SequencingMelanie Swan
Biophysics (not merely bioengineering) is required to understand the fundamental mechanisms of biology in order to make technologies (bench and bioinformatic) for understanding them
complete Single Nucleotide Polymorphiitsm Detection methods with Advance techniques with its applications
Single nucleotide polymorphisms are single base variations between genomes within a species.
There are at least 10 million polymorphic sites in the human genome.
SNPs can distinguish individuals from one another
Denaturing Gradient Gel Electrophoresis
Chemical Cleavage Of Mismatch
Single-stranded Conformation Polymorphism (SSCP)
MutS Protein-binding Assays
Mismatch Repair Detection (MRD)
Heteroduplex Analysis (HA)
Denaturing High Performance Liquid Chromatography (DHPLC)
UNG-Mediated T-Sequencing
RNA-Mediated Finger printing with MALDI MS Detection
Sequencing by Hybridization
Direct DNA Sequencing
Single-feature polymorphism (SFP)
Invader probe
Allele-specific oligonucleotide probes
PCR-based methods
Allele specific primers
Sequence Polymorphism-Derived (SPD) markers
Targeting induced local lesions in genomes (TILLinG)
Minisequencing primers
Allele-specific ligation probes
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
Sequencing is one of the major technological advancement that has taken shape in the last two or three decade. Starting from Sanger and Maxam-Gilbert sequencing methods to the latest high-throughput methods, sequencing technologies has changed the the landscape of biological sciences.
This slide takes a look a the major sequencing methods over time.
Note: Several images included here have been sourced from GOOGLE IMAGES. The content has been extracted from several SCIENTIFIC PAPERS and WEBSITES.
PLEASE DO CONTACT THE AUTHOR DIRECTLY IF ANY COPYRIGHT ISSUE ARISES.
whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits
Detailed explanation about gene sequencing methods
Sequencing the gene is an important step toward understanding the gene.
A gene sequence contains some clues about where genes are.
Gene sequencing give us understanding how the genome as a whole works-how genes work together to direct the growth, development and maintenance of an entire organism.
It help scientists to study the part of genome outside the genes-regulatory regions
High throughput next generation sequencing and robust transcriptome analysis help with gene expression profiling, gene annotation or discovery of non-coding RNA.
A microarray is a laboratory tool used to detect the expression of thousands of genes at the same time. DNA microarrays are microscope slides that are printed with thousands of tiny spots in defined positions, with each spot containing a known DNA sequence or gene.
In the field of psychology, cognitive dissonance is the perception of contradictory information. Relevant items of information include a person's actions, feelings, ideas, beliefs, values, and things in the environment.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Instructions for Submissions thorugh G- Classroom.pptx
Nanopore sequencing .
1. S
ADAM University of Kyrgyz Republic.
Faculty of medicine
Nanopore Sequencing
Student: Tashfeen Ahmad
Group:GM_4
Teacher: Prof. Domashov Iliya
2. Nanopore Sequencing
Outline
• Nanopore Sequencing Technology
• Raw Data
• Transformations and Raw Data Processing
• Toward Producing a Basecaller
• Future Directions
3. What is Nanopore
Sequencing?
Determine the sequence of
DNA fragments by passing
DNA through a protein (or
other) pore in a membrane
Nanopore sequencing is a
third generation[1] approach
used in
the sequencing of biopolymers
-
specifically, polynucleotides in
the form of DNA or RNA.
4. What is Nanopore
Sequencing?
Oxford Nanopore became the
first company to provide a
commercially available
nanopore sequencer in 2015
(available to community in
2012)
5. What is Nanopore
Sequencing?
Nanopore is a disruptive technology:
• Sequencer Size
• Read Length
• Potential direct RNA sequencing
• Biology Problem with Data Velocity Issues
• Currently ~400GB/24 hours needs to
be processed
6. Adventages
S Using nanopore sequencing, a single molecule of DNA or RNA
can be sequenced without the need for PCR amplification or
chemical labeling of the sample.
S At least one of these aforementioned steps is necessary in the
procedure of any previously developed sequencing approach.
S Nanopore sequencing has the potential to offer relatively low-
cost genotyping, high mobility for testing, and rapid processing
of samples with the ability to display results in real-time.
7. Adventages
S Publications on the method outline its use in rapid
identification of viral pathogens,monitoring
ebola, environmental monitoring, food safety monitoring,
human genome sequencing, plant genome
sequencing,monitoring of antibiotic
resistance, haplotyping and other applications.
9. Biological Nanopore
S Also called Transmembrane protein channels
S Usually inserted into a subtrate such as planner lipid
bilayer liposomes or other polymer films
S E.g alpha hemolysin , MspA ,
13. Processing Raw Data
• First step is to create a training data set
• Starting from provided raw data followed by processing to produce useful
data set for training to predict genomic bases
• Goal is to release this package to the community for greater access to create
training data sets for this data
19. Nanopore Raw Correction
1. Center on
insertion
2. Expand to
neighboring
regions
3. Segment using
mean changepoint
Correct
insertions
:
CCC
CCCC
C
CG
G
G
GG
GGG
G
GG
G
GG
GGG
G
GGG
GG
G
G
G
GGG
GGGG
G
G
G
GGGG
GG
G
G
G
GGG
G
G
G
G
G
G
GG
G
G
G
G
G
GGGG
G
G
G
C
CC
CCCCCCCCCC
CCCCC
C
C
CCCCCCC
C
CCCCCCCCC
CCCCCC
CC
A
AAAAAAA
A
AG
GGGGGGG
G
GGGG
CCCCC
C
CCCCCCCCCCCCCCCCCCCCCC
CCCC
CCCC
CCCCCCCCCCCC
C
CCCCCC
CCCCCCCC
CTT
T
T
TTTTTTTTT
T
TT
TTT
TTTTTTTT
TTT
TTT
TTTT
TTT
T
TTTTT
T
TTTTTTTTTTT
TT
TTTTTTT
TTTTTTTT
TT
T
T
TTT
TTTTT
TTTTTTTT
T
T
GGGGGG
GGGGGGGGGG
GGGGGGGG
G
GG
GG
G
G
GGGGG
GGGG
GGGGGGGGGG
G
GG
GG
G
GGGG
G
GGGGG
G
GGGGGG
GG
G
G
G
G
GG
GGGGGG
GGGG
G
GGGGGGGGGGGG
G
G
GGG
G
G
GGGG
G
GG
GGGGGGGGGGGGGGGG
G
GGGG
GGGGGG
GGGGGGGGGGGGGGGGGG
GGG
GGGGG
G
G
GG
GGGGG
G
G
G
GGGGGGGGGGGG
GGGGGGGGGGGGGGGG
G
GGGGGGGGGG
GGGGGGGGGGG
GG
G
GG
GGGG
G
GGG
G
GGGGGGGGGGG
G
G
GGGGG
GGGGGGGGGGG
GGGGGGGGG
GG
GGGGG
GGGGGG
GGG
GGGGG
GGGG
GGGGG
G
GGGGG
GGGGGGGGG
GGGGGGGGGG
GGGG
GG
GGG
GG
GGGGGGGGGGG
G
GG
GG
G
GG
GG
G
GG
GGGG
GGG
GG
GG
GGGGGGGGG
G
G
G
GGGGGGGGG
G
GG
G
G
G
GG
G
G
G
G
GG
GGGGGG
G
GG
GGG
G
G
GG
G
GG
GG
G
GG
G
G
GGG
G
G
GG
C
CCCCCC
CCCCCCC
C
A
AAA
AAAA
AAA
AA
AAA
A
T
TTTTTTTTTTTTTTT
TTTT
T
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTT
AAAA
AAAAAAAAAAAA
AAAAAAAAAAAAAAA
CCCCC
C
CCC
CCCCCCCCCCC
CCCCCCC
CCC
CCCCCCCCCCC
CC
CCCTTTTTTTTT
TTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTT
TT
T
G
GGGGGG
GGGG
DDDDDDDDDD
DDDDDDD
DDDD
D
DDDD
DD
D
DD
D
D
G
GG
GGGG
GG
GG
CC
C
CC
CCCCCC
CCCC
CCCC
A
A
AA
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
AA
AAAA
A
A
AAA
AAA
A
AA
A
AA
AA
AA
AA
A
AA
A
A
A
AAAA
A
A
AAAAAAAAAA
CCCCC
CCCCCCCC
C
C
C
CCCCCCC
C
C
CCCCCCC
CCCCCC
C
CCC
CC
C
CCCC
GGGGGGG
T
6523000 4652400
CCC
C
C
CG
G
G
GGGGG
G
GG
G
GGGGGGGGG
GG
GG
G
GGGGGGG
GGG
GGGG
GG
G
G
G
GGG
G
G
G
G
G
GGG
GG
G
G
G
GGGGGG
G
C
CC
CCC
CCCCCCC
CCCCC
C
C
CCCCCCC
C
CCCCCCCCC
CCCCCC
CC
A
AAAAAAA
A
AG
GGGGGGG
G
GGGG
CCCCC
C
CCCCCCCCCCCC
CCCCCCCCC
C
C
CCC
CCCC
CCCCCCCCCCCC
C
CCC
CCC
CCCCCCCC
C
TTT
T
TTTTTTTTTT
TTTTT
TTTTTTTT
TTT
TTT
TTTTTTT
T
TTTTT
TTTTTTTTTTTT
TT
TTTTTTTTTTTTTTTTTT
T
TTT
TTTTT
TTTTTTTT
T
T
GGGGG
G
GGGGGGGGG
G
GGGGG
G
GG
G
GG
GG
G
G
GGGG
G
GGGG
GGGGGGGGG
G
G
G
G
GG
G
GGGG
G
GGGGG
G
GGGGGG
GG
G
G
G
G
GG
GGGGGG
TTTT
T
TTTTTTTTTTTT
T
T
TTT
T
T
GGGG
G
GG
GGGGGGGGGGGGGGGG
G
GGGG
GGGGGG
GGGGGGGGGGGGGGGGGG
GGG
GGGGG
G
G
GG
GGGGG
G
G
G
GGGGGGGGGGGG
GGGGGGGGGGGGGGGG
G
G
GGGGGGGGG
GGGGGGGGGGG
GG
G
GG
GGGG
G
GGG
G
GGGGGGGGGGG
G
G
GGGGG
GGGGGGGGGGG
GGGGGGGGG
GG
GGGGG
GGGGGG
GGG
GGGGG
GGGG
GGGGG
G
GGGGG
GGGGGGGGG
GGGGGGGGGG
GGGG
GG
GGG
GG
GGGGGGGGGGG
G
GG
GG
G
GG
GG
G
GG
GGGG
GGG
GG
GG
GGGGGGGGG
G
G
G
GGGGGGGGG
G
GG
G
G
G
GG
G
GG
G
GG
GGGGGGGGG
GGG
G
G
GG
GGGGG
G
GG
G
G
GGG
G
G
GG
C
CCCCCC
CCCCCCC
C
A
AAA
AAAA
AAA
AA
AAA
AT
TTTTTTT
TTTTTTT
T
T
TTT
T
T
TTT
TTTTTTTT
TTTT
TTTTTT
TTTTT
T
TTTTTTTTTT
T
TTTTTTTT
TTTTTTTTTTTTTTTTTTT
TTTTTTTT
TT
T
TT
TTTTTTTT
AAAA
A
AAAA
AAAAAA
A
AAAAAAAAAAAAAAA
CCCCC
C
CCC
CCCCCCCCCCC
CCCCCCC
CCC
CC
C
CCCCCCC
C
CC
C
CCTTTTTTTT
T
TTTTTTTTTTT
TTTTTTTTT
T
TT
T
TTTT
TT
T
TTTTTTTT
T
TT
T
G
GGGGGG
GGGG
GGGGGGGGGG
GGGGGGG
GGGG
G
GGGG
GG
G
GG
G
G
G
GG
GGGGGG
GG
C
C
C
CC
CCCCCC
CCCC
CCCC
A
A
AA
A
A
AAAAA
A
A
AA
A
AAAAAAAAAAAA
AAA
AAA
A
AA
A
AA
AA
AA
AAA
AAA
A
A
AAAA
A
A
AAAAAAAAAA
CCCCCCCCCCCCCCC
C
CCCCCCC
CCCCCCCCCCCCCCCC
CCCCC
C
CCCC
GGGGGGG
6523000 4652400
CCC
CCCC
C
CG
G
G
GG
GGG
G
GG
G
GG
GGG
G
GGG
GG
G
G
G
GGG
GGGG
G
G
G
GGGG
GG
G
G
G
GGG
G
G
G
G
G
G
GG
G
G
G
G
G
GGGG
G
G
G
C
CC
CCCCCCCCCC
CCCCC
C
C
CCCCCCC
C
CCCCCCCCC
CCCCCC
CC
A
AAAAAAA
A
AG
GGGGGGG
G
GGGG
CCCCC
C
CCCCCCCCCCCCCCCCCCCCCC
CCCC
CCCC
CCCCCCCCCCCC
C
CCCCCC
CCCCCCCC
CTT
T
T
TTTTTTTTT
T
TT
TTT
TTTTTTTT
TTT
TTT
TTTT
TTT
T
TTTTT
T
TTTTTTTTTTT
TT
TTTTTTT
TTTTTTTT
TT
T
T
TTT
TTTTT
TTTTTTTT
T
T
GGGGGG
GGGGGGGGGG
GGGGGGGG
G
GG
GG
G
G
GGGGG
GGGG
GGGGGGGGGG
G
GG
GG
G
GGGG
G
GGGGG
G
GGGGGG
GG
G
G
G
G
GG
GGGGGG
GGGG
G
GGGGGGGGGGGG
G
G
GGGGG
GGGG
G
GG
GGGGGGGGGGGGGGGG
G
GGGG
GGGGGG
GGGGGGGGGGGGGGGGGG
GGG
GGGGG
G
G
GG
GGGGG
G
G
G
GGGGGGGGGGGG
GGGGGGGGGGGGGGGG
G
G
GGGGGGGGG
GGGGGGGGGGG
GG
G
GG
GGGG
G
GGG
G
GGGGGGGGGGG
G
G
GGGGG
GGGGGGGGGGG
GGGGGGGGG
GG
GGGGG
GGGGGG
GGG
GGGGG
GGGG
GGGGG
G
GGGGG
GGGGGGGGG
GGGGGGGGGG
GGGG
GG
GGG
GG
GGGGGGGGGGG
G
GG
GG
G
GG
GG
G
GG
GGGG
GGG
GG
GG
GGGGGGGGG
G
G
G
GGGGGGGGG
G
GG
G
G
G
GG
G
G
G
G
GG
GGGGGG
G
GG
GGG
G
G
GG
G
GG
GG
G
GG
G
G
GGG
G
G
GG
C
CCCCCC
CCCCCCC
C
A
AAA
AAAA
AAA
AA
AAA
A
T
TTTTTTTTTTTTTTT
TTTT
T
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTT
AAAA
AAAAAAAAAAAA
AAAAAAAAAAAAAAA
CCCCC
C
CCC
CCCCCCCCCCC
CCCCCCC
CCC
CCCCCCCCCCC
CC
CCCTTTTTTTTT
TTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTT
TT
T
G
GGGGGG
GGGG
DDDDDDDDDD
DDDDDDD
DDDD
D
DDDD
DD
D
DD
D
D
G
GG
GGGG
GG
GG
CC
C
CC
CCCCCC
CCCC
CCCC
A
A
AA
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
AA
AAAA
A
A
AAA
AAA
A
AA
A
AA
AA
AA
AA
A
AA
A
A
A
AAAA
A
A
AAAAAAAAAA
CCCCC
CCCCCCCC
C
C
C
CCCCCCC
C
C
CCCCCCC
CCCCCC
C
CCC
CC
C
CCCC
GGGGGGG
T
6523000 4652400
CCC
C
C
CG
G
G
GGGGG
G
GG
G
GGGGGGGGG
GG
GG
G
GGGGGGG
GGG
GGGG
GG
G
G
G
GGG
G
G
G
G
G
GGG
GG
G
G
G
GGGGGG
G
C
CC
CCC
CCCCCCC
CCCCC
C
C
CCCCCCC
C
CCCCCCCCC
CCCCCC
CC
A
AAAAAAA
A
AG
GGGGGGG
G
GGGG
CCCCC
C
CCCCCCCCCCCC
CCCCCCCCC
C
C
CCC
CCCC
CCCCCCCCCCCC
C
CCC
CCC
CCCCCCCC
C
TTT
T
TTTTTTTTTT
TTTTT
TTTTTTTT
TTT
TTT
TTTTTTT
T
TTTTT
TTTTTTTTTTTT
TT
TTTTTTTTTTTTTTTTTT
T
TTT
TTTTT
TTTTTTTT
T
T
GGGGG
G
GGGGGGGGG
G
GGGGG
G
GG
G
GG
GG
G
G
GGGG
G
GGGG
GGGGGGGGG
G
G
G
G
GG
G
GGGG
G
GGGGG
G
GGGGGG
GG
G
G
G
G
GG
GGGGGG
TTTT
T
TTTTTTTTTTTT
T
T
TTT
T
T
GGGG
G
GG
GGGGGGGGGGGGGGGG
G
GGGG
GGGGGG
GGGGGGGGGGGGGGGGGG
GGG
GGGGG
G
G
GG
GGGGG
G
G
G
GGGGGGGGGGGG
GGGGGGGGGGGGGGGG
G
G
GGGGGGGGG
GGGGGGGGGGG
GG
G
GG
GGGG
G
GGG
G
GGGGGGGGGGG
G
G
GGGGG
GGGGGGGGGGG
GGGGGGGGG
GG
GGGGG
GGGGGG
GGG
GGGGG
GGGG
GGGGG
G
GGGGG
GGGGGGGGG
GGGGGGGGGG
GGGG
GG
GGG
GG
GGGGGGGGGGG
G
GG
GG
G
GG
GG
G
GG
GGGG
GGG
GG
GG
GGGGGGGGG
G
G
G
GGGGGGGGG
G
GG
G
G
G
GG
G
GG
G
GG
GGGGGGGGG
GGG
G
G
GG
GGGGG
G
GG
G
G
GGG
G
G
GG
C
CCCCCC
CCCCCCC
C
A
AAA
AAAA
AAA
AA
AAA
AT
TTTTTTT
TTTTTTT
T
T
TTT
T
T
TTT
TTTTTTTT
TTTT
TTTTTT
TTTTT
T
TTTTTTTTTT
T
TTTTTTTT
TTTTTTTTTTTTTTTTTTT
TTTTTTTT
TT
T
TT
TTTTTTTT
AAAA
A
AAAA
AAAAAA
A
AAAAAAAAAAAAAAA
CCCCC
C
CCC
CCCCCCCCCCC
CCCCCCC
CCC
CC
C
CCCCCCC
C
CC
C
CCTTTTTTTT
T
TTTTTTTTTTT
TTTTTTTTT
T
TT
T
TTTT
TT
T
TTTTTTTT
T
TT
T
G
GGGGGG
GGGG
GGGGGGGGGG
GGGGGGG
GGGG
G
GGGG
GG
G
GG
G
G
G
GG
GGGGGG
GG
C
C
C
CC
CCCCCC
CCCC
CCCC
A
A
AA
A
A
AAAAA
A
A
AA
A
AAAAAAAAAAAA
AAA
AAA
A
AA
A
AA
AA
AA
AAA
AAA
A
A
AAAA
A
A
AAAAAAAAAA
CCCCCCCCCCCCCCC
C
CCCCCCC
CCCCCCCCCCCCCCCC
CCCCC
C
CCCC
GGGGGGG
6523000 4652400
21. Toward a Basecaller
Post correction and
normalization distributions
• Clearly some signal exists
before complex machine
learning
• ~13% accuracy achievable by
nearest mean calculations
22. Toward a Basecaller
• Oxford Nanopore has recently upgraded to a RNN basecaller which
produces reads with ~85% accuracy, thought it is still computationally
intensive
• Larger sequencer (PromethION) produces 12Tb of data in 48 hours (up to
1.44GBps) with current machine requiring ~1kW.
23. Toward a Basecaller
Current event (base) segmentation is
done using an FPGA t-test and all
computation (RNN) is completed on
the mean and SD of these segments
We are currently working to integrate
basecalling and segmentation directly
from the raw data via an RNN with
potentially vast improvements in
accuracy as well as speed which will
become increasingly important with
throughput improvements. 0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPRate
TPRate
−2
−1
0
1
log10FDR
24. Challenges
• Data Velocity
• Basecaller must be able to keep up with the increasing speed of the data
• Accuracy
• Basecaller must be accurate enough to provide meaningful biological
insight
• Adaptabiltiy
• Would like to be able to interrogate the data in order to assess confidence
as well as possible alterations outside of the given model
25. Future Directions
• Produce 1D basecalls on par with current algorithms ~70-80%
• Exploring architectures and pre-processing
• Investigate base alterations (methylation, acetylation, etc.) via encoding layers
• Release package to create raw data training sets and provide QC metrics for
raw reads.