SlideShare a Scribd company logo
1 of 1
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput DNA
Sequence Data
Mackenna Galicia - UC Davis Genome Center, Bioinformatics Core
B.S. Biotechnology, UC Davis 2019
Supervisor: Matthew Settles
Abstract
Background Information
Methods
Discussion/Conclusion
I would like to thank my supervisor and mentor, Matthew
Settles, for proposing and guiding me throughout this research
project; Zev Kronenberg for the support and providing me
with the similarity search tool, Bevel.
My project utilizes a sequence analysis technique, k-mer
minimizers, to identify bacterium from a shotgun genomic
DNA sample. We used the algorithm, Bevel, to compare DNA
sequences against standardized referenced genomes in the
PATRIC whole genome bacterial database. Bevel is a sequence
similarity tool that uses a minimizer database. Minimizers are
representative k-mers, subsequences of length k observed to
have the minimum hash value across a genomic region and are
therefore unique and comparable to that genomic region. The
two databases are queried against each other, resulting in a list
of positions where two or more sequences match. I am
developing two Python applications that first, process the
results of the algorithm and secondly, return a score that
enable the ranking of bacterium matches. The higher the
score, the better the match between the unknown bacteria and
the standardized reference genome.
Sample “Seqmatch” Output
What is Bioinformatics?
● Combines the elements of biology, computer science, and
statistics to work with genome sequencing
● Large genomes are difficult to sequence due to their size
and complex structure, so bioinformatics is an efficient way
to sequence the genomes
What is Whole Shotgun Genome Sequencing?
● A quick, efficient, and more accurate way to sequence large
genomes
● Cuts genome into small fragments of DNA that are then
reassembled by computer programs
Reads are the small fragments of DNA produced from Whole
Shotgun Sequencing. The sequence reads are assembled and form
contiguous genomic sequences called contigs. Scaffolds consist of
one or more contigs, typically joined with NNN’s which represent
sequencing gaps. The scaffolds are then properly ordered,
oriented, and assembled to form complete assemblies.
What are k-mer minimizers?
● A hash-based counting method that reduces redundancy
from neighboring k-mers, who differ from each other in
only one nucleotide position
Future work to build upon this project would include:
1. Continue collection of query minimizer scores of
query-target sequences pairs remaining to be processed
2. Correlate the results of my project with the previous
findings acquired in a laboratory
● The goal of this experiment is to show that minimizers
are a fast mean of characterizing bacterial shotgun
assembly contigs
● Given assembled contigs we can compare those to a
database of whole genome sequences
● The Query Sequence and the Target Sequence with the
most matches is likely the same organism
● This minimizer approach is used to identify unknown
samples, or to check for contamination, samples with
multiple organisms in it
Sample “Bevel” Output
Whole Shotgun Sequencing
Dot Plot Results
K-mer Minimizers
Acknowledgements
● Running Bevel
○ Store every other match (-w 2)
○ K-mer/word size of 15 (-k 15)
○ Filter matches occurring > 10 times (-n
10)
● Tally the hits/matches and assign a score.
● Target sequences with a higher score
suggest a likely match with the querying
organism.
Why use a Dot Plot?
● Useful to easily identify long regions of strong similarity between two
sequences
● Clearly reveals the presence of insertions, deletions, and mutations that are
usually hard to identify with other methods
● Plot of target sequence accn|CP005975 and query
sequence NODE_2_length_654753_cov_26.8031_ID_3779
● The diagonal line of dots shows the regions of local
similarity between the two sequences
● The gaps in the diagonal lines represent mutations or
distinctions between the sequences
● Isolated dots outside of the diagonal line represent random
matches
● The Bevel output provides a “raw” listing of all target
sequences (and query sequences) with more than one match
with an organism sequence
● Seqmatch is a Python application I created that takes the
Bevel output for each query/target sequence pair and
calculates and assigns a “minimizer score”
● The higher the score, the greater number of “hits” or
matches between the two sequences
● The score is the sum of all query minimizers for each
unique target/query sequence ID pair
https://www.ncbi.nlm.nih.gov/nuccore/CP005975.1
https://en.wikipedia.org/wiki/Shotgun_sequencing
Sample GenBank Result
● Using the highest query minimizer scores (“best matches”), we
can search the NCBI GenBank for unidentified bacteria using
their accession number
Future Work
(A) The two sequences are broken down
into its constituent k-mers.
(B) All k-mers are converted into hash
values. In this example, the window
size is four (r1...r4).
(C) The lowest hash scores
(minimizers/min-mers) for each k-mer
is extracted and listed.
(D) The fragments are assembled
according to the four lowest
minimizers to find overlapped regionshttp://dx/doi.org/10.1101/008003

More Related Content

What's hot

A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...
A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...
A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...ijwmn
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locationsA novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locationsiosrjce
 
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricPR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricTaesu Kim
 
Privacy preserving and truthful detection of packet dropping attacks in wirel...
Privacy preserving and truthful detection of packet dropping attacks in wirel...Privacy preserving and truthful detection of packet dropping attacks in wirel...
Privacy preserving and truthful detection of packet dropping attacks in wirel...LogicMindtech Nologies
 
Interpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachInterpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachElena Sügis
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniquesvivatechijri
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksDR.P.S.JAGADEESH KUMAR
 
Neuron level interpretation of deep nlp model
Neuron level interpretation of deep nlp model Neuron level interpretation of deep nlp model
Neuron level interpretation of deep nlp model Shreya Goyal
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationBunyamin Sisman
 
Internet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining TechniquesInternet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining Techniquesiosrjce
 
Network Security IEEE 2015 Projects
Network Security IEEE 2015 ProjectsNetwork Security IEEE 2015 Projects
Network Security IEEE 2015 ProjectsVijay Karan
 
Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...
Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...
Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...IRJET Journal
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal ClubMed_KU
 

What's hot (18)

A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...
A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...
A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
A novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locationsA novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locations
 
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricPR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
 
W4301117121
W4301117121W4301117121
W4301117121
 
Privacy preserving and truthful detection of packet dropping attacks in wirel...
Privacy preserving and truthful detection of packet dropping attacks in wirel...Privacy preserving and truthful detection of packet dropping attacks in wirel...
Privacy preserving and truthful detection of packet dropping attacks in wirel...
 
Interpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachInterpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approach
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural Networks
 
Spam email filtering
Spam email filteringSpam email filtering
Spam email filtering
 
Neuron level interpretation of deep nlp model
Neuron level interpretation of deep nlp model Neuron level interpretation of deep nlp model
Neuron level interpretation of deep nlp model
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug Localization
 
1855 1860
1855 18601855 1860
1855 1860
 
Internet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining TechniquesInternet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining Techniques
 
Network Security IEEE 2015 Projects
Network Security IEEE 2015 ProjectsNetwork Security IEEE 2015 Projects
Network Security IEEE 2015 Projects
 
Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...
Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...
Common-Key Encryption in Duplex Server with Key Search for Reliable Distort S...
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club
 

Similar to The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput DNA Sequence Data

Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHGPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
 
Optimized cartesian k means
Optimized cartesian k meansOptimized cartesian k means
Optimized cartesian k meansieeepondy
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitantChoksi1
 

Similar to The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput DNA Sequence Data (20)

Database Searching
Database SearchingDatabase Searching
Database Searching
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Parwati sihag
Parwati sihagParwati sihag
Parwati sihag
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Molecular Biology Software Links
Molecular Biology Software LinksMolecular Biology Software Links
Molecular Biology Software Links
 
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHGPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
 
Ijetr042111
Ijetr042111Ijetr042111
Ijetr042111
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Use of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay DesignUse of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay Design
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Optimized cartesian k means
Optimized cartesian k meansOptimized cartesian k means
Optimized cartesian k means
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput DNA Sequence Data

  • 1. The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput DNA Sequence Data Mackenna Galicia - UC Davis Genome Center, Bioinformatics Core B.S. Biotechnology, UC Davis 2019 Supervisor: Matthew Settles Abstract Background Information Methods Discussion/Conclusion I would like to thank my supervisor and mentor, Matthew Settles, for proposing and guiding me throughout this research project; Zev Kronenberg for the support and providing me with the similarity search tool, Bevel. My project utilizes a sequence analysis technique, k-mer minimizers, to identify bacterium from a shotgun genomic DNA sample. We used the algorithm, Bevel, to compare DNA sequences against standardized referenced genomes in the PATRIC whole genome bacterial database. Bevel is a sequence similarity tool that uses a minimizer database. Minimizers are representative k-mers, subsequences of length k observed to have the minimum hash value across a genomic region and are therefore unique and comparable to that genomic region. The two databases are queried against each other, resulting in a list of positions where two or more sequences match. I am developing two Python applications that first, process the results of the algorithm and secondly, return a score that enable the ranking of bacterium matches. The higher the score, the better the match between the unknown bacteria and the standardized reference genome. Sample “Seqmatch” Output What is Bioinformatics? ● Combines the elements of biology, computer science, and statistics to work with genome sequencing ● Large genomes are difficult to sequence due to their size and complex structure, so bioinformatics is an efficient way to sequence the genomes What is Whole Shotgun Genome Sequencing? ● A quick, efficient, and more accurate way to sequence large genomes ● Cuts genome into small fragments of DNA that are then reassembled by computer programs Reads are the small fragments of DNA produced from Whole Shotgun Sequencing. The sequence reads are assembled and form contiguous genomic sequences called contigs. Scaffolds consist of one or more contigs, typically joined with NNN’s which represent sequencing gaps. The scaffolds are then properly ordered, oriented, and assembled to form complete assemblies. What are k-mer minimizers? ● A hash-based counting method that reduces redundancy from neighboring k-mers, who differ from each other in only one nucleotide position Future work to build upon this project would include: 1. Continue collection of query minimizer scores of query-target sequences pairs remaining to be processed 2. Correlate the results of my project with the previous findings acquired in a laboratory ● The goal of this experiment is to show that minimizers are a fast mean of characterizing bacterial shotgun assembly contigs ● Given assembled contigs we can compare those to a database of whole genome sequences ● The Query Sequence and the Target Sequence with the most matches is likely the same organism ● This minimizer approach is used to identify unknown samples, or to check for contamination, samples with multiple organisms in it Sample “Bevel” Output Whole Shotgun Sequencing Dot Plot Results K-mer Minimizers Acknowledgements ● Running Bevel ○ Store every other match (-w 2) ○ K-mer/word size of 15 (-k 15) ○ Filter matches occurring > 10 times (-n 10) ● Tally the hits/matches and assign a score. ● Target sequences with a higher score suggest a likely match with the querying organism. Why use a Dot Plot? ● Useful to easily identify long regions of strong similarity between two sequences ● Clearly reveals the presence of insertions, deletions, and mutations that are usually hard to identify with other methods ● Plot of target sequence accn|CP005975 and query sequence NODE_2_length_654753_cov_26.8031_ID_3779 ● The diagonal line of dots shows the regions of local similarity between the two sequences ● The gaps in the diagonal lines represent mutations or distinctions between the sequences ● Isolated dots outside of the diagonal line represent random matches ● The Bevel output provides a “raw” listing of all target sequences (and query sequences) with more than one match with an organism sequence ● Seqmatch is a Python application I created that takes the Bevel output for each query/target sequence pair and calculates and assigns a “minimizer score” ● The higher the score, the greater number of “hits” or matches between the two sequences ● The score is the sum of all query minimizers for each unique target/query sequence ID pair https://www.ncbi.nlm.nih.gov/nuccore/CP005975.1 https://en.wikipedia.org/wiki/Shotgun_sequencing Sample GenBank Result ● Using the highest query minimizer scores (“best matches”), we can search the NCBI GenBank for unidentified bacteria using their accession number Future Work (A) The two sequences are broken down into its constituent k-mers. (B) All k-mers are converted into hash values. In this example, the window size is four (r1...r4). (C) The lowest hash scores (minimizers/min-mers) for each k-mer is extracted and listed. (D) The fragments are assembled according to the four lowest minimizers to find overlapped regionshttp://dx/doi.org/10.1101/008003