Contamination Detection and Taxonomic confirmation with magicBLAST

Sean La
Intern
Simon Fraser University
laseanl@sfu.ca
Cheryl Ames, Ph.D.
Research Fellow
Smithsonian National Museum
of Natural History
amesc@si.edu
Ben Busby, Ph.D.
Genomics Outreach
Coordinator
NCBI
ben.busby@nih.gov

1. Image taken from https://media1.britannica.com/eb-media/82/126182-004-A23C1423.jpg
1
The scientific community wants to detect
viruses in SRA
SIDEARM
SRR
BLASTDB of
Viruses
Magic-BLAST
(Optimized version of BLAST)
BAM alignments
to viruses
Statistics Viral contigs
Motivation
2
2. Image taken from https://github.com/NCBI-Hackathons/Virus_Detection_SRA

1
1 Image taken from http://www.newhealthguide.org/images/19999893/image001.jpg
2
2 Image taken from https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2014/auroraakinase.png
Detect bacteria in metagenomics samples Identify proteins
3
Detect plasmid sequences in bacterial reads
.
3 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png
Detect mitochondrial DNA
4
4 Image taken http://www.penrules.com/_Media/art_mito_300.png

Step 6: Convert mitochondria-free files from SAM to FASTA
format
samtools fasta trimmed.read.nomtDNA.sam > trimmed.read.nomtDNA.fasta
Step 5: Extract reads that don’t map to mtDNA database
awk '$4 == 0 {print $0}' trimmed.read.sam >> trimmed.read.nomtDNA.sam
Step 4: Create Magic-blast report (.sam) mapped &
unmapped reads
magicblast -query trimmed.read.fasta -db ala_mito_db -splice F
-perc_identity 90 -paired > trimmed.read.sam &
Step 3: Generate A. alata mtDNA database
makeblastdb -in alatina_mitochondria.fasta -ala_mito_db -dbtype nucl
Step 2: Trim adaptors from NGS data sets
-- trimmomatic Illumina.fasta > trimmed.Illumina.fasta
-- removesmartbell.sh Pacbio.fasta > trimmed.pacbio.fasta
Step 1: Generate A. alata NGS data sets (n=15)
Illumina.1.fasta=short reads (forward)
Illumina.2.fasta=short reads (reverse)
Pacbio.fasta =long reads
A. alata 8 mitochondrial chromosomes (Genbank)
Step 7: Pipe mitochondria-free reads (n=15) into downstream
pipelines
trimmed.reads.nomtDNA.fasta e.g., genome assembly
box jellyfish A. alata

Neisseria meningitides
genome (ERR1865236)
BLAST DB of known
bacterial plasmids
SIDEARM
1 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png
1

Viral Metagenome
(ERR1301508 a la
Chris O’Sullivan)
BLAST DB of complete
bacterial genomes
SIDEARM
On SRA….
Using SIDEARM…

 Greg Boratyn
 Mike Muchow
 Payl Cantalupo
 Alex Goncearenco
 Unix.systems
7

Contamination Detection and Taxonomic confirmation with magicBLAST

More Related Content

What's hot

Similar to Contamination Detection and Taxonomic confirmation with magicBLAST

More from Ben Busby

Recently uploaded

Contamination Detection and Taxonomic confirmation with magicBLAST

Editor's Notes