Sean La
Intern
Simon Fraser University
laseanl@sfu.ca
Cheryl Ames, Ph.D.
Research Fellow
Smithsonian National Museum
of Natural History
amesc@si.edu
Ben Busby, Ph.D.
Genomics Outreach
Coordinator
NCBI
ben.busby@nih.gov
1. Image taken from https://media1.britannica.com/eb-media/82/126182-004-A23C1423.jpg
1
The scientific community wants to detect
viruses in SRA
SIDEARM
SRR
BLASTDB of
Viruses
Magic-BLAST
(Optimized version of BLAST)
BAM alignments
to viruses
Statistics Viral contigs
Motivation
2
2. Image taken from https://github.com/NCBI-Hackathons/Virus_Detection_SRA
1
1 Image taken from http://www.newhealthguide.org/images/19999893/image001.jpg
2
2 Image taken from https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2014/auroraakinase.png
Detect bacteria in metagenomics samples Identify proteins
3
Detect plasmid sequences in bacterial reads
.
3 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png
Detect mitochondrial DNA
4
4 Image taken http://www.penrules.com/_Media/art_mito_300.png
Step 6: Convert mitochondria-free files from SAM to FASTA
format
samtools fasta trimmed.read.nomtDNA.sam > trimmed.read.nomtDNA.fasta
Step 5: Extract reads that don’t map to mtDNA database
awk '$4 == 0 {print $0}' trimmed.read.sam >> trimmed.read.nomtDNA.sam
Step 4: Create Magic-blast report (.sam) mapped &
unmapped reads
magicblast -query trimmed.read.fasta -db ala_mito_db -splice F
-perc_identity 90 -paired > trimmed.read.sam &
Step 3: Generate A. alata mtDNA database
makeblastdb -in alatina_mitochondria.fasta -ala_mito_db -dbtype nucl
Step 2: Trim adaptors from NGS data sets
-- trimmomatic Illumina.fasta > trimmed.Illumina.fasta
-- removesmartbell.sh Pacbio.fasta > trimmed.pacbio.fasta
Step 1: Generate A. alata NGS data sets (n=15)
Illumina.1.fasta=short reads (forward)
Illumina.2.fasta=short reads (reverse)
Pacbio.fasta =long reads
A. alata 8 mitochondrial chromosomes (Genbank)
Step 7: Pipe mitochondria-free reads (n=15) into downstream
pipelines
trimmed.reads.nomtDNA.fasta e.g., genome assembly
box jellyfish A. alata
Neisseria meningitides
genome (ERR1865236)
BLAST DB of known
bacterial plasmids
SIDEARM
1 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png
1
Viral Metagenome
(ERR1301508 a la
Chris O’Sullivan)
BLAST DB of complete
bacterial genomes
SIDEARM
On SRA….
Using SIDEARM…
 Greg Boratyn
 Mike Muchow
 Payl Cantalupo
 Alex Goncearenco
 Unix.systems
7

Contamination Detection and Taxonomic confirmation with magicBLAST

  • 1.
    Sean La Intern Simon FraserUniversity laseanl@sfu.ca Cheryl Ames, Ph.D. Research Fellow Smithsonian National Museum of Natural History amesc@si.edu Ben Busby, Ph.D. Genomics Outreach Coordinator NCBI ben.busby@nih.gov
  • 2.
    1. Image takenfrom https://media1.britannica.com/eb-media/82/126182-004-A23C1423.jpg 1 The scientific community wants to detect viruses in SRA SIDEARM SRR BLASTDB of Viruses Magic-BLAST (Optimized version of BLAST) BAM alignments to viruses Statistics Viral contigs Motivation 2 2. Image taken from https://github.com/NCBI-Hackathons/Virus_Detection_SRA
  • 3.
    1 1 Image takenfrom http://www.newhealthguide.org/images/19999893/image001.jpg 2 2 Image taken from https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2014/auroraakinase.png Detect bacteria in metagenomics samples Identify proteins 3 Detect plasmid sequences in bacterial reads . 3 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png Detect mitochondrial DNA 4 4 Image taken http://www.penrules.com/_Media/art_mito_300.png
  • 4.
    Step 6: Convertmitochondria-free files from SAM to FASTA format samtools fasta trimmed.read.nomtDNA.sam > trimmed.read.nomtDNA.fasta Step 5: Extract reads that don’t map to mtDNA database awk '$4 == 0 {print $0}' trimmed.read.sam >> trimmed.read.nomtDNA.sam Step 4: Create Magic-blast report (.sam) mapped & unmapped reads magicblast -query trimmed.read.fasta -db ala_mito_db -splice F -perc_identity 90 -paired > trimmed.read.sam & Step 3: Generate A. alata mtDNA database makeblastdb -in alatina_mitochondria.fasta -ala_mito_db -dbtype nucl Step 2: Trim adaptors from NGS data sets -- trimmomatic Illumina.fasta > trimmed.Illumina.fasta -- removesmartbell.sh Pacbio.fasta > trimmed.pacbio.fasta Step 1: Generate A. alata NGS data sets (n=15) Illumina.1.fasta=short reads (forward) Illumina.2.fasta=short reads (reverse) Pacbio.fasta =long reads A. alata 8 mitochondrial chromosomes (Genbank) Step 7: Pipe mitochondria-free reads (n=15) into downstream pipelines trimmed.reads.nomtDNA.fasta e.g., genome assembly box jellyfish A. alata
  • 5.
    Neisseria meningitides genome (ERR1865236) BLASTDB of known bacterial plasmids SIDEARM 1 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png 1
  • 6.
    Viral Metagenome (ERR1301508 ala Chris O’Sullivan) BLAST DB of complete bacterial genomes SIDEARM On SRA…. Using SIDEARM…
  • 7.
     Greg Boratyn Mike Muchow  Payl Cantalupo  Alex Goncearenco  Unix.systems 7

Editor's Notes

  • #5 Alatina alata has one of the most unusual mtDNA organizations in Metazoa, where genes are distributed on eight linear chromosomes with long terminal inverted repeats.