SlideShare a Scribd company logo
1 of 50
Cool New Stuff!
NCBI Resources for Phyletically-Defined Next Generation Analysis in and out of the Cloud
Ben Busby (ben.busby@nih.gov)
NCBI Genome Resources Workshop
PAG XXVII January 14, 2019
Tom
Madden’s
BLAST_v5
Talk (slightly
modified)
2
If you’d like to see the original slides
from Tom, or this slide deck, check out
https://www.slideshare.net/benbusby
BLASTDBv5: a better BLAST database!
• Limit search by taxonomy with a command-line parameter
-taxids
• Improved performance when limiting BLAST search with accessions.
• Faster!
• Retrieve sequences by taxonomy from the BLAST database.
• Extract data!
Search the Argonaute protein family
with BLAST!
• Against the nr protein database
• Limit the search to Schizosaccharomyces pombe
BLAST command-line parameters
blastp
-db nr
-query NP_850110.fsa
-outfmt "7 qaccver saccver pident length evalue ssciname” # tabular output
-out NP_850110.4896.tab
-task blastp-fast # runs faster!
-num_threads 8 # use 8 CPUs
-evalue 0.05 # limit to more significant matches
-taxids 4896 # limit search to Schizosaccharomyces pombe
A script to find taxid
Distributed with the BLAST package!
Tabular output
Need taxids for a whole Phylum?
BLAST against Ascomycota sequences!
blastp
-db nr
-query NP_850110.fsa
-outfmt "7 qaccver saccver pident length evalue ssciname” # tabular output
-out NP_850110.Ascomycota.tab
-task blastp-fast # runs faster!
-num_threads 8 # use 8 CPUs
-evalue 0.05 # limit to more significant matches
-taxidlist Ascomycota.taxids # limit search to Ascomycota
Ascomycota results
BLAST excluding Viridiplantae
blastp
-db nr
-query NP_850110.fsa
-outfmt "7 qaccver saccver pident length evalue ssciname”
-out NP_850110.noGreenPlants.tab
-task blastp-fast
-num_threads 8
-evalue 0.05
-negative_taxidlist Viridiplantae.taxids # exclude green plants
No Viridiplantae
Everything
Timings
Search set Run time (seconds)
Schizosaccharomyces pombe 8.7
Ascomycota 11.7
No Viridiplantae 101
Everything 91
8 CPUs, best of three runs using time command.
Why Use Identical Protein Groups?
22
STX1 STX2
Search the Shiga toxin family protein with
BLAST
• Against the nr protein database
• Limit the search to E coli
Search the Shiga toxin family protein with
BLAST
• Against the nr protein database
• Limit the search to E coli
Stx1 and Stx2 Example Sequences
Stx1 and Stx2 Example Sequences
BLAST command-line parameters
blastp
-db refseq-protein_v5
-query WP_123135120_123130813.fasta
-outfmt "7 qaccver saccver pident length evalue ssciname” # tabular output
-out Shiga_Ecoli.tab
-task blastp-fast # runs faster!
-num_threads 32 # use 8 CPUs
-evalue 0.00005 # limit to more significant matches
-taxids 562 # limit search to Escherichia coli
But what did I need to do to make my
demo?
28
Grab sequences!
Do I need to get all of nr?!
update_blast.pl -- decompress works
But theres a better way!
CloudBLAST and Docker Images to the
Rescue!
• http://ncbi.github.io/blast-cloud/
• https://github.com/ncbi/makeblastdb4cloud
• Also, check out magicBLAST!
29
But how do I use the BLAST Docker(s)?!
30
https://github.com/ncbi/docker/blob/master/blast/README.md
But how do I use the BLAST Docker(s)?!
31
https://github.com/ncbi/docker/blob/master/blast/README.md
But how do I use the BLAST Docker(s)?!
32
https://github.com/ncbi/docker/blob/master/blast/README.md
sudo docker run --rm -v $BLASTDB_DIR:/blast/blastdb:ro -v
$HOME/queries:/blast/queries:rw -w /blast/blastdb ncbi/blast blastp -db
nr_v5 -query /blast/queries/WP_123135120_123130813.fasta -out
/blast/queries/Shiga_Ecoli.tab -task blastp-fast -num_threads 32 -evalue
.00005 -taxids 562
Bonus: Prokaryotic Genome Annotation
Pipeline Wrapped with Docker and CWL
https://github.com/ncbi/pgap
33
https://github.com/NCBI-
Hackathons/ViruSpyHackathons
35
Polygenic SNP Search Tool
https://github.com/NCBI-
Hackathons/PSST
Variant
calling on
the fly!
36
37
38
39
40
41
42
43
44
Data Analysis Hackathons!
Planned Events
RNAseq – UNC, March 11-13 2019;
Palo Alto, June 2019
Smaller events at USF and Pittsburgh
Human PanGenomics – UCSC, 25-27 March 2019;
Haplotype Annotation, Baltimore, June 2019
Variant Synthetic Dataset Generation –
Boston, April 15-16, 2019
AMR and Prokaryotic Genome Annotate-athon
NIH Campus, August 2019
Creating a Community
https://biohackathons.github.io
MACHINE LEARNING
PROJECT MANAGEMENT
PRODUCT MANAGEMENT
46
GeneHummus
47
HummusGraph
Because graph genomes aren’t just for humans!
48
Thank you.
49
This research was supported by the Intramural
Research Program of the NIH, National Library of
Medicine.
Watch NCBI News for updates!
http://www.ncbi.nlm.nih.gov/news/
https://www.youtube.com/user/NCBINLM
Jan Buchmann
Chaochih Liu
Tom Madden
Moamen Elmassry
Kim LeBlanc
Allissa Dillman
Olaitan Awe
Anjana Raina
Peter Meric
Francoise Thibaud-Nissen
Douglas Slotta
Michael Crusoe
Eric Cox
Vamsi Kodali
Brian Smith-White
Bart Trawick
Kim Pruitt
Hackathon
Participants!
And visiting bioinformaticians!
NCBI Genome Resources Workshop
50
Visit NCBI Booth 223 Contact us info@ncbi.nlm.nih.gov
Time Topic
12:50 – 1:10 Submission of Genomes to GenBank
Karen Clark
1:10 – 1:30 GEO Submissions and Usage
Steve Wilhite
1:30 – 1:55 From Annotation to Visualization: Exploring Genes and Genomes with NCBI Tools
Eric Cox
1:55 – 2:15 Programmatic Access to Genomic Data: E-Utilities and FTP
Vamsi K. Kodali
2:15 – 2:35 NCBI Resources for Phyletically-Defined Next Generation Analysis in and out of the Cloud (a.k.a.
Cool New Stuff!)
Ben Busby
2:35 – 3:00 Q & A session

More Related Content

What's hot

Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesKeith Bradnam
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basicsUSD Bioinformatics
 
Genetic Recording in Yeast Using CRISPR-Cas9
Genetic Recording in Yeast Using CRISPR-Cas9Genetic Recording in Yeast Using CRISPR-Cas9
Genetic Recording in Yeast Using CRISPR-Cas9Robert Beem
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...Torsten Seemann
 
Preclinical Scale Bioprocessing, Nov. 2, 2009
Preclinical Scale Bioprocessing, Nov. 2, 2009Preclinical Scale Bioprocessing, Nov. 2, 2009
Preclinical Scale Bioprocessing, Nov. 2, 2009David Bienvenue
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Dr. Mukesh Chavan
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocolsc.titus.brown
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 

What's hot (20)

Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Sept2016 sv nabsys
Sept2016 sv nabsysSept2016 sv nabsys
Sept2016 sv nabsys
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore Technologies
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basics
 
Genetic Recording in Yeast Using CRISPR-Cas9
Genetic Recording in Yeast Using CRISPR-Cas9Genetic Recording in Yeast Using CRISPR-Cas9
Genetic Recording in Yeast Using CRISPR-Cas9
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
 
ETH_SymposiumCR
ETH_SymposiumCRETH_SymposiumCR
ETH_SymposiumCR
 
PoemTapp16
PoemTapp16PoemTapp16
PoemTapp16
 
Preclinical Scale Bioprocessing, Nov. 2, 2009
Preclinical Scale Bioprocessing, Nov. 2, 2009Preclinical Scale Bioprocessing, Nov. 2, 2009
Preclinical Scale Bioprocessing, Nov. 2, 2009
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocols
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Advanced NCBI
Advanced NCBI Advanced NCBI
Advanced NCBI
 

Similar to BB_NCBI_PAG_2019_Workshop

Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Lucidworks
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Paolo Missier
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Bb health ai_jan26_v2
Bb health ai_jan26_v2Bb health ai_jan26_v2
Bb health ai_jan26_v2Ben Busby
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeLex Nederbragt
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)Sobia
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64PeterMaf
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64PeterMaf
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...ATMOSPHERE .
 

Similar to BB_NCBI_PAG_2019_Workshop (20)

Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Thesis def
Thesis defThesis def
Thesis def
 
Bb health ai_jan26_v2
Bb health ai_jan26_v2Bb health ai_jan26_v2
Bb health ai_jan26_v2
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
BLAST
BLASTBLAST
BLAST
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
 

More from Ben Busby

Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessBen Busby
 
Containerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data accessContainerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data accessBen Busby
 
Artificial_Intelligence_for_Data_Reuse_2019
Artificial_Intelligence_for_Data_Reuse_2019Artificial_Intelligence_for_Data_Reuse_2019
Artificial_Intelligence_for_Data_Reuse_2019Ben Busby
 
Dream.recomb.ncbi.hackathons v003
Dream.recomb.ncbi.hackathons v003Dream.recomb.ncbi.hackathons v003
Dream.recomb.ncbi.hackathons v003Ben Busby
 
Human_Pangenomics_Bio-IT_2019
Human_Pangenomics_Bio-IT_2019Human_Pangenomics_Bio-IT_2019
Human_Pangenomics_Bio-IT_2019Ben Busby
 
RNAML_Bio-IT_2019
RNAML_Bio-IT_2019RNAML_Bio-IT_2019
RNAML_Bio-IT_2019Ben Busby
 
Hackathon_Bio-IT_2019
Hackathon_Bio-IT_2019Hackathon_Bio-IT_2019
Hackathon_Bio-IT_2019Ben Busby
 
Data science futures_v_vu2
Data science futures_v_vu2Data science futures_v_vu2
Data science futures_v_vu2Ben Busby
 
Sage 2 19_v5_busby
Sage 2 19_v5_busbySage 2 19_v5_busby
Sage 2 19_v5_busbyBen Busby
 
Hackathons lightning v_nbs
Hackathons lightning v_nbsHackathons lightning v_nbs
Hackathons lightning v_nbsBen Busby
 
Genome web v_repro1
Genome web v_repro1Genome web v_repro1
Genome web v_repro1Ben Busby
 
Data science futures_v_une
Data science futures_v_uneData science futures_v_une
Data science futures_v_uneBen Busby
 
Variant and disease_grs_kickoff
Variant and disease_grs_kickoffVariant and disease_grs_kickoff
Variant and disease_grs_kickoffBen Busby
 
Bioinformatics_resources_SVAI_v2
Bioinformatics_resources_SVAI_v2Bioinformatics_resources_SVAI_v2
Bioinformatics_resources_SVAI_v2Ben Busby
 
Ncbi resources i5_k_v4
Ncbi resources i5_k_v4Ncbi resources i5_k_v4
Ncbi resources i5_k_v4Ben Busby
 
Ncbi resources abrf_v3
Ncbi resources abrf_v3Ncbi resources abrf_v3
Ncbi resources abrf_v3Ben Busby
 
Data science futures_v_lbirn
Data science futures_v_lbirnData science futures_v_lbirn
Data science futures_v_lbirnBen Busby
 
Pag three ways_to_ngs_at_ncbi
Pag three ways_to_ngs_at_ncbiPag three ways_to_ngs_at_ncbi
Pag three ways_to_ngs_at_ncbiBen Busby
 
NIH Hackathons Lightning Slides
NIH Hackathons Lightning SlidesNIH Hackathons Lightning Slides
NIH Hackathons Lightning SlidesBen Busby
 

More from Ben Busby (20)

Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_access
 
Containerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data accessContainerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data access
 
Artificial_Intelligence_for_Data_Reuse_2019
Artificial_Intelligence_for_Data_Reuse_2019Artificial_Intelligence_for_Data_Reuse_2019
Artificial_Intelligence_for_Data_Reuse_2019
 
Dream.recomb.ncbi.hackathons v003
Dream.recomb.ncbi.hackathons v003Dream.recomb.ncbi.hackathons v003
Dream.recomb.ncbi.hackathons v003
 
Human_Pangenomics_Bio-IT_2019
Human_Pangenomics_Bio-IT_2019Human_Pangenomics_Bio-IT_2019
Human_Pangenomics_Bio-IT_2019
 
RNAML_Bio-IT_2019
RNAML_Bio-IT_2019RNAML_Bio-IT_2019
RNAML_Bio-IT_2019
 
Hackathon_Bio-IT_2019
Hackathon_Bio-IT_2019Hackathon_Bio-IT_2019
Hackathon_Bio-IT_2019
 
Data science futures_v_vu2
Data science futures_v_vu2Data science futures_v_vu2
Data science futures_v_vu2
 
Sage 2 19_v5_busby
Sage 2 19_v5_busbySage 2 19_v5_busby
Sage 2 19_v5_busby
 
Hackathons lightning v_nbs
Hackathons lightning v_nbsHackathons lightning v_nbs
Hackathons lightning v_nbs
 
Cmu oss 18
Cmu oss 18Cmu oss 18
Cmu oss 18
 
Genome web v_repro1
Genome web v_repro1Genome web v_repro1
Genome web v_repro1
 
Data science futures_v_une
Data science futures_v_uneData science futures_v_une
Data science futures_v_une
 
Variant and disease_grs_kickoff
Variant and disease_grs_kickoffVariant and disease_grs_kickoff
Variant and disease_grs_kickoff
 
Bioinformatics_resources_SVAI_v2
Bioinformatics_resources_SVAI_v2Bioinformatics_resources_SVAI_v2
Bioinformatics_resources_SVAI_v2
 
Ncbi resources i5_k_v4
Ncbi resources i5_k_v4Ncbi resources i5_k_v4
Ncbi resources i5_k_v4
 
Ncbi resources abrf_v3
Ncbi resources abrf_v3Ncbi resources abrf_v3
Ncbi resources abrf_v3
 
Data science futures_v_lbirn
Data science futures_v_lbirnData science futures_v_lbirn
Data science futures_v_lbirn
 
Pag three ways_to_ngs_at_ncbi
Pag three ways_to_ngs_at_ncbiPag three ways_to_ngs_at_ncbi
Pag three ways_to_ngs_at_ncbi
 
NIH Hackathons Lightning Slides
NIH Hackathons Lightning SlidesNIH Hackathons Lightning Slides
NIH Hackathons Lightning Slides
 

Recently uploaded

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Recently uploaded (20)

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 

BB_NCBI_PAG_2019_Workshop

  • 1. Cool New Stuff! NCBI Resources for Phyletically-Defined Next Generation Analysis in and out of the Cloud Ben Busby (ben.busby@nih.gov) NCBI Genome Resources Workshop PAG XXVII January 14, 2019
  • 2. Tom Madden’s BLAST_v5 Talk (slightly modified) 2 If you’d like to see the original slides from Tom, or this slide deck, check out https://www.slideshare.net/benbusby
  • 3. BLASTDBv5: a better BLAST database! • Limit search by taxonomy with a command-line parameter -taxids • Improved performance when limiting BLAST search with accessions. • Faster! • Retrieve sequences by taxonomy from the BLAST database. • Extract data!
  • 4.
  • 5. Search the Argonaute protein family with BLAST! • Against the nr protein database • Limit the search to Schizosaccharomyces pombe
  • 6.
  • 7. BLAST command-line parameters blastp -db nr -query NP_850110.fsa -outfmt "7 qaccver saccver pident length evalue ssciname” # tabular output -out NP_850110.4896.tab -task blastp-fast # runs faster! -num_threads 8 # use 8 CPUs -evalue 0.05 # limit to more significant matches -taxids 4896 # limit search to Schizosaccharomyces pombe
  • 8. A script to find taxid Distributed with the BLAST package!
  • 10. Need taxids for a whole Phylum?
  • 11. BLAST against Ascomycota sequences! blastp -db nr -query NP_850110.fsa -outfmt "7 qaccver saccver pident length evalue ssciname” # tabular output -out NP_850110.Ascomycota.tab -task blastp-fast # runs faster! -num_threads 8 # use 8 CPUs -evalue 0.05 # limit to more significant matches -taxidlist Ascomycota.taxids # limit search to Ascomycota
  • 13. BLAST excluding Viridiplantae blastp -db nr -query NP_850110.fsa -outfmt "7 qaccver saccver pident length evalue ssciname” -out NP_850110.noGreenPlants.tab -task blastp-fast -num_threads 8 -evalue 0.05 -negative_taxidlist Viridiplantae.taxids # exclude green plants
  • 15.
  • 17. Timings Search set Run time (seconds) Schizosaccharomyces pombe 8.7 Ascomycota 11.7 No Viridiplantae 101 Everything 91 8 CPUs, best of three runs using time command.
  • 18.
  • 19.
  • 20.
  • 21. Why Use Identical Protein Groups?
  • 23. Search the Shiga toxin family protein with BLAST • Against the nr protein database • Limit the search to E coli
  • 24. Search the Shiga toxin family protein with BLAST • Against the nr protein database • Limit the search to E coli
  • 25. Stx1 and Stx2 Example Sequences
  • 26. Stx1 and Stx2 Example Sequences
  • 27. BLAST command-line parameters blastp -db refseq-protein_v5 -query WP_123135120_123130813.fasta -outfmt "7 qaccver saccver pident length evalue ssciname” # tabular output -out Shiga_Ecoli.tab -task blastp-fast # runs faster! -num_threads 32 # use 8 CPUs -evalue 0.00005 # limit to more significant matches -taxids 562 # limit search to Escherichia coli
  • 28. But what did I need to do to make my demo? 28 Grab sequences! Do I need to get all of nr?! update_blast.pl -- decompress works But theres a better way!
  • 29. CloudBLAST and Docker Images to the Rescue! • http://ncbi.github.io/blast-cloud/ • https://github.com/ncbi/makeblastdb4cloud • Also, check out magicBLAST! 29
  • 30. But how do I use the BLAST Docker(s)?! 30 https://github.com/ncbi/docker/blob/master/blast/README.md
  • 31. But how do I use the BLAST Docker(s)?! 31 https://github.com/ncbi/docker/blob/master/blast/README.md
  • 32. But how do I use the BLAST Docker(s)?! 32 https://github.com/ncbi/docker/blob/master/blast/README.md sudo docker run --rm -v $BLASTDB_DIR:/blast/blastdb:ro -v $HOME/queries:/blast/queries:rw -w /blast/blastdb ncbi/blast blastp -db nr_v5 -query /blast/queries/WP_123135120_123130813.fasta -out /blast/queries/Shiga_Ecoli.tab -task blastp-fast -num_threads 32 -evalue .00005 -taxids 562
  • 33. Bonus: Prokaryotic Genome Annotation Pipeline Wrapped with Docker and CWL https://github.com/ncbi/pgap 33
  • 35. 35 Polygenic SNP Search Tool https://github.com/NCBI- Hackathons/PSST Variant calling on the fly!
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 44. 44 Data Analysis Hackathons! Planned Events RNAseq – UNC, March 11-13 2019; Palo Alto, June 2019 Smaller events at USF and Pittsburgh Human PanGenomics – UCSC, 25-27 March 2019; Haplotype Annotation, Baltimore, June 2019 Variant Synthetic Dataset Generation – Boston, April 15-16, 2019 AMR and Prokaryotic Genome Annotate-athon NIH Campus, August 2019
  • 45. Creating a Community https://biohackathons.github.io MACHINE LEARNING PROJECT MANAGEMENT PRODUCT MANAGEMENT
  • 46. 46
  • 48. HummusGraph Because graph genomes aren’t just for humans! 48
  • 49. Thank you. 49 This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. Watch NCBI News for updates! http://www.ncbi.nlm.nih.gov/news/ https://www.youtube.com/user/NCBINLM Jan Buchmann Chaochih Liu Tom Madden Moamen Elmassry Kim LeBlanc Allissa Dillman Olaitan Awe Anjana Raina Peter Meric Francoise Thibaud-Nissen Douglas Slotta Michael Crusoe Eric Cox Vamsi Kodali Brian Smith-White Bart Trawick Kim Pruitt Hackathon Participants! And visiting bioinformaticians!
  • 50. NCBI Genome Resources Workshop 50 Visit NCBI Booth 223 Contact us info@ncbi.nlm.nih.gov Time Topic 12:50 – 1:10 Submission of Genomes to GenBank Karen Clark 1:10 – 1:30 GEO Submissions and Usage Steve Wilhite 1:30 – 1:55 From Annotation to Visualization: Exploring Genes and Genomes with NCBI Tools Eric Cox 1:55 – 2:15 Programmatic Access to Genomic Data: E-Utilities and FTP Vamsi K. Kodali 2:15 – 2:35 NCBI Resources for Phyletically-Defined Next Generation Analysis in and out of the Cloud (a.k.a. Cool New Stuff!) Ben Busby 2:35 – 3:00 Q & A session

Editor's Notes

  1. 83667 taxids in Ascomycota
  2. Now… with AMR data!