SlideShare a Scribd company logo
1 of 12
Discussion on Metagenomic Data
for NEON Workshop
Adina Howe
July 15, 2014
Boulder, CO
What I do with my sequencing data
• Compare it to known references (BLAST, alignment
mapping, HMMs)
– Database gaps, 50%+ in soil metagenomes unknown
– Annotation errors
• Assembly (site specific reference)
– Dependent on sequencing depth
– Largest soil datasets, 1 billion reads, 10% assembled
• Co-occurrence / binning
– Abundance or nucleotide based
– Sequencing coverage dependent, need “unique” bins
– Particularly problematic for short reads
Sentinels in metagenomic data?
• Presence/Absence? Quantification?
• Phyla? Strain? Functional level?
• Strong signals of similarity or differences in
both “annotatable” and unknown sequences
• Information about targets for which we need
more references (by other methods)
• Indexes (alpha diversity,
copiotroph/oligotroph, % polymorphism,
recA/16S gene count)
What I do with my sequencing data
• Compare it to known references (BLAST,
alignment mapping, HMMs)
– Database gaps, 50%+ in soil metagenomes unknown
– Annotation errors
• Assembly (site specific reference)
– Well developed pipelines and tutorials
– Almost too many choices
• Co-occurrence / binning
– Some tools available, still in development largely
– Eventually need to link to knowns for validation (or
experimental validation)
What has been done – MG-RAST
• Free
• Easy
• Growing in usage by many
• Not used by the “leading wedge” (at least as
more than a blast engine)
• Free data / annotation storage and maintenance
• NEON – do you want to build your own? Does
MG-RAST suffice?
MG-RAST
Fine tuning is not possible
• What happens when your sequence matches two known genes
equally?
• What happens if there are multiple taxa related to one function?
• What if you want more than ordination (e.g., primer evaluation?)
• What if you want to change your distance matrix for ordination?
• How do (not) you normalize / rarify your data?
• What if you want more sensitivity than a BLAT analysis (e.g.,
amplicons)
• Computationally-optimized, not biologically
– Denitrification, Nitrate reductase, nitrate ammonification all distinct
categories in Level 3
– If you want to merge, not trivial – especially in dealing with abundance
estimations
Looking forward
• Development of all-pleasing computational,
statistical, visualization tool will be impossible
• NEON community demands will more than likely
be easy to use and aimed at reducing (perhaps
overly simplified) complex data into comparable
metrics (TBD)
• NEON data is actually not a big dataset and
current tools should (imho) suffice
• Data management and associated “metadata”
may be the larger challenge (in the face of rapidly
changing datasets)
http://tamingdata.com/wp-content/uploads/2010/07/tree-swing-project-management-large.png

More Related Content

Similar to Metagenomic data analysis discussion NEON Workshop

Similar to Metagenomic data analysis discussion NEON Workshop (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
25.ranking on data manifold with sink points
25.ranking on data manifold with sink points25.ranking on data manifold with sink points
25.ranking on data manifold with sink points
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data Science
 

More from Adina Chuang Howe

ASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop Slides
Adina Chuang Howe
 

More from Adina Chuang Howe (13)

Merrill Retreat 2018 - Nebraska City, Nebraska
Merrill Retreat 2018 - Nebraska City, NebraskaMerrill Retreat 2018 - Nebraska City, Nebraska
Merrill Retreat 2018 - Nebraska City, Nebraska
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
2015 Soil Science of America Meeting
2015 Soil Science of America Meeting2015 Soil Science of America Meeting
2015 Soil Science of America Meeting
 
ISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar SlidesISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar Slides
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio Engineering
 
Adina's Faculty Introduction - ISU ABE
Adina's Faculty Introduction - ISU ABEAdina's Faculty Introduction - ISU ABE
Adina's Faculty Introduction - ISU ABE
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do thisANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
 
ASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop Slides
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data Talk
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Metagenomic data analysis discussion NEON Workshop

  • 1. Discussion on Metagenomic Data for NEON Workshop Adina Howe July 15, 2014 Boulder, CO
  • 2. What I do with my sequencing data • Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors • Assembly (site specific reference) – Dependent on sequencing depth – Largest soil datasets, 1 billion reads, 10% assembled • Co-occurrence / binning – Abundance or nucleotide based – Sequencing coverage dependent, need “unique” bins – Particularly problematic for short reads
  • 3. Sentinels in metagenomic data? • Presence/Absence? Quantification? • Phyla? Strain? Functional level? • Strong signals of similarity or differences in both “annotatable” and unknown sequences • Information about targets for which we need more references (by other methods) • Indexes (alpha diversity, copiotroph/oligotroph, % polymorphism, recA/16S gene count)
  • 4. What I do with my sequencing data • Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors • Assembly (site specific reference) – Well developed pipelines and tutorials – Almost too many choices • Co-occurrence / binning – Some tools available, still in development largely – Eventually need to link to knowns for validation (or experimental validation)
  • 5. What has been done – MG-RAST • Free • Easy • Growing in usage by many • Not used by the “leading wedge” (at least as more than a blast engine) • Free data / annotation storage and maintenance • NEON – do you want to build your own? Does MG-RAST suffice?
  • 7.
  • 8.
  • 9.
  • 10. Fine tuning is not possible • What happens when your sequence matches two known genes equally? • What happens if there are multiple taxa related to one function? • What if you want more than ordination (e.g., primer evaluation?) • What if you want to change your distance matrix for ordination? • How do (not) you normalize / rarify your data? • What if you want more sensitivity than a BLAT analysis (e.g., amplicons) • Computationally-optimized, not biologically – Denitrification, Nitrate reductase, nitrate ammonification all distinct categories in Level 3 – If you want to merge, not trivial – especially in dealing with abundance estimations
  • 11. Looking forward • Development of all-pleasing computational, statistical, visualization tool will be impossible • NEON community demands will more than likely be easy to use and aimed at reducing (perhaps overly simplified) complex data into comparable metrics (TBD) • NEON data is actually not a big dataset and current tools should (imho) suffice • Data management and associated “metadata” may be the larger challenge (in the face of rapidly changing datasets)

Editor's Notes

  1. Who is important and how can I reduce the noise?
  2. Who is important and how can I reduce the noise?