SlideShare a Scribd company logo
Discussion on Metagenomic Data
for NEON Workshop
Adina Howe
July 15, 2014
Boulder, CO
What I do with my sequencing data
• Compare it to known references (BLAST, alignment
mapping, HMMs)
– Database gaps, 50%+ in soil metagenomes unknown
– Annotation errors
• Assembly (site specific reference)
– Dependent on sequencing depth
– Largest soil datasets, 1 billion reads, 10% assembled
• Co-occurrence / binning
– Abundance or nucleotide based
– Sequencing coverage dependent, need “unique” bins
– Particularly problematic for short reads
Sentinels in metagenomic data?
• Presence/Absence? Quantification?
• Phyla? Strain? Functional level?
• Strong signals of similarity or differences in
both “annotatable” and unknown sequences
• Information about targets for which we need
more references (by other methods)
• Indexes (alpha diversity,
copiotroph/oligotroph, % polymorphism,
recA/16S gene count)
What I do with my sequencing data
• Compare it to known references (BLAST,
alignment mapping, HMMs)
– Database gaps, 50%+ in soil metagenomes unknown
– Annotation errors
• Assembly (site specific reference)
– Well developed pipelines and tutorials
– Almost too many choices
• Co-occurrence / binning
– Some tools available, still in development largely
– Eventually need to link to knowns for validation (or
experimental validation)
What has been done – MG-RAST
• Free
• Easy
• Growing in usage by many
• Not used by the “leading wedge” (at least as
more than a blast engine)
• Free data / annotation storage and maintenance
• NEON – do you want to build your own? Does
MG-RAST suffice?
MG-RAST
Fine tuning is not possible
• What happens when your sequence matches two known genes
equally?
• What happens if there are multiple taxa related to one function?
• What if you want more than ordination (e.g., primer evaluation?)
• What if you want to change your distance matrix for ordination?
• How do (not) you normalize / rarify your data?
• What if you want more sensitivity than a BLAT analysis (e.g.,
amplicons)
• Computationally-optimized, not biologically
– Denitrification, Nitrate reductase, nitrate ammonification all distinct
categories in Level 3
– If you want to merge, not trivial – especially in dealing with abundance
estimations
Looking forward
• Development of all-pleasing computational,
statistical, visualization tool will be impossible
• NEON community demands will more than likely
be easy to use and aimed at reducing (perhaps
overly simplified) complex data into comparable
metrics (TBD)
• NEON data is actually not a big dataset and
current tools should (imho) suffice
• Data management and associated “metadata”
may be the larger challenge (in the face of rapidly
changing datasets)
http://tamingdata.com/wp-content/uploads/2010/07/tree-swing-project-management-large.png

More Related Content

Similar to Metagenomic data analysis discussion NEON Workshop

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
Prof. Wim Van Criekinge
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
Prof. Wim Van Criekinge
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
Paul Agapow
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
Dmitry Grapov
 
Database Searching
Database SearchingDatabase Searching
Database Searching
Meghaj Mallick
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
EITESANGO
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
Morgan Langille
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
Elsevier
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
alizain9604
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
Syed Lokman
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
Amit Sheth
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
Josh Neufeld
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
Sri Kanajan
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
kigaruantony
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
c.titus.brown
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
Leighton Pritchard
 
25.ranking on data manifold with sink points
25.ranking on data manifold with sink points25.ranking on data manifold with sink points
25.ranking on data manifold with sink points
Venkatesh Neerukonda
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data Science
Donald Miner
 

Similar to Metagenomic data analysis discussion NEON Workshop (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
25.ranking on data manifold with sink points
25.ranking on data manifold with sink points25.ranking on data manifold with sink points
25.ranking on data manifold with sink points
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data Science
 

More from Adina Chuang Howe

Merrill Retreat 2018 - Nebraska City, Nebraska
Merrill Retreat 2018 - Nebraska City, NebraskaMerrill Retreat 2018 - Nebraska City, Nebraska
Merrill Retreat 2018 - Nebraska City, Nebraska
Adina Chuang Howe
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Adina Chuang Howe
 
2015 Soil Science of America Meeting
2015 Soil Science of America Meeting2015 Soil Science of America Meeting
2015 Soil Science of America Meeting
Adina Chuang Howe
 
ISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar SlidesISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar Slides
Adina Chuang Howe
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio Engineering
Adina Chuang Howe
 
Adina's Faculty Introduction - ISU ABE
Adina's Faculty Introduction - ISU ABEAdina's Faculty Introduction - ISU ABE
Adina's Faculty Introduction - ISU ABE
Adina Chuang Howe
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
Adina Chuang Howe
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
Adina Chuang Howe
 
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do thisANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
Adina Chuang Howe
 
ASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesAdina Chuang Howe
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data Talk
Adina Chuang Howe
 

More from Adina Chuang Howe (13)

Merrill Retreat 2018 - Nebraska City, Nebraska
Merrill Retreat 2018 - Nebraska City, NebraskaMerrill Retreat 2018 - Nebraska City, Nebraska
Merrill Retreat 2018 - Nebraska City, Nebraska
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
2015 Soil Science of America Meeting
2015 Soil Science of America Meeting2015 Soil Science of America Meeting
2015 Soil Science of America Meeting
 
ISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar SlidesISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar Slides
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio Engineering
 
Adina's Faculty Introduction - ISU ABE
Adina's Faculty Introduction - ISU ABEAdina's Faculty Introduction - ISU ABE
Adina's Faculty Introduction - ISU ABE
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do thisANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
 
ASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop SlidesASM 2013 Metagenomic Assembly Workshop Slides
ASM 2013 Metagenomic Assembly Workshop Slides
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data Talk
 

Recently uploaded

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 

Recently uploaded (20)

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 

Metagenomic data analysis discussion NEON Workshop

  • 1. Discussion on Metagenomic Data for NEON Workshop Adina Howe July 15, 2014 Boulder, CO
  • 2. What I do with my sequencing data • Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors • Assembly (site specific reference) – Dependent on sequencing depth – Largest soil datasets, 1 billion reads, 10% assembled • Co-occurrence / binning – Abundance or nucleotide based – Sequencing coverage dependent, need “unique” bins – Particularly problematic for short reads
  • 3. Sentinels in metagenomic data? • Presence/Absence? Quantification? • Phyla? Strain? Functional level? • Strong signals of similarity or differences in both “annotatable” and unknown sequences • Information about targets for which we need more references (by other methods) • Indexes (alpha diversity, copiotroph/oligotroph, % polymorphism, recA/16S gene count)
  • 4. What I do with my sequencing data • Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors • Assembly (site specific reference) – Well developed pipelines and tutorials – Almost too many choices • Co-occurrence / binning – Some tools available, still in development largely – Eventually need to link to knowns for validation (or experimental validation)
  • 5. What has been done – MG-RAST • Free • Easy • Growing in usage by many • Not used by the “leading wedge” (at least as more than a blast engine) • Free data / annotation storage and maintenance • NEON – do you want to build your own? Does MG-RAST suffice?
  • 7.
  • 8.
  • 9.
  • 10. Fine tuning is not possible • What happens when your sequence matches two known genes equally? • What happens if there are multiple taxa related to one function? • What if you want more than ordination (e.g., primer evaluation?) • What if you want to change your distance matrix for ordination? • How do (not) you normalize / rarify your data? • What if you want more sensitivity than a BLAT analysis (e.g., amplicons) • Computationally-optimized, not biologically – Denitrification, Nitrate reductase, nitrate ammonification all distinct categories in Level 3 – If you want to merge, not trivial – especially in dealing with abundance estimations
  • 11. Looking forward • Development of all-pleasing computational, statistical, visualization tool will be impossible • NEON community demands will more than likely be easy to use and aimed at reducing (perhaps overly simplified) complex data into comparable metrics (TBD) • NEON data is actually not a big dataset and current tools should (imho) suffice • Data management and associated “metadata” may be the larger challenge (in the face of rapidly changing datasets)

Editor's Notes

  1. Who is important and how can I reduce the noise?
  2. Who is important and how can I reduce the noise?