SlideShare a Scribd company logo
Code sharing for microbiomics
        Leo, Wageningen 7.9.201 2
Challenges with computer code:
- Analyses not standardized -> confusion & non-optimal choices
- Poor documentation -> poor reproducibility & waste of time
- Reinventing the wheel -> waste of time & resources
Solution:
- Harmonized software libraries (e.g. R packages)
- Easier to share tools (GitHub)


-> more reliable & reproducible
-> more standardized
-> avoid repetitive coding
-> added value for publications
-> distributed version control; all changes automatically tracked
-> facilitates Helsinki-Wageningen collaboration
Wiki: various example analyses already implemented


- retrieve data from MySQL( H/M/PITChip)
- preprocessing (profiling & HITChip Atlas)
- analysis routines (diversities, tables,
Wilcoxon tests etc.)
- visualization
-> improving through time
Step­by­step examples with source code and
simulated data
Common core microbiota:
              effect of analysis depth and prevalence


                                    "Blanket analysis"
                                 github.com/microbiome
 Core size




                               Estimate the frequency of
                               belonging to the core for
                               each phylotype; confidence
                               intervals with bootstrap
Ab
 un
    dan




                                Salonen A, et al. (2012) The adult intestinal core

              Prevalence
         ce




                                microbiota is determined by analysis depth and health
                                status, Clinical Microbiology and Infection 18:16–20.
Compatible with HITChip Atlas of Human Gut
         Microbiota (>3200 samples)
                                    45 studies - Standardized Platform




                >1 000 phylotypes




                                            >3000 samples
-> Compare your own data to HITChip data collections?
Differences to the old profiling script?
    -> Separate preprocessing from analysis
    -> Support modularity
-> removed outdated options & outputs from profiling script
1. Preprocessing: minimal output from profiling script:
- preprocessed data matrices (oligo/L1 /L2/species/absolute
scale) with NMF/RPA/SUM
- preprocessing log (parameter values etc.)
- quality control plots (heatmap)
2. Analysis & visualization routines
- based on profiling script output & done afterwards -> modular
- used when needed, not run by default
-> keeping it simple & storing disk space
Summary: code development & sharing through GitHub
In-house sharing infrastructure for code
-> distributed package maintenance
-> avoid bugs; facilitate transparency & reproducibility
-> additional visibility & citations?
Avoid extra work and focus on the essential
      -> check for ready-made examples from the wiki!
      -> ask for help -> let's add examples to the wiki!
Manage and share your own code?
    -> GitHub and microbiome R package
    -> Version control

      microbiome.github.com
To discuss
Do you have R code which could be useful for others?
-> let's polish, document & add it in the package!

Which tools to include?
- diversity/richness/evenness calculations
- PCA, hierarchical clusterings, RDA etc.
- Wilcoxon tests
- Association (Spearman) tables phylotypes vs. phenotypes
- Relative contributions from bg variables


-> ideally, only standard things should be standardized;
for rare analyses just use basic R & other packages
HITChip preprocessing steps
- Spatial correction
- Between array normalization
- Background correction
- Oligo summarization
1. Spatial correction
2. Between­array normalization: minmax vs. quantiles?
3. Background correction: skip!
4. Oligo summarization


   NMF
   RPA
   SUM
   AVE
Preprocessing: recommendations
* Normalization:
   - minmax: use by default
   - quantile: use if samples have 'similar' microbiota
* Background correction
   -> ignore
* Oligo summarization
   -> NMF: for L0/L1 /L2 levels
   -> RPA: if species level is also included
   -> (SUM: for comparison)
   -> AVE: deprecated
=> The defaults readily implemented in the pipeline
Diversity analysis

Richness, evenness, diversity
Shannon vs. Inverse Simpson?
Detection threshold?
Richness with various indices and thresholds
Recommendation:
- oligo level
- shannon diversity
- richness as species
count with 80%
quantile detection
threshold
- evenness with
Pielou's index
Further analysis tools
microbiome.github.com

More Related Content

What's hot

Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
Tao Feng
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Shirshanka Das
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
MongoDB
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Slim Baltagi
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
markgrover
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Tugdual Grall
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
Nicolas Kourtellis
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Databricks
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
Nicolas Kourtellis
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 

What's hot (20)

Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 

Viewers also liked

Slideshare lahti-isme2014
Slideshare lahti-isme2014Slideshare lahti-isme2014
Slideshare lahti-isme2014
Leo Lahti
 
Rpresentation
RpresentationRpresentation
Rpresentation
Leo Lahti
 
soRvi - avoimen datan työkalupakki
soRvi - avoimen datan työkalupakkisoRvi - avoimen datan työkalupakki
soRvi - avoimen datan työkalupakki
Leo Lahti
 
Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013
Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013
Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013
Leo Lahti
 
Mikrokosmos sisällämme - ihmisen pikku kumppanit
Mikrokosmos sisällämme - ihmisen pikku kumppanitMikrokosmos sisällämme - ihmisen pikku kumppanit
Mikrokosmos sisällämme - ihmisen pikku kumppanit
Leo Lahti
 
rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...
Leo Lahti
 

Viewers also liked (6)

Slideshare lahti-isme2014
Slideshare lahti-isme2014Slideshare lahti-isme2014
Slideshare lahti-isme2014
 
Rpresentation
RpresentationRpresentation
Rpresentation
 
soRvi - avoimen datan työkalupakki
soRvi - avoimen datan työkalupakkisoRvi - avoimen datan työkalupakki
soRvi - avoimen datan työkalupakki
 
Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013
Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013
Avoimen datan mahdollisuudet terveystieteissä THL 3.12.2013
 
Mikrokosmos sisällämme - ihmisen pikku kumppanit
Mikrokosmos sisällämme - ihmisen pikku kumppanitMikrokosmos sisällämme - ihmisen pikku kumppanit
Mikrokosmos sisällämme - ihmisen pikku kumppanit
 
rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...
 

Similar to 20120907 microbiome-intro

BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
The University of Queensland
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
Christian Frech
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
João André Carriço
 
H Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRubyH Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRuby
Jan Aerts
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupGenomeInABottle
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
Functional Genomics Data Society
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
João André Carriço
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
GenomeInABottle
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
Chunlei Wu
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
Barbera van Schaik
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company Profile
BioDec
 
Visualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome BrowserVisualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome Browser
Ann Loraine
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
John Allspaw
 

Similar to 20120907 microbiome-intro (20)

BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
H Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRubyH Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRuby
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company Profile
 
Brizio rossibiodec
Brizio rossibiodecBrizio rossibiodec
Brizio rossibiodec
 
Visualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome BrowserVisualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome Browser
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

20120907 microbiome-intro

  • 1. Code sharing for microbiomics Leo, Wageningen 7.9.201 2
  • 2.
  • 3. Challenges with computer code: - Analyses not standardized -> confusion & non-optimal choices - Poor documentation -> poor reproducibility & waste of time - Reinventing the wheel -> waste of time & resources Solution: - Harmonized software libraries (e.g. R packages) - Easier to share tools (GitHub) -> more reliable & reproducible -> more standardized -> avoid repetitive coding -> added value for publications -> distributed version control; all changes automatically tracked -> facilitates Helsinki-Wageningen collaboration
  • 4.
  • 5.
  • 6. Wiki: various example analyses already implemented - retrieve data from MySQL( H/M/PITChip) - preprocessing (profiling & HITChip Atlas) - analysis routines (diversities, tables, Wilcoxon tests etc.) - visualization -> improving through time
  • 7. Step­by­step examples with source code and simulated data
  • 8. Common core microbiota: effect of analysis depth and prevalence "Blanket analysis" github.com/microbiome Core size Estimate the frequency of belonging to the core for each phylotype; confidence intervals with bootstrap Ab un dan Salonen A, et al. (2012) The adult intestinal core Prevalence ce microbiota is determined by analysis depth and health status, Clinical Microbiology and Infection 18:16–20.
  • 9. Compatible with HITChip Atlas of Human Gut Microbiota (>3200 samples) 45 studies - Standardized Platform >1 000 phylotypes >3000 samples -> Compare your own data to HITChip data collections?
  • 10. Differences to the old profiling script? -> Separate preprocessing from analysis -> Support modularity -> removed outdated options & outputs from profiling script 1. Preprocessing: minimal output from profiling script: - preprocessed data matrices (oligo/L1 /L2/species/absolute scale) with NMF/RPA/SUM - preprocessing log (parameter values etc.) - quality control plots (heatmap) 2. Analysis & visualization routines - based on profiling script output & done afterwards -> modular - used when needed, not run by default -> keeping it simple & storing disk space
  • 11. Summary: code development & sharing through GitHub In-house sharing infrastructure for code -> distributed package maintenance -> avoid bugs; facilitate transparency & reproducibility -> additional visibility & citations? Avoid extra work and focus on the essential -> check for ready-made examples from the wiki! -> ask for help -> let's add examples to the wiki! Manage and share your own code? -> GitHub and microbiome R package -> Version control microbiome.github.com
  • 12. To discuss Do you have R code which could be useful for others? -> let's polish, document & add it in the package! Which tools to include? - diversity/richness/evenness calculations - PCA, hierarchical clusterings, RDA etc. - Wilcoxon tests - Association (Spearman) tables phylotypes vs. phenotypes - Relative contributions from bg variables -> ideally, only standard things should be standardized; for rare analyses just use basic R & other packages
  • 13.
  • 14. HITChip preprocessing steps - Spatial correction - Between array normalization - Background correction - Oligo summarization
  • 16. 2. Between­array normalization: minmax vs. quantiles?
  • 18. 4. Oligo summarization NMF RPA SUM AVE
  • 19. Preprocessing: recommendations * Normalization: - minmax: use by default - quantile: use if samples have 'similar' microbiota * Background correction -> ignore * Oligo summarization -> NMF: for L0/L1 /L2 levels -> RPA: if species level is also included -> (SUM: for comparison) -> AVE: deprecated => The defaults readily implemented in the pipeline
  • 20. Diversity analysis Richness, evenness, diversity Shannon vs. Inverse Simpson? Detection threshold?
  • 21. Richness with various indices and thresholds
  • 22. Recommendation: - oligo level - shannon diversity - richness as species count with 80% quantile detection threshold - evenness with Pielou's index