This document provides an overview of research on the human microbiome and the American Gut project. It includes summaries of several scientific studies that examined the microbiome in infants and its role in health conditions like diabetes. It also describes the methods and collaborators involved in the American Gut project, which aims to characterize individual microbiomes through citizen science. Videos and interactive data visualization tools related to microbiome research are referenced. Variability in sample handling and analysis methods across studies is discussed.
3. A microbe dominated world
The universal nature of biochemistry. Pace NR.
Proc Natl Acad Sci U S A. 2001 Jan 30;98(3):805-8.
4. A microbe dominated world
Lozupone and Knight PNAS. 2007 Jul 3;104(27):11436-40
5. A microbe dominated world
Lozupone and Knight PNAS. 2007 Jul 3;104(27):11436-40
Subway image adapted from http://laughingsquid.com/the-gastrointestinal-system-represented-as-a-subway-map/
6. A microbe dominated world
Ley et al Nat Rev Microbiol. Oct 2008; 6(10): 776–788
Subway image adapted from http://laughingsquid.com/the-gastrointestinal-system-represented-as-a-subway-map/
12. Gevers et al. Cell Host and Microbe 2014 Mar 12;15(3)
13. Koeing et al. Proc Natl Acad Sci U S A. 2011 Mar 15;108
14. Video fun time!
Infant time series, https://www.youtube.com/watch?v=Pb272zsixSQ
Gut ecosystem restoration https://www.youtube.com/watch?v=-FFDqhM4pks
19. Blinding, randomization
Blinding, randomization
Artificial N=2
Sequence
generation
Data
processing
Samples:
Stool, fresh, N=11
Stool, freeze-dried, N=7
Robogut N=2
Lab 1
Raw Sample Transfer
N = 53
Control
96 Tube
Sample
Set
96 Tube
Sample
Set
Lab 2
Lab 3
.
.
.
Lab 15
HMP DACC
FTP Site
Site 1
Site 2
Site 3
.
.
.
Site 9
Integrated
Data
Analysis
Slide credit: Dr. Amnon Amir
20. Variability in methods
Handling
DNA extraction kit manufacturer
Chemagen 1
MO-BIO 7
Omega BioTek 1
Promega 1
Qiagen 2
Zymo Research Corporation 1
Not reported 3
Homogenizer used?
Yes 11
No 3
Not reported 1
PCR primer set
318F/806R 1
EMP V4 515F/806R 11
Schloss 2013 2
Sequencing machine model
HiSeq 2500 1
MiSeq 13
Paired end sequencing?
Yes 13
No 1
Sequencing read length
150 4
175 1
210 1
250 6
300 2
Fraction of PhiX (range 0-0.3) 0.11 ±0.08
Fraction quality bases (range 0.57-0.96) 0.86 ±0.12
Bioinformatics
Primary QC software
QIIME 2
Trimmomatic 3
UPARSE 2
Other/custom 2
Primary OTU calling software
QIIME 4
UPARSE 5
Unsupervised OTU clustering step?
Yes 8
No 2
Taxonomic assignment strategy
Classification 3
Clustering 5
Mapping 2
OTU filtering?
Yes 3
No 7
Slide credit: Dr. Amnon Amir
41. (Some of the) American Gut Collaborators
Prof. Rob Knight Co-founder HHMI, CU-Boulder, UCSD
Dr. Jeff Leach Co-founder Human Food Project
Prof. Cathy Lozupone Creator of UniFrac, focus on ASD Anschutz Medical Center
Prof. Marty Blaser ABX impact NYU School of Medicine
Prof. Maria Gloria
Succession in infants NYU School of Medicine
Dominguez
Prof. Patrice Cani Type-2 Diabetes Universite Catholique de
Louvain
Prof. George Church Everything Harvard Medical School, MIT
Prof. Pieter Dorrestein Realtime Mass Spec UCSD
Prof. Jack Gilbert Hospital microbiome, EMP, ASD U-Chicago, ANL
Prof. Ruth Ley Env. relationship with human
genetics
Cornell University
Prof. Owen White Genomics expert U-Maryland School of
Medicine; PI of HMP DACC
Prof. Phil Hugenholtz Taxonomy, microbial diversity U-Queensland, Australian
Centre for Ecogenomics
Dr. Tim Spector Genetic epidemiology King’s College
42. The American Gut Consortium
The Knight Lab
Prof. Rob Knight
Prof. Robin Dowell
Prof. Aaron Clauset
Prof. Paul Wischmeyer
Dr. Antonio Gonzalez
Dr. Gail Ackermann
Yoshiki Vázquez-Baeza
Justine Debelius
Adam Robbins-Pianka
Emily TerAvest
Alina Prassas
Acknowledgements
Dr. Embriette Hyde
Jeff DeReus
Dr. Jenya Kopylova
Dr. Eva Nováková
Dr. Luke Thompson
Justine Debelius
Jose Navas
Editor's Notes
Always have to use this slide…
Go anywhere in the environment and you can find microbial life
drivers of the global ecosystem
Estimated at 10^30 cells (whitman)
…but it isn’t just that they’re everywhere, they are the dominant form of life
Billions of years of evolution and incredible genetic diversity
Complex microbial relationships
But of course, the environment selects (becking)
So if you shove a q-tip everywhere in the environment you can imagine, what distinguishes the collection of microbes found the most?
And it turns out, whether samples are from saline or non-saline envs
This is a principal coordinates plot, will be showing a few, and will go into detail later on how these plots are created. For now, I want you to interpret as “each point is a sample, points closer are more similar, and PC1 is more meaningful than PC1”
Hotsprings/hydrothermal vents are not outliers
Estuarys in the center
How do animals fit in?
Rather, what about the organisms that have specialized to gastrointestinal tracts?
Incredibly, notice signal is washed out, primary difference is now env or vertebrate
Notice salne/nonsaline split is still present
3lbs of organisms in and on you
- Specialization of function
- Extends what our bodies can do
- vitamin synthesis
- scfa production
- break down of complex plant fibers
- protect us, even educate immune system
Microbiome implicated in: autism, obesity, wasting diseases, tylenol toxicity
What do our communities look like from all over?
Three primary clusters of samples
Not explained by person, day or sex
Not just PCoAs, though they do make data exploration like a video game
Enrichment in Crohn’s disease for nasty bugs, highlight fuseobacteria and colorectal cancer relationship
Depletion of organisms that are tied to butyrate and other SCFA production, note that butyrate feeds IECs which coordinate with the innate immune system
ABX usage in these patients appears to amplify the loss of beneficial microbes and gain of detrimental ones
SWITCH TO VIDEO!
Infant time series video, using the short one that rob uses for presentations
INTENTIONALLY CUT OFF
What I think is the coolest picture in the hmp, but this wasn’t new
Lots of taxonomic variation, but what they defined as healthy were individuals aged around 20-40, and not that many individuals at that
But I think the study design issue is highlighted better by Yatsunenko et al
Age gradient in global gut, and primary clusters explained by population
And the diversity represented by the HMP is…
- This little region
Population for the reference is one issue, but technical artifacts abound
Highlight primer choices (eg hmp v13 and v35)
Differing protocols used
Reinforces why it is hard to compare the results of study foo with bar if the protocols differ
Should mention study x used instrument y too…
Practically:
Protocols, primers, voodoo, and sequencing technologies can matter
- Design for reuse and integration
- Sample sizes
Process thousands of samples, entirely blinded for protocols and data processing
Assess variation, what matters what doesn’t’
Includes clinical samples
Vital for medical applications
Manuscript in prep
Add in pop comparisons against HMP
Which is further emphasized with respect to the AG (which is primarily US individuals currently)
For this figure, randomly subsample N samples I times and collect the mean and std of the min observed distance, do this over an increasing number of samples
Note, this is r1-14 and sortmerna, filtered for blooms, and all ag fecal samples @ 1k rarefaction
ICU microbiome data in isolation
Background: two time points, near to admission, at day 10 or dischange
5 sites around the world
No restriction on diagnosis
We see clustering by site, but we don’t know if they are systematically different from the rest of the population
- Combined with full ag (and filtered for blooms)
ICU are large dots, small dots are AG
Trending toward saliva, afaik not observed before
Have not eval’d specific differences yet
- Sleep duration more interesting, exercise freq. and location was weak (p==0.02 and 0.06 respectively, and uncorrected)
NOTE ASD COHORT
British Gut, just off the ground, in conjunction with Dr. Tim Spector
Reduced overhead for participants, can ship domestically instead of internationally
EMP is the project that contains the AG
Aiming to collect 200,000 samples from everywhere, everything (10s of thousands of samples sequenced now)
Over 160 indep. Research projects, and > 200 collaborators
Strict sequencing protocols to minimize technical variability, and maximize reuse of data
Adherence to MIMARKS, rigorous metadata standards
Open access, open source
Works close with GSC
Explains more of the variance than genome
Samples from nearly 1000 individuals, 171 MZ pairs and 245 DZ pairs, all female
Identified christensenellaceae through this as significantly heritable
In mouse transplant experiments, adding c. minuta to innoculum that lacked it resulted in reduced adiposity (donor source was obese human)
Haven’t decided specific one yet, but probably shared OTUs as I know that one the best
Predominantly 16S (but predicting other data works _really_ well!)
Need to get more samples!
Can begin to use AG to provide prelim. evidence for more focused studies
Systems approach: integrating composition with other data types like metabolomics
- Can we apply linear transformations to PCs as to best get the points to match up?
For instance, we can ask how well composition and metagenomic potential are correlated
And it turns out, that it is correlated pretty well. In this data set, fecal samples were collected from a range of mammals
Primary clustering is by diet type, and paired metagenomic-16S samples are highly correlated
Going a step further, in a study of twins discordant for obesity, we can see that transcriptomic and metabolomic data correlate well too
- In conjunction with Jason Bobe, George Church, and the Personal Genome Project, we’ve been collecting microbiome samples from PGP participants
AG is the first outside project on Open Humans
Aiming to be the go to place for members of the general public that want to participate in research
Link studies
Provide mechanisms to link individuals between studies while protecting privacy