• Three areas I found interesting over the summer
• Background of the data
• Working with count data
• Finding associations
5 | Jack Simpson
Where does metagenomic data start?
• Where did our counts and OTUs come from?
• Count data is not raw data: many processing decisions
• 16S rRNA gene resolution and primers
• Multiple variable regions
• Lab protocols and comparing projects
• Biological data starts in the real world
6 | Jack Simpson
Zeroes: Much Ado About Nothing…?
• Does absence of evidence == evidence of absence?
• What do we do with zeroes?
• Remove or Pseudocounts?
• When to remove/replace?
• Merged at the class level: visualise and replace zeroes
8 | Jack Simpson
Processing the Data
• OTU count data analysis
• Dealt with zeroes
• Visualised the data
• Normalization and transformation: log or Aitchison’s CLR?
• What do different transformations do to the data with different
• See artefacts related to discretization and zeroes
13 | Jack Simpson
Gut log compositional and log raw data
14 | Jack Simpson
Gut log compositional & clr compositional
15 | Jack Simpson
• Warning: compositional data!
• Be careful with correlation
• Fractions are not independent == negative correlation
• What can be done?
16 | Presentation title | Presenter name
• Metagenomic data background
• Processing our data
• Looking for associations the right way
17 | Jack Simpson