SlideShare a Scribd company logo
The landscape of microbial
phenotypic traits and
associated genes
Maria Brbić1, Matija Piškorec 1, Vedrana Vidulin 1, Anita Kriško 2,
Tomislav Šmuc 1, Fran Supek 1,3*
1 Rudjer Boskovic Institute, Zagreb, Croatia
2 Mediterranean Institute of Life Sciences, Split, Croatia
1,3 Centre for Genomic Regulation, Barcelona, Spain
* current address: Institute for Research in Biomedicine
(IRB Barcelona)
published as: Brbić M et al. (2016) Nucleic Acids Research
Microbial phenomics
• Prokaryotes display a variety of phenotypic traits
ARCHAEABACTERIA
The amount of prokaryotic
genomes is rapidly increasing
The scientific literature abounds with
trait descriptions stored as
unstructured text
High-quality phenotype
annotation of microbes
are not keeping pace
Relying on manual curation does not
scale with the increasing volume of
scientific publications (and also
sequenced genomes)
Adressing the scarcity of systematic phenotype
annotations
We have developed the ProTraits
pipeline that:
relies on text mining of biological
literature to annotate microbes with
phenotypic traits
is able to define novel phenotypic
concepts from free text
draws extensively on comparative
genomics to validate text-mining
inferences
http://protraits.irb.hr/
published as: Brbić M et al. (2016) Nucleic Acids Research
Inital data sets
Description of sites where a microbe
was isolated from  no negative
examples
The most complete sets
currently available – many
microbes are still missing
Collected from 100s of
papers by manual curation
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
?
?
?
?
?
?
?
?
??
?
? ? ?
In addition to existing traits…
• We applied non-negative matrix factorization (NMF) to model phenotypic
concepts across texts
 Run NMF on each individual text
corpora
 Cluster topics and retain only those
consistent at least 3 text corpora
 Repeat for different number of factors
(50 &100) and seeds
 Manual curation
NMF-based methodology described in: Brbić M et al. (2016) Nucleic Acids Research
Phenotype predictions from text mining
• SVMs with linear kernel trained for 424 phenotypes, 6 text corpora (Wikipedia,
PubMed etc.)
infect lung disease … Plant pathogen Gram positive Mesophilic …
Escherichia coli 0 0 1
Bacillus subtilis 0 1 1
Instances:
species/
documents
Attributes: words Class labels
BAG OF
WORDS
• Cross-validation precision-recall
curves used to convert the SVM
confidence scores to precision
(1-FDRs)
How accurate are SVM models to predict
phenotypes?
What are the most
discriminative words?
e.g. the ‘halophilic’ phenotype
Brbić M et al. (2016) Nucleic Acids Research
In addition to the text mining…
• We predicted phenotypes using 5 (independant) types of genomic data:
1. Proteome composition
2. Gene repertoires
3. Metagenomic co-occurrence
4. Gene neighbourhood
5. Translation efficiency
Thus far have been used to
predict 10s of phenotypes
Novel use to systematically
predict many phenotypes
all predictions browsable/available at:
http://protraits.irb.hr/
How many novel phenotypes do we infer?
~180, 000
at FDR<10%
~308, 000
at FDR<10%
~545, 000
at FDR<10%
Are these predictions correct?
?
two human experts
validated ~2500
predictions by
literature curation 
The ProTraits pipeline for phenotype prediction
next up:
Finding
gene-trait
associatons
using ProTraits
predictions
a network of co-occurence of ~400 phenotypic traits across organisms
all annotations available at:
http://protraits.irb.hr/
Gene-phenotype associations
Can we increase the power
to detect gene-phenotype
associations with ProTraits
predictions?
• We tested for significant associations of ~80 000
COG/NOG gene families to 332 phenotypes
using logistic regression, while controlling for
confounding effects of phylogenetic relatedness
of organisms ~20 000 associations at
FDR<10% with known labels
~117 000 associations at
FDR<10% with ProTraits labels
pathogenic to plants?
previously: 61 assoc. genes
with ProTraits: 648 genes
pathogenic to animals?
Previously: 57 assoc. genes
w/ProTraits: 1187 genes
Brbić M et al. (2016) Nucleic Acids Research
Are these associations correct?
• We validated gene-phenotype associations for sporulation and
flagellar phenotypes in Bacillus subtilis
Validation: Gene Ontology enrichment
• We linked the individual phenotypes
to genes, which were highly enriched
in various biological functions on average,
15 GO terms per
phenotype at FDR<10%
with known labels
increases to
23 GO terms per
phenotype at FDR<10%
with added ProTraits
labels
GO termTrait
iron-reducer iron ion
binding
halophilic sodium ion
transport
pathogenic
in mammals
pathogenesis
lactic/cheese
/food/ferme
nt/milk
galactose
metabolic
process
example associations that are significant only
when using ProTraits phenotype annotations:
Epistatic interactions between gene families
Synergistic epistatic
interaction
Antagonistic
epistatic interaction
Interaction may not be
evident without
knowing the
phenotypic labels
... i.e. not obvious simply from looking at phylogenetic profiles! These are 3-way interactions: gene-gene-phenotype.
Epistatic interactions between COGs
We tested for significant epistatic interactions of ~2700 COG/NOG gene families to phenotypes,
while controlling for confounding effects of phylogenetic relatedness of organisms
5.7*105 interactions at FDR<1%
with known labels
3.9*106 interactions at FDR<1%
with ProTraits labels
known
sporulation
genes new sporulation
genes
validated by
independant
data:
Meeske et al.
(2016) PLOS Biol
genetic interactions conditional
on the ‘endospore+’ phenotype
Genomic signatures of
epistasis are useful for
inferring gene function
Brbić M et al. (2016) Nucleic Acids Research
Travel to the meeting was made possible, in part, by a travel
award from the NSF. (to M.B.)
Predictions available at: protraits.irb.hr
Contact: maria.brbic@irb.hr / fran.supek@irb.hr
This work was supported by the FP7 FET project MAESTRA, and by the Croatian Science Foundation project.
Brbić M et al. (2016) Nucleic Acids Research:

More Related Content

What's hot

Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Thermo Fisher Scientific
 
Wang_David_Reseach_Paper_Abstract
Wang_David_Reseach_Paper_AbstractWang_David_Reseach_Paper_Abstract
Wang_David_Reseach_Paper_Abstract
David Wang
 
JSmithAbstractAngovFinal
JSmithAbstractAngovFinalJSmithAbstractAngovFinal
JSmithAbstractAngovFinal
Jacob Smith
 

What's hot (20)

GRC poster
GRC posterGRC poster
GRC poster
 
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
 
Wang_David_Reseach_Paper_Abstract
Wang_David_Reseach_Paper_AbstractWang_David_Reseach_Paper_Abstract
Wang_David_Reseach_Paper_Abstract
 
Rapid 16S Next Generation Sequencing for Bacterial Identification in Polymicr...
Rapid 16S Next Generation Sequencing for Bacterial Identification in Polymicr...Rapid 16S Next Generation Sequencing for Bacterial Identification in Polymicr...
Rapid 16S Next Generation Sequencing for Bacterial Identification in Polymicr...
 
Endosymbiosis & cyanobacteria
Endosymbiosis & cyanobacteriaEndosymbiosis & cyanobacteria
Endosymbiosis & cyanobacteria
 
TEDMED 2013 Talk
TEDMED 2013 TalkTEDMED 2013 Talk
TEDMED 2013 Talk
 
Karyotype variability in plant pathogenic fungi palm7016
Karyotype variability in plant pathogenic fungi  palm7016Karyotype variability in plant pathogenic fungi  palm7016
Karyotype variability in plant pathogenic fungi palm7016
 
Microbial Evolution
Microbial EvolutionMicrobial Evolution
Microbial Evolution
 
Biological weapons based on genes like myosin or tubulin that are shared at ...
Biological weapons  based on genes like myosin or tubulin that are shared at ...Biological weapons  based on genes like myosin or tubulin that are shared at ...
Biological weapons based on genes like myosin or tubulin that are shared at ...
 
JSmithAbstractAngovFinal
JSmithAbstractAngovFinalJSmithAbstractAngovFinal
JSmithAbstractAngovFinal
 
Diversity of bacterial symbionts in weevils
Diversity of bacterial symbionts in weevilsDiversity of bacterial symbionts in weevils
Diversity of bacterial symbionts in weevils
 
mmmmm
mmmmmmmmmm
mmmmm
 
Seminario Biología Molecular
Seminario Biología MolecularSeminario Biología Molecular
Seminario Biología Molecular
 
Seminario
SeminarioSeminario
Seminario
 
Seminario
SeminarioSeminario
Seminario
 
Journal Club Presentation on "Culturing of ‘unculturable’ human microbiota re...
Journal Club Presentation on "Culturing of ‘unculturable’ human microbiota re...Journal Club Presentation on "Culturing of ‘unculturable’ human microbiota re...
Journal Club Presentation on "Culturing of ‘unculturable’ human microbiota re...
 
Quorum sensing and applications in biotechnology
Quorum sensing and applications in biotechnologyQuorum sensing and applications in biotechnology
Quorum sensing and applications in biotechnology
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 
Molecular Systematics and Biodiversity
Molecular Systematics and BiodiversityMolecular Systematics and Biodiversity
Molecular Systematics and Biodiversity
 

Similar to The landscape of microbial phenotypic traits and associated genes

Identification of the positively selected genes governing host-pathogen arm r...
Identification of the positively selected genes governing host-pathogen arm r...Identification of the positively selected genes governing host-pathogen arm r...
Identification of the positively selected genes governing host-pathogen arm r...
Atai Rabby
 
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Ivan Brukner
 
zandona14nipsA0
zandona14nipsA0zandona14nipsA0
zandona14nipsA0
Pia Sen
 
Computational_biology_project_report
Computational_biology_project_reportComputational_biology_project_report
Computational_biology_project_report
Elijah Willie
 

Similar to The landscape of microbial phenotypic traits and associated genes (20)

Microbial source tracking markers for detection of fecal contamination in env...
Microbial source tracking markers for detection of fecal contamination in env...Microbial source tracking markers for detection of fecal contamination in env...
Microbial source tracking markers for detection of fecal contamination in env...
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Identification of the positively selected genes governing host-pathogen arm r...
Identification of the positively selected genes governing host-pathogen arm r...Identification of the positively selected genes governing host-pathogen arm r...
Identification of the positively selected genes governing host-pathogen arm r...
 
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
zandona14nipsA0
zandona14nipsA0zandona14nipsA0
zandona14nipsA0
 
Biomarkers brain regions
Biomarkers brain regionsBiomarkers brain regions
Biomarkers brain regions
 
OMICS in Crop Improvement.pptx
OMICS in Crop Improvement.pptxOMICS in Crop Improvement.pptx
OMICS in Crop Improvement.pptx
 
Microbe diversity-handout
Microbe diversity-handoutMicrobe diversity-handout
Microbe diversity-handout
 
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
1471 2148-6-99
1471 2148-6-991471 2148-6-99
1471 2148-6-99
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
1. MMLS 3 23Feb Sudhir Navathe.pdf
1. MMLS 3 23Feb Sudhir Navathe.pdf1. MMLS 3 23Feb Sudhir Navathe.pdf
1. MMLS 3 23Feb Sudhir Navathe.pdf
 
Microbiome_Ehrlich_final.pdf
Microbiome_Ehrlich_final.pdfMicrobiome_Ehrlich_final.pdf
Microbiome_Ehrlich_final.pdf
 
Bruno pot
Bruno pot Bruno pot
Bruno pot
 
CV_Michiko Sumiya
CV_Michiko SumiyaCV_Michiko Sumiya
CV_Michiko Sumiya
 
Computational_biology_project_report
Computational_biology_project_reportComputational_biology_project_report
Computational_biology_project_report
 
Tulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationTulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integration
 
Associations between parasites and microbiota in intestinal communities of wi...
Associations between parasites and microbiota in intestinal communities of wi...Associations between parasites and microbiota in intestinal communities of wi...
Associations between parasites and microbiota in intestinal communities of wi...
 

Recently uploaded

FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
sreddyrahul
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
Sérgio Sacani
 

Recently uploaded (20)

Shuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptxShuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
biotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptxbiotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptx
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
GEOLOGICAL FIELD REPORT On Kaptai Rangamati Road-Cut Section.pdf
GEOLOGICAL FIELD REPORT  On  Kaptai Rangamati Road-Cut Section.pdfGEOLOGICAL FIELD REPORT  On  Kaptai Rangamati Road-Cut Section.pdf
GEOLOGICAL FIELD REPORT On Kaptai Rangamati Road-Cut Section.pdf
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinath
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
electrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptxelectrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptx
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 

The landscape of microbial phenotypic traits and associated genes

  • 1. The landscape of microbial phenotypic traits and associated genes Maria Brbić1, Matija Piškorec 1, Vedrana Vidulin 1, Anita Kriško 2, Tomislav Šmuc 1, Fran Supek 1,3* 1 Rudjer Boskovic Institute, Zagreb, Croatia 2 Mediterranean Institute of Life Sciences, Split, Croatia 1,3 Centre for Genomic Regulation, Barcelona, Spain * current address: Institute for Research in Biomedicine (IRB Barcelona) published as: Brbić M et al. (2016) Nucleic Acids Research
  • 2. Microbial phenomics • Prokaryotes display a variety of phenotypic traits ARCHAEABACTERIA
  • 3. The amount of prokaryotic genomes is rapidly increasing The scientific literature abounds with trait descriptions stored as unstructured text High-quality phenotype annotation of microbes are not keeping pace Relying on manual curation does not scale with the increasing volume of scientific publications (and also sequenced genomes)
  • 4. Adressing the scarcity of systematic phenotype annotations We have developed the ProTraits pipeline that: relies on text mining of biological literature to annotate microbes with phenotypic traits is able to define novel phenotypic concepts from free text draws extensively on comparative genomics to validate text-mining inferences http://protraits.irb.hr/ published as: Brbić M et al. (2016) Nucleic Acids Research
  • 5. Inital data sets Description of sites where a microbe was isolated from  no negative examples The most complete sets currently available – many microbes are still missing Collected from 100s of papers by manual curation ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ?
  • 6. In addition to existing traits… • We applied non-negative matrix factorization (NMF) to model phenotypic concepts across texts  Run NMF on each individual text corpora  Cluster topics and retain only those consistent at least 3 text corpora  Repeat for different number of factors (50 &100) and seeds  Manual curation NMF-based methodology described in: Brbić M et al. (2016) Nucleic Acids Research
  • 7. Phenotype predictions from text mining • SVMs with linear kernel trained for 424 phenotypes, 6 text corpora (Wikipedia, PubMed etc.) infect lung disease … Plant pathogen Gram positive Mesophilic … Escherichia coli 0 0 1 Bacillus subtilis 0 1 1 Instances: species/ documents Attributes: words Class labels BAG OF WORDS • Cross-validation precision-recall curves used to convert the SVM confidence scores to precision (1-FDRs)
  • 8. How accurate are SVM models to predict phenotypes? What are the most discriminative words? e.g. the ‘halophilic’ phenotype Brbić M et al. (2016) Nucleic Acids Research
  • 9. In addition to the text mining… • We predicted phenotypes using 5 (independant) types of genomic data: 1. Proteome composition 2. Gene repertoires 3. Metagenomic co-occurrence 4. Gene neighbourhood 5. Translation efficiency Thus far have been used to predict 10s of phenotypes Novel use to systematically predict many phenotypes all predictions browsable/available at: http://protraits.irb.hr/
  • 10. How many novel phenotypes do we infer? ~180, 000 at FDR<10% ~308, 000 at FDR<10% ~545, 000 at FDR<10%
  • 11. Are these predictions correct? ? two human experts validated ~2500 predictions by literature curation 
  • 12. The ProTraits pipeline for phenotype prediction next up: Finding gene-trait associatons using ProTraits predictions a network of co-occurence of ~400 phenotypic traits across organisms all annotations available at: http://protraits.irb.hr/
  • 13. Gene-phenotype associations Can we increase the power to detect gene-phenotype associations with ProTraits predictions? • We tested for significant associations of ~80 000 COG/NOG gene families to 332 phenotypes using logistic regression, while controlling for confounding effects of phylogenetic relatedness of organisms ~20 000 associations at FDR<10% with known labels ~117 000 associations at FDR<10% with ProTraits labels pathogenic to plants? previously: 61 assoc. genes with ProTraits: 648 genes pathogenic to animals? Previously: 57 assoc. genes w/ProTraits: 1187 genes Brbić M et al. (2016) Nucleic Acids Research
  • 14. Are these associations correct? • We validated gene-phenotype associations for sporulation and flagellar phenotypes in Bacillus subtilis
  • 15. Validation: Gene Ontology enrichment • We linked the individual phenotypes to genes, which were highly enriched in various biological functions on average, 15 GO terms per phenotype at FDR<10% with known labels increases to 23 GO terms per phenotype at FDR<10% with added ProTraits labels GO termTrait iron-reducer iron ion binding halophilic sodium ion transport pathogenic in mammals pathogenesis lactic/cheese /food/ferme nt/milk galactose metabolic process example associations that are significant only when using ProTraits phenotype annotations:
  • 16. Epistatic interactions between gene families Synergistic epistatic interaction Antagonistic epistatic interaction Interaction may not be evident without knowing the phenotypic labels ... i.e. not obvious simply from looking at phylogenetic profiles! These are 3-way interactions: gene-gene-phenotype.
  • 17. Epistatic interactions between COGs We tested for significant epistatic interactions of ~2700 COG/NOG gene families to phenotypes, while controlling for confounding effects of phylogenetic relatedness of organisms 5.7*105 interactions at FDR<1% with known labels 3.9*106 interactions at FDR<1% with ProTraits labels known sporulation genes new sporulation genes validated by independant data: Meeske et al. (2016) PLOS Biol genetic interactions conditional on the ‘endospore+’ phenotype Genomic signatures of epistasis are useful for inferring gene function Brbić M et al. (2016) Nucleic Acids Research
  • 18. Travel to the meeting was made possible, in part, by a travel award from the NSF. (to M.B.) Predictions available at: protraits.irb.hr Contact: maria.brbic@irb.hr / fran.supek@irb.hr This work was supported by the FP7 FET project MAESTRA, and by the Croatian Science Foundation project. Brbić M et al. (2016) Nucleic Acids Research: