STRING - Prediction of functionally associated proteins from heterogeneous genome scale data sets

STRING Prediction of functionally associated proteins from heterogeneous genome scale data sets Lars Juhl Jensen EMBL Heidelberg

Cross-species integration of diverse data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

STRING provides a modular protein network by integrating diverse types of evidence Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning

Two modes of operation “ Protein mode” Separate network for each species “ COG mode” One network covering all species

Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted protuberance Cellulose © Trends Microbiol, 1999 Cell Cell wall Anchoring proteins Cellulosomes Cellulose The “Cellulosome”

Formalizing the phylogenetic profile method Align all proteins against all Calculate best-hit profile Join similar species by PCA Calculate PC profile distances Calibrate against KEGG maps

Score calibration against a common reference ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Predicting functional and physical interactions from gene fusion/fission events Find in A genes that match a the same gene in B Exclude overlapping alignments Calibrate against KEGG maps Calculate all-against-all pairwise alignments

Inferring functional associations from evolutionarily conserved operons Identify runs of adjacent genes with the same direction Score each gene pair based on intergenic distances Calibrate against KEGG maps Infer associations in other species

Evidence transfer based on “fuzzy orthology” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],? Source species Target species

Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments

Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species

Co-mentioning in the scientific literature Associate abstracts with species Identify gene names in title/abstract Count (co-)occurrences of genes Test significance of associations Calibrate against KEGG maps Infer associations in other species

The power of cross-species transfer and evidence integration

Predicting and defining metabolic pathways and other functional modules Image: Molecular Biology of the Cell, 3 . rd edition Metabolism overview Defined manually: cutting metabolic maps into pathways Purine biosynthesis Histidine biosynthesis Defined objectively: standard clustering of genome-scale data

Getting more specific – generally speaking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Summary ,[object Object],[object Object],[object Object],[object Object]

Acknowledgments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

STRING Examples for practical session Lars Juhl Jensen EMBL Heidelberg

STRING - Prediction of functionally associated proteins from heterogeneous genome scale data sets

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to STRING - Prediction of functionally associated proteins from heterogeneous genome scale data sets

Similar to STRING - Prediction of functionally associated proteins from heterogeneous genome scale data sets (20)

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

Recently uploaded

Recently uploaded (20)

STRING - Prediction of functionally associated proteins from heterogeneous genome scale data sets