This document discusses the Connectivity Map (CMap), a resource for exploring connections between genetics and disease. CMap contains over 1.4 million gene expression profiles from experiments with over 15 cell types and 12,488 compounds. It has been cited in over 1,200 scientific reports and has over 19,000 registered users. The document outlines CMap's data model for storing experimental signatures and metadata in a way that is flexible, scalable, and optimized for querying. It also discusses APIs for accessing and analyzing CMap data, as well as tools for tasks like predicting drug function and finding novel drug targets.
17. CMap-LINCS dataset
1.4 million gene expression profiles
3,800 Genes (shRNA & cDNA)
• Targets/pathways of approved drugs
• Candidate disease genes
• Community nominations
15 Cell types
• Banked primary cell types
• Cancer cell lines
• Primary hTERT-immortalized
• Patient-derived iPS cells
• Community nominated
12,488 Compounds
• FDA approved drugs
• Bioactive tool compounds
• Screening hits
18. • Diverse use-cases
• Users with varying technical expertise
• Annotations are complex and
incomplete
• Frequent updates
CMap Data!
Easy to describe, tough to Model
19. Store just what’s needed
Refactor frequently
Test and use daily
Data Model!
An agile philosophy keeps the model tractable
34. GCTX : A binary format based on HDF5
Cross platform
Multi-language support
Efficient I/O
Storage size for 30 billion data points is 110 Gb
Numeric Matrix Data!
HDF5 offers efficient storage for large matrices
35. Sign up at lincscloud.org
Lincscloud!
A platform for easy access to perturbational data
Free for academic use
39. Finding Novel Drug Targets!
Repurposing failed drugs
Original target
Failed in Phase 2 clinical trial due to lack of efficacy
40. Finding Novel Drug Targets!
Repurposing failed drugs
Original target
Novel Target A
Novel Target B
Novel Target C
Novel Target D
41.
42.
43.
44. Acknowledgements
Todd Golub
Core Team: Analysis & Software
Arvind Subramanian
Jacob Asiedu
Larson Hogstrom
Ian Smith
David Lahr
Aravind Subramanian
Josh Gould
Ted Natoli
David Wadden
!
Core Team: Lab
John Davis
David Peck
Xiaodong Lu
Melanie Donahue
Daniel Lam
Jackie Rosains (Project Manager)
Collaborators
Bang Wong
Steven Corsello (Golub lab)
Jake Jaffe (Proteomics)
David Takeda (Hahn lab)
Pablo Tamayo
!
Chemistry & Therapeutics
Lucienne Ronco
Josh Bittker
Arthur Liberzon
Mathias Wawer
Paul Clemons
!
Genetic Perturbation Platform
John Doench
Federica Piccioni
David Root