MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease
 

MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease

on

  • 405 views

The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional ...

The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.

Statistics

Views

Total Views
405
Views on SlideShare
277
Embed Views
128

Actions

Likes
0
Downloads
7
Comments
0

3 Embeds 128

https://www.mongodb.com 90
http://www.mongodb.com 36
https://comwww-drupal.10gen.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease Presentation Transcript

  • making connections between genetics and disease MongoDB and the Connectivity Map
  • .
  • .
  • .
  • .
  • . Corey Rajiv
  • a common language Gene Expression
  • .
  • .
  • .
  • .
  • .
  • .13 ~7,000 experiments Over 19,000 registered users Cited by over 1,200 scientific reports
  • . 2006
  • . 2014
  • .16
  • CMap-LINCS dataset 1.4 million gene expression profiles 3,800 Genes (shRNA & cDNA) • Targets/pathways of approved drugs • Candidate disease genes • Community nominations 15 Cell types • Banked primary cell types • Cancer cell lines • Primary hTERT-immortalized • Patient-derived iPS cells • Community nominated 12,488 Compounds • FDA approved drugs • Bioactive tool compounds • Screening hits
  • • Diverse use-cases • Users with varying technical expertise • Annotations are complex and incomplete • Frequent updates CMap Data! Easy to describe, tough to Model
  • Store just what’s needed Refactor frequently Test and use daily Data Model! An agile philosophy keeps the model tractable
  • Data Model! An inventory of signatures siginfo
  • Data Model! Shared fields as separate collections siginfo cellinfo pertinfo
  • Data Model! Add computed fields and external meta-data siginfo cellinfo
  • Data Model! Duplicate data to optimize lookups siginfo pertinfo
  • APIs! Are awesome, we need more of them Picked functionality over convention! /siginfo?q={“cell”:”A”}  vs  /siginfo/cell/A
  • API! MongoDB inspired a rich query syntax Function Example Query /siginfo?q={“cell:”A”,”name”:”B”} Field selection /siginfo?q={}&f={“name”:1} Document count /siginfo?q={}&c=true Document limit /siginfo?q={}&l=10 Skip documents /siginfo?q={}&l=10&sk=10 Sort order /siginfo?q={}&s={“name”:-­‐1,”cell”:1} Distinct values /siginfo?q={}&d=name Aggregation /siginfo?q={}&g=name
  • API! Node and Mongoose enable easy API creation
  • Language Bindings! JSON as a universal format Javascript Python R
  • Analytic Tools! A compute API liberates command line scripts
  • Compute API! Messaging handled via a capped collection
  • Input Validation! JSON Schema simplifies validation
  • GCTX : A binary format based on HDF5 Cross platform Multi-language support Efficient I/O Storage size for 30 billion data points is 110 Gb Numeric Matrix Data! HDF5 offers efficient storage for large matrices
  • Sign up at lincscloud.org Lincscloud! A platform for easy access to perturbational data Free for academic use
  • Predicting Drug Function! Diverse structures, common activities
  • Predicting Drug Function! Diverse structures, common activities VEGFR inhibitor PPARG agonist PI3K/MTOR inhibitor ROCK inhibitor Estrogen agonist
  • Finding Novel Drug Targets! Repurposing failed drugs Original target
  • Finding Novel Drug Targets! Repurposing failed drugs Original target Failed in Phase 2 clinical trial due to lack of efficacy
  • Finding Novel Drug Targets! Repurposing failed drugs Original target Novel Target A Novel Target B Novel Target C Novel Target D
  • Acknowledgements Todd Golub
 Core Team: Analysis & Software Arvind Subramanian Jacob Asiedu Larson Hogstrom Ian Smith David Lahr Aravind Subramanian Josh Gould Ted Natoli David Wadden ! Core Team: Lab John Davis David Peck Xiaodong Lu Melanie Donahue Daniel Lam Jackie Rosains (Project Manager) Collaborators Bang Wong Steven Corsello (Golub lab) Jake Jaffe (Proteomics) David Takeda (Hahn lab) Pablo Tamayo ! Chemistry & Therapeutics Lucienne Ronco Josh Bittker Arthur Liberzon Mathias Wawer Paul Clemons ! Genetic Perturbation Platform John Doench Federica Piccioni David Root