SlideShare a Scribd company logo
1 of 44
Download to read offline
making connections between genetics and disease
MongoDB and the Connectivity Map
.
.
.
.
.
Corey Rajiv
a common language
Gene Expression
.
.
.
.
.
.13
~7,000 experiments
Over 19,000 registered users
Cited by over 1,200 scientific reports
.
2006
.
2014
.16
CMap-LINCS dataset
1.4 million gene expression profiles
3,800 Genes (shRNA & cDNA)
• Targets/pathways of approved drugs
• Candidate disease genes
• Community nominations
15 Cell types
• Banked primary cell types
• Cancer cell lines
• Primary hTERT-immortalized
• Patient-derived iPS cells
• Community nominated
12,488 Compounds
• FDA approved drugs
• Bioactive tool compounds
• Screening hits
• Diverse use-cases
• Users with varying technical expertise
• Annotations are complex and
incomplete
• Frequent updates
CMap Data!
Easy to describe, tough to Model
Store just what’s needed
Refactor frequently
Test and use daily
Data Model!
An agile philosophy keeps the model tractable
Data Model!
An inventory of signatures
siginfo
Data Model!
Shared fields as separate collections
siginfo
cellinfo
pertinfo
Data Model!
Add computed fields and external meta-data
siginfo cellinfo
Data Model!
Duplicate data to optimize lookups
siginfo pertinfo
APIs!
Are awesome, we need more of them
Picked functionality over convention!
/siginfo?q={“cell”:”A”}	
  vs	
  /siginfo/cell/A
API!
MongoDB inspired a rich query syntax
Function Example
Query /siginfo?q={“cell:”A”,”name”:”B”}
Field selection /siginfo?q={}&f={“name”:1}
Document count /siginfo?q={}&c=true
Document limit /siginfo?q={}&l=10
Skip documents /siginfo?q={}&l=10&sk=10
Sort order /siginfo?q={}&s={“name”:-­‐1,”cell”:1}
Distinct values /siginfo?q={}&d=name
Aggregation /siginfo?q={}&g=name
API!
Node and Mongoose enable easy API creation
Language Bindings!
JSON as a universal format
Javascript
Python
R
Analytic Tools!
A compute API liberates command line scripts
Compute API!
Messaging handled via a capped collection
Input Validation!
JSON Schema simplifies validation
GCTX : A binary format based on HDF5
Cross platform
Multi-language support
Efficient I/O
Storage size for 30 billion data points is 110 Gb
Numeric Matrix Data!
HDF5 offers efficient storage for large matrices
Sign up at lincscloud.org
Lincscloud!
A platform for easy access to perturbational data
Free for academic use
Predicting Drug Function!
Diverse structures, common activities
Predicting Drug Function!
Diverse structures, common activities
VEGFR inhibitor
PPARG agonist
PI3K/MTOR inhibitor
ROCK inhibitor
Estrogen agonist
Finding Novel Drug Targets!
Repurposing failed drugs
Original target
Finding Novel Drug Targets!
Repurposing failed drugs
Original target
Failed in Phase 2 clinical trial due to lack of efficacy
Finding Novel Drug Targets!
Repurposing failed drugs
Original target
Novel Target A
Novel Target B
Novel Target C
Novel Target D
Acknowledgements
Todd Golub

Core Team: Analysis & Software
Arvind Subramanian
Jacob Asiedu
Larson Hogstrom
Ian Smith
David Lahr
Aravind Subramanian
Josh Gould
Ted Natoli
David Wadden
!
Core Team: Lab
John Davis
David Peck
Xiaodong Lu
Melanie Donahue
Daniel Lam
Jackie Rosains (Project Manager)
Collaborators
Bang Wong
Steven Corsello (Golub lab)
Jake Jaffe (Proteomics)
David Takeda (Hahn lab)
Pablo Tamayo
!
Chemistry & Therapeutics
Lucienne Ronco
Josh Bittker
Arthur Liberzon
Mathias Wawer
Paul Clemons
!
Genetic Perturbation Platform
John Doench
Federica Piccioni
David Root

More Related Content

What's hot

BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
Kemele M. Endris
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Eleanor Howe
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
Sunghwan Kim
 

What's hot (20)

SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTS
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hub
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
GtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextGtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in context
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...
Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...
Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 

Viewers also liked

The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
MongoDB
 
A Translational Medicine Platform at Sanofi
A Translational Medicine Platform at SanofiA Translational Medicine Platform at Sanofi
A Translational Medicine Platform at Sanofi
MongoDB
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB
 
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
MongoDB
 
Scaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersScaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M Users
MongoDB
 

Viewers also liked (12)

The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
 
MongoDB at Medtronic
MongoDB at MedtronicMongoDB at Medtronic
MongoDB at Medtronic
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
 
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
 
A Translational Medicine Platform at Sanofi
A Translational Medicine Platform at SanofiA Translational Medicine Platform at Sanofi
A Translational Medicine Platform at Sanofi
 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark Helmstetter
 
Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
 
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
 
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDBMongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
 
Scaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersScaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M Users
 

Similar to MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease

Similar to MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease (20)

MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Platforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esPlatforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-es
 
Wikidata for biomedical knowledge integration and curation
Wikidata for biomedical knowledge integration and curationWikidata for biomedical knowledge integration and curation
Wikidata for biomedical knowledge integration and curation
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Rna seq
Rna seq Rna seq
Rna seq
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
 
From Expression to Pathways Using Online Tools
From Expression to Pathways Using Online ToolsFrom Expression to Pathways Using Online Tools
From Expression to Pathways Using Online Tools
 
IUPHAR/BPS Guide to PHARMACOLOGY in 2017: new features and updates
IUPHAR/BPS Guide to PHARMACOLOGY in 2017: new features and updatesIUPHAR/BPS Guide to PHARMACOLOGY in 2017: new features and updates
IUPHAR/BPS Guide to PHARMACOLOGY in 2017: new features and updates
 
iOMICS Research
iOMICS ResearchiOMICS Research
iOMICS Research
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease

  • 1. making connections between genetics and disease MongoDB and the Connectivity Map
  • 2. .
  • 3. .
  • 4. .
  • 5. .
  • 8. .
  • 9. .
  • 10. .
  • 11. .
  • 12. .
  • 13. .13 ~7,000 experiments Over 19,000 registered users Cited by over 1,200 scientific reports
  • 16. .16
  • 17. CMap-LINCS dataset 1.4 million gene expression profiles 3,800 Genes (shRNA & cDNA) • Targets/pathways of approved drugs • Candidate disease genes • Community nominations 15 Cell types • Banked primary cell types • Cancer cell lines • Primary hTERT-immortalized • Patient-derived iPS cells • Community nominated 12,488 Compounds • FDA approved drugs • Bioactive tool compounds • Screening hits
  • 18. • Diverse use-cases • Users with varying technical expertise • Annotations are complex and incomplete • Frequent updates CMap Data! Easy to describe, tough to Model
  • 19. Store just what’s needed Refactor frequently Test and use daily Data Model! An agile philosophy keeps the model tractable
  • 20. Data Model! An inventory of signatures siginfo
  • 21. Data Model! Shared fields as separate collections siginfo cellinfo pertinfo
  • 22. Data Model! Add computed fields and external meta-data siginfo cellinfo
  • 23. Data Model! Duplicate data to optimize lookups siginfo pertinfo
  • 24. APIs! Are awesome, we need more of them Picked functionality over convention! /siginfo?q={“cell”:”A”}  vs  /siginfo/cell/A
  • 25. API! MongoDB inspired a rich query syntax Function Example Query /siginfo?q={“cell:”A”,”name”:”B”} Field selection /siginfo?q={}&f={“name”:1} Document count /siginfo?q={}&c=true Document limit /siginfo?q={}&l=10 Skip documents /siginfo?q={}&l=10&sk=10 Sort order /siginfo?q={}&s={“name”:-­‐1,”cell”:1} Distinct values /siginfo?q={}&d=name Aggregation /siginfo?q={}&g=name
  • 26. API! Node and Mongoose enable easy API creation
  • 27. Language Bindings! JSON as a universal format Javascript Python R
  • 28.
  • 29.
  • 30.
  • 31. Analytic Tools! A compute API liberates command line scripts
  • 32. Compute API! Messaging handled via a capped collection
  • 33. Input Validation! JSON Schema simplifies validation
  • 34. GCTX : A binary format based on HDF5 Cross platform Multi-language support Efficient I/O Storage size for 30 billion data points is 110 Gb Numeric Matrix Data! HDF5 offers efficient storage for large matrices
  • 35. Sign up at lincscloud.org Lincscloud! A platform for easy access to perturbational data Free for academic use
  • 36. Predicting Drug Function! Diverse structures, common activities
  • 37. Predicting Drug Function! Diverse structures, common activities VEGFR inhibitor PPARG agonist PI3K/MTOR inhibitor ROCK inhibitor Estrogen agonist
  • 38. Finding Novel Drug Targets! Repurposing failed drugs Original target
  • 39. Finding Novel Drug Targets! Repurposing failed drugs Original target Failed in Phase 2 clinical trial due to lack of efficacy
  • 40. Finding Novel Drug Targets! Repurposing failed drugs Original target Novel Target A Novel Target B Novel Target C Novel Target D
  • 41.
  • 42.
  • 43.
  • 44. Acknowledgements Todd Golub
 Core Team: Analysis & Software Arvind Subramanian Jacob Asiedu Larson Hogstrom Ian Smith David Lahr Aravind Subramanian Josh Gould Ted Natoli David Wadden ! Core Team: Lab John Davis David Peck Xiaodong Lu Melanie Donahue Daniel Lam Jackie Rosains (Project Manager) Collaborators Bang Wong Steven Corsello (Golub lab) Jake Jaffe (Proteomics) David Takeda (Hahn lab) Pablo Tamayo ! Chemistry & Therapeutics Lucienne Ronco Josh Bittker Arthur Liberzon Mathias Wawer Paul Clemons ! Genetic Perturbation Platform John Doench Federica Piccioni David Root