SlideShare a Scribd company logo
1 of 34
Speeding Up
Drug Research
with MongoDB
Introducing MongoDB
into an RDBMS Environment
Doug Garrett
• Genentech Research and Early
Development (gRED)
Bioinformatics and Computational Biology
(B&CB)
Software Engineer
gRED: Disease and Drug Research
Bioinformatics Customers: Scientists
Most of All: Patients
Bioinformatics: Not Your Typical IT
MongoDB
• Not just about big data
MongoDB has a flexible schema
• Not just about new systems
MongoDB easily integrates with RDBMS
• Not just about software
It’s about saving lives
Time to Introduce New Genetic Test
Weeks
Drug Development Process
9
New
Drug
Drug Development Process
10
New
Drug
Drug Development Process
11
New
Drug
New Mouse Model - Genetic Testing
File (csv)
J. Colin Cox Sept. 2013 Presentation
Growth In Genetic Testing
(thousands)
Samples
Genotypes
6 months 3 months
6 months 3 months
6 months 3 months
Varies by Genetic Test
Case Study: New Genetic Test Instrument
20
New
Instrument!
Impact?
Bio-Rad
CFX384
ABI
7900HT
Case Study: New Genetic Test Instrument
21
New
Instrument!
Impact?
DB Schema?
No Impact
Project?
3 weeks
Bio-Rad
CFX384
ABI
7900HT
Going Live…
22
Failure Mode
Failure Mode
Synch MongoDB with RDBMS
db.NewRdbmsMongoId
.find().forEach(function(doc){
db. TestResults
.update({'_id':doc._id},
{'$inc':{useCount:1}})
})
db. db. TestResults.remove({'useCount':0});
But Wait! There’s More…
• Flexible data collection
Load CSV to MongoDB
"_id" : ObjectId(“…."),
“plate_wells” : [
{ "Well" : "A01",
"Sample" : "308…",
…
}
]
Add Fields to CSV
"_id" : ObjectId(“…."),
“plate_wells” : [
{ "Well" : "A01",
"Sample" : "308…",
…
"New1" : "New Value"
}
]
Future – What If…
Avoiding the typical “Catch 22”:
1.Is it worth collecting the data?
2.What is the value of the data?
3.Need the data to find the value
Future Analytics
MongoDB
Aggregation
Framework
R
Matlab
MongoDB Aggregation Framework
40% Discount Thru July 4
Use Code: mdbdgcf
Under Discussion
CSV
JSON
XML
Other
Lab
Instruments
Government
Agencies
Third Party
Sources
MongoDB
Load to RDBMS
Process Directly
MongoDB
• Not just about big data
MongoDB has a flexible schema
• Not just about new systems
MongoDB easily integrates with RDBMS
• Not just about software
It’s about saving lives
“You better do it fast”
For my Father
Who I hope would have enjoyed this talk

More Related Content

Viewers also liked

Michael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice FusionMichael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice FusionMongoDB
 
A Translational Medicine Platform at Sanofi
A Translational Medicine Platform at SanofiA Translational Medicine Platform at Sanofi
A Translational Medicine Platform at SanofiMongoDB
 
App Sharding to Autosharding at Sailthru
App Sharding to Autosharding at SailthruApp Sharding to Autosharding at Sailthru
App Sharding to Autosharding at SailthruMongoDB
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBMongoDB
 
AWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsAWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsMongoDB
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
 
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...Andrew Morgan
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...MongoDB
 

Viewers also liked (9)

Michael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice FusionMichael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice Fusion
 
A Translational Medicine Platform at Sanofi
A Translational Medicine Platform at SanofiA Translational Medicine Platform at Sanofi
A Translational Medicine Platform at Sanofi
 
App Sharding to Autosharding at Sailthru
App Sharding to Autosharding at SailthruApp Sharding to Autosharding at Sailthru
App Sharding to Autosharding at Sailthru
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
AWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsAWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and Results
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
 

Similar to The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Genentech)

Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Aakshay Subramaniam
 
Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)siddharth singh
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Giab workshop update mar2019
Giab workshop update mar2019Giab workshop update mar2019
Giab workshop update mar2019GenomeInABottle
 
Make better drug discovery decisions through collaborative analytics cdd we...
Make better drug discovery decisions through collaborative analytics   cdd we...Make better drug discovery decisions through collaborative analytics   cdd we...
Make better drug discovery decisions through collaborative analytics cdd we...Lixin Liu
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVSGolden Helix
 
Best Practices for Building an End-to-End Workflow for Microbial Genomics
 Best Practices for Building an End-to-End Workflow for Microbial Genomics Best Practices for Building an End-to-End Workflow for Microbial Genomics
Best Practices for Building an End-to-End Workflow for Microbial GenomicsJonathan Jacobs, PhD
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
In-silico Drug designing
In-silico Drug designing In-silico Drug designing
In-silico Drug designing Vikas Sinhmar
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftDatabricks
 
Legochem Company ppt
Legochem Company pptLegochem Company ppt
Legochem Company ppttheeastrd
 
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...Al Dossetter
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 

Similar to The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Genentech) (20)

Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)
 
Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Giab workshop update mar2019
Giab workshop update mar2019Giab workshop update mar2019
Giab workshop update mar2019
 
Make better drug discovery decisions through collaborative analytics cdd we...
Make better drug discovery decisions through collaborative analytics   cdd we...Make better drug discovery decisions through collaborative analytics   cdd we...
Make better drug discovery decisions through collaborative analytics cdd we...
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 
Best Practices for Building an End-to-End Workflow for Microbial Genomics
 Best Practices for Building an End-to-End Workflow for Microbial Genomics Best Practices for Building an End-to-End Workflow for Microbial Genomics
Best Practices for Building an End-to-End Workflow for Microbial Genomics
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
In-silico Drug designing
In-silico Drug designing In-silico Drug designing
In-silico Drug designing
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
 
Legochem Company ppt
Legochem Company pptLegochem Company ppt
Legochem Company ppt
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Genentech)

Editor's Notes

  1. *** possible joke: This is the second time that I’ve been to the “First World Conference” for a ground breaking product. You don’t get this chance very often. The first time was 18 years ago for a product that some of you may have heard of: Java. My name is Doug Garrett. I’m a software engineer in the Bioinformatics and Computational Biology department of Genentech Research and Early Develop. Since that’s quite a mouthful I’ll just refer to them as gRED and Bioinformatics. Genentech was the first biotech company – the first company to produce drugs, such as insulin, from genetically engineered organisms. In 2009 Genentech was purchased by the Swiss pharmaceutical company Roche who wisely decided to keep Genentech Research as a separate group reporting directly to the CEO. *** possible joke: describe cultural clash between a laid back San Francisco academic culture and a Swiss business – 1st time we saw senior management together on the same stage, Genentech wore ties and Roche didn’t.
  2. gRED does basic research into disease mechanisms/causes and then uses those discoveries to develop new drugs. Although major successes have been in Cancer, we are now investigating other areas as well including Neurology – Alzheimer's, Parkinson's Immunology (arthritis and asthma) Metabolism (diabetes) Infectious Diseases (Flu, Hepatitis C)
  3. My customers are the scientists discovering the cause of diseases and then trying to find new drugs for the diseases.
  4. But upper most and most important, the ultimate customers are the patients.
  5. How is being a software engineer in Bioinformatics different from typical software development environment? First– most of the people within Bioinformatics are scientists. But within bioinformatics is a fairly small group of software engineers, such as myself. Software Engineers in Bioinformatics have to speak a different language. Have to understand the terminology AND the underlying science. But ALSO – the need to be flexible and adapt quickly – It’s research Terms used for above word map: Heterozygous, Alleles, Genes, Polymorphism, SNP nucleotide polymorphism PCR polymerase chain reaction, IVF, Cryo, Multiplex, Primer, Probes Genetic Assay, Colony, Congenic, Genome, Backcross, Chimera, Microinjection, hCG chorionic PMSG
  6. I’m going to be discussing a recently completed project which used MongoDB. Hope to expand or extend people's understanding of what Mongodb excels at and under what situations it is best utilized. Many talks discuss MongoDB for “big data” – but that’s not all MongoDB Excels at Flexible schema can speed development and provide system flexibility Most talks I’ve seen also cover MongoDB for new systems – where that’s all that’s used How many of you would have to integrate MongoDB with an existing Relational Database? (stop to ask this question?) In fact though, both Relational Database and MonogDB can co-exist in the same environment There are some simple ways to allow the two to easily work together In many ways the two complement each other And for us MongoDB– Is not just about software It’s about saving lives Many of the people in this room have probably been touched by the death of someone in their family – quite often from cancer In my case, my father died of non-hodgkins lymphoma shortly after I went to work for Genentech so to me I know the importance of speeding up the development of new drugs because… You never know when even a single day will make a major difference in someone’s life.
  7. In our case The flexible schema has helped us reduce the time needed to introduce new lab equipment from months to weeks, or even days This reduced time is not entirely due to MongoDB, but MongoDB plays a key part in the improvement As far as integrating MongoDB with our existing relational database environment, - we did find a very simple way to integrate the two Not completely integrated, not a two-phase commit But Integrated “Enough” AND – it’s simple This allowed us to easily integrate our MongoDB with the existing system, use existing tools geared towards Relational Database While still being able to take advantage of MongoDB’s flexible schema.
  8. This is an oversimplified view of drug development, but it illustrates the importance of mouse genetic models in many cases. Drug research begins with an idea – what is the cause of this disease? If the cause is related to genes we create new mouse genetic models, new genetic strains of mice, which are meant to reflect the underlying disease cause. This mouse genetic model is then used to verify the underlying disease cause.
  9. If verified – move on to trying to discover drugs to address the underlying genetic cause They then test new drugs first on the genetically modified mouse, testing for safety and effectiveness
  10. If safe and effective, only then will they move on to initial clinical trials with humans, although in many cases it’s back to the drawing board. As you can see, the mouse genetic model is an important part of disease research and drug discovery. And Increasingly we’re finding that the underlying genetic cause is much more complex than we thought
  11. Determining disease causes and developing drugs to address those diseases requires genetically engineered mice We support around 500 investigators and in the area of 500 different genetic strains of mice New research requires that we develop in the area of 200 new genetic strains of mice per year *** In most cases you can’t purchase a new genetic strain of an animal – a new Mouse Model Creating a new mouse genetic strain requires genetic testing LOTS of genetic testing – about 700,000 genetic tests per year for us The entire process of developing new genetic animal strains is very complex It requires breeding a number of generations of mice to obtain the desired genetic mutation Today I’ll be covering only the step where we determines if a particular gene is present or not Genetic tests uses a plate of “amplified” DNA a wells for each sample and genetic test We Run that dna sample through one of a variety of lab instruments we use for genetic testing We then load those test results, usually a CSV file, into our database Using these results the investigator can then decide which animals to breed There are different types of tests, different lab instruments – new ones coming out all the time
  12. This has driven demand within one of the departments that I support, The Genetic Analysis Lab. The demands for mouse genetic testing has increased both because: There is the normal growth in research and therefore the number of samples to be tested But in addition, the growing complexity of sample testing is driving this even faster. We now test an average of two different genes instead of just one
  13. In order to keep up with rising demand we needed to update the Genotyping Lab Instruments Originally we had just a 3730 Genetic Test. We loaded a file containing results for a plate Each file had results for one or more wells containing a genetic test
  14. We added a new Genetic Test. From this test we would producesome of the same results information as for the original genetic test But we needed to capture additional and different details for the new genetic test. So in our relational database we created a child tables of PCR Wells. But we still generated the original PCR Well row since that was the integration point with the rest of the system. It took six months to integrate this new genetic test.
  15. We then added a second new genetic test, this one from the same instrument but generating additional data. This required another child table for the new data, and took an additional three months to implement. You can see where this is going… Every new instrument began to add new complexity AND Perhaps more important – it took too long And – the requirement to add new types of genetic tests was expected to increase – driven by the need to increase throughput in the lab in order to keep up with rising demand.
  16. To help address this we had undertaken a redesign of the system As part of the this new design we included a new DB design integrating our Relational Database with MongoDB. We were fortunate that a project for another department had required MongoDB. As a result our oracle dba's were comfortable with supporting mongod, making it easy for us to request a new mongodb database. The key point was to isolate data which we expected to vary for different genetic tests, into a new MongoDB document. For each different type of genetic test we planned to create an instrument specific load process to: Read the CSV File Parse that file into the MongoDB Document Edit, Validate, Preprocess Save the preprocessed data in the MongoDB
  17. The next step in the process, a “Generalized Loader”, would then use certain commonly defined fields within the MongoDB document to load the Relational Database. Now, if we need to add a new genetic test– no time to modify the database schema
  18. From a user perspective, this is how it appears. Most of the data displayed is coming from our relational database. But details within the results which come from MongoDB are combined with the relational DB data by a Java program and then displayed on the User Interface. Currently, the variable data is only needed when the genetic test results are initially being processed, though it will be available if needed. In the future we may perform further analysis on this data and we may also capture more data since that has become so easy - mainly because with MongoDB Flexible Schema we can do this without any programming effort.
  19. This is an actual example – before the new system was even done! The users was “nice” enough to give us an “opportunity” to test out the flexibility of our MongoDB schema While in the middle implementing the data loading for the first time, The user decided we should drop that genetic test and instead load a different, newer genetic test that was just coming online to replace the previous one.
  20. There was Zero impact on our data model – all changes were in the MongoDB Flexible Schema No time required to change the schema Approximate three week impact on project vs previous history of three to six months Mongo’s Flexible Schema was a big help in achieving this. It allowed us to use a new instrument without any changes to the data model.
  21. Luckily this was NOT what going live looked like. It wasn’t a circus. It might have been a cirucus if we hadn’t used MongoDB though. The entire mouse breeding program, which this genetic testing is just a part of, is so important that we maintain a “Disaster Recovery Data Center” which keeps a running copy of the system - ready to take over if our main data center fails. Keeping a second copy of a database is a no brainer in MongoDB – keeping one or two copies of the MongoDB collections is the default configuration for most production MongoDB systems. But if you’ve every tried to do this with Oracle, the product we use, you may find it a much more difficult task. For example, when we went from Oracle 10g to Oracle 11g, somehow the defaults changed and our “disaster recovery” copy ended up being corrupted. Even scarier, we didn’t know the until a number of months later when we ran our yearly “disaster recovery” test and it failed. When we went live with MongoDB though, we reminded our DBAs that we needed a copy of the production database at the backup site. Although they did already have a replica running, they hadn’t set up one in our disaster recovery Data Center. Luckily, because of MongoDB, they were able to set this up in less than hour – something I wouldn’t have tried to do with Oracle.
  22. Next let’s talk about synchronizing our relational DB with the MongoDB. How do we maintain consistency between the Relational DB and MongoDB? Whenever you join two databases together you run into issues regarding keeping the two “synchronized”. Often this requires a complex two phase commit or similar mechanism. In our case we always insert the complete MongoDB document first. The MongoDB then contains a standard set of fields which are needed to define the genetic test results and are then used to load the relational database. But suppose there is a failure before the Relational Database insert is completed?
  23. Net result: MongoDB Document left in Collection with no corresponding Relational Database table row We considered a “quasi two phase commit” Set document “status” to “in progress” Insert and commit Relational Database Set document “status” to “committed” But then we still had to deal with scripts that clean up after any failure such as finding any MongoDB documents with a status of “in progress” and either setting the status to “committed” if there is a corresponding Relational Database row, or deleting the document if there wasn’t. But the question was: why bother? Who cares if there is an “extra” MonogDB document? If we just look at those which have an ID in the Relational Database – we’ll never see extra MongoDB documents Our simple solution makes Relational Database the DB of record and lets it handle the transaction management, something it does quite well. If an ID isn’t in the Relational Database, it doesn’t exists, as far as we’re concerned If we ever begin to go against MongoDB Directly, we can write a simple “clean up” script to delete any orphan documents But for now, we just ignore them doesn’t cause a problem Won’t happen often Not as if we have to worry about the MongoDB size The main objective is keeping It Simple
  24. If at some future date we did go directly against the MongoDB and needed to clean up the “orphan” MongDB Documents there are various ways we could handle this. Here’s just one example of how it might be done. There are many other simple ways to do this though. In this case we simply need to mark those documents where we do have a corresponding Relational Database Row, And with a single delete command we can delete what doesn’t have a row in the relational database. The point is that there are a number of simple ways to correct this problem, in the rare case that it even happens.
  25. Now that we’re live we’re realizing that **IF** we could easily do so, it might be nice to load some additional data that is available from the instrument. In the past we avoided this because we’d have to add new columns to the Relational Database schema. But many lab instruments often allow users to specify additional data elements they want in the CSV file we use to load the results.
  26. The current CSV load program always looks for a known set of fields to load into the MongoDB document, These fields must be at the beginning of each row But in fact our load program will also load any other fields added onto the end of each row.
  27. As long as the beginning of each “row” of the CSV File is what we expect, we can parse and save any additional comma separated values into the MongoDB document without programming changes. Here again the MongoDB flexible schema allows us to do things which would otherwise be difficult to support in a relational database.
  28. You often can’ tell how useful the data might be until you collect it and examine it. With MongoDB’s flexible schema it becomes very easy to collect this additional data at low or no cost, providing the luxury of collecting much more than you might otherwise. So why not collect as much as you can? It’s inexpensive It’s easy
  29. As a result we may one day start to analyze some of that additional data –access additional lab instrument specific detailed data which would otherwise be difficult to obtain. You never know what you’ll find. How the information can be used to improve the process Improve accuracy? Spot problems before they occur? Who knows what else… Until you capture the data and take a look at it – you never know what you’ll find. And with MongoDB you lower the barrier so much, that it becomes easy to collect all the data you’d ever want.
  30. This is made even easier by new Aggregation Framework capabilities which have removed some of the previous resource limitations of the framework. If you want to find out more about the MongoDB Aggregation Framework Including major revisions included in the April MongoDB Release, 2.6 Removes the 16MB limit for aggregation pipeline results Provides the option to removes limits for intermediate result set sizes Allowing you to save intermediate results on disk Chapter 6 of the soon to be released 2nd Edition of MongoDB in Action will cover this. Please use code mdbdgcf  for 44% off MongoDB in Action, 2e  (all formats) for all attendees. Please also give away a free MEAP and send us the winners name and email. The book is scheduled to be released this summer but Manning has an early access program which will allow you to read the chapter when it is completed – which should soon.
  31. There are other future possibilities for MongoDB in our department, Bioinformatics, as well. While conducting an internal review of this project (BTSC), the possibilities enabled by MongoDB flexible schema started others thinking about additional ways we could leverage it. One idea was to use MongoDB to help in dealing with different formats of data arriving from a variety of sources. (actually Jan’s idea) If nothing else, MongoDB could provide a common and flexible access method for programs which need to process these data. It could also provide a common place to first store and then curate the data, if we need to do any preprocessing or validation We could then use the results to load a Relational Database, or even process it directly from MongoDB with either the aggregation framework or other languages which have MongoDB adapters, such as R. MongoDB’s flexible schema as well as easy access makes it a natural tool for this use.
  32. So – as you can see MongoDB is not just about big data The flexible schema can speed development and provide system flexibility In our case, just for the genetic testing system, We’ve reduced the time to introduce some new lab equipment from months to weeks And we can actually capture some new instrument data without any programming changes And again - MongoDB is not just for new systems where you don’t need to integrate with existing Relational Database: We found a very simple way to integrate the two Not completely, integrated But Integrated “Enough” “Eventually” as consistent as needed And you never know when a single day will make a big difference in someone’s life.
  33. As we’ve seen, MongoDB does help us integrate new genetic tests faster, which in turn can help reduce drug development time. In closing I wanted to share a personal story, one that helps motivate me to do things faster. “You better do it fast” was the punch line from the last joke my father ever made. He died of cancer shortly after I went to work for Genentech. A few weeks before he died I had told him that I was joining this great company, Genentech, and that we were researching cures for cancer. He smiled, laughed and said “You better do it fast” With the help of MongoDB we’ve reduced the time needed to introduce new genetic tests. And you never know when even a single day will make a major difference in someone’s life.