SlideShare a Scribd company logo
MongoDB and research Jan Aerts, PhD Wellcome Trust Sanger Institute Hinxton, UK [email_address] @jandot
Disclaimer 1
Disclaimer 2
Acknowledgments MongoDB community Karen Ambrose 10gen
 
transcriptomics genomics proteomics *omics
transcriptomics genomics proteomics *omics instantiationomics metabolomics spliceomics interactomics metallomics lipidomics orfeomics phenomics histomics
Academia != industry
heterogeneous systems
transitory
little optimization
slow adoption of new technology (don't break anything that works)
data management = afterthought money
Who are the players?
[object Object],genome hackers (lone bioinformaticians) bench-based scientists Drawings by Morag Ann Lewis
[object Object],genome hackers (lone bioinformaticians) bench-based scientists heavy investment in infrastructure/pipelines data exchange => standards!
[object Object],genome hackers (lone bioinformaticians) bench-based scientists little investment in infrastructure little time/effort for optimization one-off getting it done creating legacy need IT support for heavier work often self-taught
[object Object],genome hackers (lone bioinformaticians) bench-based scientists use whatever everyone else is using "normalization?"
The data landscape
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
2. Binary compressed flat files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
3. MySQL and Oracle Curated data Meta-data Raw data: BLOBs ,[object Object],[object Object],[object Object],[object Object],[object Object],Denormalized copy
4. AceDB -  A   C aenorhabditis  e legans  d ata b ase object-oriented Author "Patel B"  Full_name "Bala Patel"  Laboratory CB  Paper [cgc1011]  Paper [cgc533]  Mail "Laboratory of Molecular Biology"  Mail "Hills Road, Cambridge"  Fax "050 3456789"    Paper [cgc533]  Title "Yet more of those Genes"  Journal "Cell Reports"  Volume 3  Year 1993
 
Challenges in *omics - Where can MongoDB play a role?
explosion of data every  researcher must be able to handle data
low stepping stone for bench-based scientists big data
 
Takeoff within research community? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thank you! Questions? [email_address] @jandot http://saaientist.blogspot.com

More Related Content

Similar to MongoDB and research

リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factoryリアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
Ryosuke Otsuya
 
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Borlaug Global Rust Initiative
 
Sales Ranking Results
Sales Ranking ResultsSales Ranking Results
Sales Ranking ResultsWill Liang
 
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET Journal
 
Jardim bot2010 jc
Jardim bot2010 jcJardim bot2010 jc
Jardim bot2010 jc
jhcapelo
 
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
AUTHELECTRONIC
 
Waterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - UpdatedWaterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - UpdatedJason Rota
 
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V NewOriginal MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
AUTHELECTRONIC
 
Girth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniquesGirth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniques
Kingston Rivington
 
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...Komandur Sunder Raj, P.E.
 
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 NewOriginal IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
authelectroniccom
 
Appendix b structural steel design based on allowable stress
Appendix  b structural steel design based on allowable stressAppendix  b structural steel design based on allowable stress
Appendix b structural steel design based on allowable stressChhay Teng
 
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
Hideki Takase
 
12 sar ergen medeeleh.
12 sar ergen medeeleh.12 sar ergen medeeleh.
12 sar ergen medeeleh.
Shijee Mtsbolor
 
Combustion Turbine Efficiency Impact
Combustion Turbine Efficiency ImpactCombustion Turbine Efficiency Impact
Combustion Turbine Efficiency ImpactKatherine Corcoran
 
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
IBM Analytics Japan
 
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationSite directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Tyler Liang
 
Original Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronics
Original Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronicsOriginal Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronics
Original Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronics
AUTHELECTRONIC
 
16 s rdna based microbial identification f
16 s rdna based microbial identification f16 s rdna based microbial identification f
16 s rdna based microbial identification f
Aman Kumar
 

Similar to MongoDB and research (20)

リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factoryリアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
 
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
 
Sales Ranking Results
Sales Ranking ResultsSales Ranking Results
Sales Ranking Results
 
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
 
Jardim bot2010 jc
Jardim bot2010 jcJardim bot2010 jc
Jardim bot2010 jc
 
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
 
Waterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - UpdatedWaterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - Updated
 
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V NewOriginal MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
 
Girth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniquesGirth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniques
 
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
 
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 NewOriginal IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
 
Appendix b structural steel design based on allowable stress
Appendix  b structural steel design based on allowable stressAppendix  b structural steel design based on allowable stress
Appendix b structural steel design based on allowable stress
 
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
 
12 sar ergen medeeleh.
12 sar ergen medeeleh.12 sar ergen medeeleh.
12 sar ergen medeeleh.
 
Combustion Turbine Efficiency Impact
Combustion Turbine Efficiency ImpactCombustion Turbine Efficiency Impact
Combustion Turbine Efficiency Impact
 
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
 
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationSite directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
 
Original Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronics
Original Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronicsOriginal Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronics
Original Thyristor TYN825RG 825 25A 800V TO-220 New STMicroelectronics
 
Mcc132 14io1
Mcc132 14io1Mcc132 14io1
Mcc132 14io1
 
16 s rdna based microbial identification f
16 s rdna based microbial identification f16 s rdna based microbial identification f
16 s rdna based microbial identification f
 

More from Jan Aerts

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
Jan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
Jan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
Jan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
Jan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
Jan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
Jan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
Jan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
Jan Aerts
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
Jan Aerts
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
Jan Aerts
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
Jan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
Jan Aerts
 

More from Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

MongoDB and research

Editor's Notes

  1. Not an expert. Reason will be explained further in the presentation.
  2. * personal ideas/opinions; not necessarily Sanger’s
  3. My background
  4. there are hurdles for adoption in academia
  5. Many people in institute; many ways of doing things + many tools
  6. Data is often transitory. Apart from the raw sequencing data (served by e.g. EBI): data can often be archived once paper is written.
  7. Because transitory. A one-off script that takes 5 minutes to write and a day to run is often preferable to one that takes a day to write and 5 minutes to run.
  8. Because we want to focus on the research, not the tools. If the available tools get the work done they will suffice.
  9. In many smaller labs: data management is not part of the initial grants. Is starting to change with the next-generation sequencing data.
  10. “ Genome hacker”: very broad. From guy-who-knows-how-to-record-macros-in-Word to hardcore mathematicians.
  11. "need IT support for heavier work": set up MongoDB server => what if need sharded cluster? => investment from IT "creating legacy": if it's something that will be used after you're gone (typical contract: postdoc = 3-5 yrs), you don't want to use a technology that is not supported or actively used within the organization “ often self-taught”!!!
  12. “ normalization?”: Overkill to try and persuade them to use databases if you have to teach them normal forms.
  13. What does the data look like?
  14. Very difficult to parse without custom libraries (bio*)
  15. “ //” => start of new record
  16. State of the art. Is tab-delimited, but not really.
  17. “ ##”: header “ #”: column headers INFO field: ‘;’-separated tag-value pairs (themselves separated with a ‘=‘) FORMAT field: necessary to know what is in the NA00001 column; colon-separated
  18. Not really tab-delimited anymore because too structured Self-taught => simple scripting languages!
  19. New technologies + existing technologies improved + decreasing cost of data generation
  20. Would benefit most. "bench-based scientists": - are more and more learning perl and working with tab-delimited files - to go from Exel to database: json looks more like how they think than having to cope with normalization steps in a relational database “ big data”: auto-sharding, mapreduce, …
  21. In-road into research: via department bioinformatician: constantly looking for new things Least effort of implementing and least costly if failure
  22. Focus is often on data-exchange => a lot of effort on exchange file formats