MongoDB and research Jan Aerts, PhD Wellcome Trust Sanger Institute Hinxton, UK [email_address] @jandot
Disclaimer 1
Disclaimer 2
Acknowledgments MongoDB community Karen Ambrose 10gen
 
transcriptomics genomics proteomics *omics
transcriptomics genomics proteomics *omics instantiationomics metabolomics spliceomics interactomics metallomics lipidomic...
Academia != industry
heterogeneous systems
transitory
little optimization
slow adoption of new technology (don't break anything that works)
data management = afterthought money
Who are the players?
<ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists Drawings by Mor...
<ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists heavy investmen...
<ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists little investme...
<ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists use whatever ev...
The data landscape
1. Flat text files <ul><li>LOCUS      SCU49845 5028 bp DNA PLN 21-JUN-1999 </li></ul><ul><li>DEFINITION Saccharomyces cere...
1. Flat text files <ul><li>LOCUS      SCU49845 5028 bp DNA PLN 21-JUN-1999 </li></ul><ul><li>DEFINITION Saccharomyces cere...
1. Flat text files <ul><li>##format=VCFv1 </li></ul><ul><li>##fileDate=20090805 </li></ul><ul><li>##source=myImputationPro...
1. Flat text files <ul><li>##format=VCFv1 </li></ul><ul><li>##fileDate=20090805 </li></ul><ul><li>##source=myImputationPro...
1. Flat text files <ul><li>##format=PCFv1 </li></ul><ul><li>##fileDate=20090805 </li></ul><ul><li>##source=myImputationPro...
2. Binary compressed flat files <ul><ul><li>One experiment  </li></ul></ul><ul><ul><ul><li>=> One datafile as text: 40-70G...
3. MySQL and Oracle Curated data Meta-data Raw data: BLOBs <ul><li>Sequencing: </li></ul><ul><ul><li>>6 TB/week and growin...
4. AceDB -  A   C aenorhabditis  e legans  d ata b ase object-oriented Author &quot;Patel B&quot;  Full_name &quot;Bala Pa...
 
Challenges in *omics - Where can MongoDB play a role?
explosion of data every  researcher must be able to handle data
low stepping stone for bench-based scientists big data
 
Takeoff within research community? <ul><ul><li>widespread? </li></ul></ul><ul><ul><ul><li>Cannot manage all data in-house ...
Thank you! Questions? [email_address] @jandot http://saaientist.blogspot.com
Upcoming SlideShare
Loading in …5
×

MongoDB and research

2,208 views
2,125 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,208
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Not an expert. Reason will be explained further in the presentation.
  • * personal ideas/opinions; not necessarily Sanger’s
  • My background
  • there are hurdles for adoption in academia
  • Many people in institute; many ways of doing things + many tools
  • Data is often transitory. Apart from the raw sequencing data (served by e.g. EBI): data can often be archived once paper is written.
  • Because transitory. A one-off script that takes 5 minutes to write and a day to run is often preferable to one that takes a day to write and 5 minutes to run.
  • Because we want to focus on the research, not the tools. If the available tools get the work done they will suffice.
  • In many smaller labs: data management is not part of the initial grants. Is starting to change with the next-generation sequencing data.
  • “ Genome hacker”: very broad. From guy-who-knows-how-to-record-macros-in-Word to hardcore mathematicians.
  • &amp;quot;need IT support for heavier work&amp;quot;: set up MongoDB server =&gt; what if need sharded cluster? =&gt; investment from IT &amp;quot;creating legacy&amp;quot;: if it&apos;s something that will be used after you&apos;re gone (typical contract: postdoc = 3-5 yrs), you don&apos;t want to use a technology that is not supported or actively used within the organization “ often self-taught”!!!
  • “ normalization?”: Overkill to try and persuade them to use databases if you have to teach them normal forms.
  • What does the data look like?
  • Very difficult to parse without custom libraries (bio*)
  • “ //” =&gt; start of new record
  • State of the art. Is tab-delimited, but not really.
  • “ ##”: header “ #”: column headers INFO field: ‘;’-separated tag-value pairs (themselves separated with a ‘=‘) FORMAT field: necessary to know what is in the NA00001 column; colon-separated
  • Not really tab-delimited anymore because too structured Self-taught =&gt; simple scripting languages!
  • New technologies + existing technologies improved + decreasing cost of data generation
  • Would benefit most. &amp;quot;bench-based scientists&amp;quot;: - are more and more learning perl and working with tab-delimited files - to go from Exel to database: json looks more like how they think than having to cope with normalization steps in a relational database “ big data”: auto-sharding, mapreduce, …
  • In-road into research: via department bioinformatician: constantly looking for new things Least effort of implementing and least costly if failure
  • Focus is often on data-exchange =&gt; a lot of effort on exchange file formats
  • MongoDB and research

    1. 1. MongoDB and research Jan Aerts, PhD Wellcome Trust Sanger Institute Hinxton, UK [email_address] @jandot
    2. 2. Disclaimer 1
    3. 3. Disclaimer 2
    4. 4. Acknowledgments MongoDB community Karen Ambrose 10gen
    5. 6. transcriptomics genomics proteomics *omics
    6. 7. transcriptomics genomics proteomics *omics instantiationomics metabolomics spliceomics interactomics metallomics lipidomics orfeomics phenomics histomics
    7. 8. Academia != industry
    8. 9. heterogeneous systems
    9. 10. transitory
    10. 11. little optimization
    11. 12. slow adoption of new technology (don't break anything that works)
    12. 13. data management = afterthought money
    13. 14. Who are the players?
    14. 15. <ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists Drawings by Morag Ann Lewis
    15. 16. <ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists heavy investment in infrastructure/pipelines data exchange => standards!
    16. 17. <ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists little investment in infrastructure little time/effort for optimization one-off getting it done creating legacy need IT support for heavier work often self-taught
    17. 18. <ul><li>large genome/data centers </li></ul>genome hackers (lone bioinformaticians) bench-based scientists use whatever everyone else is using &quot;normalization?&quot;
    18. 19. The data landscape
    19. 20. 1. Flat text files <ul><li>LOCUS      SCU49845 5028 bp DNA PLN 21-JUN-1999 </li></ul><ul><li>DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)  </li></ul><ul><li>           and Rev7p (REV7) genes, complete cds. </li></ul><ul><li>VERSION    U49845.1 GI:1293613 KEYWORDS . SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM   Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina;  </li></ul><ul><li>           Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. </li></ul><ul><li>REFERENCE  1 (bases 1 to 5028) </li></ul><ul><li>AUTHORS    Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. </li></ul><ul><li>TITLE      Cloning and sequence of REV7, a gene whose function is required for DNA  </li></ul><ul><li>           damage-induced mutagenesis in Saccharomyces cerevisiae </li></ul><ul><li>JOURNAL    Yeast 10 (11), 1503-1509 (1994) </li></ul><ul><li>PUBMED     7871890 </li></ul><ul><li>FEATURES   Location/Qualifiers </li></ul><ul><li>   gene     687..3158 </li></ul><ul><li>             /gene=&quot;AXL2&quot; gene complement(3300..4037) </li></ul><ul><li>             /gene=&quot;REV7&quot; </li></ul><ul><li>ORIGIN       1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg </li></ul><ul><li>             61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct </li></ul><ul><li>           121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa </li></ul><ul><li>           181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg </li></ul><ul><li>           241 ccacactgtc attattataa ttagaaacag aacgcaaaaa ttatccacta tataattcaa </li></ul><ul><li>           301 agacgcgaaa aaaaaagaac aacgcgtcat agaacttttg gcaattcgcg tcacaaataa </li></ul><ul><li>            361 attttggcaa cttatgtttc ctcttcgagc agtactcgag ccctgtctca agaatgtaat </li></ul><ul><li>           421 aatacccatc gtaggtatgg ttaaagatag catctccaca acctc... </li></ul><ul><li>// </li></ul><ul><li>LOCUS      SCU49845 5028 bp DNA PLN 21-JUN-1999 </li></ul><ul><li>DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)  </li></ul><ul><li>           and Rev7p (REV7) ... </li></ul>
    20. 21. 1. Flat text files <ul><li>LOCUS      SCU49845 5028 bp DNA PLN 21-JUN-1999 </li></ul><ul><li>DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)  </li></ul><ul><li>           and Rev7p (REV7) genes, complete cds. </li></ul><ul><li>VERSION    U49845.1 GI:1293613 KEYWORDS . SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM   Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina;  </li></ul><ul><li>           Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. </li></ul><ul><li>REFERENCE  1 (bases 1 to 5028) </li></ul><ul><li>AUTHORS    Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. </li></ul><ul><li>TITLE      Cloning and sequence of REV7, a gene whose function is required for DNA  </li></ul><ul><li>           damage-induced mutagenesis in Saccharomyces cerevisiae </li></ul><ul><li>JOURNAL    Yeast 10 (11), 1503-1509 (1994) </li></ul><ul><li>PUBMED     7871890 </li></ul><ul><li>FEATURES   Location/Qualifiers </li></ul><ul><li>   gene     687..3158 </li></ul><ul><li>             /gene=&quot;AXL2&quot; gene complement(3300..4037) </li></ul><ul><li>             /gene=&quot;REV7&quot; </li></ul><ul><li>ORIGIN       1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg </li></ul><ul><li>             61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct </li></ul><ul><li>           121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa </li></ul><ul><li>           181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg </li></ul><ul><li>           241 ccacactgtc attattataa ttagaaacag aacgcaaaaa ttatccacta tataattcaa </li></ul><ul><li>           301 agacgcgaaa aaaaaagaac aacgcgtcat agaacttttg gcaattcgcg tcacaaataa </li></ul><ul><li>            361 attttggcaa cttatgtttc ctcttcgagc agtactcgag ccctgtctca agaatgtaat </li></ul><ul><li>           421 aatacccatc gtaggtatgg ttaaagatag catctccaca acctc... </li></ul><ul><li>// </li></ul><ul><li>LOCUS      SCU49845 5028 bp DNA PLN 21-JUN-1999 </li></ul><ul><li>DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)  </li></ul><ul><li>           and Rev7p (REV7) ... </li></ul>
    21. 22. 1. Flat text files <ul><li>##format=VCFv1 </li></ul><ul><li>##fileDate=20090805 </li></ul><ul><li>##source=myImputationProgramV3.1 </li></ul><ul><li>##reference=1000GenomesPilot-NCBI36 </li></ul><ul><li>##phasing=partial </li></ul><ul><li>#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  NA00001 </li></ul><ul><li>1   967433 . G A   151.43  0   AB=0.42;AC=1               GT:DP:GQ  1/0:11:99.00 </li></ul><ul><li>1   970323 . G A   492.61  0   AB=0.41;AC=1;AF=0.50       GT:DP:GQ     1/0:28:99.00 </li></ul><ul><li>1   970950 . A G  1287.90  0   AB=0.55;AC=1;AF=0.50       GT:DP:GQ     0/1:108:99.00 </li></ul><ul><li>1  972804 . T C   210.56  0   AB=0.53;AC=1;AF=0.50  GT:DP:GQ     1/0:13:99.00 </li></ul><ul><li>1  972857 . T C   846.18  0   AB=0.53;AC=1;AF=0.50;AN=2  GT:DP:GQ     1/0:58:99.00 </li></ul><ul><li>1   974165 . T C   810.47  0   AB=0.38;AC=1;AF=0.50;AN=2  GT:DP:GQ     1/0:6:67.05 </li></ul><ul><li>1   977063 . C T  1110.31  0   AB=0.50;AC=1;AF=0.50;AN=2  GT:DP:GQ  0/1:67:99.00 </li></ul><ul><li>1  1006892 . C G    62.39  SF  AC=2;AF=1.00;AN=2          GT:DP:GQ     1/1:2:6.02 </li></ul><ul><li>1  1148494 . A G  5237.88  0   AC=2;AF=1.00;AN=2   GT:DP:GQ     1/1:160:99.00 </li></ul><ul><li>1  1149380 . T C   165.10  0   AC=2;AF=1.00;AN=2          GT:DP:GQ   1/1:6:18.05 </li></ul><ul><li>1  1212553 . C T   426.61  0   AB=0.26;AC=1;AF=0.50;AN=2  GT:DP:GQ  0/1:18:99.00 </li></ul><ul><li>1  1235867 . A G  1158.08  0   AC=2;AF=1.00;AN=2          GT:DP:GQ  1/1:30:90.28 </li></ul><ul><li>1  1237357 . T C    142.01  0   AC=2;AF=1.00;AN=2          GT:DP:GQ  1/1:5:15.04 </li></ul><ul><li>1  1239050 . G A 13952.03  0   AC=2;AF=1.00;AN=2          GT:DP:GQ     1/1:340:99.00 </li></ul><ul><li>20 14370 . G A       29  0   NS=58;DP=258;AF=0.786      GT:GQ:DP:HQ  0|0:48:1:51,51 </li></ul><ul><li>20 13330 . T A        3  q10 NS=55;DP=202;AF=0.024      GT:GQ:DP:HQ  0|0:49:3:58,50 </li></ul><ul><li>20 1110696 . A G,T     67  0   AF=0.421,0.579;AA=T;DB     GT:GQ:DP:HQ  1|2:21:6:23,27 </li></ul><ul><li>20 10237 . T .       47  0   NS=57;DP=257;AA=T          GT:GQ:DP:HQ  0|0:54:7:56,60 </li></ul><ul><li>... </li></ul>
    22. 23. 1. Flat text files <ul><li>##format=VCFv1 </li></ul><ul><li>##fileDate=20090805 </li></ul><ul><li>##source=myImputationProgramV3.1 </li></ul><ul><li>##reference=1000GenomesPilot-NCBI36 </li></ul><ul><li>##phasing=partial </li></ul><ul><li>#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  NA00001 </li></ul><ul><li>1   967433 . G A   151.43  0   AB=0.42;AC=1               GT:DP:GQ  1/0:11:99.00 </li></ul><ul><li>1   970323 . G A   492.61  0   AB=0.41;AC=1;AF=0.50       GT:DP:GQ     1/0:28:99.00 </li></ul><ul><li>1   970950 . A G  1287.90  0   AB=0.55;AC=1;AF=0.50       GT:DP:GQ     0/1:108:99.00 </li></ul><ul><li>1  972804 . T C   210.56  0   AB=0.53;AC=1;AF=0.50  GT:DP:GQ     1/0:13:99.00 </li></ul><ul><li>1  972857 . T C   846.18  0   AB=0.53;AC=1;AF=0.50;AN=2  GT:DP:GQ     1/0:58:99.00 </li></ul><ul><li>1   974165 . T C   810.47  0   AB=0.38;AC=1;AF=0.50;AN=2  GT:DP:GQ     1/0:6:67.05 </li></ul><ul><li>1   977063 . C T  1110.31  0   AB=0.50;AC=1;AF=0.50;AN=2  GT:DP:GQ  0/1:67:99.00 </li></ul><ul><li>1  1006892 . C G    62.39  SF  AC=2;AF=1.00;AN=2          GT:DP:GQ     1/1:2:6.02 </li></ul><ul><li>1  1148494 . A G  5237.88  0   AC=2;AF=1.00;AN=2   GT:DP:GQ     1/1:160:99.00 </li></ul><ul><li>1  1149380 . T C   165.10  0   AC=2;AF=1.00;AN=2          GT:DP:GQ   1/1:6:18.05 </li></ul><ul><li>1  1212553 . C T   426.61  0   AB=0.26;AC=1;AF=0.50;AN=2  GT:DP:GQ  0/1:18:99.00 </li></ul><ul><li>1  1235867 . A G  1158.08  0   AC=2;AF=1.00;AN=2          GT:DP:GQ  1/1:30:90.28 </li></ul><ul><li>1  1237357 . T C    142.01  0   AC=2;AF=1.00;AN=2          GT:DP:GQ  1/1:5:15.04 </li></ul><ul><li>1  1239050 . G A 13952.03  0   AC=2;AF=1.00;AN=2          GT:DP:GQ     1/1:340:99.00 </li></ul><ul><li>20 14370 . G A       29  0   NS=58;DP=258;AF=0.786      GT:GQ:DP:HQ  0|0:48:1:51,51 </li></ul><ul><li>20 13330 . T A        3  q10 NS=55;DP=202;AF=0.024      GT:GQ:DP:HQ  0|0:49:3:58,50 </li></ul><ul><li>20 1110696 . A G,T     67  0   AF=0.421,0.579;AA=T;DB     GT:GQ:DP:HQ  1|2:21:6:23,27 </li></ul><ul><li>20 10237 . T .       47  0   NS=57;DP=257;AA=T          GT:GQ:DP:HQ  0|0:54:7:56,60 </li></ul><ul><li>... </li></ul>
    23. 24. 1. Flat text files <ul><li>##format=PCFv1 </li></ul><ul><li>##fileDate=20090805 </li></ul><ul><li>##source=myImputationProgramV3.1 </li></ul><ul><li>##reference=1000GenomesPilot-NCBI36 </li></ul><ul><li>##phasing=partial </li></ul><ul><li>#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  NA00001 </li></ul><ul><li>1   967433 . G A   151.43  0   AB=0.42;AC=1               GT:DP:GQ  1/0:11:99.00 </li></ul><ul><li>1   970323 . G A   492.61  0   AB=0.41;AC=1;AF=0.50       GT:DP:GQ     1/0:28:99.00 </li></ul><ul><li>1   970950 . A G  1287.90  0   AB=0.55;AC=1;AF=0.50       GT:DP:GQ     0/1:108:99.00 </li></ul><ul><li>1  972804 . T C   210.56  0   AB=0.53;AC=1;AF=0.50  GT:DP:GQ     1/0:13:99.00 </li></ul><ul><li>1  972857 . T C   846.18  0   AB=0.53;AC=1;AF=0.50;AN=2  GT:DP:GQ     1/0:58:99.00 </li></ul><ul><li>1   974165 . T C   810.47  0   AB=0.38;AC=1;AF=0.50;AN=2  GT:DP:GQ     1/0:6:67.05 </li></ul><ul><li>1   977063 . C T  1110.31  0   AB=0.50;AC=1;AF=0.50;AN=2  GT:DP:GQ  0/1:67:99.00 </li></ul><ul><li>1  1006892 . C G    62.39  SF  AC=2;AF=1.00;AN=2          GT:DP:GQ     1/1:2:6.02 </li></ul><ul><li>1  1148494 . A G  5237.88  0   AC=2;AF=1.00;AN=2   GT:DP:GQ     1/1:160:99.00 </li></ul><ul><li>1  1149380 . T C   165.10  0   AC=2;AF=1.00;AN=2          GT:DP:GQ   1/1:6:18.05 </li></ul><ul><li>1  1212553 . C T   426.61  0   AB=0.26;AC=1;AF=0.50;AN=2  GT:DP:GQ  0/1:18:99.00 </li></ul><ul><li>1  1235867 . A G  1158.08  0   AC=2;AF=1.00;AN=2          GT:DP:GQ  1/1:30:90.28 </li></ul><ul><li>1  1237357 . T C    142.01  0   AC=2;AF=1.00;AN=2          GT:DP:GQ  1/1:5:15.04 </li></ul><ul><li>1  1239050 . G A 13952.03  0   AC=2;AF=1.00;AN=2          GT:DP:GQ     1/1:340:99.00 </li></ul><ul><li>20 14370 . G A       29  0   NS=58;DP=258;AF=0.786      GT:GQ:DP:HQ  0|0:48:1:51,51 </li></ul><ul><li>20 13330 . T A        3  q10 NS=55;DP=202;AF=0.024      GT:GQ:DP:HQ  0|0:49:3:58,50 </li></ul><ul><li>20 1110696 . A G,T     67  0   AF=0.421,0.579;AA=T;DB     GT:GQ:DP:HQ  1|2:21:6:23,27 </li></ul><ul><li>20 10237 . T .       47  0   NS=57;DP=257;AA=T          GT:GQ:DP:HQ  0|0:54:7:56,60 </li></ul><ul><li>... </li></ul><ul><ul><li>perl </li></ul></ul><ul><ul><li>java </li></ul></ul><ul><ul><li>python </li></ul></ul><ul><ul><li>ruby </li></ul></ul><ul><ul><li>“ tab-delimited” is king </li></ul></ul>
    24. 25. 2. Binary compressed flat files <ul><ul><li>One experiment </li></ul></ul><ul><ul><ul><li>=> One datafile as text: 40-70Gb </li></ul></ul></ul><ul><ul><ul><li>=> Compressed to 11-20Gb </li></ul></ul></ul><ul><ul><li>Toolkits to access data (and generate tab-delimited) </li></ul></ul><ul><ul><li>C </li></ul></ul><ul><ul><li>java </li></ul></ul>
    25. 26. 3. MySQL and Oracle Curated data Meta-data Raw data: BLOBs <ul><li>Sequencing: </li></ul><ul><ul><li>>6 TB/week and growing… </li></ul></ul><ul><li>Departmental project: </li></ul><ul><ul><li>40 individuals x 42mio datapoints/individual </li></ul></ul><ul><ul><li>=> joins? </li></ul></ul>Denormalized copy
    26. 27. 4. AceDB - A C aenorhabditis e legans d ata b ase object-oriented Author &quot;Patel B&quot;  Full_name &quot;Bala Patel&quot;  Laboratory CB  Paper [cgc1011]  Paper [cgc533]  Mail &quot;Laboratory of Molecular Biology&quot;  Mail &quot;Hills Road, Cambridge&quot;  Fax &quot;050 3456789&quot;    Paper [cgc533]  Title &quot;Yet more of those Genes&quot;  Journal &quot;Cell Reports&quot;  Volume 3  Year 1993
    27. 29. Challenges in *omics - Where can MongoDB play a role?
    28. 30. explosion of data every researcher must be able to handle data
    29. 31. low stepping stone for bench-based scientists big data
    30. 33. Takeoff within research community? <ul><ul><li>widespread? </li></ul></ul><ul><ul><ul><li>Cannot manage all data in-house <= data exchange! </li></ul></ul></ul><ul><ul><ul><li>=> focus more on file formats than on technology </li></ul></ul></ul><ul><ul><li>smaller scale </li></ul></ul><ul><ul><ul><li>Implement MongoDB for </li></ul></ul></ul><ul><ul><ul><ul><li>* local storage and querying (load file from standard file format into custom DB) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>* encourage non-informaticians to use MongoDB </li></ul></ul></ul></ul>
    31. 34. Thank you! Questions? [email_address] @jandot http://saaientist.blogspot.com

    ×