Using MongoDB for Materials Discovery


Published on

How the Materials Project uses MongoDB

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Using MongoDB for Materials Discovery

  1. 1. Using MongoDB forMaterials Discovery Michael Kocher and Dan Gunter Lawrence Berkeley National Lab
  2. 2. Energy Mission at LBNL• Li-ion Batteries• Photovoltaic (Solar Cells)• Thermoelectrics• Biofuels• New Computational Tools• Cutting edge Spectroscopic Tools (Advanced Light Source)
  3. 3. Current Material Design model is Slow18 Years... from the averagenew materials discovery tocommercialization Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.
  4. 4. Materials Genome Initiative: A Renaissance of American Manufacturing “To help businesses discover, develop, and deploy new materials twice as fast, were launching what we call the Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace. We can do it faster.” - President Obama at Carnegie Mellon University 6/24/2011
  5. 5. What is a Material?
  6. 6. NaCl Silicon
  7. 7. LiCoO2 Li O Co
  8. 8. What can we Compute using quantum mechanics? volume density total energy + formation energy metallic? etc... No empirical parameters!
  9. 9. ‘The Google of Material Science Data” + MIT and LBNL collaboration
  10. 10. Inverting the Problem
  11. 11. Detailed Properties
  12. 12. Machine Learning How often can you Structure 1 substitute Mg for Ca? Structure 2 (new materials) Structure 3 Structure 4materials.bson Learning Structure 5 Algorithm Structure 6 What about Na, V, P, O? Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503)
  13. 13. Materials Project: A Play in Three ActsI.Data generation using HTCII. Data storageIII.Data analysis/logging
  14. 14. Act I: Managing Calculations• Centralized distributed model is the only way to go• Hub is at LBNL• Store the state in db• Overview of running many MPI jobs at many different HP centers
  15. 15. MasterQueue create a new engine, add to queue pull crystal builder.x master_queue.bson ‘The Brain’ manager.x manager.x manager.x manager.x manager.xHPC Franklin Hopper Carver lr1 lr2 NERSC Lawrencium (Oakland) (Berkeley)
  16. 16. Centralized LoggingExample MongoDB and Managementmanager.x manager.x manager.x manager.x manager.x manager.x manager.x manager.x O1 Cathode Hopper Franklin Carver lr1 lr2 DLX MIT NERSC (Oakland) LBNL Kentucky query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
  17. 17. Act II :Core Data storage
  18. 18. Very Complex Documents
  19. 19. Powerful QueryingEvery crystal that has (Li or Na or K), (Mn), (O or S or F or Si)plus one other element except (Zn or Ni or Fe or Cu or Co){ "lattice.volume" : { "$lt" : 500 }, "elements" : {"$all" : [Mn],"$size" : 4, “$nin”:[Zn,Ni,Fe,Cu,Co]}, "atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} }, "$where" : "match_all( this.element_names, [Li, Na, K], [Mn], [O, S, F, Si])" }
  20. 20. pre-MongoDB :(((SELECT structure.structureid FROM structure NATURAL INNER JOINdatabase NATURAL INNER JOIN databaseentry WHERE structureid IN((select structure.structureid from structure NATURAL INNER JOINelemententry where elemententry.symbol=Li INTERSECT selectstructure.structureid from structure NATURAL INNER JOIN elemententrywhere elemententry.symbol=O) INTERSECT select structure.structureidfrom structure NATURAL INNER JOIN database NATURAL INNER JOINdatabaseentry where database.title=ICSD)) EXCEPT (SELECTstructure.structureid FROM structure where structure.entryid IN(select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECTstructure.structureid FROM structure where structure.entryid IN(select entryid from removals))Search for materials with Li and O, excluding duplicates
  21. 21. Map/Reduce Calculation 12 Calculation 13 ✓ Calculation 14 Calculation 15 MRtasks.bson materials.bson
  22. 22. Every App uses MongoDB structure_predictors.bson candidate_materials.bson diffraction_patterns.bson by G. Hautier
  23. 23. Structure Predictor
  24. 24. Diffraction Pattern
  25. 25. Act III:Analytics and Logging
  26. 26. Rich Error Analysis Experimental Calculated
  27. 27. Integrated logging just makes sense• Semi-structured data easily stored• Can correlate with all other data• Automation Layer: Failed tasks• Web/App Layer
  28. 28. Conclusions• MongoDB is a very versatile tool• Used in several different cases• Elegant query syntax• Very useful for scientific data storage• A lot of exciting future ideas
  29. 29. Acknowledgements
  30. 30. Thanks!