SlideShare a Scribd company logo
1 of 19
Download to read offline
Automated Molecular Data Extraction
using Open Babel & ChemSpotlight:
       The Semantic Desktop

         Prof. Geoff Hutchison
         Department of Chemistry
         University of Pittsburgh
         geoffh@pitt.edu


         ACS CINF: Skolnik Symposium
         21 August 2012
         http://hutchison.chem.pitt.edu
“
I can plug my iPod into any
computer and it will recognize
my music and give me all sorts
of metadata: artist, title, type of
music...

Why can’t I read the chemical
metadata off my chemistry files?
                                      ”
— Prof. Henry S. Rzepa (Imperial College)
  Spring 2005 ACS Meeting, San Diego, CA
Pre-History: Chem://Dig


                                Index files, websites
                                Based on Chem MIME
                                Find files on extension
                                Perceive chemistry
                                Database Store
                                Search, Filter
                                Retrieval

    H. Rzepa et al. New J. Chem (2002) 26 p. 656
Open Babel
              Open Babel (Started 2001)
                 Free, open source chemical toolbox
                 Cross-platform: Win, Mac, Linux...
                 Both user-tools & C++ library
                 Interfaces in Python, Perl, Ruby,
                 Java, C#
                 Supports chemistry, bioinformatics,
                 solid-state…
                 100+ file formats and variants

          http://openbabel.org/
    O’Boyle et al. J. Cheminf. 2011, 3:33
Chemical Database?


    1. Some way to store data
         (Organize it)
    2. Index it
    3. Search / filter
    4. Visualize results
ChemSpotlight: Indexing Architecture



                                   ~300 lines
              +                +    of code

  Spotlight       Open Babel

    http://chemspotlight.openmolecules.net/
ChemSpotlight: “Un” Database


      Use the system-wide search database
      No (Visible) Database!
      Index files in-place
      Includes textual data
      (e.g., chemical names, formulas, etc.)
      Multiple retrieval and filtering interfaces
      (i.e., any third-party search tool works)

      http://chemspotlight.openmolecules.net/
So What’s Stored / Perceived
       Formula, mass, SMILES, InChI
       net_sourceforge_openbabel_Formula        =
       C21H36N7O8S

       Fingerprints, number of
        atoms, bonds, residues
       PDB, SDF keywords, properties
       Calculation keywords:
       kMDItemComment                           =
       "Gaussian 09 #n B3LYP/6-31G(d) Opt"

       Calculation results
       (HOMO, LUMO, Dipole Moment)
       net_sourceforge_chemspotlight_DipoleMoment   =
       3.5
ChemSpotlight “Un” Database
ChemSpotlight “Un” Database
How Do We Visualize?

   “QuickLook” previews
   New code ~800 lines
   Generate SDF, PDB, CIF
   (if needed)
   Pass off to ChemDoodle
   Web Components
   Pseudo-3D, interactive JS
   + HTML5
   … or SVG generation
   from Open Babel

             http://web.chemdoodle.com/
Organic Heterojunction Solar Cells



  light
  Transparent Electrode
        +   p-type material
                              Circuit
    -       n-type material
    Reflective Electrode
Organic Heterojunction Solar Cells

                                 ΔE ≥ Exciton Binding Energy                           e-


                                                                           Optical Excitation
  light                                                                            hν
                                        Cathode
  Transparent Electrode                                        Hole
                                                   Electron Conducting                Effective
        +   p-type material
                                                  Conductor Polymer                Heterojunction
                              Circuit
    -       n-type material                       (Nanoparticle)                     Bandgap

    Reflective Electrode                                                  Anode
                                                                                      h+
Pipeline Model for Finding New Molecules

             Monomers
                                       >106
                                     Possible
                                    Structures

                                        Electronic




                                                     ~9 minutes
                                        Properties

                                         Optical
                                        Properties

                                        Synthetic
                                         Score


J Phys Chem C 2011 vol. 115 pp. 16200       ...
Pipeline Model for Finding New Molecules

             Monomers
                                       >106
                                     Possible
                                    Structures

                Fast                    Electronic




                                                     ~9 minutes
             Screening                  Properties

                                         Optical
                                        Properties

                                        Synthetic
               Slower                    Score


J Phys Chem C 2011 vol. 115 pp. 16200       ...
New Genetic Algorithm Approach

      Rather than directly
      driving & wait for
      calc results
      Check Spotlight for
      new results
        “What are top
        HOMO energies?”
      Update GA, generate
      new candidates,
      submit new jobs
Scaling Up the Polymer Solar Search


        S
                                             0


   2nd Gen. Search:
   680 Monomers          LUMO Energy (eV)   −1

   2800+ Fragments
   Search Space:
                                            −2
   500+ million
   oligomers
   ~9 minutes per core                      −3
                                              −9.5   −9.0   −8.5 −8.0 −7.5     −7.0   −6.5
                                                            HOMO Energy (eV)
Take-Home Messages

   “Big Data” is a Big Headache
   ChemSpotlight & Un-Databases Work!
   Keep data as native files w/separate index
   Integrate into user-friendly tools
   Sell to users: “What’s in it for me?”
    Indexing, retrieval
    Improved workflows
Marcus Hanwell
                      Pitt / Kitware




Dr. Noel O’Boyle     Casey Campbell
U.C. Cork, Ireland     Pitt (2010)

More Related Content

Viewers also liked

Trastornos alimenticios.
Trastornos alimenticios.Trastornos alimenticios.
Trastornos alimenticios.
_danielahm
 
Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10
luisberazaarieta
 
Derecho de los pueblos a la auto determinación
Derecho de los pueblos a la  auto determinaciónDerecho de los pueblos a la  auto determinación
Derecho de los pueblos a la auto determinación
Frank Ragol
 
4ª lista de exercícios desenho técnico i
4ª lista de exercícios   desenho técnico i4ª lista de exercícios   desenho técnico i
4ª lista de exercícios desenho técnico i
Marilia Estevao
 

Viewers also liked (20)

Plan de mejora de hoy.com.ec
Plan de mejora de hoy.com.ecPlan de mejora de hoy.com.ec
Plan de mejora de hoy.com.ec
 
InTASC Standards
InTASC StandardsInTASC Standards
InTASC Standards
 
2013 Year End Commercial Real Estate Review
2013 Year End Commercial Real Estate Review2013 Year End Commercial Real Estate Review
2013 Year End Commercial Real Estate Review
 
Trastornos alimenticios.
Trastornos alimenticios.Trastornos alimenticios.
Trastornos alimenticios.
 
The 2015 Tech Roundup
The 2015 Tech RoundupThe 2015 Tech Roundup
The 2015 Tech Roundup
 
Resume New Mitesh
Resume New MiteshResume New Mitesh
Resume New Mitesh
 
07 (ok)mulher encurvada (libertação)
07  (ok)mulher encurvada (libertação)07  (ok)mulher encurvada (libertação)
07 (ok)mulher encurvada (libertação)
 
Crew, Foia, Documents 012829 - 012917
Crew, Foia, Documents 012829 - 012917Crew, Foia, Documents 012829 - 012917
Crew, Foia, Documents 012829 - 012917
 
Styling with CSS
Styling with CSSStyling with CSS
Styling with CSS
 
Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10
 
Impact Outside Academia
Impact Outside AcademiaImpact Outside Academia
Impact Outside Academia
 
Tema 2: secuencias-didacticas
Tema 2: secuencias-didacticasTema 2: secuencias-didacticas
Tema 2: secuencias-didacticas
 
Disrupting the Startup Brogrammer Culture
Disrupting the Startup Brogrammer Culture Disrupting the Startup Brogrammer Culture
Disrupting the Startup Brogrammer Culture
 
Demystifying research impact metrics and library support
Demystifying research impact   metrics and library supportDemystifying research impact   metrics and library support
Demystifying research impact metrics and library support
 
How Secure is Cloud ?
How Secure is Cloud ?How Secure is Cloud ?
How Secure is Cloud ?
 
Derecho de los pueblos a la auto determinación
Derecho de los pueblos a la  auto determinaciónDerecho de los pueblos a la  auto determinación
Derecho de los pueblos a la auto determinación
 
4ª lista de exercícios desenho técnico i
4ª lista de exercícios   desenho técnico i4ª lista de exercícios   desenho técnico i
4ª lista de exercícios desenho técnico i
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening note
 
Randall Whittinghill: Puppies
Randall Whittinghill: PuppiesRandall Whittinghill: Puppies
Randall Whittinghill: Puppies
 
Explore Your Twitter Analytics Dashboard
Explore Your Twitter Analytics DashboardExplore Your Twitter Analytics Dashboard
Explore Your Twitter Analytics Dashboard
 

Similar to 2012 ACS Skolnik Symposium - ChemSpotlight

PhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_PaulPhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_Paul
Abhijeet Paul
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3
glennfish
 
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
BIOVIA
 

Similar to 2012 ACS Skolnik Symposium - ChemSpotlight (20)

Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
玩轉 LHC 公開數據 (Play around with the LHC open data)
玩轉 LHC 公開數據 (Play around with the LHC open data)玩轉 LHC 公開數據 (Play around with the LHC open data)
玩轉 LHC 公開數據 (Play around with the LHC open data)
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
PhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_PaulPhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_Paul
 
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
 
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
Вычислительный эксперимент в молекулярной биофизике белков и биомембран
Вычислительный эксперимент в молекулярной биофизике белков и биомембранВычислительный эксперимент в молекулярной биофизике белков и биомембран
Вычислительный эксперимент в молекулярной биофизике белков и биомембран
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
 
Computational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to PracticeComputational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to Practice
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc project
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials Discovery
 
Bionic eye
Bionic eyeBionic eye
Bionic eye
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)
 
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

2012 ACS Skolnik Symposium - ChemSpotlight

  • 1. Automated Molecular Data Extraction using Open Babel & ChemSpotlight: The Semantic Desktop Prof. Geoff Hutchison Department of Chemistry University of Pittsburgh geoffh@pitt.edu ACS CINF: Skolnik Symposium 21 August 2012 http://hutchison.chem.pitt.edu
  • 2. “ I can plug my iPod into any computer and it will recognize my music and give me all sorts of metadata: artist, title, type of music... Why can’t I read the chemical metadata off my chemistry files? ” — Prof. Henry S. Rzepa (Imperial College) Spring 2005 ACS Meeting, San Diego, CA
  • 3. Pre-History: Chem://Dig Index files, websites Based on Chem MIME Find files on extension Perceive chemistry Database Store Search, Filter Retrieval H. Rzepa et al. New J. Chem (2002) 26 p. 656
  • 4. Open Babel Open Babel (Started 2001) Free, open source chemical toolbox Cross-platform: Win, Mac, Linux... Both user-tools & C++ library Interfaces in Python, Perl, Ruby, Java, C# Supports chemistry, bioinformatics, solid-state… 100+ file formats and variants http://openbabel.org/ O’Boyle et al. J. Cheminf. 2011, 3:33
  • 5. Chemical Database? 1. Some way to store data (Organize it) 2. Index it 3. Search / filter 4. Visualize results
  • 6. ChemSpotlight: Indexing Architecture ~300 lines + + of code Spotlight Open Babel http://chemspotlight.openmolecules.net/
  • 7. ChemSpotlight: “Un” Database Use the system-wide search database No (Visible) Database! Index files in-place Includes textual data (e.g., chemical names, formulas, etc.) Multiple retrieval and filtering interfaces (i.e., any third-party search tool works) http://chemspotlight.openmolecules.net/
  • 8. So What’s Stored / Perceived Formula, mass, SMILES, InChI net_sourceforge_openbabel_Formula = C21H36N7O8S Fingerprints, number of atoms, bonds, residues PDB, SDF keywords, properties Calculation keywords: kMDItemComment = "Gaussian 09 #n B3LYP/6-31G(d) Opt" Calculation results (HOMO, LUMO, Dipole Moment) net_sourceforge_chemspotlight_DipoleMoment = 3.5
  • 11. How Do We Visualize? “QuickLook” previews New code ~800 lines Generate SDF, PDB, CIF (if needed) Pass off to ChemDoodle Web Components Pseudo-3D, interactive JS + HTML5 … or SVG generation from Open Babel http://web.chemdoodle.com/
  • 12. Organic Heterojunction Solar Cells light Transparent Electrode + p-type material Circuit - n-type material Reflective Electrode
  • 13. Organic Heterojunction Solar Cells ΔE ≥ Exciton Binding Energy e- Optical Excitation light hν Cathode Transparent Electrode Hole Electron Conducting Effective + p-type material Conductor Polymer Heterojunction Circuit - n-type material (Nanoparticle) Bandgap Reflective Electrode Anode h+
  • 14. Pipeline Model for Finding New Molecules Monomers >106 Possible Structures Electronic ~9 minutes Properties Optical Properties Synthetic Score J Phys Chem C 2011 vol. 115 pp. 16200 ...
  • 15. Pipeline Model for Finding New Molecules Monomers >106 Possible Structures Fast Electronic ~9 minutes Screening Properties Optical Properties Synthetic Slower Score J Phys Chem C 2011 vol. 115 pp. 16200 ...
  • 16. New Genetic Algorithm Approach Rather than directly driving & wait for calc results Check Spotlight for new results “What are top HOMO energies?” Update GA, generate new candidates, submit new jobs
  • 17. Scaling Up the Polymer Solar Search S 0 2nd Gen. Search: 680 Monomers LUMO Energy (eV) −1 2800+ Fragments Search Space: −2 500+ million oligomers ~9 minutes per core −3 −9.5 −9.0 −8.5 −8.0 −7.5 −7.0 −6.5 HOMO Energy (eV)
  • 18. Take-Home Messages “Big Data” is a Big Headache ChemSpotlight & Un-Databases Work! Keep data as native files w/separate index Integrate into user-friendly tools Sell to users: “What’s in it for me?” Indexing, retrieval Improved workflows
  • 19. Marcus Hanwell Pitt / Kitware Dr. Noel O’Boyle Casey Campbell U.C. Cork, Ireland Pitt (2010)