SlideShare a Scribd company logo
1 of 33
Download to read offline
Materials Informatics Workshop
Peter Frazier, Operations Research & Information Engineering, Cornell University

4/3/2013, Research Supported By AFOSR Natural Materials, Systems & Extremophiles FA9550-12-1-0200
                                                                                                    1
Optimal Learning
✤   My research interest is in statistics & sequential decision-making under uncertainty.

✤   My specific research interest is in what I call Optimal Learning:

        ✤   Statistics & machine learning is about using data to infer the unknown --- to
            “learn”.

        ✤   In many problems, we must make decisions that influence what data is
            available.

        ✤   In making such decisions we trade the benefit of information (the ability to
            make better decisions in the future) against its cost (money, time, or
            opportunity cost).

        ✤   If we balance these costs and benefits, we are learning optimally.

✤   Other names for optimal learning: sequential experimental design, active learning,
    value of information analysis, adaptive design optimization.
                                                                                            2
Choosing experiments to perform in the search
for a new material is an optimal learning problem.


✤   If we are developing a new material, we have a choice of experiments
    (physical and computational) that we can run.

✤   Each experiment would give us different information about how
    material quality depends on design parameters.

✤   We have a limited budget on how many experiments we can perform.

✤   We would like to have an adaptive rule for choosing the experiment
    to perform next that maximizes our chances of success.

                                                                           3
Peptide design

✤   Given two target materials (e.g., gold and silver), find a peptide that is
    a strong binder for material 1, and a weak binder for material 2.

       ✤   Paras Prasad (Buffalo); Marc Knecht (Miami); Tiff Walsh (Deakin
           University, in Australia)

✤   This will be used to create a PARE (described on the next slide).

✤   Our collaborators hypothesize that PAREs can then be used to create
    reconfigurable 3D bio-mediated nanoparticle assemblies, with useful
    photonic, electronic, plasmonic, and magnetic properties.

                                                                            4
Overall Strategy for
                 Bio-nanocombinatorics

   Will use a library of material-binding peptides connected by
   switchable linkers to assemble nanoparticles into
   reconfigurable assemblies



                                                         Switchable)
                                                           Linker)
 Materials)
 Binding))
 Pep0de)                                                Nanopar0cles)
Sequences)                                              (not)to)scale))



                                                                          5
What experiments can we run?

✤   We have the ability to run two kinds of experiments: computational, and
    physical, on any chosen pair of peptide x and target material y.

✤   If we choose to run a physical experiment, we observe the binding strength
    (Gibbs free energy of binding).

✤   If we choose to run a computational experiment, we observe an estimate of the
    binding strength, but also information about which amino acids are responsible
    for the binding:

             ✤   e.g., PPPWLPYMPPWS	
  	
  

             ✤   (red	
  amino	
  acids	
  are	
  in	
  contact	
  with	
  the	
  target	
  >	
  60%	
  of	
  the	
  <me)

✤   Computa<onal	
  &	
  physical	
  experiments	
  are	
  both	
  quite	
  expensive	
  (about	
  1	
  week	
  of	
  work).
                                                                                                                            6
We start by building two statistical
models:

✤   Statistical model 1 predicts, based on the peptide sequence, the
    percentage of time an amino acid is in contact with a given target
    material. [model uses hydrophobicity, charge, size, and binding strength of the amino acid, and of its two neighbors in the sequence.]

✤   Statistical model 2 predicts peptide binding strength, based on the
    percentage of time each amino acid in it is in contact. (Work in
    progress).

✤   Both models are Bayesian models, which means that we have more
    than an estimate. We have a predictive distribution for what would
    happen if we were to run an experiment.

                                                                                                                                             7
Based on the statistical model, we
suggest what experiment to do next
✤   For simplicity in this presentation, suppose (1) we just wish to find a
    peptide with the highest binding strength against a single target material;
    and (2) experiments are conducted without noise. Both assumptions can
    be relaxed.

✤   Here are some strategies we will consider:

    ✤   Strategy 1 (exploitation)

    ✤   Strategy 2 (expected improvement)

    ✤   Strategy 3 (knowledge gradient)

    ✤   Strategy 4 (Bayes optimal)
                                                                              8
Strategies
✤   exploitation:

    ✤   For each peptide, based on the existing data, make a prediction for binding strength. Run a physical
        experiment on the peptide predicted to the best.

✤   expected improvement:

    ✤   For each peptide x, based on the existing data, calculate the predictive distribution for the result of the
        physical experiment f(x).

    ✤   Let f* be the previously observed smallest free energy of binding.

    ✤   If we measure at x, the best value observed will be min(f*,f(x)), and the improvement on the previous best
        is f* - min(f*,f(x)) = (f*-f(x))+.

    ✤   Do the experiment on the peptide x with the largest expected improvement, E[(f*-f(x))+].

    ✤   Pros: Expected improvement accounts for the uncertainty in our predictions, preferring to measure
        peptides with high upside potential.

    ✤   Cons: It is myopic.
                                                                                                                      9
The value of optimal learning in peptide design

  !  Example  showing why optimal learning is beneficial
    in peptide design.
    "  Suppose   we want to find a peptide with strong binding to a
       given target material.
    "  We have identified a few peptides as binders through
       evolutionary search, and want to use this data to find ones
       that bind even better.
    "  Let’s compare two approaches:
       •  1. Use a statistical method to infer binding from the available data,
          select the top 10, and test these. (the “test the best” or “exploitation”
          strategy)
       •  2. Use optimal learning together with the same statistical method.
Binding
                                  Best previously
Strength                          tested peptide (small
                                  uncertainty because
                                  it has been tested)!




                       Horizontal bar is a point estimate!
                       Vertical bar is an interval in which the
                       binding strength is predicted to lie.!




           Group 1 Peptides                                       Group 2 Peptides
           Peptides in group 1 are almost identical to            Peptides in group 2 have more
           the best previously tested peptide (e.g., one          differences with previously tested
           amino acid difference), and so our estimates           peptides, and so our estimates
           have less uncertainty.!                                have more uncertainty.!
Binding
Strength




           Group 1 Peptides   Group 2 Peptides
Binding
Strength
              Exploitation tests the 5
              peptides in red.!




           Group 1 Peptides              Group 2 Peptides
Binding
Strength      We reduce our uncertainty
              about the ones we test!




           Group 1 Peptides               Group 2 Peptides
Binding
Strength      We reduce our uncertainty
              about the ones we test!




           Group 1 Peptides               Group 2 Peptides
Binding                Our improvement is
Strength               the difference between
                       the best one tested,
                       and the best previous.!




           Group 1 Peptides                      Group 2 Peptides
The value of optimal learning in peptide design

  !  Insteadsuppose we use an optimal learning rule,
    which understands that we want to test peptides that
    have a good balance between:!
     "  Having large estimates, !
     "  Having high uncertainty, i.e., being different from what
        we’ve previously tested.!
  !  Thisrule also understands correlations: closely related
    peptides have similar values, and so it is a good idea to
    spread measurements across different groups, in case
    one group is substantially worse than we thought.!
Binding
Strength




           Group 1 Peptides   Group 2 Peptides
Binding
Strength




           Group 1 Peptides   Group 2 Peptides
Binding                                                  Optimal Learning!
Strength                                                 Improvement!


                              Exploitation!
                              Improvement!




           Group 1 Peptides                   Group 2 Peptides
Strategies
✤   knowledge gradient

    ✤   For each possible experiment we could do (computational or physical), calculate the
        predictive distribution for the observation.

    ✤   For each possible observation resulting from the experiment, determine how our statistical
        fit would change, and what estimated value of the predicted best peptide would be, f**.

    ✤   The improvement due to the experiment is f*-f**.

    ✤   This strategy tells us to do the experiment on the peptide x with the largest value of
        E[f*-f**].

    ✤   Pros: Knowledge-gradient values information in a less restrictive way than expected
        improvement, and allows us to recommend doing computational experiments, rather than
        only physical experiments. It is less myopic than expected improvement.

    ✤   Cons: It requires more computation than expected improvement, and it is still myopic.
                                                                                                     21
Strategies

✤   Bayes optimal

    ✤   The optimal adaptive rule for choosing experiments to perform, as
        measured by the expected free binding energy of the best peptide
        found after some fixed number of experiments, is characterized by
        the solution to a partially observable Markov decision process.

    ✤   Pros: It is optimal.

    ✤   Cons: It is very hard to compute.

                                                                        22
Ongoing work

✤   In ongoing work, we are:

      ✤   improving our statistical models

      ✤   developing computational methods for implementing these
          strategies.

      ✤   doing numerical studies to see how well these strategies work.

      ✤   actually using these strategies to guide experimentation.

                                                                           23
What have I learned from all this?

✤   Datasets are really small.

✤   Because datasets are small, we have to use domain knowledge.

✤   Because we have to use domain knowledge, we machine learners / statisticians
    need to do some work to learn about chemistry / physics / materials science to
    be successful.

✤   Computational experiments are not just lower fidelity models of physical
    experiments --- they tell us interesting things that physical experiments cannot.

✤   Because data is expensive to get, experimental design is important.

✤   In some cases, statistical models from chemoinformatics developed for biology
    applications may be applicable here too (e.g., for small molecules & peptides).
                                                                                        24
Backup




         25
Other projects in
materials informatics




                        26
Other projects in
materials informatics


✤   Another peptide design problem: Given two enzymes and some
    proteins from nature that act as a substrate for both, find a peptide
    that is (1) as short as possible; and (2) acts as a substrate for both
    enzymes.

       ✤   Nathan Giannessci, Mike Gilson, Mike Burkhart (UCSD)




                                                                             27
Other projects in
materials informatics

✤   Solar energy (with Paulette Clancy and others at Cornell)

       ✤   The goal is to design assemblies of chlorinated
           hexabenzocoronenes and carbon nanotubes that will act as high-
           efficiency solar cells.

       ✤   The project just started, but we are planning to start by using
           optimal learning to predict crystal structures.



                                                                             28
Other projects in
materials informatics
✤   Materials informatics at Princeton:

       ✤   This work is led by Warren Powell, and I have only a small amount of
           involvement.

       ✤   Experimental collaborators: Mike McAlpine, Sigurd Warner, Jim Sturm,
           Craig Arnold, Jamie Link.

       ✤   Warren is planning to use the “optimization of expensive functions”
           methodology, where function evaluations are physical experiments, and
           the goal is to find the setting of some input knobs that maximize the
           quality of the output.

       ✤   There is a similar project choosing experimental conditions for growing
           carbon nanotubes with Benji Maruyama at Air Force Research Lab (AFRL).
                                                                                   29
Optimal Learning has many applications:
    medical decision-making
✤   Adaptive scheduling of diagnostic tests for
    patients after vascular surgery.

       ✤   Shanshan Zhang, ORIE; Dr. Andrew
           Meltzer, Weill Cornell Medical College



✤   Design of cardiovascular bypass grafts

       ✤   Jing Xie, ORIE; Alison Marsden,
           UCSD


                                                    30
Optimal Learning has many applications:
    optimization of expensive noisy functions

✤   Stochastic root-finding

        ✤   Shane Henderson, ORIE;
            Rolf Waeber, ORIE



    ✤   Derivative-free global optimization, in
        stochastic and parallel settings

        ✤   Jing Xie ORIE; Jialei Wang ORIE; Scott
            Clark Cornell Center for Applied Math;
            Steve Chick INSEAD
                                                     31
Optimal Learning has many applications:
recommendation systems
                  ✤   A recommender system is a computer
                      system that recommends interesting
                      items to website users, e.g., Netflix,
                      Amazon.

                  ✤   We are building a recommendation
                      system for the arXiv.org collection of
                      scientific papers.

                  ✤   We are using optimal learning to make
                      recommendations that (1) provide a
                      good user experience now; and (2)
                      provide data for improving the user
                      experience in the future.         32
Optimal Learning has many applications:
optimal design of laboratory experiments
✤   Early stage drug development for
    Ewings’ Sarcoma, a pediatric cancer.

       ✤   Jialei Wang Cornell Applied &
           Engineering Physics; Dr. Jeff
           Toretsky, Georgetown University


✤   Development of new nano-materials

       ✤   Jialei Wang Cornell Applied &
           Engineering Physics; Paulette
           Clancy, Cornell Chemical
           Engineering; Nathan Gianneschi
           UCSD; Marc Knecht Miami
                                             33

More Related Content

Similar to Materials Informatics Workshop

Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
MLAI2
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
jaumebp
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
sky chang
 

Similar to Materials Informatics Workshop (20)

In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
 
SYRCLE_Hooijmans mini symposium sr animal studies 30082012
SYRCLE_Hooijmans mini symposium sr animal studies 30082012SYRCLE_Hooijmans mini symposium sr animal studies 30082012
SYRCLE_Hooijmans mini symposium sr animal studies 30082012
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Drug discovery clinical evaluation of new drugs
Drug discovery clinical evaluation of new drugsDrug discovery clinical evaluation of new drugs
Drug discovery clinical evaluation of new drugs
 
Drug discovery clinical evaluation of new drugs
Drug discovery clinical evaluation of new drugsDrug discovery clinical evaluation of new drugs
Drug discovery clinical evaluation of new drugs
 
bbbPaper
bbbPaperbbbPaper
bbbPaper
 
Bimolecular Homework Help
Bimolecular Homework HelpBimolecular Homework Help
Bimolecular Homework Help
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
 
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
 
Prioritizing drug targets in complete genomes
Prioritizing drug targets in complete genomesPrioritizing drug targets in complete genomes
Prioritizing drug targets in complete genomes
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental design
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
 
Drug design
Drug designDrug design
Drug design
 
First Pages Cotner 907-6
First Pages Cotner 907-6First Pages Cotner 907-6
First Pages Cotner 907-6
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 

Materials Informatics Workshop

  • 1. Materials Informatics Workshop Peter Frazier, Operations Research & Information Engineering, Cornell University 4/3/2013, Research Supported By AFOSR Natural Materials, Systems & Extremophiles FA9550-12-1-0200 1
  • 2. Optimal Learning ✤ My research interest is in statistics & sequential decision-making under uncertainty. ✤ My specific research interest is in what I call Optimal Learning: ✤ Statistics & machine learning is about using data to infer the unknown --- to “learn”. ✤ In many problems, we must make decisions that influence what data is available. ✤ In making such decisions we trade the benefit of information (the ability to make better decisions in the future) against its cost (money, time, or opportunity cost). ✤ If we balance these costs and benefits, we are learning optimally. ✤ Other names for optimal learning: sequential experimental design, active learning, value of information analysis, adaptive design optimization. 2
  • 3. Choosing experiments to perform in the search for a new material is an optimal learning problem. ✤ If we are developing a new material, we have a choice of experiments (physical and computational) that we can run. ✤ Each experiment would give us different information about how material quality depends on design parameters. ✤ We have a limited budget on how many experiments we can perform. ✤ We would like to have an adaptive rule for choosing the experiment to perform next that maximizes our chances of success. 3
  • 4. Peptide design ✤ Given two target materials (e.g., gold and silver), find a peptide that is a strong binder for material 1, and a weak binder for material 2. ✤ Paras Prasad (Buffalo); Marc Knecht (Miami); Tiff Walsh (Deakin University, in Australia) ✤ This will be used to create a PARE (described on the next slide). ✤ Our collaborators hypothesize that PAREs can then be used to create reconfigurable 3D bio-mediated nanoparticle assemblies, with useful photonic, electronic, plasmonic, and magnetic properties. 4
  • 5. Overall Strategy for Bio-nanocombinatorics Will use a library of material-binding peptides connected by switchable linkers to assemble nanoparticles into reconfigurable assemblies Switchable) Linker) Materials) Binding)) Pep0de) Nanopar0cles) Sequences) (not)to)scale)) 5
  • 6. What experiments can we run? ✤ We have the ability to run two kinds of experiments: computational, and physical, on any chosen pair of peptide x and target material y. ✤ If we choose to run a physical experiment, we observe the binding strength (Gibbs free energy of binding). ✤ If we choose to run a computational experiment, we observe an estimate of the binding strength, but also information about which amino acids are responsible for the binding: ✤ e.g., PPPWLPYMPPWS     ✤ (red  amino  acids  are  in  contact  with  the  target  >  60%  of  the  <me) ✤ Computa<onal  &  physical  experiments  are  both  quite  expensive  (about  1  week  of  work). 6
  • 7. We start by building two statistical models: ✤ Statistical model 1 predicts, based on the peptide sequence, the percentage of time an amino acid is in contact with a given target material. [model uses hydrophobicity, charge, size, and binding strength of the amino acid, and of its two neighbors in the sequence.] ✤ Statistical model 2 predicts peptide binding strength, based on the percentage of time each amino acid in it is in contact. (Work in progress). ✤ Both models are Bayesian models, which means that we have more than an estimate. We have a predictive distribution for what would happen if we were to run an experiment. 7
  • 8. Based on the statistical model, we suggest what experiment to do next ✤ For simplicity in this presentation, suppose (1) we just wish to find a peptide with the highest binding strength against a single target material; and (2) experiments are conducted without noise. Both assumptions can be relaxed. ✤ Here are some strategies we will consider: ✤ Strategy 1 (exploitation) ✤ Strategy 2 (expected improvement) ✤ Strategy 3 (knowledge gradient) ✤ Strategy 4 (Bayes optimal) 8
  • 9. Strategies ✤ exploitation: ✤ For each peptide, based on the existing data, make a prediction for binding strength. Run a physical experiment on the peptide predicted to the best. ✤ expected improvement: ✤ For each peptide x, based on the existing data, calculate the predictive distribution for the result of the physical experiment f(x). ✤ Let f* be the previously observed smallest free energy of binding. ✤ If we measure at x, the best value observed will be min(f*,f(x)), and the improvement on the previous best is f* - min(f*,f(x)) = (f*-f(x))+. ✤ Do the experiment on the peptide x with the largest expected improvement, E[(f*-f(x))+]. ✤ Pros: Expected improvement accounts for the uncertainty in our predictions, preferring to measure peptides with high upside potential. ✤ Cons: It is myopic. 9
  • 10. The value of optimal learning in peptide design !  Example showing why optimal learning is beneficial in peptide design. "  Suppose we want to find a peptide with strong binding to a given target material. "  We have identified a few peptides as binders through evolutionary search, and want to use this data to find ones that bind even better. "  Let’s compare two approaches: •  1. Use a statistical method to infer binding from the available data, select the top 10, and test these. (the “test the best” or “exploitation” strategy) •  2. Use optimal learning together with the same statistical method.
  • 11. Binding Best previously Strength tested peptide (small uncertainty because it has been tested)! Horizontal bar is a point estimate! Vertical bar is an interval in which the binding strength is predicted to lie.! Group 1 Peptides Group 2 Peptides Peptides in group 1 are almost identical to Peptides in group 2 have more the best previously tested peptide (e.g., one differences with previously tested amino acid difference), and so our estimates peptides, and so our estimates have less uncertainty.! have more uncertainty.!
  • 12. Binding Strength Group 1 Peptides Group 2 Peptides
  • 13. Binding Strength Exploitation tests the 5 peptides in red.! Group 1 Peptides Group 2 Peptides
  • 14. Binding Strength We reduce our uncertainty about the ones we test! Group 1 Peptides Group 2 Peptides
  • 15. Binding Strength We reduce our uncertainty about the ones we test! Group 1 Peptides Group 2 Peptides
  • 16. Binding Our improvement is Strength the difference between the best one tested, and the best previous.! Group 1 Peptides Group 2 Peptides
  • 17. The value of optimal learning in peptide design !  Insteadsuppose we use an optimal learning rule, which understands that we want to test peptides that have a good balance between:! "  Having large estimates, ! "  Having high uncertainty, i.e., being different from what we’ve previously tested.! !  Thisrule also understands correlations: closely related peptides have similar values, and so it is a good idea to spread measurements across different groups, in case one group is substantially worse than we thought.!
  • 18. Binding Strength Group 1 Peptides Group 2 Peptides
  • 19. Binding Strength Group 1 Peptides Group 2 Peptides
  • 20. Binding Optimal Learning! Strength Improvement! Exploitation! Improvement! Group 1 Peptides Group 2 Peptides
  • 21. Strategies ✤ knowledge gradient ✤ For each possible experiment we could do (computational or physical), calculate the predictive distribution for the observation. ✤ For each possible observation resulting from the experiment, determine how our statistical fit would change, and what estimated value of the predicted best peptide would be, f**. ✤ The improvement due to the experiment is f*-f**. ✤ This strategy tells us to do the experiment on the peptide x with the largest value of E[f*-f**]. ✤ Pros: Knowledge-gradient values information in a less restrictive way than expected improvement, and allows us to recommend doing computational experiments, rather than only physical experiments. It is less myopic than expected improvement. ✤ Cons: It requires more computation than expected improvement, and it is still myopic. 21
  • 22. Strategies ✤ Bayes optimal ✤ The optimal adaptive rule for choosing experiments to perform, as measured by the expected free binding energy of the best peptide found after some fixed number of experiments, is characterized by the solution to a partially observable Markov decision process. ✤ Pros: It is optimal. ✤ Cons: It is very hard to compute. 22
  • 23. Ongoing work ✤ In ongoing work, we are: ✤ improving our statistical models ✤ developing computational methods for implementing these strategies. ✤ doing numerical studies to see how well these strategies work. ✤ actually using these strategies to guide experimentation. 23
  • 24. What have I learned from all this? ✤ Datasets are really small. ✤ Because datasets are small, we have to use domain knowledge. ✤ Because we have to use domain knowledge, we machine learners / statisticians need to do some work to learn about chemistry / physics / materials science to be successful. ✤ Computational experiments are not just lower fidelity models of physical experiments --- they tell us interesting things that physical experiments cannot. ✤ Because data is expensive to get, experimental design is important. ✤ In some cases, statistical models from chemoinformatics developed for biology applications may be applicable here too (e.g., for small molecules & peptides). 24
  • 25. Backup 25
  • 26. Other projects in materials informatics 26
  • 27. Other projects in materials informatics ✤ Another peptide design problem: Given two enzymes and some proteins from nature that act as a substrate for both, find a peptide that is (1) as short as possible; and (2) acts as a substrate for both enzymes. ✤ Nathan Giannessci, Mike Gilson, Mike Burkhart (UCSD) 27
  • 28. Other projects in materials informatics ✤ Solar energy (with Paulette Clancy and others at Cornell) ✤ The goal is to design assemblies of chlorinated hexabenzocoronenes and carbon nanotubes that will act as high- efficiency solar cells. ✤ The project just started, but we are planning to start by using optimal learning to predict crystal structures. 28
  • 29. Other projects in materials informatics ✤ Materials informatics at Princeton: ✤ This work is led by Warren Powell, and I have only a small amount of involvement. ✤ Experimental collaborators: Mike McAlpine, Sigurd Warner, Jim Sturm, Craig Arnold, Jamie Link. ✤ Warren is planning to use the “optimization of expensive functions” methodology, where function evaluations are physical experiments, and the goal is to find the setting of some input knobs that maximize the quality of the output. ✤ There is a similar project choosing experimental conditions for growing carbon nanotubes with Benji Maruyama at Air Force Research Lab (AFRL). 29
  • 30. Optimal Learning has many applications: medical decision-making ✤ Adaptive scheduling of diagnostic tests for patients after vascular surgery. ✤ Shanshan Zhang, ORIE; Dr. Andrew Meltzer, Weill Cornell Medical College ✤ Design of cardiovascular bypass grafts ✤ Jing Xie, ORIE; Alison Marsden, UCSD 30
  • 31. Optimal Learning has many applications: optimization of expensive noisy functions ✤ Stochastic root-finding ✤ Shane Henderson, ORIE; Rolf Waeber, ORIE ✤ Derivative-free global optimization, in stochastic and parallel settings ✤ Jing Xie ORIE; Jialei Wang ORIE; Scott Clark Cornell Center for Applied Math; Steve Chick INSEAD 31
  • 32. Optimal Learning has many applications: recommendation systems ✤ A recommender system is a computer system that recommends interesting items to website users, e.g., Netflix, Amazon. ✤ We are building a recommendation system for the arXiv.org collection of scientific papers. ✤ We are using optimal learning to make recommendations that (1) provide a good user experience now; and (2) provide data for improving the user experience in the future. 32
  • 33. Optimal Learning has many applications: optimal design of laboratory experiments ✤ Early stage drug development for Ewings’ Sarcoma, a pediatric cancer. ✤ Jialei Wang Cornell Applied & Engineering Physics; Dr. Jeff Toretsky, Georgetown University ✤ Development of new nano-materials ✤ Jialei Wang Cornell Applied & Engineering Physics; Paulette Clancy, Cornell Chemical Engineering; Nathan Gianneschi UCSD; Marc Knecht Miami 33