Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Rabitz

    1. 1. GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544
    2. 2. HDMR Methodology <ul><li>HDMR expresses a system output as a hierarchical correlated function expansion of inputs: </li></ul>
    3. 3. HDMR Methodology (Contd.) <ul><ul><li>HDMR component functions are optimally defined as: </li></ul></ul><ul><ul><ul><li>where are unconditional and conditional probability density functions: </li></ul></ul></ul>
    4. 4. RS (Random Sampling) – HDMR (Contd.) <ul><ul><li>RS-HDMR component functions are approximated by expansions of orthonormal polynomials </li></ul></ul><ul><ul><ul><li>Inputs can be sampled independently and/or in a correlated fashion </li></ul></ul></ul><ul><ul><ul><li>Only one set of data is needed to determine all of the component functions </li></ul></ul></ul><ul><ul><ul><li>Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion </li></ul></ul></ul>
    5. 5. Global Sensitivity Analysis by RS-HDMR <ul><ul><li>Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions </li></ul></ul><ul><ul><ul><li>Where are defined as the covariances of </li></ul></ul></ul><ul><ul><li>with f(x), respectively </li></ul></ul>
    6. 6. A Propellant Ignition Model Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant
    7. 7. A Propellant Ignition Model <ul><ul><li>10 independent and 44 cooperative contributions of inputs were identified as significant </li></ul></ul>
    8. 8. A Propellant Ignition Model <ul><ul><li>Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs </li></ul></ul>
    9. 9. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling Microenvironmental/exposure/dose modeling system Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)
    10. 10. Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling <ul><ul><li>The coupled microenvironmental/pharmacokinetic model: </li></ul></ul><ul><ul><ul><li>Three exposure routes (inhalation, ingestion, and dermal absorption) </li></ul></ul></ul><ul><ul><ul><li>Release of TCE from water into the air within the residence </li></ul></ul></ul><ul><ul><ul><li>Activities of individuals and physiological uptake processes </li></ul></ul></ul><ul><ul><li>Seven input variables [age (x 1 ), tap water concentration (x 2 ), shower stall volume (x 3 ), drinking water consumption rate (x 4 ), shower flow rate (x 5 ), shower time (x 6 ), time in bathroom after shower (x 7 )] are used to construct the RS-HDMR orthonormal polynomials </li></ul></ul><ul><ul><li>Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption) </li></ul></ul><ul><ul><ul><li>The amount inhaled or ingested: </li></ul></ul></ul><ul><ul><ul><li>The amount absorbed: </li></ul></ul></ul><ul><ul><ul><li>C(t): exposure concentration, IR(t): inhalation or ingestion rate, K p : permeability coefficient, SA(t): surface area exposed </li></ul></ul></ul>
    11. 11. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling <ul><li>Inputs (x 1 , x 2 , x 3 , x 4 ) have a uniform distribution, and inputs (x 5 , x 6 , x 7 ) have a triangular distribution; 10,000 input-output data were generated </li></ul>The data distributions for the uniformly distributed variable x 1 and the triangularly distributed variable x 5
    12. 12. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling <ul><ul><li>Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant </li></ul></ul>First order sensitivity indexes
    13. 13. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling <ul><ul><li>Nonlinear global sensitivity indexes (2 nd order and above) efficiently identified all significant contributions of inputs </li></ul></ul>The ten largest 2 nd and 3 rd order sensitivity indexes
    14. 14. Identification of bionetwork model parameters <ul><li>Characteristics of the problem: </li></ul><ul><li>System nonlinearity </li></ul><ul><li>Limited number & type of experiments </li></ul><ul><li>Considerable biological and measurement noise </li></ul>Multiple solutions exist ! <ul><li>Problems with traditional identification methods: </li></ul><ul><li>Provide only one or a few solutions for each parameter </li></ul><ul><li>Assume linear propagation from data noise to </li></ul><ul><li>parameter uncertainties </li></ul><ul><li>The closed-loop identification protocol (CLIP): </li></ul><ul><li>Extract the full parameter distribution by global identification </li></ul><ul><li>Iteratively look for the most informative experiments for </li></ul><ul><li>minimizing parameter uncertainty </li></ul>
    15. 15. Pre-lab analysis and design of the most informative experiments Iterative experiment optimization and data acquisition Global parameter identification General operation of CLIP
    16. 16. Isoleucyl-tRNA synthetase proofreading valyl-tRNA Ile Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984) * * * * * * * Rate constants to be identified
    17. 17. The inversion module: identifying the rate constant distribution <ul><li>The Genetic Algorithm (GA) </li></ul><ul><li>Mutation </li></ul><ul><li>1101 11 1 1+1100 0 0 10 </li></ul><ul><li>1101 11 0 1+1100 0 1 10 </li></ul><ul><li>Crossover </li></ul><ul><li>1101 1100 + 1111 0010 </li></ul><ul><li>1101 0010 + 1111 1100 </li></ul>The inversion cost function Typical rate constant distribution after random perturbation/control Q Inversion quality index Q
    18. 18. <ul><li>The analysis module: estimating the most informative experiments </li></ul><ul><ul><li>Estimate the best species for monitoring system behavior </li></ul></ul><ul><ul><li>Determine the best species for perturbing the system </li></ul></ul><ul><li>Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR) </li></ul>
    19. 19. Optimally controlled identification: squeezing on the rate constant distribution <ul><li>The control cost function </li></ul>Inversion quality Feng and Rabitz, Biophys. J., 86:1270-1281 (2004) Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A , 110:7755-7762 (2006) Non-
    20. 20. <ul><li>Network property optimization: </li></ul><ul><li>Identifying the best targeted </li></ul><ul><li>network locations for intervention </li></ul><ul><li>B. Identifying the optimal network control </li></ul>Optimal Network Performance Optimal Controls Biological System Initial Guess/ Random Control Control Design Learning Algorithm Observed Response Control Objective
    21. 21. A. Molecular target identification for network engineering Random-sampling high dimensional model representation (RS-HDMR) Randomly sample k <ul><li>Advantages of RS-HDMR: </li></ul><ul><li>Global sensitivity analysis </li></ul><ul><li>Nonlinear component functions </li></ul><ul><li>Physically meaningful representation </li></ul><ul><li>Favorable scalability </li></ul>Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)
    22. 22. k 6 k 10 ─ k 13 k 10 ─ k 13 fixed k 6 fixed Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004) Laboratory data on the mutants
    23. 23. Example: Biochemical multi-component formulation mapping <ul><ul><li>Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs) </li></ul></ul><ul><ul><li>ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory </li></ul></ul><ul><ul><li>A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error </li></ul></ul>The absolute error of repeated measurements
    24. 24. Biochemical multi-component formulation mapping The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data Note: The two parallel lines are absolute error ±0.2
    25. 25. The s-space network identification procedure (SNIP) aTc: x 1 IPTG: x 2 EYFP: y(x 1 ,x 2 ) Encode: x 1 ->x 1 m 1 (s) x 2 ->x 2 m 2 (s) Response measurement: y ->y(s) Decode: Fourier transform Laboratory data on the transcriptional cascade TetR tetR lacI eyfp EYFP LacI pL(tet ) P(lac ) aTc p(lacIq ) IPTG TetR tetR tetR lacI lacI eyfp EYFP LacI LacI pL(tet ) P(lac ) aTc p(lacIq ) IPTG
    26. 26. Nonlinear property prediction by SNIP Unmeasured region correctly predicted Nonlinear, cooperative behavior revealed Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation
    27. 27. SNIP application to an intracellular signaling network Sachs, et al., Science, 308:523-529 (2005) Laboratory single cell measurement data
    28. 28. Identified network with predictive capability Network connections identified by SNIP and Bayesian analysis Reliable SNIP prediction of Akt levels
    29. 29. Example: Ionospheric measured data <ul><li>The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points) </li></ul><ul><li>Input: year, day, solar flux (f 10.7 ), magnetic activity index ( k p ), geomagnetic field index ( dst ), previous day's value of foE </li></ul><ul><li>Output: ionospheric critical frequencies foE </li></ul><ul><li>The inputs are not controllable and not independent; the pdf of the inputs is not separable, and was not explicitly known </li></ul>
    30. 30. Ionospheric measured data The dependence of foE on the input “day” Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for the data at 12 UT
    31. 31. Ionospheric measured data The accuracy of the 2nd order RS-HDMR expansion for the output, foE
    32. 32. Quantitative molecular property prediction Standard QSAR General strategy: Molecular activity is a function of its chemical/physical/structural descriptors <ul><li>Problems: </li></ul><ul><li>Overfitting (choice of descriptors) </li></ul><ul><li>Underlying physics </li></ul>A simple solution: y=f(x 1 ,x 2 ), x 1 =1,2,…,N 1 , x 2 =1,2,…,N 2 Descriptor-free quantitative molecular property interpolation X 1 X 2
    33. 33. Descriptor-free property prediction from an arbitrary substituent order
    34. 34. Property prediction from the optimal substituent order Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003) Complexity of the search: N 1 !•N 2 !=14!•8!= 10 15 Cost function:
    35. 35. Application to a chromophore transition metal complex library Before reordering After reordering Outliers captured by the reordering algorithm Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005) Cost function:
    36. 36. Application to a drug compound library Cost function: 15% of data Reorder Prediction >14,000 compounds
    37. 37. THE MODERN WAY TO DO SCIENCE* * Adaptively under high duty cycle and automated <ul><ul><ul><li>“ You should understand the physics, write down </li></ul></ul></ul><ul><ul><ul><li>the correct equations, and let nature do the calculations.” </li></ul></ul></ul><ul><ul><ul><ul><li> Peter Debye </li></ul></ul></ul></ul>