System Dynamics Models: Building & Testing System Dynamics Models


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

System Dynamics Models: Building & Testing System Dynamics Models

  1. 1. Small System Dynamics Models for Big Issues Triple Jump towards Real-World Dynamic Complexity Erik Pruyt | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |$| | | First time readers: start with the preface | |
  2. 2. Chapter 13 Building & Testing System Dynamics Models ‘[T]esting is often designed to “prove” the model is “right”, an approach that makes learning difficult and ultimately erodes the utility of the model and the credibility of the modeler. [. . . ] Model testing should be designed to uncover errors so you and your clients can understand the model’s limitations, improve it, and ultimately use the best available model to assist in important decisions.’ (Sterman 2000, p.845-846) 13.1 Model Testing Model Debugging If errors made in the model specification phase show up during the first test simulation, then model debugging becomes a necessary –often rather frustrating– activity. Model debugging con- sists in tracing errors that keep the model from simulating properly and correcting them. Model debugging is an art that can be learned/trained. Following errors are common for novices: • faulty combinations of numeric integration method and step size (too large a step size is often chosen given a model and integration method) which leads to inaccurate calculations and/or faulty –even totally impossible– behavior ⇒ reduce the time step and reflect on the appropriateness of the integration method; • wrong signs within stock equations, especially in case of net flows, which often leads to runaway behavior in the opposite direction (e.g. goal seeking → exponential increase) and to floating point overflows ⇒ get rid of net flows and check all signs in stock equations; • loops without stock or delay/smoothing functions with initial conditions, which leads to simultaneous equations ⇒ (temporarily) add delays with initial values to loops without stocks, and/or (temporarily) break loops; • floating point overflows which indicate that variables can be calculated but their values got too big, either due to positive loops or to divisions by very small numbers ⇒ track down the large/small values and correct the problems, or –if that does not work– (temporarily) add floors or ceilings (ZIDZ, XIDZ, (soft)MAX, (soft)MIN); • very discrete implementations, which may lead to strange behaviors ⇒ replace discrete elements such as IF THEN ELSE functions by more continuous alternatives; 151
  3. 3. STEP: Building & Testing SD Models c⃝ 2013 by Erik Pruyt • erroneous structures and equations, such as outflows that are not connected to the stock they should empty or the use of special functions in stock equations ⇒ check all equations and structures. While debugging, keep track of all temporary changes/hacks such as floors and ceilings, figure out what is going on, fix the real problem, then try to relax all temporary hacks – one at a time. Model Verification Model verification activities correspond to a large extent to model debugging activities. The difference is that in case of model debugging one knows there are bugs, whereas in case of model verification, one is looking for errors without knowing whether there are any. Model verification consists mostly of testing the appropriateness of the combination of numeric integration method and step size, checking all equations and inputs for errors, testing submodels and structures, and testing the dimensional consistency (i.e. test unit consistency e.g. to see whether variables are missing in equations). A useful tool for model verification is Argonne National Laboratory’s SDM-doc tool. This tool is actually a documentation tool. Use it for that purpose too: SD modeling and simulation needs to be understandable, transparent and fully replicable (see (Rahmandad and Sterman 2012)). It is also good practice –apart from using this tool– to keep track of all choices and decisions made during the modeling process in a lab report or notebook. Model Validation Validation in many research areas corresponds largely to testing whether a model reproduces past real data. In SD, reproduction of historical patterns is just one of many tests that may be needed depending on the model/modeling purpose. Comparing model behavior with past data is almost never a goal in itself, especially not for SD modeling about the future. . . due to the types of systems and problems studied and SD-specific model uses, SD validation goes beyond this more traditional concept of validation. Two common uses of SD modeling are (i) to explore plausible futures, and (ii) to study the implications of different policies. Comparing model output with data from the past does not guarantee a good fit with future developments. Traditional validation in the sense of ‘objective demonstration of the truth of a model’ is in these cases, and many other cases, impossible. Another common use of SD modeling is to learn about a system and the link between system structure and behavior. For that use, most can be learned from reflective modeling, i.e. unprotected experimentation with models to uncover and learn, not to force the model to replicate past system behavior. For this purpose, a model that does not produce a good fit with past data may be more useful than a model that does. SD validation is actually all about building confidence in the usefulness of models for the purpose at hand (Sterman 2000). Valid models/modeling are therefore models/modeling that are believed to be useful for their intended purpose. Model validation tests are applied in each iteration of the modeling process – from rough to refined: in the first iterations, rough tests are good enough, and later, more refined tests are required. There is a large set of tests that could help in establishing confidence in the usefulness of a model or modeling process for its intended purpose. As mentioned before, they can be categorized into (i) ‘direct structure tests’ in which the structure is tested without simulating the behavior, (ii) ‘structure-oriented behavior tests’ which allow to test the structure indirectly by running the model and comparing its behavior to real/anticipated behavior in order to find, again, errors in the model structure, and (iii) ‘behavior reproduction tests’ which allow to statistically compare model output with past behavior of the real system. With respect to investigating the structure of the model (structural validation), a distinction could thus be made between direct structure tests and structure-oriented behavior tests. In the first type, the model is not simulated, whereas in the second type it is. Direct structure tests are used to check whether the relations and assumptions in the model are based on accepted theories | | | | | | | | | | | | | | | | | | | | | | |STOP | 152 | | | | | | | | | | | | | | |$| | |
  4. 4. c⃝ 2013 by Erik Pruyt STEP: Building & Testing SD Models and that all important variables are included in the model. It is also important to check whether the equations and model hold under extreme conditions. Structure-oriented behavior tests are used to test whether the modes of behavior, frequencies and mechanisms causing the behavior and other characteristics correspond to what one would expect. Unexpected results and responses to extreme conditions are then explored in detail. Sen- sitivity analysis is a very important structure-oriented behavior test for identifying parts of the model to which model behavior is particularly in/sensitive (i.e. parameters, functions, struc- tures,. . . that have a minor/major influence on the behavior when slightly changed). These model in/sensitivities should be compared to real system in/sensitivities if information about real system in/sensitivities is available. Replicative validation, i.e. investigating whether the values of the variables calculated by the model correspond to known or historical data, should be performed, but only in addition to structural validation. In the absence of real data, model results can sometimes be compared to results of other models developed in the same (or a similar) area. This might, for example, include other models in which a sub-problem is analysed in detail and to which the calculated variables can be compared with a number of the variables in the model under study. It should furthermore be verified whether the model focuses on the problems and questions deemed important by clients and stakeholders, that the model itself is comprehensible and that the use of the model stimulates the understanding of the behavior of the system. After the model has been used, one could investigate whether the model was useful and whether the real-world situation was improved as in the model. This information could be used for ex-post evaluation which may further increase confidence in the model. Validation tests partly based on (Sterman 2000, chapter 21): • Direct structure tests – Direct boundary adequacy test: test whether boundaries are adequate (e.g. large enough to endogenously capture carbon leakage in a climate change policy model); – Direct structure assessment test: test whether the structure is conform the real system and laws of nature (e.g. irreversible decisions are irreversible); – Theoretical/empirical structure/parameter confirmation test: test whether struc- tures and parameters have real world counterparts and are consistent with knowl- edge about the system; – Direct extreme conditions test: test without simulation (i) whether structures and equations make sense under assumed extreme conditions, or (ii) what the limits are for the model to be plausible/useful; – Face validation: test whether domain experts find the model structure and equa- tions appropriate for the intended purpose. • Structure-oriented behavior tests – Extreme conditions behavior test: test through simulation (i) whether the model responds plausibly under extreme conditions, or (ii) what the limits are for the model to be plausible/useful; – Sensitivity analysis: manual and/or automated testing whether relatively small changes in parameters, initial values, (graph) functions, alternative model struc- tures and boundary choices lead –in decreasing order of importance– to a modified order of preference of policies (policy sensitivity), to changes in modes of behavior (behavior mode sensitivity), or to mere ‘numerical sensitivity’; – Qualitative features analysis: test under specific test conditions whether the model generates particular qualitative features (modes, frequencies, trends, phasing, am- plitude, phase relation between pairs of variables, et cetera); | | | | | | | | | | | | | | | | | | | | | | |STOP | 153 | | | | | | | | | | | | | | |$| | |
  5. 5. STEP: Building & Testing SD Models c⃝ 2013 by Erik Pruyt – Behavior anomaly test: test whether changing or deleting assumptions leads to anomalous behaviors; – Family member test: test whether the model can generate all plausible types of behaviors that are observed in other instances of the same system or that could be imagined; – Surprise behavior test: test whether the model generates surprising behavior, and if so, whether these behaviors are plausible. • Behavior reproduction tests: test statistically whether the model generates the behavior of interest. Validity in the sense of usefulness can only be assessed in view of the intended purpose, and hence, from the points of view of client/audience and modelers/analysts. In fact, clients and audience need to be involved in some of the validation testing or need to be informed about, or ask questions in order to test, the appropriateness and effects of choices made with regard to boundaries, time horizons, levels of aggregation, inclusion/exclusion of theories and (explicit and implicit) assumptions, and types of data used. The effects of these choices should be clarified. 13.2 Sensitivity & Uncertainty Analysis, and Scenario Dis- covery Sensitivity Analysis ‘Sensitivity Analysis is the computation of the effect of changes in input values or assumptions (including boundaries and model functional form) on the outputs’ (Morgan and Henrion 1990, p39). Defined like this, Sensitivity Analysis (SA) refers to the analysis of the effect of relatively small changes to values of parameters and functions on the behavior (behavioral sensitivity) or preference for a particular policy (policy sensitivity), starting from a base case1 . In SD, the term SA is often2 used and mostly refers to a mixture of Sensitivity Analysis and Uncertainty Analysis (see below) as defined by Morgan and Henrion (1990). In this e-book I follow their distinction. SA defined in this strict sense could be used, following Pannell (1997) and Tank-Nielsen (1980), for: • searching for errors in models; • increasing the understanding of relationships between inputs and outputs, and in the case of SD, generate insights about the link between structure and behaviour; • identifying candidates for uncertainty reduction efforts, and hence, direct further work on parameters and structure; • identifying inputs for which the output is insensitive because dynamic limits may have been reached or non-linear thresholds crossed; • identifying highly sensitive policy levers; • testing the local robustness of results in the proximity of a base case scenario. 1Three types of sensitivity to (small) changes in a model are commonly distinguished: numerical sensitivity (just a small numeric change), behavior mode sensitivity (change in behavior pattern), and policy sensitivity (change in the preference order of policies). Only the last two types really matter. 2See for example (Graham 1976; Tank-Nielsen 1976; Sharp 1976; Tank-Nielsen 1980; Richardson and Pugh 1981; Ford 1983; Clemson 1995; Sterman 2000; Moxnes 2005) | | | | | | | | | | | | | | | | | | | | | | |STOP | 154 | | | | | | | | | | | | | | |$| | |
  6. 6. c⃝ 2013 by Erik Pruyt STEP: Building & Testing SD Models Both univariate and multivariate SA are performed in SD, both manually at many moments throughout the SD modeling process (Tank-Nielsen 1980) and in automated mode by means of Monte Carlo sampling (Fiddaman 2002), Latin Hypercube Sampling (Ford 1990), and Taguchi methods (Clemson 1995). However, rigorous theories and procedures for performing SA do not exist (Meadows 1980, p36), nor do tools for comprehensive sensitivity analysis. Sensitivity is therefore performed by hand as well as with sampling techniques, first one at a time, followed by combinations of parameters, initial values, etc. The sensitivity of the key performance indicators of a model should however not only be tested for changes in parameter values, initial values, in table (graph/lookup) functions, . . . also to modifications of functions, model structures, and boundaries. Sensitivity can be desirable as well as undesirable. High sensitivity is often undesirable if it cannot be controlled and could negatively influence key performance indicators. High sensitivity is on the other hand desirable if it can be controlled and opens up more desirable dynamics. SA is therefore essential, both in model testing and in policy analysis. In model testing, one would like to know which small changes to the model lead to large changes in behaviors, whereas in policy analysis, one would like to know where the largest policy leverage can be found. Uncertainty Analysis ‘Uncertainty Analysis is the computation of the total uncertainty induced in the output by quantified uncertainty in the inputs and models, and the attributes of the relative importance of the input uncertainties in terms of their contributions’ (Morgan and Henrion 1990, p39). Uncer- tainty Analysis (UA) thus refers to the exploration of the influence of the full range of uncertainty deemed plausible. UA could be used for systematically: • evaluating plausible effects of uncertainties in parameters, lookups, functions, structures, submodels, models, boundaries, methods, and possibly controversial/disputable perspec- tives; • generating many plausible scenarios / behavior patterns (see for example (Kwakkel and Pruyt 2013; Pruyt and Kwakkel 2012a)); • exploring and analyzing ensembles of scenarios/runs and uncertainty spaces using advanced machine learning algorithms (Kwakkel et al. 2013; Pruyt et al. 2013), analytical tools, and statistical screening (Ford and Flynn 2005; Taylor et al. 2010); • evaluating the appropriateness of models under uncertainty similar to testing models under extreme conditions where it is often easier to review whether the model is plausible or not (Tank-Nielsen 1980, 189) and triangulating/comparing sets of alternative models3 (Pruyt and Kwakkel 2012b); • directly searching the uncertainty space for limits, tipping points, best fits, implausible results, or high leverage points using optimization techniques; • searching the uncertainty space for particular behaviors and densely concentrated regions thereof, identifying joint root causes of behaviors with particular characteristic (un/desirable dynamics, un/desirable end-states, or undesirable side-effects) with dynamic scenario discov- ery and selection techniques (see (Kwakkel and Pruyt 2013; Kwakkel et al. 2013)); • supporting the design of adaptive robust policies (Hamarat et al. 2013); 3Loosely following Lane’s definition of a SD model, i.e. ‘the assembly of causal hypotheses about relationships between variables [. . . ]’ (Lane 1998, p938), models are considered different here if there are differences in system boundaries, model structures, functions, internal and external parameter values and other input data, and model implementation (e.g. different simulation methods). A model is thus not the same as a model file: one model file which contains switches between alternative structures thus contains many different models. | | | | | | | | | | | | | | | | | | | | | | |STOP | 155 | | | | | | | | | | | | | | |$| | |
  7. 7. STEP: Building & Testing SD Models c⃝ 2013 by Erik Pruyt • testing and comparing the absolute and relative robustness of conclusions and policies (Ster- man 2000; Lempert et al. 2003; Kwakkel and Pruyt 2013); • identifying candidate parameters and structures for model simplification in highly non-linear models, i.e. model inputs and structures that do not significantly affect output. A major difference between SA and UA is that SA necessarily starts from a base run/scenario, which is not the case for UA. In addition, SA is a means to explore the sensitivity of a model to small perturbations whereas UA is a means to virtually explore plausible real world effects of assumptions over their plausible uncertainty sets/ranges. Technically speaking there is a large overlap between SA and UA and both are extremely useful for SD. Using them in tandem –SA followed by UA– is most insightful for model-based studies. Advanced UA techniques beyond sampling techniques will be dealt with in the follow-up book on ESDMA. Scenario Generation and Discovery Scenarios are often useful to communicate a few different behavior patterns without having to communicate all plausible behaviors. Scenario discovery is the identification of potentially interesting scenarios. Scenarios can be generated by setting inputs such that different behaviors are generated. Another approach is to use UA to generate a large ensemble of runs and then use time series classification and/or machine learning techniques to identify different scenarios of interest (Kwakkel and Pruyt 2013; Kwakkel et al. 2013). In this e-book, we will stick to the former intuitive approach, just simulating different consistent sets of inputs. The latter approach will be dealt with in the follow-up book on ESDMA. Additional (non-mandatory) chapters: VI. Model Testing | | | | | | | | | | | | | | | | | | | | | | |STOP | 156 | | | | | | | | | | | | | | |$| | |
  8. 8. Flexible E-Book for Blended Learning with Online Materials Although this e-book is first and foremost an electronic case book, it is much more than just a set of case descriptions: it is the backbone of an online blended-learning approach. It consists of 6 concise theory chapters, short theory videos, 6 chapters with about 90 modeling exercises and cases, many demo and feedback videos, feedback sheets for each case, 5 overall chapters with feedback, 5 chapters with multiple choice questions (with graphs or figures), hundreds of online multiple choice questions, links to on-site lectures, past exams, models, online simulators, 126 slots for new exercises and cases, and additional materials for lecturers (slides, exams, new cases). The fully hyperlinked e-version allows students (or anybody else for that matter) to learn –in a relatively short time– how to build SD models of dynamically complex issues, simulate and analyze them, and use them to design adaptive policies and test their robustness. ISBN paperback version: ISBN e-book version: