IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010             ...
960               IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER...
WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM                                                             ...
962               IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER...
WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM                                                             ...
964                IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBE...
WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM                                                             ...
966               IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER...
WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM                                                             ...
968                 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMB...
WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM                                                             ...
970                   IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTE...
WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM                                                             ...
972                  IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEM...
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
A Benchmark Diagnostic Model Generation System
Upcoming SlideShare
Loading in …5
×

A Benchmark Diagnostic Model Generation System

1,050 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,050
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Benchmark Diagnostic Model Generation System

  1. 1. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010 959 A Benchmark Diagnostic Model Generation System Jun Wang and Gregory Provan Abstract—It is critical to use automated generators for syn- To satisfy the need for benchmark models, we describethetic models and data given the sparsity of benchmark mod- a domain-independent Complex System Model Generatorels for empirical analysis and the cost of generating models by (CoSyMGen), which is based on using a compositional mod-hand. We describe an automated generator for benchmark models eling framework and employs graphical models for the systemthat is based on using a compositional modeling framework andemploys graphical models for the system topology. We propose topology. Compositional modeling [10] is the predominanta three-step process for synthetic model generation: 1) domain knowledge-based approach to automated model construction.analysis; 2) topology generation; and 3) system-level behavioral It assumes that a system can be decomposed into a collectionmodel generation. To demonstrate our approach on two highly dif- of components, each of which can be defined using a behav-ferent domains, we generate models using this process for circuits ioral model. These component models are then integrated intodrawn from the International Symposium on Circuits and Systems the full system model using a system topology graph, whichbenchmark suite and a process-control system. We then analyze describes the component interactions.the synthetic models according to two criteria: 1) topologicalfidelity and 2) diagnostic efficiency. Based on this comparison, Standard compositional modeling tools, e.g., [11]–[14], re-we identify parameters necessary for the autogenerated models to quire manual construction of the component models and thengenerate benchmark diagnosis circuit and process-control models use the component library to speed up this tedious man-with realistic properties. ual process of system-level model development. This manual Index Terms—Benchmark model generation, compositional process is necessary to capture target systems but is costly formodeling, diagnosis. compiling a suite of similar benchmark models for tasks like algorithm analysis. Our use of automated topology generators I. I NTRODUCTION overcomes the drawback of hand-generated topologies typical of compositional modeling [10] by using topology generatorsB ENCHMARK model suites are vital to facilitating progress in a variety of domains, and the presence ofgood benchmarks has had big impacts on several areas. For for this task. We base our automated topology generation on the recent discovery that the topology of virtually all real-world systems,example, the benchmarks for SATISFIABILITY (SAT), e.g., from domains as diverse as World Wide Web (WWW), socialSATLIB1 and DIMACS ,2 have spurred progress in that area; networks, biological systems, and technological systems [15],further, it has enabled SAT algorithms to be applied to a variety [16], can be modeled using a graph framework [17]. A rangeof other domains, such as planning [1]. Benchmark model of graph models has been proposed, e.g., [17]–[19], which aresuites are becoming increasingly important for validating a significant improvements over the classic random graph modelsvariety of algorithms in other domains, including very large traditionally used for the empirical analysis of algorithms inscale integration (VLSI) design [2], [3], process control [4]–[6], that they capture the topological properties of realistic systemsand bioinformatics [7]. much better than do classic random graphs [15]. Although it Diagnosis, in contrast to areas such as SAT and constraint is known that different domains have different properties, e.g.,satisfaction (CSP), has very few benchmarks. To our knowl- [15] and [20], there has been little work on characterizingedge, there are only two publicly available benchmarks for domains based on the underlying properties. Further, untildiagnostics: 1) the International Symposium on Circuits and now, most analyzes of such models have been confined to theSystems (ISCAS) benchmark models for discrete-valued mod- models’ global statistical properties (e.g., degree distribution,els [8] and 2) the DAMADICS benchmark for continuous- average shortest connecting paths and clustering coefficients) orvalued models [9]. Given the sparsity of benchmark models and the statistics of specific local connectivity patterns (motif) [15].the cost of generating models by hand, it is critical to design an In contrast, little research has focused on the functionality andautomated generator for synthetic models and data. the corresponding complexity of generated graphs in practical applications. Manuscript received May 2, 2008; revised March 3, 2009. Date of pub- Further, existing models have inherently been inaccurate duelication July 1, 2010; date of current version August 18, 2010. This workwas supported by the Science Foundation of Ireland under Grant 04/IN3/I524. to discrepancies between the graphical parameters of the realThis paper was recommended by Guest Editor G. Biswas. systems and those of the autogenerated graphs. For example, J. Wang is with Fujitsu Laboratories of America, Inc., Sunnyvale, CA 94085 the well-known Watts–Stogatz [19] model requires an integralUSA (e-mail: jwang@fla.fujitsu.com). mean degree, whereas the mean degree of many systems is G. Provan is with the Department of Computer Science, University CollegeCork, Cork, Ireland (e-mail: gprovan@cs.ucc.ie). nonintegral [21]. Color versions of one or more of the figures in this paper are available online We address the validity of models generated not only in termsat http://ieeexplore.ieee.org. of their topological properties but also in terms of their behav- Digital Object Identifier 10.1109/TSMCA.2010.2052039 ioral properties. The behavioral property that we examine in 1 http://www.cs.ubc.ca/hoos/SATLIB/benchm.html. this paper is diagnostics, specifically the inference efficiency of 2 ftp://dimacs.rutgers.edu/pub/challenge/satisfiability/benchmarks/. model-based diagnosis (MBD) [22]–[24]. The MBD problem 1083-4427/$26.00 © 2010 IEEE
  2. 2. 960 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010focuses on isolating the root faults given an observation (e.g., of A. Notationsensor values). More formally, MBD determines whether an as- The models used in standard MBD frameworks, such as thesignment of failure status to a set of mode variables is consistent qualitative (logical) [22] or fault diagnosis and isolation (FDI)with a system description and an observation (e.g., of sensor [23], [24] approaches, typically define a system model in termsvalues). This problem is addressed in various engineering fields, of a set B of behavioral equations, defined over sets of failureand the underlying structure of the problem that affects the modes and observable/controllable variables. The underlyingdiagnosis complexity is governed by the graph framework [21]. assumption is that such systems are not explicitly viewed as In this paper, we assume that we have a library of behavioral being decomposable, and the system model is treated mono-component models for the domain in question, so the main lithically. In contrast to these approaches, we are interested infocus of benchmark generation is on creating ensembles of system models that are explicitly decomposable, so we mustrandom but “realistic” topologies. A range of methods exist to capture not only the behavioral equations but also a frameworkgenerate system topologies, each of which has a set of specific that captures the system’s compositional properties.input parameters that must be optimized to create a model A system is compositional if its behavior consists of thethat accurately depicts a domain-specific topology. As we will combination of the behaviors of its constituent components’show, the different generation methodologies produce quite behaviors. Due to this assumption, modeling/analysis of com-different models, with different topological properties, such as positional models can be more efficient than noncompositionaldegree distribution, etc. Since each application domain requires modeling/analysis and scales better. A precondition for compo-different topological properties, the key to generating good sitional modeling is the existence of an underlying structure forbenchmark models is to match the generation methodology to the model. This will describe how the different components ofthe domain requirements. the system constrain each other. Our contributions are as follows: We can thus describe a decomposable model Ψ using two 1) We propose a domain-independent synthetic model gen- orthogonal aspects: 1) behavior and 2) topology (interaction). eration system, i.e., CoSyMGen, that can create models The behavior model describes the (possibly dynamic) behaviors whose parameters can be optimized to conform to a range of the system and components, whereas the topology model of different criteria. describes the component connectivity in terms of components 2) We describe the following main phases of model and their connections and defines the constraints on component generation: behaviors that enable their interactions to specify the system- a) domain analysis, which extracts topology and compo- level interaction [25]. nent statistics and selects fidelity metrics Φ; Definition 1: A composable system Ψ is defined using the b) topology generation, which automatically, rather than pair (G, B), where B is the behavior model, and G is the manually, generates a high-fidelity system topology topology model. G by optimizing the corresponding parameters Π in We assume that a system Ψ can be decomposed into sub- terms of Φ; systems. There are two types of subsystem: 1) a component, c) behavioral model generation, which uses the domain which is a primitive subsystem, and 2) a composite subsystem, library, the component statistics, and the system topol- which can further be decomposed. A component represents the ogy (G) to generate the behavioral equations. specification of a primitive behavioral specification of Ψ, i.e., 3) We illustrate the domain-analysis and model-generation no further decomposition of behavior is possible that allows procedure on two quite different domains: 1) MBD of each subfunction to coherently describe a process. We assume a discrete Boolean circuits (where we compare the topo- set C = {C1 , . . . , Cm } of components and that the input/output logical fidelity of the generated models to that of real tuple for each Cj can be specified. By merging components circuit models) and 2) process control diagnostics for a and/or subsystems, we can define a hierarchical model; we pulp mill. define a flat model to consist of a system represented only in We organize the remainder of this paper as follows. Section II terms of components Ci ∈ C and their interconnections.describes the assumptions underlying our model-generation In this paper, we focus on generating two classes of diagnos-approach. Section III illustrates the system architecture of tic benchmark model: 1) a propositional logic MBD model andCoSyMGen. Sections IV, V-A, and V-B describe the domain 2) a Bayesian network (BN) model, as subsequently described.3analysis, the system topology generation, and the system behav- 1) MBD Model: This section describes the general MBDioral model generation in the proposed benchmark generation framework, which we will use to illustrate our model con-process, respectively. Section VI presents the experimental struction process throughout this paper. We can characterize anresults of diagnosis benchmark generation on the two different MBD problem using the triple Φ = COMPS, SD, OBS [22],application domains. Section VII compares our contributions to where we have the following:previous results in the literature. Finally, Section VIII summa-rizes our contributions. 1) COMPS = {C1 , . . . , Cm } describes the operating modes of the set of m components into which the system is decomposed. II. K EY A SSUMPTIONS This section discusses two key assumptions underlying this 3 Note that we start from a propositional logic model as well as a differentialapproach: 1) model compositionality and 2) the ability to gener- equation model, and our approach is not limited to generating discrete models;ate realistic system topologies. We first introduce the necessary the approach is applicable to any class of model that satisfies the assumptionsnotation to discuss these topics. described in this paper.
  3. 3. WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 961 2) SD, or system description, describes the behavior of the As an example, the Ecological Component Library for Parallel system. This model encodes the system’s topology within Spatial Simulation tool has been developed to build ecosystem the equations in SD. models, at multiple spatial scales, from a library of reusable 3) OBS, the set of observations, denotes possible sen- interchangeable components [35]. sor/actuator measurements, which may be control inputs, Technological (or man-made) systems exhibit modularity, outputs, or intermediate variable values. regularity, and hierarchy to an even greater degree than biosys- We adopt a propositional logic framework for our MBD sys- tems; in fact, modularity, regularity, and hierarchy are funda-tem behavior model SD. Component i has the associated mode mental principles of the theory and practice of engineeringvariable Ci ; Ci can be functioning normally ([Ci = OK]) or design [36]. Further, engineering design has adopted the prin-can take on a finite set of abnormal behaviors. ciples of modularity, regularity, and hierarchy as a key to cost- MBD inference, using weak fault models [26], initially as- effective and reliable design, both in theory and in practice [36].sumes that all components are functioning normally: [Ci = As the complexity of technological systems increases, moduleOK], i = 1, . . . , m. Diagnosis is necessary when SD ∪ reuse increases based on 1) symmetrical and regular structuresOBS ∪ {[Ci = OK]|Ci ∈ COM P S} is proved to be in- and 2) developing standards for components and dimensions,consistent. Hypothesizing that component i is faulty means since this regularity and component-based methodology trans-switching from [Ci = OK] to [Ci = OK]. Given some mini- lates into reduced design, fabrication, and operation costs.mality criterion ω, a (minimal) diagnosis is a (ω-minimal) sub- Component-based approaches have been used to model aset C ⊆ COM P S such that SD ∪ OBS ∪ {[Ci = OK]|Ci ∈ wide range of technological systems, with model representa-COM P S C } ∪ {[Ci = OK]|Ci ∈ C } is consistent. tions ranging from discrete logic to the complexity of hybrid In this paper, we adopt a multivalued propositional logic systems [37], [38]. Component-based hybrid system modelsusing standard connectives (¬, ∨, ∧, ⇒). We denote variable A have been developed for systems ranging from mechatronictaking on value α using [A = α]. An example equation for a systems [39] to electrical power systems [40].buffer X is [In = t] ∧ [X = OK] ⇒ [Out = t]. Since the focus of this paper is not on the theory of system 2) BN Diagnosis Model: We can frame a diagnosis problem compositionality or of the domain-dependent methods for cre-as a BN, as done in [27]. Using our (G, B) framework, in a BN, ating component-based system models, we will assume the allthe behavior model B consists of a set of factorized probability of the domains to which the synthetic generator will be applieddistributions, and the topology model G is a graph. are compositional domains, and refer the reader to the literature Definition 2 (BN): A BN is a tuple (G, B), where G for precise expositions of system compositionality, for example,is a directed acyclic graph, and B is a set of factorized [10], [38], [41], and [42]. Note that, although we restrict ourprobability distributions constructed from vertices V in G analysis to systems where components can be connected bybased on the topological structure of G. B satisfies P r(V ) = unidirectional arcs, component composition methods exist for n handling bidirectional component connections, e.g., [43]. i=1 P r(vi |π(vi )), where π(vi )’s are the parents of vi in G. If we model a diagnosis system as in [27], then given an In this paper, we focus on two forms of behavior equation:observation OBS, a diagnosis consists of the posterior distrib- 1) Logic: The compositionality of propositional logicution of the failure modes, i.e., Pr(COMPS|OBS). We typically models has been described in [44].will select those components with the highest probability as 2) Probability equations: If we model a probability distribu-the most likely diagnoses. We can also compute multiple-fault tion in terms of a graphical model, then the composition-diagnoses within this framework [28]. ality of probability distributions is well defined [45]. We assume that a model can be generated from the tuple (G, B), where G denotes the topology graph, and B denotesB. Compositionality Assumption the system behavioral specification. The topology graph G = A domain D is compositional if a system model from D (V, E) consists of vertices V and edges E and specifies thecan be composed from model components, each of which topological relations among the system components. For anyis defined by a component behavioral model. CoSyMGen is system Ψ that can be decomposed into a set of components, weapplicable to any compositional domain for which structural define the graph G(V, E) for Ψ such that we have the following:models and component libraries exist. There is a large body • the vertices V of G correspond to the components, inputs,of evidence that virtually all mechanisms, whether natural or or outputs of Ψ;man made, exhibit properties of modularity, compositionality, • the edges E of G are such that (vi , vj ) ∈ E if (vi , vj ) cor-and hierarchical design and hence are amenable to component- respond to the component pair (Ci , Cj ), Ψ and (Ci , Cj )based modeling approaches. are coupled. Biological systems, which have evolved according to natural If the coupling relations indicate directionality, then weevolutionary principles, clearly exhibit modularity and com- assume that all edges of G are directed. Our component librarypositionality [29]–[31]. The issue in the analysis of biosys- specifies a behavioral description Bi for each component vi intems, given the assumption that modularity is ubiquitous, is the system being modeled.how to integrate the biological modules; integration theories 1) Compositionality Requirements: When we assume com-include evolutionary frameworks, such as that governing the positionality, we assume that the components can be rep-integration of skeletal modules [32]. Further, several tools that resented in terms of a block, which consists of the tuplebuild biosystems from components have been implemented, (I, O, X, B), where a block has two types of ports (i.e., Isuch as [7], [33], and [34]. In addition, component-based tools is a set of input ports, and O is a set of output ports), Xfor systems as complex as ecosystems are becoming standard. defines the internal variables, and B is the block’s behavioral
  4. 4. 962 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010description (behavioral equations) defined over (I, O, X). The [17], and technological [16], [17] systems, share a common un-process of composing a system from components consists of derlying structure, which is characterized by a class of randomselecting appropriate blocks and connecting the outputs of graph models. In this structure, the nodes form several looselyparticular blocks to the inputs of other blocks. The output–input connected clusters, every node can be reached from every otherconnection is possible if the types of the corresponding ports node by a small number of hops or steps, and the degreematch [12]. Formal studies of causal block diagrams can be distribution Pk , which is the probability of finding a node withfound in [12], [41], and [46]–[48]. k links, displays a heavy tail [17]. Several random graph models Bond graphs are one modeling framework that explicitly have been proposed to capture the real-world graph properties,captures a block composition framework for a broad range of such as the Watts–Strogatz [or small-world graph (SWG)]continuous-time physical systems. Cellier [49] describes how [19] and the Barabasi–Albert [or preferential attachment (PA)]to interconnect a set of basic bond graph elements within the models [18].object-oriented modeling language Dymola. Throughout this paper, we use this topological property to In this paper, we focus on discrete-valued systems. As an automatically generate the structure of our diagnosis models.example of composing such systems, consider the case wherewe have two simple blocks Bi and Bj , each with one input andone output. We further assume that the equations for blocks Bi III. S YSTEM A RCHITECTUREand Bj are Oi = φ(Ii , Xi ) and Oj = φ(Ij , Xj ), respectively. The use of random-graph generators is appealing becauseAssume that we connect the output of block Bi , which is they can generate an infinite number of models, in each ofdenoted by Oi , to the input of block Bj , which is denoted which the model parameters (e.g., number and type of com-by Ij . By our assumption of compositionality, we can compose ponents) can be specified. However, only a small subset ofthe equations of Bi and Bj by equating Oi and Ij or by random-graph models can be considered reasonable with re-renaming the input Ij to the name of the output Oi . spect to technological constraints and topological properties Consider an example where Bi and Bj are both inverters, [51]. The key question for any work on benchmark generationwith modes Mi and Mj , respectively, where each mode has is “How good are the synthetic models that are produced?” Forvalues {OK, bad}, and with the equations given by synthetic benchmark models to be useful, they must be shown to be realistic proxies for real models. Thus, it is important Bi : (Mi = OK) ∧ (Ii ⇒ ¬Oi ) to have both a strong experimental platform and objective Bj : (Mj = OK) ∧ (Ij ⇒ ¬Oj ). measures of benchmark model quality with which to evaluate the output of the automated generation process. Naturally, If we rename Ij to be Oi , then we obtain the composed we can demonstrate realism by comparing real benchmarkequations models to “clones” generated synthetically from the charac- terization of the existing real models. Real and clone models (Mi = OK) ∧ (Ii ⇒ ¬Oi ) (Mj = OK) ∧ (Oi ⇒ ¬Oj ). could be compared on the basis of an important structural and behavioral properties, and this realism assessment approach If the system equations are defined by differential equations, has been widely used in the synthetic model generation ofthen an analogous example on block compositionality can be various application domains, including electronic circuits [51]–provided, e.g., [12] and [46]. [53], the Internet [54], and biological systems [7]. We extend In our model generation, we assume that we can represent these domain-specific approaches and propose our domain-the block diagram, which consists of a set of blocks connected independent automated benchmark model generators.by directed arcs, in terms of a directed graph G. Generation Algorithm: We generate diagnostic (bench- 2) Topology Graph: The topology graph G defines the con- mark) models in a three-step process.nectivity over the system components. For example, electroniccircuits can be viewed as graphs in which nodes are elec- 1) Domain analysis: Analyze existing domain models totronic components (such as logic gates in digital circuits) and extract important model properties.edges are wires in a broad sense [16]. In gene transcriptional 2) Topology generation: Generate the (topology) graph G ˆregulatory networks (TRNs), nodes represent genes and edges underlying each synthetic model.correspond to regulatory interactions at the transcriptional level 3) System-level behavioral model generation: Assignbetween the genes [7], [50]. ˆ components to each node in G and create the system-level G can be either directed or undirected, depending on the se- ˆ behavioral model B.mantics of the component coupling relations. It is important tonote, however, that most compositional modeling frameworks Fig. 1 depicts the system architecture. The figure shows howassume directionality, either explicitly [41] or implicitly (i.e., Step 1, domain analysis, characterizes two types of information:deriving the directionality) [12], [46]. 1) domain topological properties, which are used for topology generation (Step 2), and 2) component statistics, which are used for behavioral-model generation (Step 3). Step 2 is concernedC. Topology Generation Assumption ˆ with generating a system topology graph G that captures the A second key assumption that we make is that the topology domain topology with high fidelity. We have implemented thisof real-world systems can be captured using a random graph step in terms of model selection, i.e., selecting the model typeframework. In the past several years, several recent theoretical (from among a large set of model types that can be created bystudies and extensive data analyses have shown that a variety of different model generators) that best matches the key topologi-complex systems, including biological [15], [17], social [15], cal properties of the original domain model. Finally, in Step 3,
  5. 5. WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 963 have proposed motifs as the basic building blocks in biological and technological networks and further argue that such motifs possess direct analogs in technological systems. A motif has been defined as a subgraph that occurs significantly more frequently in real-world networks than expected by chance alone [59]. The observed overrepresentation of motifs has been interpreted as a manifestation of behavioral constraints and design principles that have shaped the network architecture at the local level. Some researchers argue that motifs reflect the underlying processes that generated different networks and may have specific functions as elementary computational circuits in the networks [59]. Motifs may then predict what system-level function a network performs and how it performs it. The proposed model generation approach can be used for synthetic generation based on motifs as long as the domain analysis is focused on the motif level rather than the component level.Fig. 1. Automated model generation framework. ˆ B. Topological Analysisusing G, the component statistics, and a component library, wecreate the behavioral model for the synthetic system. We extract topological properties that are widely used to In the following sections, we will show the main model characterize complex systems [54], [60]. The obtained statis-generation steps in detail. tical data, based on key topological properties, can help us to select the most plausible topology generation algorithms and IV. D OMAIN A NALYSIS M ODULE behavioral model. We assume that we have a graph G(V, E) with n vertices and We perform domain analysis to capture two types of data: m edges. Some key topological properties are now summarized. 1) component statistical properties; 1) Global Graph Properties: Two fundamental global graph 2) topological statistical properties. properties, which are highly domain dependent, are the charac- We now describe each data type in turn. ¯ teristic path length L and the clustering coefficient C. ¯ Shortest paths play an important role in the transport andA. Component Analysis communication within a network. For such a reason, shortest paths have also played an important role in the characterization The analysis of the component properties extracts informa- of the internal structure of a graph. A measure of the typical sep-tion about the distribution of components in Ψ and also poten- aration between two nodes in the graph is given by the averagetially the connectivity patterns, or motifs, for the components. shortest path length, which is also known as the characteristicThe statistical data on components help us to generate more ¯ path length L, defined as the mean of geodesic lengths over allrealistic behavioral models. couples of nodes. For example, for a circuit, we must classify the component Graph clustering characterizes the degree of cliquishnesstypes based on their connectivity and obtain the relative distrib- of a typical neighborhood (a node’s immediately connectedution for each connectivity class. Hence, in a circuit, which has neighbors). The clustering coefficient Ci for a vertex vi is thea directed graph as its underlying structure such that every node proportion of links between the vertices within its neighbor-corresponds to a gate, we identify the following: hood divided by the number of links that could possibly exist 1) the set of connectivity classes, where each class is distin- ¯ between them. The graph clustering coefficient C is the average guished by the pair ηij = (#-inputs, #-outputs) of every of the clustering coefficient Ci for each vertex vi [15]. component (node v ∈ G)4 ; 2) Graph Degree Properties: The degree (or connectivity) 2) for each ηij , we identify the relative proportion of gates; ki of a node vi is the number of edges incident with the node. A for example, for η11 , we compute the relative proportion list of the node degrees of a graph is called the degree sequence. of buffers and inverters. The most basic topological characterization of a graph G can be It is possible to generate models using component clusters obtained in terms of the degree distribution Pk , which is definedof size larger than the primitive components for model gen- as the probability that a node chosen uniformly at random haseration. This approach has been advocated as the proper one degree k or, equivalently, as the fraction of nodes in the graphfor biological domains, among others. For example, various having degree k.researchers, e.g., [55]–[57], have argued that the underlying The degree distributions of some complex systems, such asbuilding blocks of biosystems, or motifs, consist of interacting power grids, appear to have exponential tails: Pk ∝ e−k/κ , asgroups of between two and four genes, which control transcrip- indicated by their approximately straight line forms on thetional regulation [58]. Given this evidence, Shen-Orr et al. [56] semilogarithmic scales [17]. Many real-world systems, such as the WWW and gene 4 Our current automated procedure deals only with the number of inputs, but TRNs, are heavy tailed in their degree distributions. Power lawsin the future, we plan to extend the implementation to cover both inputs and can characterize their tails, i.e., Pk ∝ k −γ , as indicated by theiroutputs. approximately straight line forms on the double-logarithmic
  6. 6. 964 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010scales [17]. They are also referred to as scale-free networks, as standard metrics. In addition to these standard metrics,although it is only their degree distributions that are scale recently, some extended metrics have also been introduced forfree [17]. various applications [15], [54], [66]. We focus on the following Other common forms for degree distributions are power laws extended metrics.with cutoffs [17], [20], [61], such as those seen in electronic s-Metric: The s-Metric of graph G is defined as s(G) =circuits and airport networks. The degree distribution looks like edge(vi ,vj ) di dj , where (vi , vj ) are the edges in the graph, anda power law over the lower range of values but decays quickly di and dj are the degrees of the node vi and vj , respectively. Thefor higher values. Often, this decay is exponential, and hence, s-Metric is closely related to betweenness, degree correlation,this is usually called an exponential cutoff: Pk ∝ k −γ e−k/κ , and graph assortativity [54]. Recent research in both techno-where e−k/κ is the exponential cutoff term, and k −γ is the logical and biological systems has shown that the correlationpower law term. structure has an important impact on system function and While most of the focus regarding node degrees has fallen performance [67].on degree distributions, there are higher-order statistics that Subgraph Frequency Distribution: P (Fx (G)) defines thatcould also be considered. Mahadevan et al. [54] introduces the probability of subgraph of type x occurring in graph G. ThedK series of probability distributions that specify all degree distribution P (Fx (G)) enables us to analyze the frequencies ofcorrelations within d-sized subgraphs of a given graph G. In all subgraphs with specified sizes, and such subgraph frequencythis framework, the degree distribution Pk is the 1K distrib- statistics have been successfully applied on the evaluation ofution. The 2K distribution is the joint degree distribution that biological network models [68], [69].describes degree correlations for pairs of connected nodes. The Rent Exponent: For the evaluation of algorithms that are re-joint degree distribution is lated to circuit partitioning, the main characterization parameter m(k1 , k2 )μ(k1 , k2 ) of interconnection complexity is the Rent exponent [53], which Pk1 ,k2 = reflects the relationship between the number of terminals in a 2m partitioned circuit and the number of blocks per partition.5where m(k1 , k2 ) is the number of edges between nodes of Inference Complexity Metric: In measuring the diagnosisdegree k1 and k2 , and μ(k1 , k2 ) is 2 if k1 = k2 , and 1 otherwise. inference complexity, we use the treewidth [70] of the system’sThe higher-order distributions can be defined analogously. The topology graph.6 We now explain our choice of metric forstatistical data of dK distribution can be used as input con- measuring the diagnosis inference complexity.straints of random graph generation approach in the subsequent The worst-case complexity of MBD has been thoroughlysection. studied. For the propositional models considered in this paper, 3) Spatial Properties: A particular class of networks are the complexity is Σp -hard [71]. The average-case complexity 2those embedded in real space, i.e., networks whose nodes has not been heavily studied, apart from an empirical analysisoccupy a precise position in 2-D or 3-D Euclidean space, of the ISCAS circuit benchmarks [72], nor are there detailedand whose edges are real physical connections. Along with a comparative analyses of the different MBD algorithms. Exist-complex topological structure, many spatial networks display a ing empirical evidence suggests that the diagnostic inferencelarge heterogeneity in the wire length of the connections [15]. complexity of a model Φ is a function of two parameters: 1) theFor example, both electronic circuits and brain networks have observation vector OBS and 2) the treewidth τ of the Gaifmanheavy-tailed wire length distributions [62], [63]. graph of Φ [73].7 The observation vector OBS is closely re- 4) Design Objectives: Various researchers have also pro- lated to the minimal cardinality diagnosis β [74]; search-basedposed optimization (OPT) approaches as a means of generating diagnosis algorithms have complexity proportional to β, butsystem topologies [64], [65]. By investigating plausible objec- compilation-based algorithms, such as the Assumption-basedtives and constraints in the design of actual networks, observed Truth Maintenance System (ATMS) [75] or causal network ap-topological properties such as node degree distributions can proach [44], or stochastic algorithms like SAFARI, are relativelybe understood as the natural byproduct of an approximately insensitive to β [76]. The treewidth τ of a diagnosis model Φoptimal solution to a network design problem. For example, governs the worst-case complexity of Φ: the abduction, SAT,empirical analysis demonstrates that the wire length optimiza- and CSP problems are fixed-parameter tractable in τ , which istion is among the underlying driving forces creating power lawdegree distributions with cutoffs in both electronic circuits and 5 These differences in complexity of the interconnect topology have beenbrain networks [53], [62], [63]. experimentally observed by Rent, and his observations led to the well-known Rent’s rule: a relationship between the average number of elementary blocks B in the modules of a partitioned circuit and the average number of the module’sC. Topological Metric Selection external connections (terminals) T , i.e., T = tB r , where t is the average number of terminals per logic block, and r is called the Rent exponent. This We assume that we have a correct set of behavioral compo- exponent is a measure of the interconnection complexity of the circuit. Its valuenents, meaning that the system topology is the source of model is always smaller than 1, with increasing values for increasing interconnectionfidelity. In this case, we need to identify metrics for topology complexity. Generally, it ranges from 0.47 for regular circuits up to 0.75 forcomparison, i.e., methods to measure the topological distance complex circuits [53]. 6 Roughly speaking, the treewidth is a metric of the join-tree T of a graph G, ˆδ(G, G) between real topology graph G and synthetic topology which is a topological transformation of G into a tree of cliques, where a clique ˆgraph G. is a fully connected subgraph [70]. 7 The Gaifman graph of a conjunctive normal form (CNF) formula is a graph Naturally, the topological properties discussed in the previ- having a vertex for each variable and an edge (v1 , v2 ) if the variables v1 and ¯ous section, i.e., the characteristic path length L, the clustering v2 occur in the same clause of the formula. By treewidth of a CNF formula, we ¯coefficient C, and the degree distribution Pk , can be used refer to the treewidth of its Gaifman graph.
  7. 7. WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 965 TABLE I TOPOLOGY GENERATION APPROACHES. INPUT PARAMETERS FOR GENERATION ALGORITHMS ARE AS FOLLOWS: n—NODE NUMBER; m—EDGE NUMBER; pr —REWIRING PROBABILITY; α—SPATIAL FACTOR; gs —SEED NETWORK; pd —DUPLICATION PROBABILITY; λi —TRADEOFF WEIGHT; d—SUBGRAPH SIZEan algorithm-independent measure [77]; see also [78].8 Provan[72] showed the proportionality of inference complexity to τfor the causal network approach, in addition to the fact that theparameter τ dominates β. Given that τ is an algorithm-independent measure, we adoptit in this paper as a diagnosis complexity measure for ourpropositional models. For our BN models, τ is accepted as thede facto inference complexity metric. Rather than computingthe treewidth τ , we adopt the metric of maximum clique sizeμ(τ ), noting that μ(τ ) = 1 + τ [70]. We need to select suitable metrics to validate the synthetictopology based on the requirements of the particular applica-tion. A parsimonious model can capture the general principles Fig. 2. Selecting plausible algorithms depending on domain analysis. Candi-or structures of real-world systems, but it is hard to match all date algorithms include SWG, PA, SPA, PD, and OPT.topological metrics simultaneously and perfectly. As a conse-quence, we need to identify and understand the essential metrics shown that this inference metric is more important than otherthat are responsible for key behaviors of each application to network characteristic [21] metrics.rank the performance of synthetic models primarily in termsof the specified metrics. For example, if the performance of V. M ODEL G ENERATIONa routing algorithm depends only on the distribution of the This section describes the topology generation module, asshortest path length in the network, then the topologies of a well as the instantiation of the topology with a behavioralreal-world and synthetic network match perfectly as soon as model.their distance distributions are the same, independent of othercharacteristics [54], [79]. In the evaluation of different physical A. Topology Generation Moduledesign algorithms, not all aspects of realistic circuits will betaken into account, but only those that influence the aspects that ˆ To generate a synthetic topology graph G using anone wants to study [52]. For algorithms related to partitioning, algorithm A, we provide to A a set Π of input parameters.the Rent exponent is used as one of the main objective metrics, ˆ We then measure the properties of G (e.g., degree distribution)but for some other applications, particularly for timing-driven using a set Φ of graph metrics [54] to compare the properties ofapplications, the delay distribution is an important metric [52]. the real and synthetic topology graphs and make adjustments to Our proposed framework enables a user to specify any the generation process, if necessary.evaluation metric. To empirically demonstrate the use of met- There is a wide range of generation algorithms availablerics, in this paper, focus on generating benchmark models for for synthetic topology generation, e.g., [15] and [80]. Table Ievaluating the complexity of discrete MBD algorithms, and classifies the space of topology generation approaches that ouradopt a join-tree metric that focuses on diagnostic inference model generation tool supports, together with their key proper-complexity, as described in Section IV. We have previously ties, corresponding parameters, recommended applications, and associated model generation computational costs. Based on the 8 The MBD problem can be framed as an abduction problem, and it can results of domain analysis, the key properties can be appliedmake use of SAT algorithms for its inference. Although the SAT problem is to select suitable algorithms. Fig. 2 shows an overview of thisrelated to the diagnosis problem, their complexity is different, and the entailed selection process. For example, the PA [15] algorithm requiresalgorithms are different. It was empirically shown in [76] that a SAT solver will ˆ the number of nodes and edges of G as input parameters andnot solve diagnosis problems as well as custom diagnosis algorithms like theone proposed in [76]. In fact, many diagnosis solvers use a SAT solver as a is a viable model when the degree distribution Pk ∝ k −γ (issubroutine in the diagnosis algorithm; hence, SAT is often a subset of diagnosis power-law distributed) with no cutoff. We classify the generatorinference. models into two main groups, as shown in column 1 of Table I:
  8. 8. 966 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 20101) explanatory models, which attempt to capture the underlyinggrowth or evolution process of the system topology in theresulting model, or 2) descriptive models, which randomlygenerate topology graphs under the constraints on the specifiedtopological properties, independent of any complex systemgrowth process. For example, the PA model captures a specificnetwork growth process of complex systems, in which a newstructure preferentially forms around existing substructures[15], and provides a plausible explanation for the origin ofthe corresponding power law degree distributions. In contrast,given a specified degree sequence, the descriptive generalized Fig. 3. Generating SWG from a regular ring lattice with rewiring probability pr .random graph (GRG) model [15], [81] randomly forms edges deviating from that of the TRN, but the PD model can generateby pairing nodes with uniform probability and satisfies the γ in a wide range (1–3). Finally, the PD model with (pd =degree constraint on each node simultaneously. The process we 0.2) closely matches the actual TRN, much better than theadopt to generating high-fidelity synthetic models differs based PA model [82]. In some more complicated cases, both PA andon this basic classification. In the following, we will summarize PD models can closely match the degree distribution of a realour model selection process (using these two classes) and then system, such as the yeast protein interaction network (PIN),review the different generation approaches. and we need to compute additional metrics Φ like subgraph 1) Topology Generation Using Explanatory Models: Given frequency distribution to further evaluate and select algorithms,key properties obtained from the topological analysis of real- as presented in step 3 [83]. Our topology generation approachesworld network G in domain D and specified metric set Φ can be used in a wide range of applications, including di-according to domain-specific requirements, we select an ex- agnosis benchmark generation and bioinformatics simulation.planatory model from a set A of possible generators (see More concrete examples of the topology generation process forTable I) as follows: diagnosis are demonstrated in Section VI. 1) Generate a potential algorithm set A ⊆ A based on the When using an explanatory model, we first restrict the possi- results of domain analysis. ble algorithms based on Model Focus (cf. column 2 in Table I), 2) Optimize the parameters Πi of each algorithm Ai ∈ A i.e., whether the domain D provides information to generate a to match G in terms of specified topological metrics Φ model from topological properties or using an OPT approach ˆ and put Ai into the result set A if it can match G with given the system’s global objective function. We briefly discuss appropriate values of Πi . these two types of approaches. ˆ 3) If A contains multiple algorithms, we compute additional Topology-Based Generators: Given the wide range of graph metrics Φ , according to further requirements in D, and generators defined in the literature, e.g., [15] and [80], we have continue to evaluate and select algorithms in terms of Φ . selected four of the most important approaches, i.e., SWG, PA, spatial PA (SPA), and PD models. These models show the As discussed in Section IV, the degree distribution is one general and fundamental principles underlying the topologiesof the most fundamental topological properties, and as shown of real-world systems, and we can extend them and achievein Table I, many current topology generation approaches focus higher fidelity by introducing richer sets of domain-specificon the capability to capture the degree distributions of real- external parameters. Each approach has particular properties,world systems. Fig. 2 shows a typical example of step 1 which lend themselves to modeling particular domains within the above topology generation process, in which potential differing fidelity. We now summarize each model in turn.algorithms are selected according to the analysis on degree SWG model: This model aims to capture “small-world”distribution. For example, in gene expression simulation, we properties observed in many real-world systems like electronicanalyzed the topology graph of the E. coli TRN and found that circuits [16] and power grids [19], such as low characteristicit displays a clear power law degree distribution. According ¯ path length L relative to that of a classic random-graph (ER)to Fig. 2, the PA and partial duplication (PD) models seem ¯ ¯ ¯ model Lr (L Lr ), and high clustering coefficient C relativeviable choices for topology generation. Since the synthetic to that of an ER model Cr (C ¯ ¯ ¯ Cr ). The SWG generator ˆTRN topology graph G is used to generate gene expression extends a regular ring lattice with a set of random connectionsdata (on which the accuracy of reverse-engineering algorithms determined by a rewiring probability pr . We adopt the extendedis evaluated), we only need to measure the model fidelity in SWG approach of [21], which can model the arbitrary meanterms of regular topological metrics. For this task, we can use ¯ degrees k that occur in real systems, and not just integral meanthe degree distribution Pk as the basic fidelity metric Φ for the degrees, as in the standard generator. pr 0 corresponds to ˆsynthetic topology G. The metric of Pk can be simplified as the a regular graph, and pr 1 corresponds to a random graph;corresponding exponent γ when following a power law; the γ of graphs with real-world structure occur in between these ex-the E. coli TRN is about 2.5. The parameters in the P A(n, m) tremes. Fig. 3 depicts the graph generation process, whereand P D(n, m, gs , pd ) algorithms are optimized in terms of γ; we control the proportion of random edges using a rewiringn and m are assigned as the numbers of nodes and edges in the probability pr . By continually increasing pr , the regularity andactual TRN model, respectively. In step 2, we further optimize modularity of the generated graph will keep decreasing, morevalues of the input parameters to minimize the difference in long-range links and nodes with higher degree will appear, andterms of the γ between the synthesized topology and the actual ¯ consequentially, the characteristic path length L will becomeTRN. The PA model can only generate graphs with γ around 3, smaller.
  9. 9. WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 967 ¯ of constructing connection (average wire length W ) [53], [63], [84]. The objective function is formulated as follows: f = ¯ ¯ λL + (1 − λ)W , where 0 ≤ λ ≤ 1. For this OPT model, the optimization function concentrates only on minimizing the total wire length at λ = 0, and a regular network emerges with a nearly uniform degree and high characteristic path length L. ¯ At λ = 1, the optimization function concentrates only on mini- mizing the average shortest path length, and a star-like network emerges with highly connected hubs. The topology graphs with the power law distributions with cutoffs should emerge whenFig. 4. Graphs generated by the SPA model by enhancing the spatialconstraint. 0 < λ < 1 [64], [65]. The optimization process is looking for a solution that minimizes the above objective function at an PA model: This model focuses on capturing the power law appropriate value of λ. This approach can also give rise toof the degree distribution using a WWW-inspired generation power laws in graph degree distributions with cutoffs underprocess [18]. Starting with n0 isolated nodes, at each t = appropriate values of λ [64], [65]. Starting from a connected1, 2, . . . , n − n0 , a new node vj with mnew links is added to the random network in which nodes are evenly put on a 2-Dgraph. The probability P (vi , vj ) that a link will connect vj to square, we use simulated annealing to search for the minimuman existing node vi is linearly proportional to the actual degree cost of the objective function [85], [86]. In each annealingdi of the node vi . rearrangement step, an edge is randomly selected and rewired. SPA model: The SPA model extends the PA model with In rearrangements, duplicated edges and self-loops are nota parameter α that improves the ability to capture networks allowed to ensure that no node will be disconnected or isolated.with spatial constraints embedded in physical space, such as 2) Topology Generation Using Descriptive Models: Theelectronic circuits, telecommunications networks, and trans- dK-series Model generator [54] has as its primary input pa-portation networks [15]. In the SPA model, the node position rameter an integer d, which allows one to specify all degreeis chosen randomly in a 2-D square space with uniform density. correlations within d-size subgraphs of a given graph G.9 1KConnections of a new node vj with each existing node vi are captures the degree distribution Pk and is equal to the GRG −αestablished with probability P (vi , vj ) ∝ di wij , where wij is approach [15], [81]. 2K randomly generates synthetic graphsthe spatial (Euclidean or Manhattan) distance between the node by maintaining of the joint degree distribution of the givenpositions, di is the degree of the node vi , and α ≥ 0 is a topology graph G, and 3K considers interconnectivity amongtunable parameter used to adjust spatial constraint and shape triples of nodes.the connection probability in the PA process. When α = 0, the Generally, the set of (d + 1)K-graphs is a subset of dK-model corresponds to the standard PA model. By continually graphs, and larger values of d further constrain the number ofincreasing α, the modularity of the generated graph will keep possible graphs. Given a descriptive dK-series algorithm, we ˆ generate a synthetic model G by increasing the input parameterincreasing, fewer long-range links and high-degree nodes will ¯ d until the generated graph Gˆ matches the properties of the real-appear, and consequentially, the characteristic path length Lwill become larger. Finally, the degree distribution will degrade world graph G with sufficient fidelity in terms of the specifiedfrom the power law distribution to the exponential distribution metrics Φ. Increasing values of d capture progressively morewith sharper cutoff. Fig. 4 displays the SPA graph generation properties of G at the cost of more complex representationby adjusting the geometric constraint. Similarly, we also extend of the probability distribution and dramatically increasing the ¯the PA process to match the mean degree k of the real circuit. computational complexity. PD model: The PD model aims to capture the duplication Mahadevan et al. [54] found that the d = 2 case is sufficientmechanism, which is a dominant evolutionary force in shaping for most practical purposes, whereas d = 3 essentially recon-biological networks [80], in contrast to other mechanisms such structs the Internet AS- and router-level topologies exactly inas PA. Given a initial seed network gs , the network is updated terms of regular graph metrics. Our experiments show similarby randomly choosing a node vi , adding a duplicate of vi results on the TRN of the E. coli and electronic circuits [82].called node vj , and connecting vj to each neighbor of vi Another dimension in model selection encompasses tradeoffswith probability pd . This model and its variants have been between 1) the complexity of a model and the number ofwidely applied on bioinformatic applications related to PINs metrics it tries to reproduce, and 2) its explanatory power andand TRNs. associated generality. Although the dK-series model can gener- Optimization-Based Generators: Rather than explicitly ally capture regular topological metrics better than explanatoryreplicate the statistical properties, the OPT approach uses models due to the number of constraints imposed, we cannotan optimization framework to model the mechanisms driving use it to discover laws governing the topology growth processnetwork growth or evolution. The OPT model formulates a of a particular system. It lacks the predictive and rescalingweighted objective function over conflicting system properties power necessary for benchmark generation. Our experimentsξi and weights λi , e.g., f = n ξi · λi , and trades off the i=1 on diagnosis model generation also showed that the dK-seriesproperties using the weights λi . For example, in some sys- model is not flexible enough for fitting more complicated join-tems embedded in physical space, such as electronic circuits tree metrics.and brain networks, the topological structures are shaped andoptimized under two conflicting constraints: 1) information 9 A large number parameters are needed for every value of d in real imple- ¯transmission steps (characteristic path length L) and 2) cost mentations, but d is the governing parameter.
  10. 10. 968 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010 Fig. 6. Two-input XOR functional block implemented by four NAND gates.Fig. 5. Graph created for simple circuit topology. Inputs are denoted byyellow nodes, components by blue nodes, and outputs by dotted green nodes. ISCAS-99 circuits [8]. We have run experiments for the full suite of ISCAS-85 benchmarks. We present only a few demon-B. System Behavioral Model Generation strative results here due to space limitations. The pulp mill To generate a behavioral model, we assign components to benchmark model consists of modular representations of unitG and then merge the behavioral equations for each assigned operations in a complete pulp mill. The pulping process bench-system component. mark is based on a nonlinear dynamic mathematical model 1) Assign Components to Graph G: Given a topology graph of an actual pulp mill process. The benchmark can be usedG, we associate with each node in G a component based on the as a basis for studying several process-system tasks, includingnumber of incoming and outgoing arcs for the node. Hence, modeling, control, estimation, and fault diagnosis [6].given a node v ∈ G with i inputs and o outputs, we assign acomponent denoted as ΔZ (i, o, τ, BZ , w), where τ denotes thetype (e.g., AND gate, OR gate), BZ defines the functionality A. Domain Analysis(behavioral equations) of component Z, and w denotes the Component Analysis:weights assigned to variables, e.g., probabilities assigned to the ISCAS-85 circuits: The ISCAS-85 benchmark circuits arecomponent failure modes of Z. presented in netlists of fundamental logic gates, which pro- Example 1: Consider that the topology generation process vide a standard nonhierarchical representation specifying bothhas created a graph G, as shown in Fig. 5. This graph has been network topology and functionality (in terms of the functional-created with appropriate proportions of input, component, and ity of primitive gates) [87]. Our component analysis revealedoutput nodes based on our domain analysis. Given this structure that only seven types of primitive gates (NAND, NOR, NOT,as input, together with a component library and component AND , OR, XOR, and BUFF) appear in ISCAS-85 circuits, andstatistics, we can now assign components to G. in general, each circuit contains only four or five types of gates. Given a node v ∈ G, we randomly assign to v a suitable The NAND gates are the most common components in everycomponent with probability based on the computed component ISCAS-85 circuit. For instance, the percentage of NAND gatesdistribution. For example, the single-input nodes correspond in C17, C432, and C1355 is 100%, 74%, and 84%, respectively.to single-input gates (NOT and buffer), and the dual-input One possible reason for the prevalence of the NAND gate isnodes correspond to dual-input gates (AND, OR, NAND, NOR, that it is the cheapest gate to manufacture. Additionally, NANDand XOR). gates alone can be used to reproduce the functions of all the 2) Generate the System Behavioral Model: In the final step, other logic gates. Fig. 6 shows how an XOR functional blockwe generate the system functionality in terms of the union of can be implemented by four NAND gates, and the prevalence inthe component functions such that we match the corresponding C1355 of many NAND gates may be due to the XOR functionsinputs and outputs. As an example of input/output matching, it repeatedly performs.consider the following: if output 1 of component X, which is Another property of note is that the same type of componentdenoted as OX,1 , is the second input to component Y , which is may have various numbers of inputs. For example, in C432,denoted as OY,2 , then we set OX,1 = OY,2 . Once this is done, most AND gates have two inputs, but a small number of ANDwe merge the component behavioral descriptions to generate gates have four, eight, and even nine inputs. We need tothe system behavioral model B. At present, we assume that carefully consider above circuit component statistics in systemwe can simply take the union of the component behavioral behavioral model generation.descriptions; in future work, we plan to explore systems for Pulp mill: According to the detailed schematic diagramswhich the behavioral composition is more complicated. of the pulp mill in [6], there are 130 basic physical components and about 180 connections in the pulp mill benchmark. The VI. C ASE S TUDIES : ISCAS B ENCHMARK C IRCUITS most common basic components are valves, which are used to connect components in and between various key units. Table II AND P ROCESS -C ONTROL S YSTEMS lists the statistics of the major components used in the pulp mill In this section, we summarize experimental results compar- benchmark.ing the structure and diagnostic inference complexity properties Fig. 7 shows several typical devices used in our pulp millof the autogenerated models with the ISCAS circuits, which are process control component library, together with the variablesan established benchmark for circuit optimization [8], and a real representing the devices’ primary behavioral role. Only thepulp mill benchmark model developed by Castro and Doyle [6]. pulp washers are modeled by algebraic equations. All the other The ISCAS benchmark suites consist of multiple sets units are modeled by ordinary differential equations or partialof circuits, which include the ISCAS-85, ISCAS-89, and differential equations [6].
  11. 11. WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 969 TABLE II STATISTICS OF MAJOR COMPONENTS IN THE PULP MILL BENCHMARK Fig. 8. Cumulative degree distributions of ISCAS-85 benchmark circuits in log–log scale. TABLE III C HARACTERISTIC PATH L ENGTHS OF R EAL C IRCUITS AND C ORRESPONDING R ANDOM G RAPHS ponents can approximately be simplified as the characteristic path length. In an electronic circuit, a cluster of components corresponds to components that together serve a particular task, e.g., a subsystem; the relatively small number of connections between clusters corresponds to the fact that subsystems are typically loosely coupled. As shown in Table III, the characteristic pathFig. 7. Partial component library for pulp mill process control domain. Each length of each ISCAS-85 circuit is close to that of the 1Kcomponent has specific process measurement variables associated with its random graph with corresponding size. Fig. 9(a) and (b) showsfunctionality, as shown under the component’s name. different views of a typical ISCAS-85 benchmark circuit, i.e., C432. It is important to note the density of shortcut edges Topological Analysis: joining nodes that have long paths (based on the circle connec- ISCAS-85 circuits: Cancho et al. found SWG patterns for tivity). This circular network view displays such connectionsa small collection of electronic circuits and observed power clearly and demonstrates that the overall graph distance orlaw tails with cutoffs in degree distributions [16]. Fig. 8 shows characteristic path length for this network will be relatively lowcumulative degree distributions for the full suite of ISCAS-85 due to such connections.benchmark circuits in log–log scale. We can see that most cir- Pulp mill: Based on the flow sheet and detailed schematiccuits exhibit long tails with cutoffs in their degree distributions. diagrams in [6], we decomposed the pulp mill benchmark intoExisting analysis has conjectured that the cutoffs in power law fundamental device components and generated the underlyingdegree distributions might result from the presence of spatial topology graph according to the physical connections amongconstraints limiting the number of links when connections are these fundamental components. The topology graph of the pulpcostly [20], and this has been confirmed by further studies on mill displays a clear power law degree distribution with cutoff,diverse networks such as the Internet, power grids, transport which is discovered in many technological complex systems.networks, and brain networks [16], [20], [63], [88]–[90]. Fig. 10(a) and (b) shows different views of the topology graph In circuit design, the wire length has been treated as the of the pulp mill, and both figures show that most connectionsprime parameter for performance evaluation since it has a are local connections between near neighbors, and there aredirect impact on several important design parameters [53], [91]. a few long-range connections. The pattern displayed in bothRecent research on circuit placement showed that the wire figures is consistent with the modular structure of the topologylength of real circuits exhibits power law distributions [53], of the pulp mill. The detailed schematic diagrams in [6] show[62], [84]. Another driving force underlying circuit design is that the pulp mill consists of eight loosely coupled submodules,timing. Many design cost metrics can be treated as technolog- in each of which the nodes are densely connected. We alsoical parameters that can be optimized by trading off delay and ¯ found that the characteristic path length L of the topology graphwire length [84]. The delay of signal transmission among com- is much larger than that of a corresponding random graph.
  12. 12. 970 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010Fig. 9. (a) Directed graph depicting the topology of circuit C432, displayed in a circular view. (b) Directed graph depicting the topology of circuit C432,displayed in a regular view.Fig. 10. (a) Topology graph of the physical components in the pulp mill displayed in a circular view. This figure shows far fewer shortcuts than the correspondinggraph of C432. (b) Topology graph of the physical components in the pulp mill displayed in a regular view. (c) Topology graph of the physical components andembedded control interfaces in the pulp mill displayed in a circular view. (d) Topology graph of the physical components and embedded control interfaces in thepulp mill displayed in a regular view. In our topological analysis, we only consider connections be- Fig. 10(a) and (b) ignores the embedded control interfaces oftween physical components; however, some components, such these components. If we treat each embedded control interfaceas valves and condensers, have an embedded control interface as an “input” component, then we can generate a new topol-in addition to the inlet and outlet flow. The topology shown in ogy graph with 232 nodes and nearly 300 edges. Fig. 10(c)
  13. 13. WANG AND PROVAN: BENCHMARK DIAGNOSTIC MODEL GENERATION SYSTEM 971and (d) shows different views of the corresponding topology bution under appropriate λ. We, along with Barthelemygraph, which share a similar pattern with the topology graph in [94], have found that, under appropriate parameters, theFig. 10(a) and (b). The new topology graph also displays a clear SPA model can generate structures similar to that of thepower law degree distribution with cutoff. OPT model. However, the computational cost of model If we compare the two circular views of the pulp mill generation using the OPT model is significantly higher[Fig. 10(a) and (c)] with that of a typical circuit, e.g., C432 than that using the SPA model, so we use the SPA model[Fig. 9(a)], then we see that the pulp mill has far fewer shortcut as an efficient alternative of the OPT model in practicaledges than does the circuit. As noted earlier, these shortcut applications. The SWG model is also a possible choiceedges connect components that would otherwise be connected to fit the SWG pattern observed in circuits. The SWGby long paths. Hence, these figures clearly show that the pulp model naturally has a sharp cutoff in its exponentialmill has a longer mean graph distance or characteristic path degree distribution and can vary the tail length of degreelength than does the circuit. In addition, even with the control distribution in a limited range.structure, the essential linear aspect of the process control 2) We automatically optimized parameters in each model tosystem is the dominant structural feature. match the μ(T ) of real circuits. Experiments showed that 1) Objective MetricSelection: This section addresses ways in two selected models can both match real circuits withwhich we analyze the fidelity of the synthetic models. The prim- appropriate parameters. For example, the typical circuitary metric that we use is derived from the MBD inference task. C432 can be matched by the SWG model with pr 0.28 MBD Autogeneration Task: Given a real-world model (B, [21], as shown in Fig. 11(a), and the SPA model withG), the objective of MBD autogeneration is to create an “equiv- α 3.7, as shown in Fig. 11(b). Fig. 11(c)–(f) shows the ˆ ˆ ˆalent” synthetic model (B, G) that minimizes |γA (B, OBS) − results of some other circuits.γA (B, OBS)|, where A is an MBD inference algorithm that 3) Since both SPA and SWG models fit the real circuitshas complexity γA (B, OBS) when computing a probability well in terms of μ(T ), we can further refine the modelminimal diagnosis given B and observations OBS.10 selection by other topological metrics, such as degree As noted earlier, the treewidth of the topology graph is the distribution Pk . As shown in Fig. 12, the SPA model cankey parameter for determining the inference complexity. The match real circuits better than the SWG model in termstreewidth is closely related to the largest clique size μ(T ) of G of Pk .[44], which is the parameter we adopt as a complexity measure Pulp mill:for an MBD model defined in terms of propositional logic or 1) Since the pulp mill displays a power law degree distribu-as a BN. tion with cutoff, the SPA model is a natural candidate for The main objective of our study is to provide as close a topology generation. According to the connection patterncomparison of the system models of the circuit and pulp mill shown in Fig. 10(a) and (c), the SWG model seemsdomains as possible. For the circuits, it is straightforward to another possible candidate. Due to the large characteristiccreate propositional logic and BN diagnostic models. Although ¯ path length L, the pr of the SWG model should havediagnostic models based on propositional logic and BN differ rewiring probability with a relatively small value.from the FDI approach taken for most diagnostics studies of 2) We automatically optimized parameters in each model tothe pulp mill, e.g., [92] and [93], system-level BN models match the μ(T ) of the above two types of topology graphscan provide useful diagnostics for the global behaviors of a of the pulp mill benchmark, respectively.pulp mill. To build a qualitative BN model for the pulp mill Topology graph of physical components: Fig. 13(a) showsbenchmark, we discretized all of our variables. For example, the that the pulp mill benchmark can be matched by the SWGinput signals (manipulated variables) are scaled and constrained model with pr 0.15. However, as shown in Fig. 13(b), thebetween ±1.0, and output signals are scaled based on the corresponding SPA model cannot match the low μ(T ) of thenominal/steady-state value of outputs and the maximum range pulp mill benchmark. By increasing α, the SPA model canof change of the outputs [6]. The scaled continuous-valued generally produce the graphs with sharper cutoffs in degreevariables can further be converted into discrete-valued variables distributions and consequent lower μ(T ), but as shown inaccording to the specification of the pulp process. Fig. 13(b), tuning α becomes counterproductive when α > 5 due to the limited size of the pulp mill benchmark. As shownB. Topology Generation in Fig. 14(a), the SPA model can match the pulp mill bench- mark better than the SWG model in terms of Pk . The data in Explanatory Model Approach: We generated topology Fig. 14(a) are averaged over 100 graph instances and demon-graphs using the steps shown below. strate that the SPA model can generally closely match the ISCAS-85 circuits: degree distribution of the pulp mill benchmark, although a 1) Based on the evidence of power laws with cutoffs in small fraction of graph instances, with overly long tails in their degree distribution and wire length, the SPA model, degree distributions, contribute to a high μ(T ). combining PA with the constraint of spatial layout, is a Topology graph of physical components and control inter- plausible candidate for the topology generation of cir- faces: The results are similar to those of the topology graph cuits. The OPT model can give rise to power laws with of physical components. Fig. 13(c) shows that the pulp mill cutoffs in both degree distribution and wire length distri- benchmark can be matched by the SWG model with pr 0.08. As shown in Fig. 13(d), the SPA model cannot match the low 10 We assume that γ(·) returns a complexity parameter such as CPU time or μ(T ) of the pulp mill benchmark. Fig. 14(b) shows similarnumber of nodes searched. results on Pk as well.
  14. 14. 972 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 5, SEPTEMBER 2010Fig. 11. We use the SWG and SPA models to create synthetic circuits with the number of nodes and edges identical to that of real circuits, and analyze thecorresponding average-case inference complexity and average maximum degree (averaged over 100 runs). We automatically optimize the appropriate parameter(rewiring probability parameter pr of the SWG model or spatial constraint parameter α of the SPA model) to match the inference complexity (measured bymaximum clique size) of real circuits. Experiments show that the circuit depicted can be matched by the SWG/SPA model with corresponding pr /α parametersin terms of maximum clique size. For each plot, we show the generator type and the parameter value. (a) C432, SWG (pr 0.28). (b) C432, SPA (α 3.7).(c) C499, SWG (pr 0.18). (d) C499, SPA (α 5.4). (e) C880, SWG (pr 0.14). (f) C880, SPA (α 5.8). dK-Series Approach: Pulp mill: As shown in [97], even when d = 3, the dK- ISCAS-85 circuits: As shown in [95]–[97], when d = 1, series model cannot capture the basic topological properties ofthe dK-series model can match the topological properties of the pulp mill benchmark, although the same approach can per-ISCAS-85 circuits well. fectly match all the regular topological metrics of the Internet,

×