Design Automation for Embedded Systems, 4, 243–310 (1999)                               c 1999 Kluwer Academic Publishers,...
244                                                     MACIEL, BARROS, AND ROSENSTIEL   Furthermore, a short time-to-mark...
A PETRI NET MODEL                                                                          245most of the algorithms deal ...
246                                                     MACIEL, BARROS, AND ROSENSTIELFigure 1. PISH design flow.in the par...
A PETRI NET MODEL                                                                                  247     The occam subse...
248                                                     MACIEL, BARROS, AND ROSENSTIELFigure 2. Task diagram.4.1. The Part...
A PETRI NET MODEL                                                                           249among sequential data depen...
250                                                                 MACIEL, BARROS, AND ROSENSTIELcommunication among sequ...
A PETRI NET MODEL                                                                    251Figure 3. Two splitting rules.in t...
252                                                         MACIEL, BARROS, AND ROSENSTIEL  The goal of Step 2 is to trans...
A PETRI NET MODEL                                                                       253Figure 5. Classification and clu...
254                                                     MACIEL, BARROS, AND ROSENSTIELone process to each available softwa...
A PETRI NET MODEL                                                                           255  After the construction of...
256                                                      MACIEL, BARROS, AND ROSENSTIELFigure 6. Simple Petri net.  Place/...
A PETRI NET MODEL                                                                                    257a vector s, where ...
258                                                      MACIEL, BARROS, AND ROSENSTIEL5.3. Timed Petri NetsSo far, Petri ...
A PETRI NET MODEL                                                                            259x if, and only if, M( p) ≥...
260                                                     MACIEL, BARROS, AND ROSENSTIELFigure 7. Communication.process as w...
A PETRI NET MODEL                                                                        261Figure 8. Parallel.6.2. Parall...
262                                                      MACIEL, BARROS, AND ROSENSTIEL   There are so many properties whi...
A PETRI NET MODEL                                                                          263loop counts. This approach, ...
264                                                                   MACIEL, BARROS, AND ROSENSTIELFigure 9. Timed marked...
A PETRI NET MODEL                                                                          265  Herewith a structural appr...
266                                                       MACIEL, BARROS, AND ROSENSTIELfired. We also have to eliminate fr...
A PETRI NET MODEL                                                                            267tics, the designer may pro...
268                                                      MACIEL, BARROS, AND ROSENSTIELFigure 10. Example.    6.   Compute...
A PETRI NET MODEL                                                                                                     269F...
270                                                                 MACIEL, BARROS, AND ROSENSTIELt13 , t14 }, Subnet4 = {...
A PETRI NET MODEL                                                                             271M0 ( p), p ∈ P the number...
272                                                         MACIEL, BARROS, AND ROSENSTIELDefinition 9.2 (Speed up). Let Ne...
A PETRI NET MODEL                                                                    273Figure 12. Description.           ...
274                                                               MACIEL, BARROS, AND ROSENSTIELFigure 13. The extended mo...
A PETRI NET MODEL                                                                       275  However, the method used to e...
276                                                        MACIEL, BARROS, AND ROSENSTIELDefinition 11.2 (Communication Cos...
A PETRI NET MODEL                                                                               277THEOREM 11.2 Let N be a...
278                                                           MACIEL, BARROS, AND ROSENSTIEL•   Output:        N C( pi ) o...
A PETRI NET MODEL                                                                           279  The global normalized com...
280                                                         MACIEL, BARROS, AND ROSENSTIEL                          Table ...
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
A petri net model for hardware software codesign
Upcoming SlideShare
Loading in …5
×

A petri net model for hardware software codesign

780 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
780
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A petri net model for hardware software codesign

  1. 1. Design Automation for Embedded Systems, 4, 243–310 (1999) c 1999 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.A Petri Net Model for Hardware/Software CodesignPAULO MACIELDepartamento de Inform´ tica Universidade de Pernambuco CEP. 50732-970, Recife, Brazil aEDNA BARROSDepartamento de Inform´ tica Universidade de Pernambuco CEP. 50732-970, Recife, Brazil aWOLFGANG ROSENSTIELFakultaet fuer Informatik Universitaet Tuebingen D-7207 Tuebingen, GermanyAbstract. This work presents Petri nets as an intermediate model for hardware/software codesign. The mainreason of using of Petri nets is to provide a model that allows for formal qualitative and quantitative analysisin order to perform hardware/software partitioning. Petri nets as an intermediate model allows one to analyzeproperties of the specification and formally compute performance indices which are used in the partitioningprocess. This paper highlights methods of computing load balance, mutual exclusion degree and communicationcost of behavioral description in order to perform the initial allocation and the partitioning. This work is alsodevoted to describing a method for estimating hardware area, and it also presents an overview of the generalpartitioning method considering multiple software components.Keywords: Petri nets, hardware/software codesign, quantitative analysis, estimation1. IntroductionBecause of the growing complexity of digital systems and the availability of technologies,nowadays many systems are mixed hardware/software systems. Hardware/software code-sign is the design of systems composed of two kinds of components: application specificcomponents (often referred to as hardware) and general programmable ones (often referredto as software). Although such systems have been designed ever since hardware and software first cameinto being, there is a lack of CAD tools to support the development of such heterogeneoussystems. The progress obtained by the CAD tools at the level of algorithm synthesis,the advance in some key enabling technologies, the increasing diversity and complexityof applications employing embedded systems, and the need for decreasing the costs ofdesigning and testing such systems all make techniques for supporting hardware/softwarecodesign an important research topic [21, 22, 23, 24, 25]. The hardware/software codesign problem consists in implementing a given system func-tionally in a set of interconnected hardware and software components, taking into accountdesign constraints. In the case where an implementation on a microprocessor (software component), cheapprogrammable component, does not meet the timing constraints [2], specific hardwaredevices must be implemented. On the other hand, to keep cost down, an implementation ofcomponents in software should be considered.
  2. 2. 244 MACIEL, BARROS, AND ROSENSTIEL Furthermore, a short time-to-market is an important factor. The delay in product launchingcauses serious profit reductions, since it is much simpler to sell a product if you have littleor no competition. It means that facilitating the re-use of previous designs, faster designexploration, qualitative analysis/verification in an early phase of the design, prototyping, andthe reduction of the required time-to-test reduce the overall time required from a specificationto the final product. One of the main tasks when implementing such systems is the partitioning of the descrip-tion. Some partitioning approaches have been proposed by De Micheli [20], Ernst [18],Wolf [4] and Barros [17]. The hardware/software cosynthesis method developed by De Micheli et al. considers thatthe system-level functionality is specified with Hardware as a set of interacting processes.Vulcan-II partitions the system into portions to be implemented either as dedicated hardwaremodules or as a sequence of instructions on processors. This choice must be based on thefeasibility of satisfaction of externally imposed data-rate constraints. The partitioning iscarried out by analyzing the feasibility of partitions obtained by gradually moving hardwarefunctions to software. A partition is considered feasible when it implements the originalspecification, satisfying performance indices and constraints. The algorithm is greedy andis not designed to find global minimums. Cosyma is a hardware/software codesign system for embedded controllers developed byErnst et al. at University of Braunschweig. This system uses a superset of C language, calledC∗ , where some constructors are used for specifying timing constraints and parallelism.The hardware/software partitioning in Cosyma is solved with simulated annealing. Thecost function is based on estimation of hardware and software runtime, hardware/softwarecommunication time and on trace data. The main restriction of such a partitioning methodis that hardware and software parts are not allowed for concurrent execution. The approach proposed by Wolf uses an object-oriented structure to partition functionalityconsidering distributed CPUs. The specification is described at two levels of granularity.The system is represented by a network of objects which send messages among themselves toimplement tasks. Each object is described as a collection of data variables and methods. Thepartitioning algorithm splits and recombines objects in order to speed up critical operations,although the splitting is only considered for the variable sets and does not explore the codesections. One of the challenges of hardware/software partitioning approaches is the analysis of agreat varity of implementation alternatives. The approach proposed by Barros [14, 17, 16,15] partitions a description into hardware and software components by using a clusteringalgorithm, which considers the distinct implementation alternatives. By considering aparticular implementation alternative as the current one, clustering is carried out. Theanalysis of distinct implementation alternatives in the partitioning allows for the choice ofa good implementation with respect to time constraints and area allocation. However, thismethod does not present a formal approach for performing quantitative analysis, and onlyconsiders a single processor architecture. Another very important aspect is the formal basis used in each phase of the design process.Layout models have a good formal foundation based on set and graph theory since this dealswith placement and connectivity [94]. Logic synthesis is based on boolean algebra because
  3. 3. A PETRI NET MODEL 245most of the algorithms deal with boolean transformation and minimization of expressions[95]. At high-level synthesis a lot of effort has been applied to this topic [96, 97, 98].However, since high-level synthesis is associated to a number of different areas, varyingfrom behavioral description to metric estimation, it does not have a formal theory of itsown [9]. Software generation is based on programming languages which vary from logicalto functional language paradigms. Since hardware/software codesign systems take intoaccount a behavioral specification in order to provide a partitioned system as input for high-level synthesis tools and software compilers, they must consider design/implementationaspects of hardware and software paradigms. Therefore, a formal intermediate format ableto handle behavioral specification that would also be relevant to hardware synthesis andsoftware generation is an important challenge. Petri nets are a powerful family of formal description techniques able to model a largevariety of problems. However, Petri nets are not only restricted to design modeling. Severalmethods have been developed for qualitative analysis of Petri net models. These methods[28, 29, 37, 26, 27, 30, 31, 1] allow for qualitative analysis of properties such as deadlockfreedom, boundedness, safeness, liveness, reversibility and so on [42, 37, 87, 39]. This work extends Barros’s approach by considering the use of timed Petri nets as aformal intermediate format [61, 59] and takes into account a multi-processor architecture.In this work, the use of timed Petri nets as an intermediate format is mainly for computingmetrics such as execution time, load balance [72], communication cost [70, 69], mutualexclusion degree [73] and area estimates [75]. These metrics guide the hardware/softwarepartitioning. The next section is an introduction to the PISH codesign methodology. Section 3 in-troduces the description language adopted. Section 4 describes the hardware/softwarepartitioning approach. Section 5 is an introduction to Petri nets [1, 64, 56, 58]. Section 6presents the occam-time Petri net translation method [64, 59, 61, 62, 40]. Section 7 de-scribes the approach adopted for performing qualitative analysis. A method for computingcritical path time, minimal time and the likely time based on structural Petri net methodsis presented in Section 8. Section 9 presents the extended model and an algorithm for esti-mating the number of processors needed. Section 10 describes the delay estimation methodadopted in this work. Sections 11, 12, 13 and 14 describe methods for computing com-munication cost, load balance, mutual exclusion degree and area estimates, respectively.Section 15 presents an example and finally some conclusions and perspectives for futureworks follow.2. The PISH Co-Design Methodology: An OverviewThe PISH co-design methodology being developed by a research group at Universidadede Pernambuco, uses occam as its specification language and comprises all phases of thedesign process, as depicted in Figure 1. The occam specification is partitioned into a setof processes, where some sets are implemented in software and others in hardware. Thepartitioning is carried out in such a way that the partitioned description is formally proved tohave the same semantics as the original one. The partitioning task is guided by the metricscomputed in the analysis phase. Processes for communication purposes are also generated
  4. 4. 246 MACIEL, BARROS, AND ROSENSTIELFigure 1. PISH design flow.in the partitioning phase. A more detailed description of the partitioning approach is givenin Section 4. After partitioning, the processes to be implemented in hardware are synthesized and thesoftware processes are compiled. The technique proposed for software compilation is alsoformally verified [93]. For hardware synthesis, a set of commercial tools has been used. Firstsystem prototype has been generated by using two distinct prototyping environments: TheHARP board developed by Sundance and a transputer plus FPGAs boards, both connectedthrough a PC bus. Once the system is validated, either the hardware components or thewhole system can be implemented as an ASIC circuit. For this purpose, layout synthesistechniques for Sea-of-gates technology have been used [94].3. A Language for Specifying Communicating ProcessesThe goal of this section is to present the language which is used both to describe the appli-cations and to reason about the partitioning process itself. This language is a representativesubset of occam, defined here by the BNF-style syntax definition showed below, where[clause] has the usual meaning of an optional item. The optional argument rep stands for a replicator. A detailed description of these con-structors can be found in [62]. This subset of occam has been extended to consider twonew constructors: BOX and CON. The syntax of these constructors is BOX P and CONP, where P is a process. These constructors have no semantic effect. They are just anno-tations, useful for the classification and the clustering phases. A process included withina BOX constructor is not split and its cost is analyzed as a whole at the clustering phase.The constructor CON is an annotation for a controlling process. Those processes act as aninterface between the processes.
  5. 5. A PETRI NET MODEL 247 The occam subset is described in BNF format: P: : = SKIP STOP x: = e nop, deadlock and assignment ch?x ch!e input and output IF(c1 p1 , . . . , cn pn ) ALT(g1 p1 , . . . , gn pn ) conditional and non-deterministic conditional SEQ( p1 , . . . , pn ) PAR( p1 , . . . , pn ) sequential and parallel combiners WHILE(c P) while loop VAR x: P variable declaration CHAN ch: P channel declaration Occam obeys a set of algebraic laws [62] which defines its semantics. For example, thelaws PAR( p1 , p2 ) = PAR( p2 , p1 ) and SEQ( p1 , p2 , . . . , pn ) = SEQ( p1 , SEQ( p2 , . . . , pn ))define the symmetry of the PAR constructor and the associativity of the SEQ constructor,respectively.4. The Hardware/Software PartitioningDue to the computational complexity of optimum solution strategies, a need has arisen fora simplified, suboptimum approach to the partitioning system. In [20, 14, 18, 10, 11] some heuristic algorithms to the partitioning problem are presented.Recently, some works [12, 13] have suggested the use of formal methods for the partitioningprocess. However, although these approaches use formal methods to hardware/softwarepartitioning, neither of them includes a formal verification that the partitioning preservesthe semantics of the original description. The work reported in [16] presents some initial ideas towards a partitioning approachwhose emphasis is correctness. This was the basis for the partitioning strategy presentedhere. As mentioned, the proposed approach uses occam [62] as a description languageand suggests that the partitioning of an occam program should be performed by applying aseries of algebraic transformations into the program. The main reason to use occam is that,being based on CSP [60], occam has a simple and a elegant semantics, given in terms ofalgebraic laws. In [63] the work is further developed, and a complete formalization of oneof the partitioning phases, the splitting, is presented. The main idea of the partitioning strategy is to consider the hardware/software partition-ing problem as a program transformation task, in all its phases. To accomplish this, anextended strategy to the work proposed in [16] has been developed, adding new algebraictransformation rules to deal with some more complex occam constructors such as replica-tors. Moreover, the proposed method is based on clustering and takes into account multiplesoftware components. The partitioning method uses Petri nets as a common formalism [41] which allows for aquantitative analysis in order to perform the partitioning, as well as the qualitative analysisof the systems so that it is possible to detect errors in an early phase of the design [33].
  6. 6. 248 MACIEL, BARROS, AND ROSENSTIELFigure 2. Task diagram.4.1. The Partitioning Approach: An OverviewThis section presents an overview of our proposed partitioning approach. The partitioningmethod is based on clustering techniques and formal transformations; and it is guided bythe results of the quantitative analysis phase [61, 69, 70, 72, 73, 75]. The task-flow of thepartitioning approach can be seen in Figure 2. This work extends the method presentedin [15] by allowing to take into account a more generic target architecture. This mustbe defined by the designer by using the architecture generator, a graphical interface forspecifying the number of software components, their interconnection as well as the memoryorganisation. The number and architecture of each hardware component will be definedduring the partitioning phase. The first step in the partitioning approach is the splitting phase. The original descriptionis transformed into a set of concurrent simple processes. This transformation is carried outby applying a set of formal rewriting rules which assures that the semantics is preserved[63]. Due to the concurrent nature of the process, communication must be introduced
  7. 7. A PETRI NET MODEL 249among sequential data dependent processes. The split of the original description in simpleprocesses allows a better exploration of the design space in order to find the best partitioning. In the classification phase, a set of implementation alternatives is generated. The setof implementation alternatives is represented by a set of class values concerning somefeatures of the program, such as degree of parallelism and pipeline implementation. Thechoice of some implementation alternative as the current one can be made either manuallyor automatically. When choosing automatically, the alternatives lead to a balanced degreeof parallelism among the various statements and the minimization of the area-delay costfunction. Next, the occam/timed Petri net translation takes place. In this phase the split program istranslated into a timed Petri net. In the qualitative analysis phase, a cooperative use of reduction rules, matrix algebraapproach and reachability/coverability methods is applied to the Petri net model in orderto analyze the properties of the description. This is an important aspect of the approach,because it allows for errors to be detected in an early phase of the design and, of course,decreases the design costs [33]. After that, the cost analysis takes place in order to find a partition of the set of processes,which fulfills the design constraints. In this work, the quantitative analysis is carried outby taking into account the timed Petri net representation of the design description, which isobtained by a occam/timed Petri net translation tool. Using a powerful and formal modelsuch as Petri nets as intermediate format allows for the formal computation of metricsand for performing a more accurate cost estimation. Additionally, it makes the metricscomputation independent of a particular specification language. In the quantitative analysis,a set of methods for performance, area, load balance and mutual exclusion estimation havebeen developed in the context of this work. The results of this analysis are used to reasonabout alternatives for implementing each process. After this, the initial allocation takes place. The term initial allocation is used to describethe initial assignment of one process to each processor. The criteria used to performthe initial allocation is the minimization of the inter-processor communication, balancingof the workload and the mutual exclusion degree among processes. One of the mainproblems in systems with multiple processors is the degradation in throughput causedby saturation effects [68, 66]. In distributed and multiple processor systems [65, 67], itis expected that doubling the number of processors would double the throughput of thesystem. However, experience has showed that throughput increases significantly only forthe first few additional processors. Actually, at some point, throughput begins to decreasewith each additional processor. This decrease is mainly caused by excessive inter-processorcommunication. The initial allocation method is based on a clustering algorithm. In the clustering phase, the processes are grouped in clusters considering one implemen-tation alternative for each process. The result of the clustering process is an occam descrip-tion representing the obtained clustering sequence with additional information indicatingwhether processes kept in the same cluster should be serialized or not. The serialization leadsto the elimination of the unnecessary communication introduced during the splitting phase. A correct partitioned description is obtained after the joining phase, where transformationrules are applied again in order to combine processes kept in the same cluster and to eliminate
  8. 8. 250 MACIEL, BARROS, AND ROSENSTIELcommunication among sequential processes. In the following each phase of the partitioningprocess is explained with more detail.4.2. The Splitting StrategyAs mentioned, the partitioning verification involves the characterization of the partitioningprocess as a program transformation task. It comprises the formalization of the splittingand joining phases already mentioned. The goal of the splitting phase is to permit a flexible analysis of the possibilities for com-bining processes in the clustering phase. During the splitting phase, the original descriptionis transformed into a set of simple processes, all of them in parallel. Since in occam PARis a commutative operator, combining processes in parallel allows an analysis of all thepermutations. The simple processes obey the normal form given below. chanch 1 , ch 2 , . . . , ch n : PAR(P1 , P2 , . . . , Pk )where each Pi , 1 < i < k is a simple process.Definition 4.1 (Simple Process). A process P is defined as a simple if it is primitive(SKIP, STOP, x: = e, ch ? x, ch ! e), or has one of the following forms: (i.) ALT(b, &gk ck : =true) (ii.) IF(bk ck : = true) (iii.) BOX Q, HBOX Q and SBOX Q, where Q is an arbitraryprocess. (iv.) IF(c Q, TRUE SKIP) where Q is primitive or a process as in (i), (ii) or (iii).(v.), where Q is simple and are communication commands, possibly combined in sequenceor in parallel. (vi.) WHILE c Q, where Q is simple. (vii.) VAR x: Q, where Q is simple.(viii.) CON Q, where Q is simple. This normal form expresses the granularity required by the clustering phase and thistransformation permits a flexible analysis of the possibilities for combining processes inthat phase. In [63] a complete formalization of the splitting phase can be found. To perform each one of the splitting and joining phases, a reduction strategy is necessary.The splitting strategy developed during the PISH project has two main steps.1. the simplification IF and ALT processes2. the parallelisation of the intermediary description generated by Step 1. The goal of Step 1 is to transform the original program into one in which all IFs and ALTsare simple processes. To accomplish this, occam laws have been applied. Two of theserules can be seen in Figure 3. The Rule 4.1.1 deals with conditionals. This rule transformsan arbitrary conditional into a sequence of IFs, with the aim to achieve the granularityrequired by the normal form. The application of this rule allows a flexible analysis of eachsub-process of a conditional in the clustering phase. Note that the role of the first IF operator on the right-hand side of the rule above is to makethe choice and to allow the subsequent conditionals to be carried out in sequence. This iswhy the fresh variables are necessary. Otherwise, execution of one conditional can interfere
  9. 9. A PETRI NET MODEL 251Figure 3. Two splitting rules.in the condition of a subsequent one. After Step 1, all IFs and ALTs processes are simple,but not necessarily PAR, SEQ and WHILE processes. To be simple, the sub-process of aconditional is either a primitive process or can include only ALT, IF and BOX processes.Rule 4.1.2 distributes IF over SEQ and guarantees that no IF will include a SEQ operatoras its internal process.Figure 4. Splitting rule.
  10. 10. 252 MACIEL, BARROS, AND ROSENSTIEL The goal of Step 2 is to transform the intermediate description generated by Step 1 intoa simple form stated by the definition given above where all processes are kept in parallel.This can be carried out by using four additional rules. Rule 2 is an example of these rules,which puts in parallel two original sequential processes. In this rule, z is a list of local variables of S E Q(P1 , P2 ), x1 is a list of variables used andassigned in Pi and xi is a list of variables assigned in Pi . Assigned variables must be passedbecause one of Pi may have a conditional command including an assignment that may ornot be executed. The process annotated with the operator CON is a controlling process, and this operatorhas no semantic effect. It is useful for the clustering phase. This process acts as aninterface between the processes under its control and the rest of the program. Observethat the introduction of local variables and communication is essential, because processescan originally share variables (the occam language does not allow variable sharing betweenparallel processes). Obviously, there are other possible forms of introducing communicationthan that expressed in Rule 2. This strategy, however, allows for having control of theintroduced communication, which makes the joining phase easier. Moreover, although it may seem expensive to introduce communication into the system,this is really necessary to allow a complete parallelisation of simple processes. The joiningphase must be able to eliminate unnecessary communication between final processes. Afterthe Step 2, the original program has been transformed into a program obeying the normalform.4.3. The Partitioning AlgorithmThe technique for hardware/software process partitioning is based on the approach proposedby Barros [14, 15], which includes a classification phase followed by clustering of processes.These phases will be explained with more detail in this section. The considered versionof such an approach did not cope with hierarchy in the initial specification, and only fixedsize loops had been taken into account in the cost analysis. Additionally, the underlyingarchitecture template considered only one software component (i.e. only one microprocessoror microcontroller), which limits the design space exploration. This work addresses someof these lacks by using Petri nets as an intermediate format, which support abstractionconcepts and provide a framework for an accurate estimation of communication, area anddelay cost, as well as load balance and mutual exclusion between processes. Additionally,the partitioning can taken into account more complexes architectures, which can be specifiedby the designer through a graphical environment. The occam/Petri net translation methodand the techniques for a quantitative analysis will be later explained, respectively.4.3.1. ClassificationIn this phase a set of implementation possibilities represented as values of predefinedattributes is attached to each process. The attributes were defined in order to capture thekind of communication performed by the process, the degree of parallelism inside the
  11. 11. A PETRI NET MODEL 253Figure 5. Classification and clustering phases.process (PAR and REPPAR constructors), whether the assignments in the process couldbe performed in pipeline (in the case of REPSEQ constructor) and the multiplicity of eachprocess. As an example, Figure 5a illustrates some implementation alternatives for theprocess, which has a REPPAR constructor. Concerning distinct degrees of parallelism, allassignments inside this process can be implemented completely parallel, partially parallelor completely sequential. Although a set of implementation possibilities is attached to each process, only one istaken into account during the clustering phase and the choice can be done automaticallyor manually. In the case of an automatic choice, a balance of the parallelism degree of allimplementations is the main goal.4.3.2. The Clustering AlgorithmOnce a current alternative of implementation for each process has been chosen, the clusteringprocess takes place. The first step is the choice of some process to be implemented in thesoftware components. This phase is called initial allocation [74] and may be controlled bythe user or may be automatically guided by the minimization of a cost function. Takinginto account the Petri net model, a set of metrics is computed and used for allocating
  12. 12. 254 MACIEL, BARROS, AND ROSENSTIELone process to each available software component. The allocation is a very critical task,since the partitioning of processes in software or hardware clusters depends on this initialallocation. The developed allocation method is also based on clustering techniques andgroups processes in clusters by considering criteria such as communication cost [69, 70],functional similarity, mutual exclusion degree [73] and load balancing [72]. The numberof resulting clusters is equal to the number of the software components in the architecture.From each cluster obtained, one process is chosen to be implemented in each softwarecomponent. In order to implementing this strategy, techniques for calculating the degree ofmutual exclusion between processes, the work load of processors, the communication costand the functional similarity of processes have been defined and implemented. The partitioning of processes has been performed using a multi-stage hierarchical clus-tering algorithm [74, 15]. First a clustering tree is built as depicted in Figure 5.b. Thisis performed by considering criteria like similarity among processes, communication cost,mutual exclusion degree and resource sharing. In order to measure the similarity betweenprocesses, a metric has been defined, which allows for quantifying the similarity betweentwo processes by analyzing the communication cost, the degree of parallelism of their cur-rent implementation, the possibility of implementing both processes in pipeline and themultiplicity of their assignments [15]. The cluster set is generated by cutting the clusteringtree at the level (see Figure 5.b), which minimizes a cost function. This cost functionconsiders the communication cost as well as area and delay estimations [75, 72, 61, 59, 15]. Below a basic clustering algorithm is described. First, a clustering tree is built based ona distance matrix. This matrix provides the distance between each two objects: p1 p2 p3 p4 0 d1 d2 d3 p1 0 d4 d5 p2 D = 0 d6 p3 0 p4Algorithm1. Begin with a disjoint clustering having level L(0) = 0 and sequence number m = 0.2. Find the least dissimilar pair r , s of clusters in the current clustering according to d(r, s) = min{d(oi , o j )}.3. Increment the sequence number (m ← m + 1). Merge the clusters r and s into a simple one in order to form the next clustering m. Set the level of this clustering to L(m) = d(r, s).4. Update the distance matrix by deleting the row and the column corresponding to the clusters. The distance between the new cluster and an old cluster k may be de- fined as: d(k, (r, s)) = Max{d(k, r ), d(k, s)}, d(k, (r, s)) = Min{d(k, r ), d(k, s)} or d(k, (r, s)) = Average{d(k, r ), d(k, s)}.5. Whether all objects are in one cluster, stop. Otherwise, go to the step 2.
  13. 13. A PETRI NET MODEL 255 After the construction of the clustering tree, this tree is cut by a cut-line. The cut-linemakes clusters according to a cut-function. In order to allow the formal generation of a partitioned occam description in the joiningphase, the clustering output is an occam description with annotations, which reflects thestructure of the clustering tree, indicating whether resources should be shared or not.4.4. The Joining StrategyBased on the result of the clustering and on the description obtained in the splitting, thejoining phase generates a final description of the partitioned program in the form: chan: ch 1 , ch 2 , . . . , ch n : PAR(S1 , . . . , Ss , H1 , . . . , Hr )where each Si and Hj are the generated clusters. Each of these is either one simple process(Pk ) of the split phase of a combination of some (Pk )’s. Also, none of the (Pk )’s is left out:each one is included in exactly one cluster. In this way, the precise mathematical notion ofpartition is captured. The Si , by convention, stands for the a software process and each Hjfor a hardware process. Like the splitting phase, the joining phase should be designed as a set of transforma-tion rules. Some of these rules are simply the inverses of the ones used in the splittingphase, but new rules for eliminating unnecessary communication between processes of asame cluster have been implemented [63]. The goal of the joining strategy is to elim-inate irrelevant communication in the same cluster and after joining branches of condi-tionals and ALT processes as well as reducing sequences of parallel and sequential pro-cesses. The joining phase is inherently more difficult than splitting. The combination of simpleprocesses results in a process which is not simple anymore. Therefore, no obvious normalform is obtained to characterize the result of the joining in general. Also, one must takea great care for not introducing deadlock when serializing processes in a given cluster.Also, in this phase some sequential processes are carried out in parallel, by applying sometransformation rules.5. A Brief Introduction to Petri NetsPetri nets are a powerful family of formal description techniques with a graphic and mathe-matical representation, and have powerful methods which allow for performing qualitativeand quantitative analysis [37, 28, 29, 64]. This section presents a brief introduction to Place/Transitions nets and Timed Petri nets.5.1. Place/Transition NetsPlace/Transition Nets are bipartite graphs represented by two types of vertices called places(circles) and transitions (rectangles), interconnected by directed arcs (see Figure 6).
  14. 14. 256 MACIEL, BARROS, AND ROSENSTIELFigure 6. Simple Petri net. Place/Transition nets can be defined in terms of matrix as follow:Definition 5.1 (Place Transition Net). is defined as 5-tuple N = (P, T, I, O, M0 ), whereP is a finite set of places which represents the state variables, T is a set of transitions whichrepresents the set of actions, I : P × T → N is the input matrix which represents thepre-conditions, O: P × T → N is the output matrix which represents the post-conditionsand M0 : P → N is the initial marking which represents the initial state.Definition 5.2 (Firing Rule). One transition t j is enabled to fire if, and only if, all its inputplaces ( p ∈ P) has M( p) ≥ I ( p, t j ). The transition firing changes the marking of the net(M0 [t j > M ). The new marking is obtained as follows: M ( p) = M0 ( p) − I ( p, t j ) +O( p, t j ), ∀ p ∈ P The execution of actions is represented by the transition firing. However there are twoparticular cases where the firing rule is different: the first case is represented by sourcetransitions. One source transition does not have any input places. This transition is alwaysenabled. In the second case the transition does not have any output place. This transitionis called of sink transition. Its firing does not create any token. Using the matrix representation, the structure of the net is represented by a set of places,a set of transitions, an input matrix (pre-conditions) and an output matrix (post-conditions).When one transition t fires, the difference between the markings is represented by O( p, t)−I ( p, t), ∀ p ∈ P. The matrix C = O − I is called incidence matrix. This matrix representsthe structure of the net and if the net does not have self-loop.Definition 5.3 (Incidence Matrix). Let a net N = (P, T, I, O). The incidence matrix rep-resents the relation C: P ×T → Z , ∀ p ∈ P defined by: C( p, t) = O( p, t)− I ( p, t), ∀ p ∈P. A net that has self-loop may be represented by the incidence matrix if the self-loop isrefined using dummy pair [37]. The state equation describes the behavior of the system, as well as allows for the analysisof properties of the models. Using matrix representation, the transition t j is represented by
  15. 15. A PETRI NET MODEL 257a vector s, where the components of the vector are zero except for the j − th component,which is one. So it is possible to represent a marking either as M ( p) = M0 ( p) − I ( p, t j ) + O( p, t j )or as M ( p) = M0 ( p) − I.s(t j ) + O.s(t j ) = M ( p) = M0 ( p) − C.s(t j )T , ∀ p ∈ P.Applying the sequence sq = t0 , . . . , tk , . . . , t0 , . . . , tn of transitions to the equation above,the following equation is obtained M ( p) = M0 ( p) + C.¯ , swhere s = s(t0 )T , s(t1 )T , . . . , s(tn )T is called Parikh vector. ¯ The Place/Transition nets can be divided into several classes [28]. Each class has distinctmodeling power. The use of subclasses improves the decision power of the models, althoughwithout excessively decreasing the modeling power.5.2. AnalysisThe methods used to analyze Petri nets may be divided into three classes. The first methodis graph-based and it builds on the reachability graph (reachability tree). The reachabilitygraph is initial marking dependent and so it is used to analyse behavioral properties [39].The main problem in the use of reachability tree is the high computational complexity [43,44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55] even if some interesting techniques areused such as: reduced graphs, graph symmetries, symbolic graph etc [35, 36]. The secondmethod is based on the state equation [34, 37, 84, 86]. The main advantage of this methodover the reachability graph is the existence of simple linear algebraic equations that aidin determining properties of the nets. Nevertheless, it gives only necessary or sufficientcondition to the analysis of properties when it is applied to general Petri nets. The thirdmethod is based on reduction laws [37, 38]. The reduction laws based method provides aset of transformation rules which reduces the size of the models, preserving the properties.However, it is possible that for a given system and some set of rules, the reduction can notbe complete. From a pragmatic point of view, it is fair to suggest that a better, more efficient and morecomprehensive analysis can be done by a cooperative use of these techniques. Nevertheless,necessary and sufficient conditions can be obtained by applying the matrix algebra for somesubclasses of Petri nets [33].
  16. 16. 258 MACIEL, BARROS, AND ROSENSTIEL5.3. Timed Petri NetsSo far, Petri nets have been used to model a logical point of view of the systems; noformal attention has been given to temporal relations and constraints [71, 78, 76]. Thefirst temporal approach was proposed by Ramchandani [57]. Today, there are at least threeapproaches which associate time to the nets: stochastic, firing times specified by intervalsand deterministic. In the stochastic nets a probability distribution to the firing time isassigned to the model [80, 79]. Time can be associated with places (Place-Timed Models) [83], tokens (Token-TimedModels) [82] and transitions (Transition-Timed Petri Models). The approach proposed in[71] associates to each transition a time interval (dmin , dmax ) (Transition-Time Petri Nets). In deterministic timed nets the transitions firing can also be represented by policies: thefirst one is the three phase policy firing semantics and the second model is the atomic firingsemantics approach. In the deterministic timed net with three phase policy semantics, thetime information represents the duration of the transition’s firing [57]. The deterministictimed net with atomic firing may be represented by the approach proposed at [71] wheretime interval bounds are the same (di , di ) (Transition-Time Petri Nets). This section deals with the Timed Petri Nets, or rather, Petri net extensions in which thetime information is expressed by duration (deterministic timed net with three phase policyfiring semantics) and it is associated to the transitions.Definition 5.4 (Timed Petri Nets). Let a pair N t = (N , D) be a timed Petri net, whereN = (P, T, I, O, M0 ) is a Petri net, D: T → R+ ∪ 0, is a vector which associates to eachtransition ti the duration of the firing di . A state in a Timed Petri Net is defined as 3-tuple of functions, one of which is the markingof the places; the second is the distribution in firing transitions and the last is the remainingfiring time [77], for instance, if the number of tokens in a firing transition ti is mt (ti ) = k,then the remaining firing time is represented by a vector RT (t) = (r t (t)1 , . . . , r t (t)k ).More formally:Definition 5.5 (State of Timed Petri Net). Let N t = (N , D) a Timed Petri Net, a stateS of N t is defined by a 3-tuple S = (M, M T, RT ), where M: P → N is the marking,M T : T → N is the distribution of tokens in the firing transitions and RT : K → R+ is the Tremaining firing time function which assigns the remaining firing time to each independentfiring of a transition for each transition that mt (t) = 0. K is the number of tokens in afiring transition ti (mt (ti ) = K ). RT is undefined for the transitions ti which mt (ti ) =0. A transition ti obtaining concession at a time x is obliged to fire at the time, if there is noconflict.Definition 5.6 (Enabling Rule). Let N t = (N , D) be a Timed Petri Net, and S =(M, M T, RT ) the state of N t. We say that a bag of transitions BT is enabled at the instant
  17. 17. A PETRI NET MODEL 259x if, and only if, M( p) ≥ ∀ti ∈BT #BT (ti ) × I ( p, ti ), ∀ p ∈ P, where BT (ti ) ⊆ BT and#BT (ti ) ∈ N. If a bag of transitions BT is enabled at the instant x, thus at that instant it removes fromthe input places of the bag BT the corresponding number of tokens (#BT (ti ) × I ( p, ti ))and, at the time x + di (di is the duration related to the transition ti in the bag BT ), adds therespective number of tokens in the output place. The number of tokens stored in the outputplace is equal to the product of the output arc weight (I ( p, t)) by the module of the bagregarding each transition t which has the duration equal to di (#BT (t), BT (t) ⊆ BT ) plusthe number of tokens already “inside” the firing transitions, such that their duration haveelapsed, during the present bag firing.Definition 5.7 (Firing Rule). Let N t = (N , D) be a Timed Petri Net, and S =(M, M T, RT ) the state of N t. If a bag of transitions BT is enabled at the instantx, then at the instant x ∀ti ∈BT #BT (ti ) × I ( p, ti ), ∀ p ∈ P number of tokens is re-moved from the input places of the bag BT . At the time x + d, the reached marking isM = M − ∀ti ∈BT #BT (ti )× I ( p, ti )+ ∀ti ∈BT |di =d #BT (ti )× O( p, ti )+ ∀tj ∈T,M T (tj )>0M(t j ) × O( p, t j ), ∀ p ∈ P, if no other transition t ∈ T has been fired in the interval(x, x + di ). The inclusion of time in the Petri net models allows for performance analysis of systems.Section 8 introduces performance analysis in a Petri net context paying special attention tostructural approaches of deterministic models.6. The Occam—Petri Net Translation MethodThe development of a Petri net model of occam opens a wide range of applications basedon qualitative analysis methods. To analyze performance aspects one requires a timed netmodel that represents the occam language. In our context, an occam program is a behavioral description which has to be implementedcombining software and hardware components. Thus, the timing constraints applied toeach operations of the description is already known, taking into account either hardware orsoftware implementation. This section presents a timed Petri net model of occam language. This model allowsfor the execution time of the activities to be computed using the methods described inSections 10 and 8. The occam-Petri net translation method [61] receives an occam description and translatesit into a timed Petri net model. The occam programming language is derived from CSP[60], and allows the specification of parallel algorithms as a set of concurrent processes.An occam program is constructed by using primitive processes and process combiners asdescribed in Section 3. The simplest occam processes are the assignment, the input action, the output action, theskip process and the stop process. Herewith, due to space restriction, only one primitive
  18. 18. 260 MACIEL, BARROS, AND ROSENSTIELFigure 7. Communication.process as well as one combiner will be described. A more detailed description can befound in [59, 61, 64].6.1. Primitive Communication Process: Input and OutputOccam processes can send and receive messages synchronously through channels by usinginput (?) and output (!) operations. When a process receives a value through a channel,this value is assigned to a variable. Figure 7.b gives a net representing the input and the output primitive processes of theexample in Figure 7.a. The synchronous communication is correctly represented by thenet. It should be observed that the communication action is represented by the transitiont0 and is only fired when both the sender and the receiver processes are ready, which arerepresented by tokens in the places p0 and p1 . When a value is sent by an output primitiveprocess through ch 1 , it is received and assigned to the variable x, being represented in thenet by the data part of the model. Observe that the transition t1 can only be fired when theplaces p2 and p3 have tokens. After that, both processes become enabled to execute thenext actions.
  19. 19. A PETRI NET MODEL 261Figure 8. Parallel.6.2. ParallelismThe combiner Par is one of the most powerful of the occam language. It permits concurrentprocess execution. The combined processes are executed simultaneously and only finishwhen every one has finished. Figure 8.a shows a program containing two processes t1 and t2 . Figure 8.b shows a netthat represents the control of this program. One token in the place p0 enables the transition t0 . Firing this transition, the tokensare removed from the input place ( p0 ) and one token is stored in the output places ( p1and p2 ). This marking enables the concurrent firings of the transitions t1 and t2 . Afterthe firing of these transitions, one token is stored in the places p3 and p4 , respectively.This marking allows the firing of the transition t3 , which represents the end of the parallelexecution.7. Qualitative AnalysisThis section depicts the proposed method to perform qualitative analysis. The quantitativeanalysis of behavioral descriptions is described in following sections. Concurrent systems are usually difficult to manage and understand. Therefore, misun-derstanding and mistakes are frequent during the design cycle [87]. Therefore, a need arises to decrease the cost and time-to-market of the products. Thus,it is fair to suggest the formalization of properties of the systems so that it is possible todetect errors in an early phase of the design.
  20. 20. 262 MACIEL, BARROS, AND ROSENSTIEL There are so many properties which could be analyzed in a Petri net model [37]. Amongthese properties, we have to highlight some very important ones in a control system context:boundedness, safeness, liveness and reversibility. Boundedness and safeness imply theabsence of capacity overflow. For instance, there may be buffers and queues, representedby places, with finite capacity. If a place is bounded, it means that there will be no overflow.One place bounded to n means that the number of tokens that will be stored in it is at most n.Safeness is a special case of boundness: n equal to one. Liveness is related to the deadlockabsence. Actually, if a system is deadlock free, it does not mean that the system is live,although if a system is live, it is deadlock free. This property guarantees that a system cansuccessfully produce. One deadlock free system may also be not live, which is the casewhen a model does not have any dead state, but there exists at least one activity which isnever executed. Liveness is a fundamental property in many real systems and also verycomplex to analyze in a broad sense. Therefore, many authors have been divided livenessin terms of levels in such a way to make it easy to analyze. Another very important propertyis reversibility. If a system is reversible, it means that this system has a cyclic behavior andthat it can be initialized from any reachable state. The analysis of large dimensions nets is not a trivial problem, therefore in the prag-matic point of view, a cooperative use of reduction rules, structural analysis and reachabil-ity/coverability methods is important. The first step of the adopted analysis approach is the application of the ‘closure’ [32] by theintroduction of a fictitious transition t, whose the time duration is 0. The second step shouldbe the application of transformation rules which preserves properties like boundedness andliveness of the models. Such rules should be applied in order to reduce the model dimension.After that, the matrix algebra and the reachability methods can be applied in order to obtainthe behavioral and the structural properties related to the model. Below, the sequence ofsteps adopted to carry out the qualitative analysis is depicted.• Application of a ‘closure’ to the net by the introduction of the transition t,• Application of reduction rules to net,• Application of structural methods and• Application of reachability/coverability methods.8. Performance AnalysisThe calculation of execution time is a very important issue in hardware/software codesign. Itis necessary for performance optimization and for validating timing constraints [3]. Formalperformance analysis methods provide upper and/or lower (worst/best) bounds instead of asingle value. The execution time of a given program is data dependent and takes into account theactual instruction trace which is executed. Branch and loop control flow constructs resultin an exponential number of possible execution paths. Thus, specific statistics must becollected by considering a sample data set in order to determine the actual branch and
  21. 21. A PETRI NET MODEL 263loop counts. This approach, however, can be used only in probabilistic analysis. In somelimited applications it is possible to determine the conditionals and loop iteration counts byapplying symbolic data flow analysis. Another important aspect is the cost of formal performance analysis. Performance anal-ysis does not intend to replace simulation; instead, besides a model for simulation, anaccurate model for performance analysis must be provided. Worst/best case timing analysis may be carried out by considering path analysis techniques[19, 6]. In worst/best case analysis, it has been assumed the worst/best case choices for eachbranch and loop. An alternative method allows the designer to provide simple executionnumber of certain statements. It helps to specify the total execution number of iteration ofnested loops. Methods based on Max-Plus algebra have also been applied for performanceevaluation, but for mean case performance it suffers of a great complexity and only worksfor some special classes of problems [81]. This section presents one technique for static performance analysis of behavioral descrip-tions based on structural Petri net methods. Static analysis is referred to as methods whichuse information collected at compilation time and may also include information collectedin profiling runs [90]. Real time systems can be classified either as a hard real time system or a soft real timesystem. Hard real time systems cannot tolerate any missed timing deadlines. Thus, theperformance metric of interest is the extreme case, typically the worst case, neverthelessthe best case may also be of interest. In a soft real time system, a occasional missing timingdeadline is tolerable. In this case, a probabilistic performance [80, 79, 89] measure thatguarantees a high probability of meeting the time constraints suffices. The inclusion of time in the Petri net model allows for performance analysis of systems[85]. Max-Plus algebra has been applied to Petri net models for system analysis [92], butsuch approaches can only be considered for very restricted Petri nets subclasses [5]. Onevery well known method for computing cycle time in Petri net is based on timed reachabilitygraph with path-finder algorithm. However the computation of the timed reachability graphis very complex and for unbounded systems is infinite. The pragmatic point of view suggestsexploring the structure (to transform or to model the systems in terms of Petri nets subclasses)of the nets and to apply linear programming methods [86]. Structural methods eliminate the derivation of state space, thus they avoid the “stateexplosion” problem of reachability based methods; nevertheless they cannot provide asmuch information as the reachability approaches do. However, some performance measuressuch as minimal time of the critical-path can be obtained by applying structural methods.Another factor which has influenced us to apply structural based methods is that every othermetric used in our proposed method to perform the initial allocation and the partitioning iscomputed by using structural methods, or rather, the communication cost and the mutualexclusion degree computation algorithms are based on t-invariants and p-invariants methods. Firstly, this section presents an algorithm to compute cycle time for a specific deterministictimed Petri net subclass called Marked Graph [1]. After that, an extended approach ispresented in order to compute extreme and average cases of behavioral description. First of all, it is important to define two concepts: direct circuit and direct elementarycircuits. A direct circuit is a directed path from one vertex (place or transition) back to
  22. 22. 264 MACIEL, BARROS, AND ROSENSTIELFigure 9. Timed marked graph.itself. A directed elementary circuit is a directed circuit in which no vertices appear morethan once.THEOREM 8.1 For a strongly connected Marked Graph N , M( pi ) in any directed circuitremains the same after any firing sequence s.THEOREM 8.2 For a marked graph N , the minimum cycle time is given by Dm =maxk {Tk /Nk }, for every circuit k in the net. Where Tk is the sum of every transitiondelays in a circuit k and Nk = M( pi ) in a circuit. To compute all directed circuit in the net, first we have to obtain all minimum p-invariants,then use them to compute the direct circuits. Figure 9 shows a marked graph, adopted from [1], in which we apply the method describedpreviously. The reader should observe that each place has only one input and outputtransition. Applying very well known methods to compute minimum p-invariants [37, 84], thesesupports are obtained: sp1 = { p0 , p2 , p4 , p6 }, sp2 = { p0 , p3 , p5 , p6 }, sp3 = { p1 , p2 , p4 }and sp4 = { p1 , p3 , p5 }. After obtaining the minimum p-invariants, it is easy to computethe directed circuits: c1 = { p0 , t1 , p2 , t2 , p4 , t4 , p6 , t5 }, c2 = { p0 , t1 , p3 , t3 , p5 , t4 , p6 , t5 },c3 = { p1 , t1 , p2 , t2 , p4 , t4 } and c4 = { p1 , t1 , p3 , t3 , p5 , t4 }. The cycle time associatedto each circuit is: Dm (N ) = max{T (1), T (2), T (3), T (4), T (5), T (6)}. Therefore, theminimum cycle time is the Dm = maxk {12, 16, 10, 12} = 16. If instead of using a Petri net model in which the delay related to each operation is attachedto the transition, a Petri net model was adopted in which each transition has a time interval(tmin , tmax ) attached to it (see Section 5.3), we may compute the lower bound of the cycletime. Note that if the delay related to each transition is replaced by the lower bound (tmin )the value obtained means the lower bound of the cycle time [87].
  23. 23. A PETRI NET MODEL 265 Herewith a structural approach is presented to computing extreme and average cases ofbehavioral description. The algorithm presented computes the minimal execution time,the minimal critical-path time and likely minimal time related to branch probabilities inpartially repetitive or consistent strongly connected nets covered by p-invariants. Thisapproach is based on the method presented previously, however it is not restricted only tomarked graphs. First, we present some definitions and theorems in Petri net theory which are importantfor the proposed approach [37]. A Petri net N is said to be repetitive if there is a marking and a firing sequence from thismarking such that every transition occurs infinitely often. More formally:Definition 8.1 (Repetitive net). Let N = (R; M0 ) be a marked Petri net and firing sequences. N is said to be repetitive if there exists a sequence s such that M0 [s > Mi every transitionti ∈ T fires infinitely often in s.THEOREM 8.3 A Petri net N is repetitive if, and only if, there exists a vector X of positiveintegers such that C · X ≥ 0, X = 0. A Petri net is said to be consistent if there is a marking and a firing sequence from thismarking back to the same marking such that every transition occurs at least once. Moreformally:Definition 8.2 (Consistent net). Let N = (R; M0 ) be a market Petri net and firing sequences. N is said to be consistent if there exists a sequence s such that M0 [s > M0 every transitionti ∈ T fires at least once in s.THEOREM 8.4 A Petri net N is consistent if, and only if, there exists a vector X of positiveintegers such that C · X = 0, X = 0. The proofs of such theorems can be found in [37]. The method presented previously is based on the computation of the direct circuit by usingthe p-minimum invariants. In other words, first the p-minimum invariants are computed,then, based on these results, the direct circuits are computed (sub-nets). The cycle timerelated to a marked graph is the maximum delay considering each circuit of the net. This method cannot be applied to nets with choices because the circuits which are com-puted by using the minimum p-invariants will provide sub-nets such that the simple summa-tion of the delays, attached to each transition, gives a number representing all those branchesof the choice. Another aspect which has to be considered is that concurrent descriptionswith choice may lead to a deadlock. Thus, we have to avoid these paths in order to computethe minimal execution time of the model. In this method we use the p-minimum invariants in order to compute the circuits of thenet and the t-minimum invariants to eliminate from the net transitions which do not appearin any t-invariant. With regard to partially consistent models, these transitions are never
  24. 24. 266 MACIEL, BARROS, AND ROSENSTIELfired. We also have to eliminate from the net every transition which belongs to a choice,in the underlined untimed model, it does not model a real choice in the timed model. Forinstance, consider a net in which we have the following output bag related to the placepk : O( pk ) = {ti , t j }. Thus, in the underlined untimed model these transitions describea choice. However, if in the timed model these transition have such time constraints: diand d j and d j > di , a conflict resolution policy is applied such that transition ti mustbe fired. After obtaining this new net (a consistent net), each sub-net obtained by eachp-minimum invariant must be analyzed. In such nets, their transitions have to belong toonly one t-minimum invariant. The sub-nets which cannot satisfy this condition have to bedecomposed into sub-nets such that their transitions should belong to only one t-invariant.The number of sub-nets obtained has to be the minimum, or rather, the number of transitionsof each sub-net in each t-invariant should be the maximum.Definition 8.3 (Shared Element). Let N = (P, T, I, O) be a net, two subnets S1 , S2 ⊂ N ,a set of places P such that ∀ pz ∈ P , pz ∈ S1 , pz ∈ S2 . The set of places P is said to be ashared element if, and only if, its places are strongly connected and ∃ pm ∈ P , ti ∈ S1 , tk ∈S2 such that ti , tk ∈ O( pm ) and ∃ pl ∈ P , t j ∈ S1 , tr ∈ S2 such that t j , tr ∈ I ( pl ).THEOREM 8.5 Let N = (P, T, I, O) be a pure strongly connected either partially con-sistent or consistent net covered by p-invariants without shared elements, S Nk ⊂ Na subnet obtained from a p-minimum invariant I pi and covered by one t-minimum in-variant I t j , S Nc ⊂ N a subnet obtained from a p-minimum invariant I pi and does notcovered by only one t-minimum invariant I t j and if ∃ pk such that #O( pk ) > 1 then l j > h i , ∀ti , t j ∈ O( pk ). The time related to the critical path of the net N is givenby C T (N ) = max{d S Nk , d S N j }, ∀S Nk , S N j ⊂ N , where S N j is obtained by a decomposi-tion of S Nc such that each S N j ⊂ S Nc is covered by only one t-minimum invariant.THEOREM 8.6 Let N = (P, T, I, O) be a pure strongly connected either partially con-sistent or consistent net covered by p-invariants without shared elements, S Nk ∈ N bea subnet obtained from a p-minimum invariant I pi and covered by one t-minimum in-variant I t j , S Nc ∈ N be a subnet obtained from a p-minimum invariant I pi and notcovered by only one t-minimum invariant I t j and if ∃ pk such that #O( pk ) > 1 then ∃l j > h i , ∀ti , t j ∈ o( pk ). Each S N j ⊂ S Nc is covered by only one t-minimum invariant,where S N j is obtained by a decomposition of S Nc . The minimal time of the net N is given byM T (N ) = max{d S Nk , d S Nc }, ∀S Nc ⊂ N such that each d S Nc = min{d S N j }, ∀S N j ⊂ S Nc . The proof of both theorems can be found in [72]. It is important to stress that the main difference among these metrics is related to theexecution time of pieces of program in which choices must be considered. The computationof the minimal time related to the whole program takes into account the choice branch whichprovides the minimum delay. Another important measure is the likely minimal time. This metric takes into ac-count probabilities of branches execution. Therefore, based on specific collected statis-
  25. 25. A PETRI NET MODEL 267tics, the designer may provide, for each branch in the description, the probability ofexecution.Definition 8.4 (Strongly Connected Component Probability Matrix). Let a net N =(P, T, I, O, D), a strongly connected component S Nk ⊂ N be a branch bi related to achoice-net. SC P M(S Nk , t j ) is the execution probability of the transition t j in the stronglyconnected component S Nk .  0 if t j ∈ / S Nk . SC P M(S Nk , t j ): N × N → pri (0 ≤ R ≤ 1) if t j ∈ S Nk , t j ∈ bi .  1 if t j ∈ S Nk , t j ∈ bi , ∀bi ∈ S Nk . /Definition 8.5 (Probability Execution Vector). Let a net N = (P, T, I, O, D) and a stronglyconnected component probability matrix SC P M related to N . P E V (t j ): N → 0 ≤ R ≤ 1,#P E V (t j ) = T . Each vector component pev(t j ) = ∀S Nk SC P M(S Nk , t j ), ∀t j ∈ T . The likely minimal time related to the net N is given by M L T = max{mlt S Nk }, ∀S Nk ,where mlt S Nk = d(t j ) × pev(t j ), ∀t j ∈ S Nk . d(t j ) is the lower bound time related to thetransition t j and S Nk is a strongly connected component. Similar results are obtained for either partially repetitive or repetitive nets covered byp-invariants. In nets with these properties, instead of computing the t-minimum invariantswe have to compute the support of the equation systems C · X ≥ 0, X = 0. The algorithm to compute the minimal time, the minimal critical path time and likelyminimal time is given in the following:• Input: a net N = (P, T, I, O, D). branch probabilities bi = { pri ; t j }, ∀bi ⊂ N and ∀t j ∈ bi .• Output: the minimal time. the minimal critical path time. the likely minimal time.• Algorithm: 1. Compute the t-minimum invariants (I t) of the net N . 2. Remove from the net N the transition tl and the places pg ∈ I (tl ) or pg ∈ O(tl ), if it1 = 0, ∀I t, such that: N = (P , T , I , O ), where P = P pg , pg ∈ I (tl ) and/or pg ∈ O(tl ) and T = T tl . 3. Compute the p-minimum invariants (I p) of the net N . 4. Compute the probability execution vector (P E V ). 5. Compute the likely execution time of each component S Nk ⊂ N (mlt S Nk = d(t j ) × pev(t j ), ∀t j ∈ S Nk ).
  26. 26. 268 MACIEL, BARROS, AND ROSENSTIELFigure 10. Example. 6. Compute the strongly connected components S Nk obtained from I pk . 7. Compute the strongly connected components S Nk obtained from I pk and covered by one t-minimum invariant I t. 8. For each strongly connected component (obtained from I pk ) not covered by a t- minimum invariant (S Nc ), do: Decompose S Nc into nets (S N j s) in such a way that a t-minimum invariant I t j covers each subnet S N j . 9. Compute the execution time of each component S Nk , S N j ⊂ S Nc ∀S Nk , S N j ∈ N. 10. Compute the critical path time C T (N ) = C T (N ) = max{d S Nk , d S N j } ∀S Nk , S N j ∈ N. 11. Compute the minimal time M T (N ) = M T (N ) = max{d S Nk , d S NC } ∀S Nk , S NC ∈ N and d S Nc = min{d S N j }, ∀S N j ⊂ S Nc . 12. Compute the likely minimal time M L T (N ) = max{mlt S Nk }, ∀S Nk ∈ N . The method presented is applied to a small example (see Figure 10). This example is aconcurrent description in which there are two choices. It is important to emphasize that oneof these leads to a deadlock. In the qualitative analysis phase, we found out that the model isstrongly connected, partially consistent, covered by semi-positive p-invariants, structurallybounded and safe.
  27. 27. A PETRI NET MODEL 269Figure 11. Transformed net. If the system is partially consistent, it means that some of its transitions either are notable to be fired or, if they were fired, they would lead to a deadlock. The first step of the algorithm is the computation of the t-minimum invariant. The supportof the t-minimum invariants are st1 = {t0 , t2 , t6 , t7 , t8 , t9 , t10 , t11 , t12 , t13 , t14 , t15 , t17 } andst2 = {t0 , t2 , t6 , t7 , t8 , t9 , t10 , t11 , t12 , t13 , t14 , t16 , t18 }. By analyzing these invariants it maybe observed that the transitions t1 , t3 , t4 , t5 do not belong to any t-minimum invariant. The second step is the elimination of the transitions which do not belong to any t-minimuminvariant and also the elimination of the places that are input and/or output of such transitions.The resulting net (obtained by this transformation) is presented in Figure 11. Considering the transitions of the branches of the choice (transitions t11 , t16 , t17 and t18 ),an execution probability of 0.5 was assigned. The following step is the computation of the p-minimum invariants of the transformednet. The support of the p-minimum invariants are: sp1 = { p0 , p1 , p5 , p8 , p16 , p18 },sp2 = { p0 , p3 , p12 , p15 , p17 , p18 , p19 , p20 }, sp3 = { p0 , p1 , p5 , p10 , p18 }, sp4 = { p0 , p2 ,p9 , p10 , p18 }, sp5 = { p0 , p2 , p8 , p9 , p16 , p18 } and sp6 = { p0 , p3 , p11 , p13 , p14 , p17 , p18 }.After that, the strongly connected components (Subneti ) are obtained from these in-variants. Each component has the following transitions as its elements: Subnet1 ={t0 , t2 , t6 , t7 , t13 , t14 }, Subnet2 = {t0 , t10 , t11 , t12 , t13 , t14 , t16 , t17 , t18 }, Subnet3 = {t0 , t2 , t6 ,
  28. 28. 270 MACIEL, BARROS, AND ROSENSTIELt13 , t14 }, Subnet4 = {t0 , t6 , t8 , t13 , t14 }, Subnet5 = {t0 , t6 , t7 , t8 , t13 , t14 } and Subnet6 ={t0 , t9 , t10 , t11 , t12 , t13 , t14 , t15 }. Then, the likely minimal time for each strongly connected component mlt (S Nk ) = d(t j ) × pev(t j ), ∀t j ∈ S Nk ): mlt (1) = 12, mlt (2) = 21, mlt (3) = 7, mlt (4) = 10,mlt (5) = 15 and mlt (6) = 8 is computed Each component which is not covered by only one t-minimum invariant is decomposedinto subnets in such a way that each of its subnets is covered by only one t-minimuminvariant. Therefore, the Subnet2 = {t0 , t10 , t11 , t12 , t13 , t14 , t16 , t17 , t18 } is decomposedinto Subnet21 = {t0 , t10 , t11 , t12 , t13 , t14 , t17 } and Subnet22 = {t0 , t10 , t12 , t13 , t14 , t16 , t18 }. After that, we compute the time related to each subnet of the whole model. The result-ing values are: T (1) = 12, T (21) = 20, T (22) = 22, T (3) = 7, T (4) = 10, T (5) = 15 andT (6) = 8. Then, we have C T (N ) = max{T (1), T (21), T (22), T (3), T (4), T (5), T (6)} =22, M T (N ) = max{T (1), min{T (21), T (22)}, T (3), T (4), T (5), T (6)} = 20 and finallyM L T (N ) = max{mlt (1), mlt (2), mlt (3), mlt (4) mlt (5), mlt (6)} = 21 For execution time computation in systems with data dependent loops, the designer has toassign the most likely number of iterations to the net. This number could be determined bya previous designer’s knowledge, therefore he/she may directly associate a number usingannotation in the net. Another possibility is by simulating the net on several sets of samplesand recording how often the various loops were executed. Such a method has been applied to several examples and compared to the results obtainedby the petri net tool INA. The results are equivalents, but INA only provides the minimaltime.9. Extending the Model for Hardware/Software CodesignThis section presents a method for estimating the necessary number of processors to executea program, taking into account the resource constraints (number of processors) providedby the designer, in order to achieve best performance, disregarding the topology of theinterconnection network. First, let us consider a model extension in order to capture the number of proces-sors of the proposed architecture. The extended model is represented by the net N =(P, T, I, O, M0 , D), which describes the program, a set of places P in which each ofits places ( p ) is a processor type adopted in the proposed architecture; the marking ofeach of these places represents the number of processors of the type p ; the input and theoutput arcs that interconnect the places of the set P to the transitions which represents thearithmetic/logical operations (ALUop ⊂ T ). In the extended model the number of conflicts in the net increases due to the competitionof operations for processors [32]. These conflicts require the use of a pre-selection policyby assigning equal probabilities to the output arcs from processors places to the enabledtransitions t j ∈ ALUop (O( p, t j ), p ∈ P ) in each reachable marking Mz . Thus, moreformally:Definition 9.1 (Extended Model). Let a net N = (P, T, I, O, M0 , D) a program model, aset of places P the processor types adopted in the architecture such that P ∩ P = ∅ and
  29. 29. A PETRI NET MODEL 271M0 ( p), p ∈ P the number of processors of the type p. Let a net Ne = (Pe , Te , Ie , Oe , M0 , eDe , f ) the extended model such that Pe = P ∪ P , Te = T, Ie ( p, t j ) = 1 and Oe ( p, t j ) =1, ∀t j ∈ ALUop , ∀ p ∈ P , otherwise Ie ( p, t j ) = I ( p, t j ) and Oe ( p, t j ) = O( p, t j ).M0 : N P∪P → N and De = D. Let Mz a reachable marking from M0 , f : Oe ( p, t j ) → 0 ≤ eR ≤ 1 a probability attached to each output arc Oe ( p, t j ) where ∀tj ∈T f (Oe ( p, t j )) = 1such that Mz [t j > and p ∈ P . In such a model, the concurrence is constrained by the number of available processors(M0 ( p), p ∈ P ) provided by the designer. The main goal of the proposed approach is toestimate the minimal number of processors that can achieve best performance taking intoaccount the upper bound of available processors already specified. Therefore, the designer,in the architecture generator, provides the number of available processors, then the executiontime (C T ) is computed by reachability based methods. The following step comprises thereduction in the number of processors (M( p) = M( p) − 1, p ∈ P ) in order to computea new execution time (C T ). If C T > C T , the necessary number of processors has beenreached. This number of processors is used in the proposed method for initial allocation. The proposed algorithm to estimate the needed number of processor is:• Input: a net Ne = (Pe , Te , Ie , Oe , M0 , De , f ). e the number of available processors (M0 ( p), ∀ p ∈ P ).• Output: the optimum number of processors Mopt ( p), p ∈ P taking into account the re- sources constraints provided by the designer. the minimal execution time (CT) regarding to Mopt ( p).• Algorithm: 1. Compute the execution time C T (Ne ) 2. C T = C T (Ne ) 3. For each place p ∈ P , do: M( p) = M( p) − 1 Compute a new execution time C T (Ne ) if C T (Ne ) ≤ C T C T = C T (Ne ) Mopt ( p) = M( p), ∀ p ∈ P else or if M( p) = 0, ∀ p ∈ P end. The number of necessary processors can also be reached by taking into account either thespeed up, the efficiency or the efficacy provided by the use of multiple processors. The gain in terms of execution time by a parallel execution of a program (Ne ) can bedefined as the ratio between the execution time of Ne taking into account only one processorand the execution time concerning the use of # pr processors to execute it.
  30. 30. 272 MACIEL, BARROS, AND ROSENSTIELDefinition 9.2 (Speed up). Let Ne be an extended model, P be a set of places representingprocessors of a given type adopted in the architecture, M( p), p ∈ P , be the numberof processors of the type p, C T (Ne , 1) the execution time of Ne carried out by onlyone processor and C T (Ne , M( p)), ∀ p ∈ P , be the execution time of Ne consideringM( p), ∀ p ∈ P , processors. S(Ne , M( p)) = C T (Ne , 1)/C T (Ne , M( p)), ∀ p ∈ P isdefined as the speed up due to M( p)), ∀ p ∈ P , processors. The speed up of S(Ne , M( p)) provided by the use of M( p), ∀ p ∈ P , processors toexecute Ne divided by the number of processors defines the efficiency. More formally:Definition 9.3 (Efficiency). Let Ne be an extended model and S(Ne , M( p)) the speedup provided by M( p), ∀ p ∈ P , processors to execute Ne . The efficiency is defined byE(Ne , M( p)) = S(Ne , M( p))/ M( p), ∀ p ∈ P . Considering the execution time C T (Ne , M( p)) as a cost measure and the efficiencyE(Ne , M( p)) as a benefit, the cost-benefit relation and its inverse can be defined as theefficacy.Definition 9.4 (Efficacy). Let Ne be an extended model, E(Ne , M( p)) the efficiency,S(Ne , M( p)) the speed up, C T (Ne , 1) the execution time of Ne carried out by only oneprocessor and C T (Ne , M( p)), ∀ p ∈ P the execution time of Ne carried out by M( p), ∀ p ∈P , processors. E A(Ne , M( p)) = CE(Nee,M( p)) × C T (Ne , 1) = S(Ne , M( p)) × T (N ,M( p)) S(Ne ,M( p))2E(Ne , M( p)) = M( p) , ∀p ∈ P is defined as efficacy. A small example follows in order to illustrate the proposed method. A Petri net representsthe control flow of an occam program obtained by the translation method proposed in [64,59, 61]. This net describes a program composed of three subprocesses (see Figure 12).The process P R1 is represented by the set of places PP R1 = { p1 , p4 , p5 , p6 , p7 }, theset of transitions TP R1 = {t1 , t2 , t3 , p4 , p5 }, their input and output bags (multi-sets) andthe respective initial markings of its places. The process P R2 is composed of PP R2 ={ p2 , p8 , p10 } as its set of places, TP R2 = {t6 , t8 }, its input and output bags and the markingsof its places. Finally, the process P R3 is composed of PP R3 = { p3 , p9 , p11 } as its set ofplaces, TP R3 = {t7 , t8 }, its input and output bags as well as the markings of its places. Eachtransition has the duration (d) attached to it. The extended model is represented by the net in Figure 13. The place p13 representsa processor type. Its marking represents the number of available processors. The onlytransitions that describe the ALU operations (t ∈ ALUop ) are interconnected in order tofind out the number of functional units (ALUs—we are supposing that each processor hasonly one ALU) needed to execute the program and achieve best performance. These arcsare expressed by dotted lines. Due to the fact that these transitions are connected to theplace p13 by an input and an output arc, for short these are represented by bidirectional arcs. Suppose, for instance, that the designer has specified, in the architecture generator, theupper bound of available processor as n = 4. Thus, the execution time of the extended
  31. 31. A PETRI NET MODEL 273Figure 12. Description. Table 1. Critical path time. M( p13 ) C T (Ne ) 4 7 3 7 2 7 1 10model should be analyzed taking into account n = 4, n = 3, n = 2 and n = 1 and then theconfiguration with small execution time and small number of processors must be chosen. By applying the proposed algorithm, the results depicted in Table 1 are obtained. Thus,the necessary number of processors to execute the program is Mopt ( p13 ) = 2. This numbershould be used in the initial allocation process. In the following sections techniques for computing others useful metrics for hardware/software codesign are presented. Next section describes a method for delay estimation.A method for computing communication cost is described in Section 11. Workload ofprocessors is calculated by the method detailed in Section 12. The degree of mutual
  32. 32. 274 MACIEL, BARROS, AND ROSENSTIELFigure 13. The extended model. Table 2. Speed up, efficiency and efficacy. Processors S E EA 1 1 1 1 2 1.4286 0.7143 1.0204 3 1.4286 0.4762 0.6803 4 1.4286 0.3571 0.5102exclusion among processes is calculated by the method described in Section 13. Thesilicon area in terms of logic blocks for hardware and software implementation of eachprocess is computed by the technique described in Section 14.10. Delay EstimationPrevious sections described methods to compute execution time (cycle time) associatedto occam behavioral description translated into timed Petri nets by using the translationmethod proposed in [64, 61].
  33. 33. A PETRI NET MODEL 275 However, the method used to estimate the delay of the arithmetic and logical expressionsperformed in the assignments still needs to be defined. The delay of expressions is estimated in terms of the control steps needed to performthe expression. The expression execution time (delay) is given as a function composedof two factors: the delay related when performing the arithmetic and logical operationsof assignments and the delay when reading and writing variables. It was assumed thatoperations in one expression are sequentially executed. Dex (e) = D RW (e) + D O P (e)   ∀vu ∈e D Rh (vu ) + vd ∈e DW h (vd )   if e is implemented in hardware D RW (e) =  ∀vu ∈e D Rs (vu ) + vd ∈e DW s (vd )   if e is implemented in softwareThe variables vu and vd are the used and defined variables in the expression e.   ∀opi ∈e D O Ph (opi ) × #opi   if e is implemented in hardware D O P (e) =  ∀opi ∈e D O Ps (opi ) × #opi   if e is implemented in software11. Communication CostThis section presents the method proposed for computing communication cost betweenprocesses by using Petri nets [69, 70]. The communication cost related to a process dependson two factors: the number of transferred bits by each communication action and how manytimes the communication action is executed (here referred as number of communication). Considering that we are dealing with behavioral descriptions that are translated into Petrinets, we have already defined, in each communication action, the number of transferredbits in each communication action execution.Definition 11.1 (Number of Transferred Bits in a Communication Action). N bb : nbc → N,where #N bb = T and T is the set of transitions. Each component (nbc), associated to atransition that represents a communication action, defines the number of transferred bits inthe respective communication action, otherwise is zero. However, we have to define a method to compute how many times the communicationaction is executed the process, the communication cost for each process, the communicationcost of the whole description, the communication cost between two sets of processes andfinally to compute the normalized communication cost. The communication cost for each process (CC( pi )) is the product of the number oftransferred bits in each communication action (N bc ) and the number of communication(N C( pi )).
  34. 34. 276 MACIEL, BARROS, AND ROSENSTIELDefinition 11.2 (Communication Cost for each Process). Let N bc be the number oftransferred bits in the communication actions and N C( pi )T a vector that represents thenumber of communication. The communication cost for each process pi is defined byCC( pi ) = N bc × N C( pi )T . The Number of Communication (N C( pi )) is a vector, where each component (nc( pi )),associated to a transition that represents a communication action in the process pi , is theexecution number related to the respective communication action, otherwise, that is, thecomponent vector associated to the transition which does not represent the communicationaction in the process pi is zero. More formally:Definition 11.3 (Number of Communication). N C( pi ): nc( pi ) → N, where: #N C( pi ) =T and T is the set of transitions. Each component nc( pi ) = max(nc X k ( pi )), ∀X k ,where X k is a vector of positive integers such that either C · X = 0 or C · X ≥ 0.N C X k ( pi ): nc X k ( pi ) → N, where #N C X k ( pi ) = T and T is the set of transitions. Each vector X k is the minimum support which can be obtained by solving either C · X = 0(in this case X k are minimum t-invariants) or C · X ≥ 0. The number of communication ofan action ai (represented by a transition ti ) is the respective value obtained in the componentxi for the correspondent X k . The other components, which do not represent communicationactions in the process pi , are equal zero. According to the results obtained in the qualitative analysis, it is possible to choosemethods to compute the number communication (N C pi ) considering the complexity of themethod used. If the net (system) is consistent, first we have to compute the minimum t-invariants thenthe N C( pi ), ∀ pi ∈ D, are obtained. However, if the net is not consistent, but is repetitive,first the minimal support to X k , which is obtained using X in the system C · X ≥ 0, whereX = 0, has to be obtained, then the N C( pi ), ∀ pi ∈ D are computed. In the case of thenet not being repetitive and if it is possible to transform it into a repetitive or consistent netby inserting one transition t f such that I (t f ) = { p f } and O(t f ) = { p0 }, we apply the samemethod to compute X and then to obtain the N C( pi ), ∀ pi ∈ D. These places ( p0 and p f )are well defined, because one token in the place p0 (initial place) enables the execution of theprocess and when one token arrives in the place p f (final place), it means that the executionhas already finished. Otherwise, if it is not possible to transform the net into a repetitiveor consistent one, although this system seems not to have interesting properties and evenso the designer does not intend to modify it, we can compute the X and then N C( pi ) byusing either the reachability graph or by solving the system C · X = M f inal − M0 , whereM f inal and M0 are the final and the initial markings, respectively. However, the reader hasto remember that the state equation could provides spurious solutions for some Petri netsub-classes [33].THEOREM 11.1 Let N be a consistent net and X k a minimum t-invariant in the net. Con-sidering every minimum t-invariant in the net (∀X k ∈ N ), the maximum value obtained foreach component vector is the minimum transition firing number for each transition.
  35. 35. A PETRI NET MODEL 277THEOREM 11.2 Let N be a repetitive net and X k a minimum support in the net whichcan be obtained by using X which solves the equation C · X ≥ 0. Considering everyminimum support X k in the net, the maximum value obtained for each component vector isthe minimum transition firing number for each transition. The proof for both theorems can be found in [70]. The communication cost between two processes ( pi and p j ) is the product of the num-ber of communications between the processes by the number of transferred bits in eachcommunication action. More formally:Definition 11.4 (Communication Cost between Processes). Let N bc : nbc → N be thenumber of transferred bits in a communication action and N C( pi , p j ) a vector that repre-sents the number of communication between the processes pi and p j . The communicationcost between processes is defined by CC( pi , p j ) = N bc × N C( pi , p j )T . The communication execution number between two processes ( pi and p j ) is representedby a vector, where each vector component, associated to a transition which represents acommunication action between both processes, defines the execution number related to therespective action.Definition 11.5 (Number of Communication Between two Processes). N C( pi , p j ):nc( pi , p j ) → N, where #N C( pi , p j ) = T and nc( pi , p j ) = min(nc( pi ), nc( p j )). In order to compute the communication cost in systems with data dependent loops, thedesigner has to assign the more likely number of iteration for each loop of the net. Thisnumber can be determined in two ways: the designer may either directly associate a numberusing annotation in the model based on previous knowledge or by simulating the model onseveral sets of samples in order to record how often the various loops were executed. Therefore, each component of the vector N C( pi , p j ) which represents a communicationaction between the processes pi and p j and that is inserted in the loop lk has to be multipliedby the more likely number of iteration (L V (lk )) of the loop lk . A similar procedure has tobe taken for the vector N C( pi ). In this case, each component of the vector which representsthe communication action of the process pi and which is inserted in the loop lk has to bemultiplied by the more likely number of iteration (L V (lk )) of the loop lk . In the following, we present the algorithms to compute the number of communicationsbetween pairs of processes (N C( pi , p j )) and the the number of communication of eachprocess (N C( pi )) taking into account data dependent loops.• Input: set of processes {. . . , pi , . . .}, set of loops {. . . , lk , . . .}, set of communication action (SC A = {. . . , t j , . . .}), more likely number of iteration of each loop (L V (lk )), N C( pi ) of each process.
  36. 36. 278 MACIEL, BARROS, AND ROSENSTIEL• Output: N C( pi ) of each process taking into account the data dependent loops.• Algorithm:• For each process pi ∈ D, do: For each loop lk ∈ pi , do: For each transition t j ∈ lk , do: L V (t j ) = L V (lk ) For each loop lm ∈ pi , do: If lm ⊆ lk , do: For each transition t j ∈ SC A, do: L V (t j ) = L V (t j ) × L V (lm ) Else For each transition t j ∈ SC A, do: L V (t j ) = min{L V (t j ), L V (L m )} For each transition t j ∈ SC A, do: N C( pi , t j ) = N C( pi , t j ) × L V (t j )T• Input: set of processes {. . . , pi , . . .} set of communication action (SC A = {. . . , t j , . . .}), N C( pi ) of each process taking into account the data dependent loops, N C( pi , p j ) of each pair of processes.• Output: N C( pi , p j ) of each pair of processes taking into account the data dependent loops.• Algorithm:• For each pair of processes ( pi , p j ) ∈ D, do: For each transition tk ∈ SC A, do: If N C( pi , p j , t j ) = 0, do: N C( pi , p j , tk ) = min{N C( pi , tk ), N C( p j , tk )} The behavioral description communication cost is given by summing the communicationcost between each pair of processes in the description. More formally:Definition 11.6 (Communication Cost). CC(D) = ∀( pi , p j )∈D CC( pi , p j ) In order to be able to compare distinct metric values, we have adopted two kinds ofnormalization: local and global normalization. The normalization refers to scaling ofmetric values to a number between 0 and 1. The normalization provides a possibility ofcombining different units, such as communication cost and mutual exclusion degree, into asingle value and it also provides an absolute closeness measure between objects. Thus, weknow that a closeness value between two objects of 0.95 is high, nevertheless we can notassert about the number 25 [7].
  37. 37. A PETRI NET MODEL 279 The global normalized communication cost between two processes is defined by thecommunication cost between these processes divided by the communication cost for thewhole description.Definition 11.7 (Global Normalized Communication Cost). Let CC( pi , p j ) be the com-munication cost between processes pi and p j , and CC(D) the communication cost in thewhole behavioral description. The global normalized communication cost is defined by CC( pi , p )N CC( pi , p j ) = CC(D)j . The local normalized communication cost between two processes is defined by the com-munication cost between both processes divided by the summation of the communicationcost for each process.Definition 11.8 (Local Normalized Communication Cost). Let CC( pi , p j ) be the commu-nication cost between processes pi and p j , and CC( pi ) and CC( p j ) the communicationcost of the processes pi and p j , respectively. The local normalized communication cost isdefined by LCC( pi , p j ) = CC( pi p j )/(CC( pi ) + CC( p j )). The global normalized communication cost must also be defined for each process. Thismetric is defined by the communication cost related to the process divided by the commu-nication cost of the whole description.Definition 11.9 (Global Normalized Communication Cost of a Process). Let CC( pi )be the communication cost for each process and CC(D) the communication cost of thewhole behavioral description. The global normalized communication cost of each processis defined by N CC( pi ) = CC( pi ) . CC(D) The local normalized communication cost for each process pi is defined by the communi-cation cost of the process divided by the summation of the communication cost of processesp j s which has some communication (CC( pi , p j ) = 0). In the following we present theformal definition:Definition 11.10 (Local Normalized Communication Cost of a Process). Let CC( pi ) be thecommunication cost for each process pi and p j one process which CC( pi p j ) = 0. The local CC( pi )normalized communication cost of each process is defined by LCC( pi ) = CC( p ) . ∀ p j ∈D j The algorithms below compute the the number communication between pairs of processes(N C( pi , p j )) and the communication execution number of each process (N C( pi )) takinginto account the data dependent loops.1. Compute the communication execution number for each process (N C( pi ))2. Compute the communication cost of each process pi (CC( pi ) = N bc × N C( pi )T )
  38. 38. 280 MACIEL, BARROS, AND ROSENSTIEL Table 3. Communication cost CC( pi , p j ). Processes CC N CC LCC p1 , p2 32 0.333333 0.333333 p1 , p3 32 0.333333 0.500000 p2 , p3 32 0.333333 0.333333 Description 96 — —3. Compute the communication execution number between each pair of processes of the whole the description (N C( pi , p j )).4. Compute the communication cost between each pair of processes (CC( pi , p j ) = N bc × N C( pi , p j )T )5. Compute the communication cost of the whole description (CC(D) = ∀( pi , p j )∈D CC( pi , p j ))6. Compute the global normalized communication cost of each process (N CC( pi ) = CC( pi ) CC(D) ).7. Compute the local normalized communication cost of each process (LCC( pi ) = CC( pi ) CC( p j ) ) ∀ p j ∈D8. Compute the global normalized communication cost of each pair of processes CC( pi , p ) (N CC( pi , p j ) = CC(D)j )9. Compute the local normalized communication of each pair of processes (LCC( pi , p j ) = CC( pi , p j )/(CC( pi ) + CC( p j ))). Applying the proposed algorithm to the example shown in Figure 14.a, the results de-scribed in Table 3 and in Table 4 are obtained (considering that an integer is representedby 32 bits). Figure 14.b shows the net that represents the control flow of the algorithmdescribed. Table 4. Communication cost of pro- cesses. Process CC N CC LCC p1 32 0.333333 0.333333 p2 64 0.666667 1 p3 32 0.333333 0.333333

×