Melti n Bell: 512-505-8125,

the grade of a fuzzy output type associatedwith a particular
rule and fuzzy out type is the highestgrade for a particular
(3)      IF A IS NS AND AC IS ZZ THEN C IS PS                          Table...
applicationsrequiring significantcompute power,such as for
pattern recognition, image compressionand decompression,
max instruction uses it to deactivate PEs that don't have the                        THE ALGORITHM
highest value among all...
format so this is given after the sorting and before the scan­      With the instructions listed above, many sorting optio...

bra TOP


weights. For these reasons and the fact that finding the zero

   W ..       M.   NY   NS
•	 extracting the MF numberof the lowest fuzzy input grade            that the knowledge base would only be scanned once. ...
[Ko92] Kosko, B., quot;Neural Networks and Fuzzy Systemsquot;,
Upcoming SlideShare
Loading in...5

Rule Evaluation on a Motorola SIMD


Published on

An implementation of fuzzy inference, the second part of a fuzzy logic system, on a Motorola Single Instruction Multiple Data machine.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rule Evaluation on a Motorola SIMD

  1. 1. RULE EVALUATION ON A MOTOROLA SIMD Melti n Bell: 512-505-8125, & Rod Goke: 512-505-8121, Motorola Parallel Scalable Processors/Center for Emerging Computer Technology 505 Barton Spgs. Rd. Suite 1055, MD: F30, Austin, TX 78704 FAX: 512-505-81 00 ABSTRACT Fuzzification, rule evaluation and defuzzification in most fuzzy logic systems are computationally expensive tasks. Many sys­ tems using a sequential processor will scan the rules/knowledge base and fetch or recompute the fuzzy inputs even if one of them is zero. Due to the nature of fuzzy AND-OR inference processing, this leads to unnecessary fetches and/or computations nega­ tively impacting execution time and hardware resources. This paper presents an algorithm applied to the Association Engine (AE) Single Instruction Multiple Data (SIMD) machine that attempts to make this fuzzy inference process more efficient by min­ imizing the number of fetches and computations when fuzzy inputs are zero. Although this algorithm may be applied to fuzzy logic systems using sequential processors, analyzing the fuzzy inputs before scanning the knowledge base will highlight the scal­ able computing power of the AE as well as support Motorola's data oriented processing excellence in the fuzzy logic market. BACKGROUND logic system, defuzzification, takes the fuzzy output data of the second stage and converts it into a crisp output. Although fuzzy logic has been around for more than 20 years, it's taken a long time for it to gain acceptance in the engineer­ The process of taking the usually small set of fuzzy input ing community. Over time, many people have addressed the grades and combining them with the rules for producing potential drawbacks of fuzzy logic so that it is now seen as an fuzzy outputs closely matches our reasoning abilities and invaluable tool in many of todays' systems. Even though partly explains why fuzzy logic systems often take less code fuzzy logic is not generally suited for use in linear systems, and/or execute faster than traditional boolean logic systems. it's projected that the fuzzy logic market will increase by 76% TIle basis for this second stage of the fuzzy logic process is every year into a billion dollar business through 1998 [St93]. the fuzzy MIN-MAX inference method most frequently The factors responsible for such market projections are applied to fuzzy set logical computation [Ar92]. This method related to what makes fuzzy logic invaluable in many nonlin­ computes the fuzzy AND of multiple fuzzy input grades by ear systems: faster and lower cost development, adaptiveness, taking the minimum grade of each individual fuzzy input type smoother and simpler controls, fault tolerance, improved used in a rule. The rule weight giving the grade of one of the product performance, maintenance and extensibility, etc. fuzzy outputs for such a rule is the same as the minimum grade value of the fuzzy inputs. The method then computes Fuzzy logic is also popular because it more closely emulates the fuzzy OR of multiple rule weights by taking the maxi­ our reasoning abilities and knowledge modelling capabilities mum of the rule weights associated with a particular fuzzy than traditional boolean logic systems [Ba93]. The first stage output. Mathematically, this method may be summarized by of a typical fuzzy logic system, fuzzification, deals with find­ ing the degree/grade to which crisp system inputs fit within • fuzzy out typeX.ruleY = MIN(ruleY.fuzzy in typel, the membership functions (MF) of the fuzzy inputs. The sec­ grade...ruleY.fuzzy in typeN.grade) ond part, rule evaluation or fuzzy inference, uses these fuzzy • fuzzy out typeX = MAX(fuzzy out typeX.ruleJ... fuzzy input grades and the rules describing the desired behavior of out typeX.ruleN) the system to produce fuzzy output grades. This is the key stage of the process that models our knowledge reasoning capabilities, and, consequently, is responsible for much of the where rule.fuzzy in type.grade is the grade of a particular computation in most fuzzy systems. The last part of a fuzzy fuzzy input type associated with a rule, fuzzy out type.rule is -1­
  2. 2. the grade of a fuzzy output type associatedwith a particular rule and fuzzy out type is the highestgrade for a particular fuzzy output type. PENDULUM MOTIVATION Figure 1: Inverted Pendulum Many fuzzy logic systems spend most of their computation time during the fuzzy inferencestage because of the large numberof fuzzy inputsand rules that mustbe scannedduring the fuzzy AND-QRoperations. Since a fuzzy input grade of MOTOR zerofor a rule meansa corresponding zero fuzzyoutput value for thatrule and 75% of the fuzzy inputgradesof manyfuzzy systemscharacteristically have zero values,significantcom­ putationtimeand resourcesare wastedscanningtherules and performing fuzzy AND-OR/MIN-MAX operationson zero values. This paper will address this significant drawback to D D D typical fuzzy logic systems with an algorithm written for a MotorolaSIMD that improves the performance factor directlyimpactingMotorola'sabilityto successfully compete in the expanding fuzzy logic market. ~ ... The example fuzzy logic applicationfor this algorithm is the There are seven triangularmembershipfunctions per input InvertedPendulumProblem while the targetarchitectureis for this example.Three of the membershipfunctions repre­ the AE. The InvertedPendulum Problem fuzzy logic param­ sent positive values: Positive_Large (PL), Positive_Medium eters are given in the followingsection and derived in the ref­ (PM),and Positive_Small (PS). Three more membership erence [K092]. The section after the InvertedPendulum functions representnegative values: Negative_Large (NL), Problemdescription gives information on the AE related to Negative_Medium (NM) and Negative_Small (NS). The last the example.The next section will cover the specifics of the membership function is Zero (ZZ). Each edge of these mem­ algorithm itself (the sorting of the fuzzy inputs, the represen­ bership functions is prohibited from overlappingwith more tationof rules/knowledge base format, the knowledgebase than one other membership function edge so that each crisp scanning/generation of fuzzy outputs) and illustrate data ori­ system input will be described by no more than 2 nonzero ented processing's effect on algorithmdesign. The section fuzzy inputs (out of 7 possible). Although three points are following the algorithmdescriptionwill analyzeand summa­ enough to define triangularmembershipfunctions, four rize the performanceof this algorithmfor the InvertedPendu­ points (Pl, P2, P3, and P4) are used in this exampleso that lum Problem as well as larger fuzzy logic applications. The the applicationwill be general enough to be applied to fuzzy last sectionacknowledges those who have contributed to this logicsystemsusingtrapezoidal membershipfunctions as well paper. as triangularones. Unlike the input membershipfunctions, singletons are used for the seven output membership func­ tions (pL, PM, PS, ZZ, NI.quot; NM, NS) so that only one point INVERTED PENDULUM PROBLEM (PI) is needed. Balancing an invertedpendulum in two dimensions is a clas­ With the input and output membershipfunctions defined, sic control problem. A motor is used to move the base of the commonsenseand some engineeringanalysismay be usedto invertedpendulum. Motionin onlyonedimension is assumed generate the rules and membershipfunction point values for thisexample to simplify theproblem to two inputs.These describing the behaviorof the system. For example, if the inputsare theangle thependulummakeswith the vertical(A) pendulum falls to the right, a negative current should make and theper secondrate at whichthe anglechanges(AC). The the motor compensate. Conversely, if the pendulum falls to positiveor negativeamount of current (C) supplied to the theleft,the outputcurrentshouldbe positive. If thependulum motor is the output that will balance the pendulum. The sys­ is balancedat the vertical, the output current should be zero. tem is shown in the following figure: The full set of rules describingthe behaviorof the systemfol­ low: (1) IF A IS NL AND AC IS ZZ THEN C IS PL -2­
  3. 3. (2) IF A IS NM AND AC IS ZZ THEN C IS PM (3) IF A IS NS AND AC IS ZZ THEN C IS PS Table 2: ANGLE CHANGE MF POINTS (4) IF A IS NS AND AC IS PS THEN C IS PS MF PI P2 P3 P4 (5) IF A IS ZZ AND AC IS NL THEN C IS PL NL -90 -90 -72 -49 (6) IF A IS ZZ AND AC IS NM THEN C IS PM (7) IF A IS z: AND AC IS z:z THEN C IS ZZ NM -72 -49 -48 -25 (8) IF A IS zz AND AC IS PS THEN C IS NS NS -48 -25 -24 -1 (9) IF A IS zz. AND AC IS PM THEN C IS NM (10) IF A IS ZZ AND AC IS PL THEN C IS NL zz -24 -1 0 +23 (11) IF A IS PS AND AC IS NS THEN C IS NS PS 0 +23 +24 +47 (12) IF A IS PS AND AC IS ZZ THEN C IS NS (13) IF A IS PM AND AC IS ZZ THEN C IS NM PM +24 +47 +48 +71 (14) IF A IS PL AND AC IS zz THEN CIS NL PL +48 +71 +90 +90 (15) IF A IS zz AND AC IS NS THEN C IS PS The following tables apply engineering analysis techniques for relating the crisp system input or output points to their respective membership functions: Table 3: CURRENT MF POINTS MF PI Table 1: ANGLE MF POINTS MF PI P2 P3 P4 NL -18 NL -90 -90 -54 -36 NM -12 NM -54 -36 -36 -16 NS -6 NS -36 -19 -18 0 ZZ 0 ZZ -18 0 0 +20 PS +6 PS 0 +17 +18 +36 PM +12 PM +18 +36 +36 +56 PL +18 PL +36 +56 +90 +90 To summarize, the Inverted Pendulum Problem may be described as a 2-input, l-output fuzzy logic system with 7 membership functions per input or output, a maximum of 4 nonzero fuzzy inputs and a total of 15 rules. THE ASSOCIATION ENGINE The AE is a single-chip SIMD coprocessor intended for data oriented processing environments and parallel computing -3 ­
  4. 4. applicationsrequiring significantcompute power,such as for pattern recognition, image compressionand decompression, neural networks,and fuzzy logic [AE93]. Although many AEs may be linked together in arrays for MIMDand/or large SIMD processing, only one AE is required for the Inverted Pendulumexample.ntis examplewill demonstratethescalar engine which handles sequentialprogram execution,process control, exception processing and other traditional scalar operationsas well as the vector engine consistingof 64 pro­ Each of the scalar and vector PEs (65 per AE) contain a ded­ cessing elements (PEs) for efficientexecution of parallel or icated 8-bit ALU enabling each AE to deliver 1.3 billion vectorprocessingalgorithms. The followingfigures show all signed, unsigned or multibyte operations per second at a of the major AE modules explained in this section: 20MHz clock frequency. The PEs receive their commands from the Sequence Controller which in tum accesses them Figure 2: Modules of the AE from the 256 byte InstructionCache (K'), Vectorengine PEs execute the same instruction simultaneously, in lock-step, each accessing the Input Data Register (lOR), Coefficient Memory Array (CMA), or vector data registers (vO-v7) asso­ ciated with it while the scalarengine PE executes instructions that access the lOR, CMA, and scalar global and pointer reg­ isters (gO-g7, pO-p7). CMA In combination with the scalar and vector engines, the CMA and lOR are other major AE modules that demonstrate the AE's flexibility. The 64 by 64 (=4K) bytes of CMA SRAM functions as the general memory storage for instructions, Control Regia.... stack space,jump tables, workingdata and data arrays. A row i15 I I of 64 bytes is allocated to each of the 64 PEs so that a CMA ~ columnof 64 bytes is availablefor vector/paralleloperations. The CMA can also interact with the lOR when the AE is in Run (vs. Host) mode (e.g. the AE is processing instructions insteadof interactingwith a host processor for randomand/or stream accesses). Figure 3: A Vector Engine Row The IDR is the only input data path for the AE when the AE is in Run mode. An input tagging feature allows the lOR to access individual bytes of data out of a byte stream while an inputreplicationfeature allows the individualbytes to be cop­ ied to more thanone of the 64 IDRelements.Theseindividual bytes enter from either of the 4 AE ports (North, South, East Indirect-Pointer PO through P7 and West) and go directly into the IDR. Up to 64 bytes of data may then be accessed from the lOR by the scalar and vector enginesduring AE programexecution. The scalarengine can accessan element/byteout of the lOR while the vectorengine can access all 64 elements/bytesof the lOR. Although other features of the AE include many control reg­ Figure 4: The Scalar Engine PE isters not yet definedand a rich instruction set where many operations take 1 clock cycle, the Vector Process Control Register(VPCR)and the instructionslisted in this sectionare used to solve the fuzzy inferenceportion of the InvertedPen­ dulum Problem. A VPCR is contained in each of the 64 PEs of the vectorengine.Only two of the 8 bits in the VPCRapply to this example. Although the Vector Conditional True (VT) bit is usually used to evaluate if-then-elseconditions,the loc­ -4­
  5. 5. max instruction uses it to deactivate PEs that don't have the THE ALGORITHM highest value among all vector register (vO-v7 and IDR) ele­ ments. The ValidInput Data (VID) bit indicates that the asso­ As the second stage of a fuzzy logic system, rule evaluation ciated lOR element has data that is valid for use. requires Besides the locmax instruction, the following instructions • the fuzzy input grades of the first stage and may be used for implementing efficient rule evaluation on AEs: • the rules describing the mapping of fuzzy inputs to the fuzzy outputs • vnwv • movi in order to generate the fuzzy output weights required for the third stage. As implied earlier, most fuzzy logic systems start • nwv the fuzzy AND-ORIMIN-MAX operations by scanning the • dskip rules and then fetching or computing the fuzzy inputs. This • skipne means that rule processing will not only be proportional to the number of rules, but the number of fuzzy inputs possible in a • skipnvt system. With the 7 membership functions/system input, 2 • repeat system inputs and 15 rule Inverted Pendulum Problem, rule processing will be proportional to 7 * 2 * 15 =210 member­ • repeate ship function * rules even though a majority of the fuzzy • vwritel inputs are zero. • locmin By analyzing the fuzzy inputs and their impact on the fuzzy AND-ORoperations before the process of scanning the rules, • rowmin this data oriented processing exercise changes the focus of • rowmax computing from scanning all the rules and performing fuzzy MIN-MAX computations on every fuzzy input to determin­ • colmin ing the useful fuzzy inputs and then minimizing the amount • colmax of computation performed on them. With a maximum of 2 nonzero membership functions/system input, 2 system inputs • bra and 15 rules, rule processing under such a data oriented para­ • vifgt digm extends the execution time so that it is proportionally bounded by 2 * 2 * 15 =60 membership function * rules. • vifne • vifeq The data oriented processing emphasis of this algorithm is atypical of many fuzzy logic systems because the processing • vendif and space limitations of Single Instruction Single Data • vor (SISD) chips, no matter how well or highly pipelined, require that all fuzzy AND-OR computations, the scanning of rules, • add and the recomputation or storing and retrieving of intermedi­ • getpe ate results be performed by the single sequential processor. During all phases of this algorithm, the data flowarchitecture • get of the AE and the compute power available from its 65 pro­ • put cessors stress the performance improvementover SISD chips of using Motorola data oriented processing engines, such as • inc AEs, for fuzzy logic solutions. • dec The first part of this algorithm will sort the fuzzy inputs and • dsrot maintain/track the relationship of fuzzy input to membership function so that the nonzero fuzzy inputs and rules using them will facilitate efficient scanning of the knowledge base. The The reader should consult the reference [AE93] for instruc­ second part of this algorithm, generating the fuzzy outputs tion execution times and further explanation of instructions, from the sorted fuzzy inputs by efficientlyscanning the rules/ registers, or other AE features. knowledgebase, is closely related to the rule knowledge base -5 ­
  6. 6. format so this is given after the sorting and before the scan­ With the instructions listed above, many sorting options are ning. For the remainder of this discussion, the fuzzy input available on the AE. Some of the options apply theory from grades from the fuzzification stage will be stored in 14 ele­ sorting algorithms for conventional SISD processors, but ments of a vector register and in the lOR, the fuzzy input offer a significantperformance improvement when applied to membership functions will be stored in a vector register, the the AE. For example, a good sorting algorithm for a sequen­ rules will be stored in a 7 by 14 byte space within the CMA tial processor would have a performance proportional to O(N and the fuzzy output rule weights will be computed in a sec­ * 10g(N)) where N is the number of items to be sorted. ond vector register. The register map for these and other val­ Although there are theoretically faster sorting algorithms for ues used for intermediate calculations follows: sequential processors, the hardware or software overhead usually makes them undesirable or inefficient for small N. The application of an O(N * 10g(N)) conventional SISO sort­ ing algorithm to an AE, however, can result in linear perfor­ Table 4: Register Map mance, O(N), practically impossible to achieve on any conventional sequential processor [l(n73, Be93]. Though Fuzzy Input Grades v l, either linear sorting algorithm would be sufficient for the lOR Inverted Pendulum Problem, a routine based on the locmax instruction will be used to demonstrate the diversity and PE With the Largest Fuzzy Input Grade p3 uniqueness of the AE instruction set and architecture. The first part of this routine will initialize the Sorted Fuzzy The Largest Fuzzy Input Grade g4 Input Grades vector (v l), Tracked Fuzzy Input MFs vector (v2), Zero global (g3), and the Sorted Fuzzy Input Grades Sorted Fuzzy Input Grades vO Index Pointer (p4) registers to zero. The next part of the rou­ tine is a loop that selects the largest fuzzy input grade from Sorted Fuzzy Input Grades Index Pointer p4 the Fuzzy Input Grades vector (v1) register, inserts that value into increasing locations of the Sorted Fuzzy Input Grades Number of Nonzero Fuzzy Inputs g5 vector (vO), and then replaces the largest fuzzy input grade in the Fuzzy Input Grades vector (vI) with zero. The AE assem­ Tracked Fuzzy Input MFs v2 bly code of this descending values sorting routine follows: vmov#O, vO Fuzzy Input MF Pointer Into CMA pO vmov#O, v2 Rules CMA[O,3]­ CMA[6,I6] movi #0, g3 movi #0, p4 Fuzzy Input MF Column Offset Into CMA g7 IDP: locmax#8, vI Number of Fuzzy Input MFs for Example g6 skipnvt Zero g3 bra BOTTOM Pointer Into IOR/Fuzzy Input Grades p2 getpe p3 Latches Bit Vector v3 get v l, pe[p3], g4 Fuzzy Output Weights v4 put g4, pe[p4], vO put p3, pe[p4], v2 put g3, pe[p3] vI Sorting The Fuzzy Inputs inc #1, p4 -6 ­
  7. 7. vendif bra TOP BOTTOM: quot;'faIl~ MF·» NL NM NS AHQl£ zz. PS PM PI.. NL NM ANGLECHANGE NS zz PS PM PI.. 8 u b r . R u I .. CIIA Oulpul )(~ · 0 The locmax-based sorting routine given above may be easily CMACol.. 3 4 5 • 7 1 • 10 11 12 13 14 15 quot; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 modified for sorting across multiple AEs by substituting row­ 1 1 t 1 1 1 t 1 t 1 1 1 t 1 1 max,rowmin, colmax, or colmin instructions for locmax and t 1 t t t 1 1 1 1 1 1 1 1 1 2 then writing the result out to a port for further processing by 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 , , 4 QINL [an]other AEs or other hardware. , , 1 1 1 1 1 1 1 1 1 1 1 1 5 0 0 0 0 0 0 , 0 0 0 1 0 0 0 • 14 a a , a a a a a Rules Knowledge Base Format ... ---» FC 0 FC Fe FD Fe 0 Fe FE FC Fe FC 0 FE 0 Fe 0 t 7 10 Fe FD 1 t 1 t , , , t , 1 1 1 1 1 1 The format for representing the rules of a fuzzy logic applica­ 1 , 1 1 , • 1 1 1 1 , , 1 1 1 tion written for the AE further illustrates one of the data ori­ 1, , , , , , , , , , , , 1 '0 ented processing edges over conventional function oriented , , , , , , 1, , , , 1 t 1 1, , , , , 1 quot; 1 1 1 1 1 1 1 12 11NU processors. ntis format was chosen to make the storage of the , , , , , , , , 1 1 t 1 t 1 13 knowledge base very compact and the scanning of these rules a a a 0 0 1 a a 0 a 1 a a a 14 13 highly efficient. Each rule stored within the CMA will take up D a D 1 a D a a D D D a 1 a 15 , a subrow of bits in the CMA. The length of the subrow will ... ---» FC FC Fe FD FC FE FC Fe FC FC FE FC FD FC be the number of fuzzy input MFs. All subrows contributing 1 , , 1 , 1 1 , 1 1 t , , 1 , , , , , , , , , quot; 1 1 t 1 '7 1 to a fuzzy output must be grouped together in a CMA row so , , , , , , , '1 1 1 1 t 1 1 1 that a total of 8 rules may affect a fuzzy output. For the 1 1 1 1 1 1 1 1 1 1 1 1 1 1 quot; Inverted Pendulum example, the fuzzy inputs MF relation­ 1 1 1 1 1 1 1 1 1 , t 1 , 1 2D 2INS ship to fuzzy output MFs requires 14 columns and 7 rows of a D a a 1 D 0 a a a 1 a a a 21 12 CMA space and can represent a maximum of 7 fuzzy outputs a 0 0 a 1 0 a D a 1 0 0 a a 22 11 * 8 rules per fuzzy output = 56 rules with the limitation that ... ---» D FI D Fa D Fa 1 FI D FE D FI 0 Fa a Fa 0 Fa 0 FA a FC 1 FI 0 Fa 0 FI 23 1 no more than 8 rules contribute to a fuzzy output, 1 1 1 1 1 t , , 1 1 , , , 1 24 , 1 , , 1 1 , 1 , , 1 , 1 1 25 Since there are only 15 rules describing the Inverted Pendu ­ , 1 1 t 1 , , , 1 1 , , 1 1 21 lum Problem, there will be 56 - 15 = 41 subrows that will not 1 , , 1 , , , , , 1 1 , , 1 27 contribute to a fuzzy output. These excess subrows must be 1 1 , 1 1 1 , , 1 1 , 1 t 1 a 3IZZ , 1 , , , , , , , 1 1 , , , 21 filled with I's to facilitate a latching mechanism described , , , , , , , , , 1 1 1 1 1 3D later.The other subrows identifying fuzzy input MFs contrib­ , , D uting to a fuzzy output MF will be filled with l's and O's. For each of the 15 rules, the bits within each subrow set to 1 will ... .. ~ D FE , , 0 FE D FE 1 FF , 0 FE 1 a FE , D FE 1 D FE , D FE 1 D FE , FF , a FE 1 0 FE t FE 1 3' 3i2 7 identify the fuzzy input MFs contributing to a fuzzy output , , t t 1 , , , , 1 , t , 1 33 , , , , , t , , 1 , , 1 , , 34 while those fuzzy input MFs not contributing to the fuzzy , , , , t , , , , t t t t 1 36 output for this same subrow/rule will be set to O. For this 1 , 1 , , , , , , t 1 1 , , 3& 4JPS example, exactly two bits will be set in a subrow for a rule 0 0 0 , 0 D , 0 0 0 0 D D 0 37 '5 because each rule uses both fuzzy inputs. The following bit­ 0 0 t 0 0 0 0 D 0 0 0 , 0 0 38 4 map of the CMA representing this format for the Inverted 0 0 , 0 0 0 0 0 0 0 t 0 0 D 38 3 ... ~-» FI FI F8 FC FI FI FI FI FI FC FI FA FI FI Pendulum rules shows the CMA columns and subrows iden­ , 1 , , , , , , , 1 1 t t t 4D tifying the fuzzy inputs and associated MFs that contribute to , , , , , , , , 1 t , , , , 4' a particular fuzzy output: , , , , , , , , , t , , , , Q t , , , , , , , t t , , , t 43 Table 5: Inverted Pendulum Problem Rules Knowledge Base t t , , , , , , , , , , , t .w 5iPM , , t , , t , , , , t , , , 45 D D D , 0 D 0 0 1 0 D D a a 46 I , ... ~ .. D FC FD 0 FC 0 FE 0 FC 0 FC 0 FC 0 FC 0 FE 0 FC 1 FD D FC 0 FC 0 FC 47 2 -7 ­
  8. 8. weights. For these reasons and the fact that finding the zero Input·. W .. M. NY NS AHG..e ZZ PS PM Pl M. NY ANOl.ECHANGE N8 zz. PS PM Pl a II b .. .. R II CMA fuzzyoutput weights facilitatescalculating the nonzero fuzzy output weights, this part of the fuzzy MIN/ANDevaluation is . r I 0uqIu& CUACGI·. 3 4 5 I 7 I , '0 '2 '4 '5 '1 M~ performedas the first computations for generating the fuzzy quot; quot; , , , 1 1 1 1 1 , 1 1 1 1 1 48 output weights. , 1 1 , , 1 1 1 1 1 1 , 1 1 4' , , , , , , 1 , , 1 ,1 1 1 50 Just as with the sorting routine, the first part of generatingthe , 1 , , , , , 1 ,1 1 1 1 1 51 fuzzyoutput weights will initialize a number of registers with , , , , 1 , , , 1 1 1 1 52 IA. , , , 1 , , , , 1 , , 1 1 !as appropriate values. The Fuzzy Output Weights (v4) vector, 1 1 1 0 0 0 , a 0 a , 0 0 a a 0 a 54 5 Pointer Into IDR/Fuzzy Input Grades (p2), and the Latches , a a 0 0 0 0 0 0 a , a a 0 55 1 Bit Vector(v3) registers will be initialized to zero while the ..... ar-.. FD Fe FC FE FC Fe Fe FE Fe FC FD Fe Fe Fe Fuzzy Input MF Column Offset Into CMA (g7) global and Fuzzy Input MF Pointer Into CMA (pO) registers will be set to 3. The Number of Fuzzy Input MFs for Example (g6) glo­ Although each subrow contributing to a fuzzy output is bal register will be set to 14. After the initializations, the placed at the end of a CMA row, the actual order of subrows CMA will be scanned and bits within the Latches Bit Vector within a CMA row is not important Gustthat all subrows (v3) register will be set to reflect fuzzy input MFs with zero affecting a fuzzy output be groupedin the same row). With so weightsand other excess subrows not contributing to a fuzzy many excess subrows, however, it helps in generating the output MF weight. Any PEs containing a Latches Bit Vector hexadecimalCMA bytes if the upperor lower 4 bits are all 1's (v3) element/byte with all bits set will be deactivated so that (i.e. F). rule weights of zero will not be changed by subsequent pro­ cessing.The AE assembly code for the first part of generating Besides being a compact representationof the rules knowl­ the fuzzy output MF weights follows (fuzzy MIN/AND oper­ edge base, this format allows for a latching mechanism to be ation): employed when scanning the rules so that bits within the latch are set for excess subrows and when fuzzy input MFs contrib­ vmov#O, v4 ute to a fuzzy output MF.The bits within the latch will never be cleared so that a fuzzy output MF weight is known when movi #3, g7 all bits in a byte of the latch are set to 1.This weight, however, may not be the correct weight because more than one fuzzy movi g7, pO output MF weight is possible and the fuzzy OR operation requires that the highest weight be chosen. movi #O,p2 vmov#O, v3 Generating Fuzzy Output MF Weights From Rules and Sorted Fuzzy Input Grades movi #14, g6 As stated above, the latching mechanism supported by the repeate #2, g6 rules knowledge base format is not enough to guarantee that vifeq IDR[p2++], v4 the correct fuzzy output weight will be generated when scan­ ning the knowledge base. The fuzzy input grades are sorted vor CMA[pO++], v3 from highest value to lowest value partly because of this problem. The main concept behind this phase of the algo­ vifne #-1, v3 rithm is a method of using the fuzzy inputs, sorted fuzzy inputs and associated MFs for efficientlyscanning the knowl­ The next part of generating the fuzzy output MF weights ini­ edge base so that the fuzzy AND-OR operations are pre­ tializes the Number of Nonzero Fuzzy Inputs (g5) global reg­ served and the correct fuzzy output weight is generated for ister and sets the Sorted Fuzzy Input Grades Index Pointer each fuzzy output MF. (p4) register to point to the last element (e.g. lowest grade) of the Sorted Fuzzy Input Grades vector (vO) register. These ini­ Since a majority of the fuzzy input grades are zero, this tializationsare done so that the Sorted Fuzzy Input Grades method must evaluate all of the rules dependingon these zero (vO) vector may be traversed from smallest grade to largest fuzzy input grades and generate zero fuzzy output weights grade as part of this algorithm's fuzzy AND-OR/MIN-MAX appropriately. With the IDR holdinga copy of the fuzzy input inference processing. The fuzzy AND-OR/MIN-MAX infer­ grades, this operation is relativelyeasy to perform and under­ ence processing loop involves stand compared to calculating the nonzero fuzzy output -8 ­
  9. 9. • extracting the MF numberof the lowest fuzzy input grade that the knowledge base would only be scanned once. Since not yet processed into the Fuzzy Input MF Pointer Into the rules knowledge base is scanned twice during this last CMA (pO) register, phase of thealgorithm (once for processing zero fuzzy output weights and once for processing nonzero fuzzy output • extracting the lowest fuzzy input grade not yet processed weights), the theoreticalexecution time is proportionally (continuance of the fuzzy MIN/AND operation which bounded by 2 nonzero membership functions/system input * started with computing the zero fuzzy output member­ ship function weights), 2 system inputs * 15 rules * 2 = 120 membership function * rules. This is still just under twice as fast as is possible on a • adding the Fuzzy Input MF Column Offset Into CMA conventionalprocessor.In practice, however, the theoretical (g7) register to the MF number of the lowest fuzzy input execution time can be proportional to as little as 60 member­ grade not yet processed (pO) register, ship function * rules when there is only 1 nonzero member­ ship function/system input. For comparison's sake, let's • ORing the rules using the lowest fuzzy input MF with the assume that the average theoretical execution time of this Latches Bit Vector (v3) register, algorithmwill be proportional to (120 + 60) /2 = 90 member­ • moving the lowest fuzzy input grade into the active ele­ ship function * rules. This represents a theoretical 210/90 = ments of the Fuzzy Output Weights (v4) vector register, 233% performanceimprovement over must fuzzy logic sys­ tems. • setting up the Sorted Fuzzy Input Grades Index Pointer (p4) register to point to the next lowest fuzzy input grade not yet processed, and PERFORMANCE AND SUMMARY • deactivating the PEs with all the bits set in their Latches Though the performance estimates given above for the algo­ Bit Vector (v3) register (fuzzy MAX/OR operation) rithm are impressive, they do not give the exact amount of time it takes for the algorithm to execute on the AE nor do The AE assembly code for this last part of generating the they illustrate the AE's suitability for solving fuzzy logic fuzzy output MF weights follows: problems of varying sizes. This section will give the algo­ rithm's worst execution time in clock cycles for the Inverted movp4,g5 PendulumProblem and larger fuzzy logic systems based on an unpipelinedAE and instruction cycle times given in the dec #1, p4 reference [AE93].The calculations used to generate the num­ ber of clock cycles in the following table reduces to 2 * I * 10 repeat #7, g5 + 72 * 1+ 41, where I is the number of system inputs for the fuzzy logic problem, 10 is the number of fuzzy input or out­ get v2, pe[p4], pO put membershipfunctionsper system input or output and the maximum number of rules supported is 10 * the number of get vO, pe[p4], g4 fuzzy outputs * 8. Since the number of CMA columns addg7,pO accessed by this algorithm is only dependent on the number of fuzzy input MFs, this algorithm has the added benefitof vor CMA[pO], v3 allowing for a constant execution time when the number of rules is less than the maximum number of rules supported. vmov g4, v4 Table6: Performanceof Algorithm for DifferentFuzzyLogic dec #1, p4 Systems vifne #-1, v3 vendif Fuzzy Logic System I 0 10 Max Rules Cycles The vendifreactivates all the PEs that were deactivated dur­ ing the fuzzy AND-OR/MIN/MAX inference processing so Inverted Pendulum 2 1 7 56 213 that the third stage of fuzzy logic processing,defuzzification, 2/1 2 1 8 64 217 doesn't have to worry about the state of the PEs. 4/2 4 2 8 128 393 It should also be noted that the theoreticalexecution time esti­ 6/3 6 3 8 192 569 mategiven for the algorithmearlier was under the assumption 8/4 8 4 8 256 745 -9 ­
  10. 10. [Ko92] Kosko, B., quot;Neural Networks and Fuzzy Systemsquot;, Prentice-Hall,Inc., Englewood Cliffs, NJ, 1992. The difference between the 2/1 and 4/2, 4/2 and 6/3, 6/3 and 8/4 fuzzy logic systems in the above table is exactly 176clock [[5t93] Stevens, T., quot;Fuzzy Logic Makes Sensequot;, Industry cycles. This data proves that the AE scales linearly with the Week, March 1, 1993 pp. 36 - 42. size of fuzzylogic systemsand providesan excellentexample of a chip well designed for scalable computing performance, Note also that even for the largest fuzzy logic system, 8/4, halfof the CMA rows are empty.This implies that the AE can support larger fuzzy logic applications requiring more rules and/or fuzzy output MFs with slightly modified(if modified at all) code. This is important to note because although the problem size may increase, the code size may very well stay the same without adding significantlyto execution time. In summary, this algorithm is particularly exemplary of data oriented processing enhancements available with applica- tions using the AE. It shows how solving smaller parts of a fuzzy logic problem on the AE with data oriented partitioning elegance creates an interdependenceamong all phases of a problem solution allowing for greater overall efficiencyand scalabilitythan can be attained with conventional processors. These factors will give Motorola a clear performance advan- tage in fuzzy logic markets. ACKNOWLEDGEMENTS The authors would like to recognizeWilliamArchibaldas the firstand only other individual (to the authors' knowledge) to develop the basic algorithm and apply it to any other hard- ware (Ar92] as well as for his time in helpingus to understand the algorithm. Alex DeCastro also provided the figure used in this paper and the source for one of the references. BIBLIOGRAPHY [AE93] Motorola Parallel Scalable ProcessorsGroup, quot;Asso- ciation Engine (AE) Software Manualquot;, Motorola MCTG Publications, 1993. [Ar92]Archibald,W.,quot;FLIPPER Architecturaland Algorith- mic Notesquot;, Not yet published. [Ba93] Barron, J., quot;Putting Fuzzy Logic Into Focusquot;, Byte, April 1993 pp. 111 - 118. [Be93] Bell, M., quot;Sorting on the AEquot;, Not yet published. [Kn73] Knuth, D., quot;Sorting and Searchingquot;, The Art of Com- puter Programming, Vol. 3, Addison-Wesley Publishing Company, Menlo Park, CA, 1973. - 10 -