Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this


  1. 1. Mounica Vempa Swetha Varadarajan
  2. 2. TRENDS in EMBEDDED SYSTEMS The need for high performance functionalities is increasing day by day, so are the no. of chips on a system. This increase has become inevitable due to which the designers are switching to Multiprocessors System-on-Chip (MPSoCs) from System-on-chips.
  3. 3.  An SoC basically refers to putting together all the components of a computing system onto a single chip(IC).
  4. 4. Issues with traditional bus and crossbar networks • Latency •Priority •Length of the wires •Bandwidth limitation •Cost
  5. 5. Routing congestion-aware and communication power-aware mapping in MPSoCs A core-graph with each communication flow with a latency constraint (in terms of time) is given. As the BWs on each communication link are finite, latencies would depend on network congestion. Thus, latency constraint violations can sometimes be fixed by mapping cores farther apart so that routing paths with lesser congestion can be found. Assume multiple link-insertions are not allowed.
  6. 6. 1. Arrange the task graph in the decreasing order of the severity of the latency constraint. 2. The entire list is divided into „m‟ number of child search windows. 3. Each window is searched for the cores occurring more than 4 times. (This is because; the maximum number of adjacent neighbors a core can have is 4). 4. Such cores are named as prime nodes. 5. For each of the prime node, a tile with maximum neighbors is chosen and the node is inserted on the tile. 6. The four neighbors of the prime node will be the ones with tight latency constraints, thus at one hop distance away from the node. 7. The steps are followed for the rest of the cores in the decreasing order of occurrences.
  7. 7. Fig 1. Given CTG for 9 cores Fig 3. Initial Mapping Fig 2. Sorted latency table in decreasing order
  8. 8. • A congestion detection mechanism is used to detect if the route is congested or not. • A CD packet is sent from source to router. The threshold is kept as 'k' . If this threshold is exceeded, the router is assumed to be congested and updated in the CD matrix. • The new path allocation is explained in the following slide.
  9. 9. New Path Allocation Algorithm: 1. The position of the destination router position is found out with respect to the „x‟ position of the source as either east or west. 2. In the determined direction, the CD matrix is traversed i.e., y co-ordinate is fixed and the x co-ordinate is varied till it reaches the x co-ordinate of the destination router. 3. If in this path, congested router is found, („1‟ in the matrix) the search is shifted to either north or south direction. 4. The search continues until it reaches the y co-ordinate of the router. 5. If the path is clear and the y co-ordinate is reached, the path continues either in east or west direction to reach the destination. 6. If the path is not clear, i.e., if a congested router is again reached, steps 2-5 is repeated and the new path is found. 7. The latency of the new path is calculated and determined if the constraint is satisfied or violated.
  10. 10. Example: Route from tile 7 to tile 2 Fig 4. CD Matrix Fig 5. new path allocation
  11. 11. Given an MPSoC of heterogeneous cores (with 2 types of cores) and a task graph, assign each task to a certain core type so that the chip wide power constraint and a chip wide performance constraint (in terms of total delay). So that you will find out the number of cores needed for each type. Assume: each task is suited to run on a certain core type for better performance and on the other core type for better power.
  12. 12.  1. For each task, there are 4 set of parameters. So, in our program, we take 4 arrays of length “n” where n corresponds to the number of tasks.  2. The chip wide power constraint (Pcons) is taken to be the average of P1 and P2 where, P1 corresponds to the total sum of powers of tasks corresponding to Type 1 core and P2 corresponds to the total sum of powers of tasks corresponding to Type 2 core.  3. The chip wide delay constraint (Tcons) is taken to be the average of T1 and T2where, T1 corresponds to the total sum of delays of tasks corresponding to Type 1 core and T2 corresponds to the total sum of delays of tasks corresponding to Type2 core.
  13. 13. 1. The power and delay of each task should be less than or equal to 1/n of the fixed constraint. If this satisfies, the corresponding core is mapped to the task. 2. If case 1 is not satisfied, test is run to satisfy either one of the constraint and mapping done accordingly. 3. If both the constraints are not satisfied, i.e., if power or delay of the task is not less than 1/n of the constraints, the core's parameter nearer to the constraint is assigned to the task.
  14. 14. From the above test cases, a factor called Violation Factor (VF) is derived as, the sum of the division of given power and delay to the respective average chip wide power and delay. This factor is calculated for both the core types for a given task. If this value is less than 1, it is a good mapping. If not, there are chances of the mapping to violate the imposed constraints. Let Pavg and Tavg be the average power and delay defined as Pavg = Pcons /n Tavg = Tcons /n VF[j] = (P[j][i]/Pavg)+ (T[j][i]/Tavg) j = 1 or 2 corresponding to the core type. i = 1 to n corresponding to the number of tasks.
  15. 15. 1. Store the given set of parameters from the task graph into the 4 arrays. 2. Calculate chip wide performance and power constraint- Pcons andTcons. 3. Calculate the VF for the given task 4. Compare the VFs corresponding to the different cores and select the core for which VF is the least. 5. Calculate the number of task corresponding to the core type. 6. Decide whether the constraints can be obtained or not. 7. Print the results.
  16. 16. . TASK POWER 1 TIME 1 POWER 2 TIME 2 1 5 mW 50 ms 48 mW 8 ms 2 15 mW 145 ms 80 mW 15 ms 3 20 mW 187 ms 97 mW 23 ms 4 13 mW 108 ms 78 mW 12 ms 5 2 mW 20 ms 23 mW 4 ms P1=55 mW P2=326 mW Pcons=190.5 mW Pavg=38.1 mW T1=510 ms T2=62 ms Tcons=286 ms Tavg=57.2 ms Results: #of type1 core:3 #of type2 core:2 Delay constraint:286.000000 Total delay:216.000000 Power constraint:190.500000 Total power:197.000000 Constraints not met
  17. 17. TASK POWER 1 TIME 1 POWER 2 TIME 2 1 5 mW 30 ms 48 mW 8 ms 2 15 mW 105 ms 55 mW 15 ms 3 20 mW 117 ms 97 mW 23 ms 4 13 mW 78 ms 40 mW 12 ms 5 2 mW 20 ms 23 mW 4 ms Results: #of type1 core:3 #of type2 core:2 Delay constraint:206.000000 Total delay:194.000000 Power constraint:159.000000 Total power:122.000000 Constraints met
  18. 18. • We would like to extend the mapping and routing algorithm to an NoC with Voltage islands. •Here, we have implemented a sequential search, whereas in the future we would like to implement a parallel search thereby improving the performance of the chip. •For mapping, we would like to improve it by considering incremental swapping to reduce the tight latency in case of least prioritized cores.
  19. 19. Things learnt: • NoC router architecture • Routing algorithms • Mapping techniques • Congestion detection algorithms • Current trends in NoC architecture
  20. 20. Thank you