4. IntroductionIntroduction
• Hardware / Software Partitioning
o Used in systems with reconfigurable hardware (FPGA) operated in
conjunction with a software processor
o Hardware and Software tasks can execute concurrently
o Partitioning divides task graph into HW executed and SW executed tasks
to reduce time to completion
4
5. IntroductionIntroduction
• Partial Reconfiguration
o ‘Columns’ of FPGA can be configured independently
o Hardware mapped to other columns continues to run during
reconfiguration
• Partial Dynamic Reconfiguration
o Allows reuse of FPGA resources
o However, feasibility of placement no longer guaranteed
6. Target System ArchitectureTarget System ArchitectureGeneral Purpose
Memory
Software
Hardware
(Partial RTR)
Shared Memory
• Software: A processor running
software tasks
• Hardware: An FPGA accelerator
that supports partial
reconfiguration
• Shared Memory: Dedicated
memory used to transfer
input/output data between tasks
6
7. Target System ArchitectureTarget System Architecture
• Shared Memory can be implemented as on-chip or
off-chip dedicated memory
• Tasks mapped to the same device have negligible
communication overhead
• Tasks mapped to different devices incur a HW/SW
communication overhead
• Primary advantage: FPGA task placement reduces
to simple linear placement
8. Criticality of Task PlacementCriticality of Task Placement
• Each HW task occupies one or more adjacent
FPGA columns
• Placement feasibility in not guaranteed even with
an exact algorithm
• Infeasible implementation can result from
scheduling conflicts if not considered during
placement
9. Criticality of Task PlacementCriticality of Task Placement
9
Infeasible
Task Graph
10. Criticality of Task PlacementCriticality of Task Placement
10
Feasible
Task Graph
11. Criticality of Task PlacementCriticality of Task Placement
Infeasible placement
12. Heterogeneous ImplementationsHeterogeneous Implementations
• FPGA contain heterogeneous components:
o Memory Blocks
o Hardware Multipliers
o Embedded Processors
• Placement should consider multiple hardware
implementations of tasks
• Problem: Resources are limited and available in
specific locations on FPGA
12
14. Proposed ApproachProposed Approach
• Exact Algorithm: Integer Linear Programming
o Technique of Optimization given linear constraints
o Constraints: Traditional HW/SW partitioning + Contiguous placement +
Configuration Prefetch
o Implementation on commercial ILP solver (CPLEX) very slow
• Heuristic Formulation:
o Modified KLFM approach
15. Basic KLFM HeuristicBasic KLFM Heuristic
KLFM Loop:
While (more unlocked tasks)
select best task to switch between HW/SW
move & lock best task
update best partition if new partition is better
16. Basic KLFM HeuristicBasic KLFM Heuristic
KLFM Loop:
While (more unlocked tasks)
for (each unlocked task)
for (each alternate implementation)
calculate makespan by physically
aware list scheduling
select & lock best (task, implementation point)
update best partition if new partition is better
18. Experiments on FeasibilityExperiments on Feasibility
Placement-unaware Placement-aware
Test TILP Feasibility TILP THEU
tg1 10 Y 10 11
tg5 25 NO 26 26
Mean-value 21 Y 21 21
tg7 20 Y 20 20
tg10 27 NO 28 29
FFT 25 Y 25 25
tg11 36 NO 38 41
tg12 14 NO 15 18
4-band eq 27 Y 27 27
19. Case Study: JPEG EncoderCase Study: JPEG Encoder
• Resource constraint of 8 columns
• Total area occupied by tasks: 11 columns
• Data collected for a 256x256 color image
Experiment Schedule length (ms)
HW-SW partitioning, no partial RTR 16.74
HW-SW, partial RTR 9.9
HW-SW, partial RTR, perfect prefetch 9.04
Finer-grain graph 7.21
Multiple implementations, single heterogeneous
column
6.82
Best implementation points only 9.58
19
20. ConclusionsConclusions
• Current techniques do not consider one or more placement and
scheduling issues:
o Configuration prefetch
o Feasibility of partition
o Single reconfiguration controller bottleneck
o Multiple Implementations
o Heterogeneous Architecture
• Integer Linear Programming: Exact solution, but very long run-time
• Modified KLFM Heuristic: Almost ideal solution, run-time of minutes of
hundreds of nodes
21. Issues in PlacementIssues in Placement
• Resource bottleneck of a single reconfiguration
controller
• May not be possible to hide reconfiguration
overhead for all tasks
• Cannot apply rectangular packing algorithms due
to gaps in schedule (caused by dependencies)
22. EST Computation AlgorithmEST Computation Algorithm
find earliest time slot where task can
be placed
reconfig start = earliest time instant
that space and controller are
available together
if (( reconfig start + reconfig time) <
dependency time )
EST=earliest time parent
dependencies are satisfied
else
EST=end of reconfiguration
23. ThankThank You !!!You !!!
For More Information click below link:
Follow Us on:
http://vibranttechnologies.co.in/embedded-system-classes-in-mumbai.html