Your SlideShare is downloading. ×
3rd 3DDRESD: DReAMS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

3rd 3DDRESD: DReAMS

337
views

Published on

Published in: Technology, Travel

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
337
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Good morning to everybody and thank you for being here, I am… I’m going to present my thesis work, which is entitled…
  • Transcript

    • 1. Design Methodologies for Dynamic Reconfigurable Multi-FPGA Systems BY Alessandro Panella [email_address] 3-Day DRESD 07/28 – 08/01 2008 Hotel Villa Gina, Goglio, Italy
    • 2. About this thesis (1/2)
      • PROBLEM STATEMENT:
      • Extend the range of application of dynamic reconfigurability techniques from the single FPGA case to multi-FPGA systems
      • NOVELTY
      • Methodology for the design of multi-FPGA systems
        • Dynamic reconfigurability
          • Seen as a solution for implementing area over-requiring applications
          • Only used “when needed”
        • Regularity-driven partitioning for run-time reuse
    • 3. About this thesis (2/2)
      • Major contribution:
        • Development of a multi-FPGA system design flow which exploits dynamic reconfigurability for blocks’ reuse.
      • Useful contributions:
        • Creation of an intermediate representation for structural and hierarchical circuits.
        • Creation of a framework for the extraction of the design from VHDL.
        • Design and implementation of static global layout algorithms.
        • Exploit hierarchy information for regular patterns extraction.
    • 4. Outline
      • Context definition
        • FPGA
        • Multi-FPGA Systems (MFS)
        • Dynamic reconfigurability
      • Related works
        • MFS design flows
        • Dynamic reconfigurable MFS’s
      • Proposed methodology
        • Design extraction
        • Global layout
        • Reuse and Dynamic reconfigurability
      • Experimental results
      • Conclusion and future works
    • 5. Field Programmable Gate Array
      • Re-programmable semi-custom hardware
        • Low Non Recurrent Engineering (NRE) costs
        • Good performances
        • High flexibility
      • Composed of Configurable Logic Blocks (CLB)
        • Xilinx Virtex CLB:
          • 2 slices, each containing two 4-input Look-Up Tables (LUT)
    • 6. Multi-FPGA Systems (MFS)
      • Ensembles of more FPGAs (2 - 1000’s)
      • Motivations:
        • Massively parallel computing
        • Need to implement large applications
        • General trend in VLSI towards multi-core computers
      • Applications:
        • Supercomputing
        • Logic emulation
        • Neural networks, …
      • Terminology:
        • Architecture : physical cluster of FPGAs
        • Application : programmed functionality
        • System : architecture + application
    • 7. MFS topologies (1/2)
      • Connections:
        • Hardwired vs. Programmable
        • Dedicated vs. Shared (bus, point to point)
      • Complete graph (Clique)
        • Direct connection between any two chips
        • Planarity; Pin requirements
      • Mesh : 4(8)-neighbor pattern
        • Expandability
        • No fixed length path
        • Communication logic in intermediate chips
      PRO CON
    • 8. MFS topologies (2/2)
      • Crossbar : logic bearing chips and routing chips
        • Total (one routing chip)
        • Partial (several routing chips)
        • Equal communication delays
        • Low scalability
      • Hybrid : combine benefits of the two approaches
        • Example: Complete Graph Partial Crossbar (HCGP) (from Khalid, M.: Routing Architecture and Layout Synthesis for Multi-FPGA Systems, Ph.D. Thesis, University of Toronto, 1999)
    • 9. Reconfigurability
      • Reconfiguration: altering the location or functionality of a system element (H. Estrin, 1960)
      • FPGA: suitable physical ground
      • Partial vs. Total
      • (Partial) Dynamic vs. Static:
        • Only some parts of the system take part in each reconfiguration
        • The execution of the system does not cease
      • Motivations and applications
        • Provide a larger virtual area
        • React to sudden and frequent changes in applications needs
        • Fault tolerance
    • 10. Dynamically Reconfigurable MFS’s
      • Rationale: expand the capabilities of static MFS’s
        • Going beyond MFS physical limitations
        • Provide a high level of flexibility
          • E.g. in logic emulation: dynamic fault fixing
      • Partial vs. Total reconfiguration in MFS
      • Two main scenarios (not exclusive)
        • Reconfiguration of logic chips
        • Reconfiguration of routing chips
          • The interconnections are dynamically mutable
          • Components can be reused
    • 11. Design hierarchy
      • Application composed of:
        • Blocks
          • Can have sub-blocks
        • Nets
          • Block-to-block
          • Block-to-interface
      • Advantages:
        • Handle the complexity of design
        • Reuse of modules
          • IP-Cores libraries
      Block-to-block net Block-to-interface net
    • 12. What’s next
      • Context definition
        • FPGA
        • Multi-FPGA Systems (MFS)
        • Dynamic reconfigurability
      • Related works
        • MFS design flows
        • Dynamic reconfigurable MFS’s
      • Proposed methodology
        • Design extraction
        • Global layout
        • Reuse and Dynamic reconfigurability
      • Experimental results
      • Conclusion and future works
    • 13. Related works - MFS design flow
      • All MFS design flows have a similar structure
        • Different algorithms used in each phase
      • Examples: Hauck (a) and Kahlid (b)
      • Global layout tasks: partitioning, placement and routing
      • Hauck , S.: Multi-FPGA Systems, Ph.D. Thesis, University of Washington, 1995
      • Kahlid , M.: Routing Architecture and Layout Synthesis for Multi-FPGA Systems, Ph.D. Thesis, University of Toronto, 1990
    • 14. Complete MFS design flows (a)
      • Integrated solution to partitioning, placement and routing
        • Recursive bi-partitioning
          • Multilevel approach
            • Clustering and refinement phases
        • Partition orderings for placement
          • Identify the bottlenecks in the architecture
          • Assign the two initial partitions to the least connected parts of the architecture, and so on recursively
        • The connections are routed as the bisections are computed
      • PROS: the architecture is considered
      • CONS: no flexibility on routing given partitioning and placement
    • 15. Complete MFS design flows (b)
      • Partitioning: recursive bisection using Fiduccia-Mattheyses heuristic
      • Placement: dependent on the topology
        • Mesh: force-directed
        • Crossbar: trivial task, the FPGAs have the same distance
      • Routing: two approaches
        • General (obtain a graph from the architecture)
        • Specific (fitted on the particular MFS topology)
      • PROS: uses existent effective and robust algorithms
      • CONS: stress on routing and topology evaluation
    • 16. Partial MFS design flows
      • Address only some phases of the design
        • Usually partitioning and placement
      • Iterative approaches
        • Genetic algorithm [Hidalgo et al., DSD ‘02]
        • Simulated annealing
        • [Roy at al., ICCAD ’93; Vicente et al., FPL ‘99]
      • Hierarchical approaches
        • Exploit the design hierarchy in partitioning
        • Behrens et al., ICCAD ‘96
          • Hierarchy exploration heuristic
        • Fang et al., TODAES ‘00
          • Hierarchy extraction from Verilog spec.
          • Set-covering procedure
    • 17. Dynamic Reconfigurable MFS
      • Extraction of a directed task graph from VHDL
      • Task graph divided into time segments
        • Using a non-linear programming model
      • Each segment is spatially partitioned
      [ Ouaiss et al. , An Integrated Partitioning and Synthesis System for Dynamically Reconfigurable Multi-FPGA architectures, 1998]
      • Dynamic?
    • 18. What’s next
      • Context definition
        • FPGA
        • Multi-FPGA Systems (MFS)
        • Dynamic reconfigurability
      • Related works
        • MFS design flows
        • Dynamic reconfigurable MFS’s
      • Proposed methodology
        • Design extraction
        • Global layout
        • Reuse and Dynamic reconfigurability
      • Experimental results
      • Conclusion and future works
    • 19. Proposed methodology
      • Multi-FPGA design flow
      • Three main phases
        • Design extraction
        • Static Global Physical Layout
          • Partitioning
          • Placement
          • Routing
        • Reuse through Dynamic Reconfigurability
      • Reuse introduces extra delays
        • Reconf. times, sequential execution…
        • Only adopted when needed
        • In such case, the introduced delay has to be minimized
    • 20.
      • Input: VHDL description
      • Output: Intermediate representation
        • Ad hoc created data structure
      • Two sub-phases:
        • VHDL preprocessing
        • VHDL structural parsing
      Design Extraction
    • 21. Intermediate representation
      • C++ data structure
      • Contains both structural and hierarchical information
      • Graphs implemented using the Boost Graph Library
      • Container class provides an API
    • 22. VHDL Parsing
      • VHDL preprocessing: obtain a pure structural VHDL description
        • Features of each component are retrieved using vendors synthesis tools (i.e. Xilinx XST, Synplify PRO)
      • Create the intermediate representation from the pure VHDL description
    • 23. Example Hierarchy Flattened view DES encryption core (part of the 3DES core circuit)
    • 24. Static Global Layout
      • This phase addresses Partitioning and Placement
      • Two implemented approaches:
        • Integrated P&P
        • Sequential P&P
    • 25.
      • Simulated annealing algorithm
        • Iterative randomized approach
          • Suitable to cope with high dimesionality problems
          • Partitioning + Placement is such a problem
        • Aim: minimize a cost function f
        • The algorithm starts with a “high” temperature T
        • At each iteration
          • M random moves are performed
          • The move if accepted ( Metropolis criterium )
            • Always if the cost decreases or remains equal
            • With probability if the cost increase
          • T is decreased by a cooling factor α
        • Stop after S consecutive non-accepted moves
      Integrated P&P
    • 26. Annealing implementation
      • Solution: array [c i ] , node i is placed in FPGA c i
      • Cost: Weighted Estimated Wire Length (WEWL)
      • Random move: single-node or swap, with equal probability
      • Constraints:
        • Area constraint
        • I/O Pin constraint
        • Handled with penalties
    • 27. Sequential P&P
      • Partitioning: bottom-up clustering
      • 1-to-1 Placement: annealing
        • Simplified version of the integrated P&P algorithm
      • CLUSTERING:
      • Initialization: each node is considered as a cluster
      • At each iteration
        • Choose two nodes on the basis of a metric
        • Collapse them
      • Stop when
        • Only one cluster is left
        • No clusters can be formed due to
          • Area constraint
          • I/O Pin constraint
    • 28. Clustering metrics
      • Connection :
      • Communication Ratio :
        • Internal comm.
        • External comm.
      • Communication density :
    • 29. Blocks reuse
      • Problem: application does not fit onto the architecture
        • Reuse similar parts of the circuit in order to save space
      • Def: dynamically-interconnected structure
      • Architectural scenarios
        • Bus
        • Crossbar
    • 30. Isomorphic clusters
      • Which parts of the structure consider for reuse?
      • Def. Isomorphic Clusters
        • Substructures which contain the same blocks having the same connections
        • Example
      • Two subproblems
        • Finding isomorphic clusters
        • Select the ones to reuse (and how many times)
    • 31. Isomorphic clusters extraction (1/2)
      • Regularity driven clustering
      • Def. type of a node : component which the node is instance of
      • If two nodes selected for collapsing have the same parent
        • Look for nodes with the same type of the parent in the hierarchy
        • Execute the same collapsing operation
        • Assign the same type to the newly created cluster s
      • Clustering itself benefits from this enhancement
        • Problem of standard clustering: lack of global metric
        • Regularity provides global information
    • 32. Isomorphic clusters extraction (2/2)
      • The key feature is the assignment of a “type” to clusters
      • Example:
    • 33. Blocks reuse choices
      • Choose which blocks to reuse
      • Difficulty: high complexity due to hierarchical clusters
        • Some clusters contains others
      • Solution
        • ILP model fast even for a high number of nodes
        • Run the ILP model on each “cut” of the dendrogram
        • Each cut is a flatten structural view of the application
    • 34. ILP model for blocks reuse
      • x i : number of times cluster type t i is reused (= no. of needed reconfigurations)
    • 35. What’s next
      • Context definition
        • FPGA
        • Multi-FPGA Systems (MFS)
        • Dynamic reconfigurability
      • Related works
        • MFS design flows
        • Dynamic reconfigurable MFS’s
      • Proposed methodology
        • Design extraction
        • Global layout
        • Reuse and Dynamic reconfigurability
      • Experimental results
      • Conclusion and future works
    • 36. Experiments
      • Test circuit description (slide 37)
      • Integrated vs. Sequential partitioning & placement
        • Methodologically, both approaches are valid
        • They are compared from a numerical point of view
          • Partitioning evaluation (slide 38)
          • Placement evaluation (slide 39)
      • Sequential P&P vs. Metis (slide 40)
        • Provide a comparison with an external approach
      • Blocks reuse evaluation (slide 41)
        • Execution time
        • Example of application
    • 37. Results: test circuits
      • Triple-DES encryption+decryption core (3DES)
      • Finite Impulse Response filter (FIR)
      • Noekeon cipher (NOEK)
      • Composed module FIR+3DES
    • 38. Integrated vs. Sequential P&P (1/2)
      • Partitioning evaluation
      NOTE : by setting the distance between any two FPGAs equal to 1, the integrated annealing approach is actually a partitioning algorithm
    • 39.
      • Placement evaluation (on mesh architectures)
      • Integrated P&P
      • Sequential P&P
      • v
      Integrated vs. Sequential P&P (2/2)
    • 40. Clustering Vs. Metis
    • 41. Results: ILP model solving Timing results
      • ILP result - example :
      • 3DES-FIR circuit
      • Conn metric
      • 4 FPGAs of 600 slices needed
      • Only 3 are available
      • Adopt reuse
      • Dendrogram cuts 2-7 provides the lowest estimated rec. time
    • 42. What’s next
      • Context definition
        • FPGA
        • Multi-FPGA Systems (MFS)
        • Dynamic reconfigurability
      • Related works
        • MFS design flows
        • Dynamic reconfigurable MFS’s
      • Proposed methodology
        • Design extraction
        • Global layout
        • Reuse and Dynamic reconfigurability
      • Experimental results
      • Conclusion and future works
    • 43. Conclusion: contributions
      • Major contribution:
        • Development of a multi-FPGA systems design flow which exploits dynamic reconfigurability for blocks reuse while minimizing the estimated execution time.
      • Useful contributions:
        • Creation of a intermediate representation for structural and hierarchical circuits.
        • Creation of a framework for the extraction of the design from VHDL.
        • Design and implementation of static global layout algorithms.
        • Exploit hierarchy information for regular patterns extraction.
      • The proposed approaches have been validated through experimental evaluations
    • 44. Conclusion: future works
      • Improvements
        • Go beyond the inherent greediness of clustering
        • More powerful closeness metrics
        • More accurate time estimation function for blocks reuse
      • Additions
        • Development of a robust and effective routing algorithm for both static and dynamic implementations
        • Partitioning and placement for dynamically-interconnected structures
        • Binding and scheduling of application blocks on the instantiated clusters
    • 45. The end.
      • Questions?
    • 46. That’s all folks!
      • Thank you.
      • How ‘bout a funny joke?