DReAMS Dynamic Reconfigurability Applied to Multi-FPGA Systems Matteo Murgida, Alessandro Panella {matteo.murgida, alessandro.panella}@dresd.org
DReAMS Dynamic Reconfigurability Applied to Multi-FPGA Systems Branch of DRESD project Inherits architectures and tools Automatic workflow from VHDL system description to FPGA implementation VHDL parsing and system simulation System creation over a specific architecture Bitstream creation and download onto FPGAs
Workflow
SPartA A novel algorithm for multi-FPGA partitioning Alessandro Panella [email_address]
Outline Problem description Project goals and contributions What is partitioning? Existing approaches Going deep into the problem SpartA The framework The idea The algorithm Experimental results Future work
Problem description Multi-FPGA - RATIONALE Large designs do not fit into a single chip High performance parallelized applications Our case: apply dynamic reconfigurability Need to break the initial design into several blocks One block corresponds to a single FPGA chip Which inputs/outputs? Which objectives? Which techinques?
Project goals and contributions Analyze existing approaches Obtain a deep knowledge of this -well explored- field Extract basic ideas for a new approach Obtain some terms of comparison Define precisely which problem(s) we cope with Contextualize the problem Focus on our needs Develop a new solution Theoretical background Implementation and evaluation
What is partitioning? Goal Divide a set of interrelated objects into a set of subsets Optimize a specific objective(s) K-way partitioning Given a graph G=(V,E), partition it into  k  subsets V 1 ...V k  such that their intersection is empty and their union = V. Balance constraint: |V i | ≈ |V|/k Aims at minimizing (or maximizing) an objective function Edge-cut Other objectives In general: NP-complete Several heuristics that provide good results have been developed
Existing approaches - a glance Traditional methods Kernighan – Lin and Fiduccia – Mattheyses heuristics Iterative-improvement algorithms Begins with an initial partition and iteratively improve it O(n 3 ) complexity Iterative algorithms Genetic Simulated annealing Multilevel algorithms Clustering -> Initial partitioning -> Refining MeTIS/hMETIS suite: best current results for large flattened graphs partitioning
Going deeper into the problem Two kinds of multi-FPGA partition Topology-aware Architecture topology is an input No optimization of the no. of FPGAs needed Main task: association between the (larger)  system graph  and the (smaller)  architectural graph Topology-free Architecture topology is not provided Input: dimension and communication features of FPGAs Minimization of the number of FPGAs Place and route after partitioning At the moment, we deal with the  Topology-free  problem
SPartA: the framework Input: VHDL system description Output: several VHDL files, one for each block (FPGA) Three main phases: Extract design from VHDL description “Real” partitioning phase (core) Build VHDL files
SPartA: the idea Structural approach Fully exploits the design hierarchy Modules can be treated as single blocks Bases for expansions toward dynamic reconfigurability Objectives Minimize cutsize Minimized the number of used FPGAs Preserving module integrity (*) (*) From Wen-Jong Fang and Allen C.-H. Wu,  Multiway FPGA Partitioning by Fully Exploiting Design Hierarchy
SPartA: the algorithm  1/2 Recursive algorithm (deals with trees) Starts from TOP node Precondition No leaves with dimension > FPGA size At every moment, a node can be: COVERED, UNCOVERED or PARTIALLY COVERED Stop condition Node TOP is COVERED
SPartA: the algorithm  2/2 OPEN ISSUE: Selecting the first node to be inserted into an empty partition Random node Node with overall max communication Node with max communication with its siblings
Results  1/3 Complexity: exponential, due to the recursive nature of the algorithm Execution time however low (tens of seconds for a reasonable large design) EXAMPLE ORIGINAL TREE PARTITIONED TREE
Results  2/3 Evaluation metrics EDGECUT, FILLING and SPLITS Evaluation of the three policies for node selection 18 different trees of varying size
Results  3/3
Future work Algorithm improvement Balancing of last partition First node selection policies More refined “score” function for selecting node Use closeness metrics Comparisons with existing algorithms Expansion SpartA framework development Topology-aware partitioning
The end ANY QUESTIONS?
Chimera Multi-FPGAs Architecture Definition Matteo Murgida [email_address]
Murgida - Outline Introduction Problem description Project Goals State of the Art Project in details Contributions Development Results Future Works Demo
Problem Description Architectural description of a distributed FPGAs environment 3 layers architecture
Project Goals Design the architecture of the most generic distributed system Node definition Interface definition Communication channel definition Design a communication protocol Essential protocol Interrupt based protocol Timeout improvement
State of the Art CONFigurable ElecTronic TIssue (CONFETTI) by EPFL Cellular based architecture PROs: high degree of parallelism, high computational power CONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort Splash 2 by IDA Supercomputing Center Architecture composed by a Sun Sparcstation host, an interface board and “Splash Array” boards PROs: again high parallelism and power CONs: a central host coordinates the computational units, no fault tollerance, no flexibility
Contributions The proposed architecture: Allows several Spartan-3 Starter Boards to communicate and exchange data It is portable to different FPGAs with minimum effort It is the basic infrastructure that will allow  external  partial dynamic reconfiguration
Board Study How to use resources like switches, leds and connectors in the board How to map an IP-Core port with a physical pin of the board Choice of the A2 Expansion Connector to connect two boards
Microblaze Communication Communication between two Microblaze soft-processors Development of a display controller to visualize the data flow
GPIO Insertion Higher architecture portability through the use of the GPIO IP-Core.
Interrupt Controller Insertion Communication protocol improvement by interrupt handling to prevent processor from  busy waiting  Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling
Timeout Malfunctioning due to interference on the communication channel lead to deadlocks Communication protocol is not reliable at all Counter implementation, including the driver used by the processor to lower down raised interrupts Development of a simple application to verify to correctness of the proposed approach
Results A short Demo ...
Demo
Future Works Development of a SystemC/VHDL Co-Simulation Framework Expert system integration
The end ANY QUESTIONS?

3D-DRESD DReAMS

  • 1.
    DReAMS Dynamic ReconfigurabilityApplied to Multi-FPGA Systems Matteo Murgida, Alessandro Panella {matteo.murgida, alessandro.panella}@dresd.org
  • 2.
    DReAMS Dynamic ReconfigurabilityApplied to Multi-FPGA Systems Branch of DRESD project Inherits architectures and tools Automatic workflow from VHDL system description to FPGA implementation VHDL parsing and system simulation System creation over a specific architecture Bitstream creation and download onto FPGAs
  • 3.
  • 4.
    SPartA A novelalgorithm for multi-FPGA partitioning Alessandro Panella [email_address]
  • 5.
    Outline Problem descriptionProject goals and contributions What is partitioning? Existing approaches Going deep into the problem SpartA The framework The idea The algorithm Experimental results Future work
  • 6.
    Problem description Multi-FPGA- RATIONALE Large designs do not fit into a single chip High performance parallelized applications Our case: apply dynamic reconfigurability Need to break the initial design into several blocks One block corresponds to a single FPGA chip Which inputs/outputs? Which objectives? Which techinques?
  • 7.
    Project goals andcontributions Analyze existing approaches Obtain a deep knowledge of this -well explored- field Extract basic ideas for a new approach Obtain some terms of comparison Define precisely which problem(s) we cope with Contextualize the problem Focus on our needs Develop a new solution Theoretical background Implementation and evaluation
  • 8.
    What is partitioning?Goal Divide a set of interrelated objects into a set of subsets Optimize a specific objective(s) K-way partitioning Given a graph G=(V,E), partition it into k subsets V 1 ...V k such that their intersection is empty and their union = V. Balance constraint: |V i | ≈ |V|/k Aims at minimizing (or maximizing) an objective function Edge-cut Other objectives In general: NP-complete Several heuristics that provide good results have been developed
  • 9.
    Existing approaches -a glance Traditional methods Kernighan – Lin and Fiduccia – Mattheyses heuristics Iterative-improvement algorithms Begins with an initial partition and iteratively improve it O(n 3 ) complexity Iterative algorithms Genetic Simulated annealing Multilevel algorithms Clustering -> Initial partitioning -> Refining MeTIS/hMETIS suite: best current results for large flattened graphs partitioning
  • 10.
    Going deeper intothe problem Two kinds of multi-FPGA partition Topology-aware Architecture topology is an input No optimization of the no. of FPGAs needed Main task: association between the (larger) system graph and the (smaller) architectural graph Topology-free Architecture topology is not provided Input: dimension and communication features of FPGAs Minimization of the number of FPGAs Place and route after partitioning At the moment, we deal with the Topology-free problem
  • 11.
    SPartA: the frameworkInput: VHDL system description Output: several VHDL files, one for each block (FPGA) Three main phases: Extract design from VHDL description “Real” partitioning phase (core) Build VHDL files
  • 12.
    SPartA: the ideaStructural approach Fully exploits the design hierarchy Modules can be treated as single blocks Bases for expansions toward dynamic reconfigurability Objectives Minimize cutsize Minimized the number of used FPGAs Preserving module integrity (*) (*) From Wen-Jong Fang and Allen C.-H. Wu, Multiway FPGA Partitioning by Fully Exploiting Design Hierarchy
  • 13.
    SPartA: the algorithm 1/2 Recursive algorithm (deals with trees) Starts from TOP node Precondition No leaves with dimension > FPGA size At every moment, a node can be: COVERED, UNCOVERED or PARTIALLY COVERED Stop condition Node TOP is COVERED
  • 14.
    SPartA: the algorithm 2/2 OPEN ISSUE: Selecting the first node to be inserted into an empty partition Random node Node with overall max communication Node with max communication with its siblings
  • 15.
    Results 1/3Complexity: exponential, due to the recursive nature of the algorithm Execution time however low (tens of seconds for a reasonable large design) EXAMPLE ORIGINAL TREE PARTITIONED TREE
  • 16.
    Results 2/3Evaluation metrics EDGECUT, FILLING and SPLITS Evaluation of the three policies for node selection 18 different trees of varying size
  • 17.
  • 18.
    Future work Algorithmimprovement Balancing of last partition First node selection policies More refined “score” function for selecting node Use closeness metrics Comparisons with existing algorithms Expansion SpartA framework development Topology-aware partitioning
  • 19.
    The end ANYQUESTIONS?
  • 20.
    Chimera Multi-FPGAs ArchitectureDefinition Matteo Murgida [email_address]
  • 21.
    Murgida - OutlineIntroduction Problem description Project Goals State of the Art Project in details Contributions Development Results Future Works Demo
  • 22.
    Problem Description Architecturaldescription of a distributed FPGAs environment 3 layers architecture
  • 23.
    Project Goals Designthe architecture of the most generic distributed system Node definition Interface definition Communication channel definition Design a communication protocol Essential protocol Interrupt based protocol Timeout improvement
  • 24.
    State of theArt CONFigurable ElecTronic TIssue (CONFETTI) by EPFL Cellular based architecture PROs: high degree of parallelism, high computational power CONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort Splash 2 by IDA Supercomputing Center Architecture composed by a Sun Sparcstation host, an interface board and “Splash Array” boards PROs: again high parallelism and power CONs: a central host coordinates the computational units, no fault tollerance, no flexibility
  • 25.
    Contributions The proposedarchitecture: Allows several Spartan-3 Starter Boards to communicate and exchange data It is portable to different FPGAs with minimum effort It is the basic infrastructure that will allow external partial dynamic reconfiguration
  • 26.
    Board Study Howto use resources like switches, leds and connectors in the board How to map an IP-Core port with a physical pin of the board Choice of the A2 Expansion Connector to connect two boards
  • 27.
    Microblaze Communication Communicationbetween two Microblaze soft-processors Development of a display controller to visualize the data flow
  • 28.
    GPIO Insertion Higherarchitecture portability through the use of the GPIO IP-Core.
  • 29.
    Interrupt Controller InsertionCommunication protocol improvement by interrupt handling to prevent processor from busy waiting Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling
  • 30.
    Timeout Malfunctioning dueto interference on the communication channel lead to deadlocks Communication protocol is not reliable at all Counter implementation, including the driver used by the processor to lower down raised interrupts Development of a simple application to verify to correctness of the proposed approach
  • 31.
  • 32.
  • 33.
    Future Works Developmentof a SystemC/VHDL Co-Simulation Framework Expert system integration
  • 34.
    The end ANYQUESTIONS?