On using BS to improve the
Upcoming SlideShare
Loading in...5
×
 

On using BS to improve the

on

  • 566 views

Talk delivered to PhD students at the Tallinn Technical University in May 2009

Talk delivered to PhD students at the Tallinn Technical University in May 2009

Statistics

Views

Total Views
566
Views on SlideShare
566
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

On using BS to improve the On using BS to improve the Presentation Transcript

  • Tallinn Technical University :: May 4th 2009 This presentation is available at http://www.slideshare.net/josemmf Tallinn Technical University :: May 5th 2009 This presentation is available at http://www.slideshare.net/josemmf On using BS to improve the reliability and availability of reconfigurable hardware J. M. Martins Ferreira [ jmf@fe.up.pt ] FEUP / DEEC - Rua Dr. Roberto Frias 4200-537 Porto - PORTUGAL M. G. Gericota, G. R. Alves, M. Silva, J. M. Ferreira, “Reliability and Avaliability in Reconfigurable Computing: A Basis for a Common Solution,” IEEE Transactions on VLSI Systems , Vol. 16, No. 11, pp. 1545-1558 , Nov. 2008.
  • Outline of this talk
    • Introduction
    • Concurrent replication of active CLBs
    • On-line structural concurrent test (better reliability )
    • Defragmentation (better availability )
    • Conclusion
    • Motivation
    • Causes of failure in FPGAs
    Introduction
  • Motivation: An old problem becomes more important
    • Dynamically reconfigurable FPGAs:
      • Production tests cannot guarantee fault-free operation
      • Application areas include mission-critical systems
      • The cost / benefit of spatial redundancy is different from static implementations
  • Motivation: An old problem becomes more important
  • Causes of failure in FPGAs
    • Post-production failure modes may be permanent or temporary ― examples:
      • Electromigration phenomena may lead to permanent physical damage
      • Single-event upsets (SEUs) may cause permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)
    • The principle
    • How it works
    • Resources required (time, space)
    Concurrent replication of active CLBs
  • Concurrent replication of CLBs: The principle
    • The basic idea underlying release-to-test strategies consists of replicating a given
    functional block in another area, (non-intrusively), and making the original resources available for test
  • Concurrent replication of CLBs: The principle
    • Concurrent fault detection based on release-to-test approaches must provide functional and state replication
    • Replication at CLB-level
      • Facilitates state transfer and requires a minimal amount of spare resources
      • The relative position of the replicated CLB and its replica has an impact on propagation delay
    CLB IOB
  • Concurrent replication of CLBs: How it works
    • General replication principle – phase one :
      • Copy the internal configuration of the replicated CLB into the replica CLB and place the inputs of both CLBs in parallel
  • Concurrent replication of CLBs: How it works
    • General replication principle – phase two :
      • Place the outputs of both CLBs in parallel (the replicated CLB may then be disconnected and made available for testing)
  • Concurrent replication of CLBs: Replication aid block
    • Supports state transfer in synchronous gated-clock circuits
  • Replication flow: Time & space needed 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 22,444 26 094 Total 3,438 3 986 Disconnect the original CLB inputs 1,146 1 333 Disconnect the original CLB outputs 3,550 4 129 Place the CLB outputs in parallel 1,906 2 217 Disconnect all the auxiliary relocation circuit signals 1,844 2 145 Connect the clock enable inputs of both CLBs 0,238 277 BY_C=0 0,238 277 CC=0 0,379 441 BY_C=1 & CC=1 9,705 11 289 Copy the internal logic functionality and place the input signals in parallel Time (ms) No. of bytes Steps
    • Fault model, test configurations
    • Test application
    • Rotation and release for test strategy
    • Fault detection latency
    On-line structural concurrent test
  • Fault model and test configurations
    • A hybrid fault model (stuck-at / functional) was adopted and the two CLB slices (each with 13 inputs and 6 outputs) are tested in parallel
    20,539 23 889 40 Total 0,440 512 2 6 th 0,527 613 2 5 th 0,545 634 2 4 th 0,536 623 2 3 rd 2,678 3 115 16 2 nd 15,813 18 392 16 1 st Time (ms) No. of bytes Number of test vectors Number of configurations
  • Test application
    • CLB testing via BS:
      • Test vector application is done through a 13-bit user test data register
      • Response capturing takes place through unused BS cells
  • Rotation strategy
    • Vertical rotation has an advantage in the case of arithmetic circuits that use the dedicated carry interconnection between (vertically) adjacent CLBs
    • In the general case, we should consider such factors as the number of circuits with high fanout and the shape / orientation of the implementation
  • Replicate and release-to-test in a 24-bit counter (example)
  • Replicate and release-to-test in a 24-bit counter (example)
  • Rotation strategy: ITC’99 benchmarks 150 11 245 4787 54 32+2 B14 4 1 53 343 10 10+2 B13 0 0 121 1037 6 5+2 B12 4 1 31 484 6 7+2 B11 0 0 17 190 6 11+2 B10 0 0 28 160 1 1+2 B09 0 0 21 168 4 9+2 B08 6 2 49 422 8 1+2 B07 0 0 9 61 6 2+2 B06 16 4 34 977 36 1+2 B05 14 4 66 606 8 11+2 B04 0 0 30 150 4 4+2 B03 0 0 4 29 1 1+2 B02 0 0 5 47 2 2+2 B01 Segments Lines # FF # gates # PO # PI Reference Carry logic Logic Circuit
  • Rotation strategy: ∆f and size for the ITC’99 circuits 16,8 6 070 485 5 195 444 -47,8 -13,5 B14 28,6 332 954 258 827 -42,8 -4,3 B13 27,9 1 631 953 1 275 804 -1,2 0,0 B12 22,8 614 093 500 261 -36,0 -10,5 B11 25,5 245 455 195 571 -7,6 -7,5 B10 15,8 129 855 112 107 -4,9 -1,8 B09 18,8 178 339 150 093 -5,8 -5,8 B08 20,0 425 214 354 367 -37,8 -23,6 B07 18,1 53 503 45 291 0,0 -2,7 B06 13,7 1 286 031 1 130 985 -36,9 -17,3 B05 21,3 665 419 548 595 -29,3 -6,1 B04 14,7 138 484 120 705 -4,9 -1,9 B03 51,4 10 623 7 016 0,0 0,0 B02 16,0 56 102 48 350 0,0 -5,5 B01 Horizontal Vertical Horizontal Vertical Ratio size of the reconf. files by CLB (%) (horizontal>vertical) Size of the reconfiguration files (bytes) Maximum ∆f (%) Ref.
  • Fault detection latency
    • The duration of a complete rotation cycle depends on the device size and on the reconfiguration and test times
    • The fault detection latency alternates between a minimum and a maximum value, according to the rotation direction :
      • MAX FDL = [(#CLB ROWS x #CLB COLS )-1] x 2 x ( Δ RECONF + Δ TEST )
      • MIN FDL = 2 x ( Δ RECONF + Δ TEST )
  • Fault detection latency 34,820 40500 Total 15,813 18392 Disconnect the original CLB inputs and setup test configuration 1,146 1333 Disconnect the original CLB outputs 3,550 4129 Place the CLB outputs in parallel 1,906 2217 Disconnect all the auxiliary relocation circuit signals 1,844 2145 Connect the clock enable inputs of both CLBs 0,238 277 BY_C=0 0,238 277 CC=0 0,379 441 BY_C=1  CC=1 9,705 11 289 Copy logic f unctionality and parallel input signals Time (ms) 20MHz TCK # of bytes Synchronous circuits with clock enable [With the replication aid circuit ] 30,625 35621 Total 15,813 18392 Disconnect of the original CLB inputs and setup test configuration 0,923 1073 Disconnect of the original CLB outputs 3,433 3993 Place of the CLB outputs in parallel 10,457 12163 Copy of the internal logic functionality and place of the input signals in parallel Time (ms) 20MHz TCK # of bytes Synchronous circuits with free-running clock and combinational circuits [Without the replication aid circuit]
  • Worst-case fault detection latency (XCV200) The mean time to test the full CLB matrix is also the worst-case fault detection latency 4,726 5 497 Total 0,440 512 6 th 0,527 613 5 th 0,545 634 4 th 0,536 623 3 rd 2,678 3 115 2 nd Time (ms) 20MHz TCK # of bytes # of configurations File size and reconfiguration time of the test configurations 0,066 520 13 40 Time (ms) 20MHz TCK Total (bits) Length (bits) # of test vectors Shifting time for test vector application 4,088 40 1 022 Time (ms) 20MHz TCK # of test vectors # of cells of the BS register in a XCV200 Shifting time for the test vector responses from a CLB under test 26 472,235 ms @ TCK = 33 MHz 43 679,188 ms @ TCK = 20 MHz Occupation type: 25% synchronous, 50% combinational, 25% empty Mean time for the test of a 1176 CLBs matrix
    • The importance of floor planning
    • Why (de)fragmentation?
    • Can concurrent replication help?
    Defragmentation
  • Availability vs. floor planning performance
    • Good dynamic floor planning management may enable the implementation of applications that in total would require more than 100% of the FPGA resources
  • Fragmentation: Why?
    • The absence of faults does not guarantee acceptable availability , namely when function swapping / partial reconfiguration occurs frequently
    • Insufficient contiguous resources will delay incoming functions
  • Can concurrent replication help?
    • Concurrent replication of active CLBs may be used to defragment the FPGA and minimise the implementation delay to incoming functions
      • Defragmentation is performed concurrently with all running functions (no need to halt their execution)
      • Coherency of the register contents is guaranteed, preserving all state information
    • Summary
    • Research topics
    Conclusion
  • Summary
    • Concurrent replication offers a powerful and non-intrusive solution to improve reliability and availability of reconfigurable hardware
    • Paralleling CLB inputs and outputs doesn’t create any problem
    • Boundary-scan provides a valuable contribution to implement an on-line concurrent structural test strategy
  • Research topics
    • Concurrent replication of active CLBs offers a powerful tool for defragmentation purposes, but the higher-level strategy is still missing
    • All aspects of the proposed solutions were validated in practice (lab experimentation), but a software tool to fully automate the reconfiguration process is still missing
  • Tallinn Technical University :: May 4th 2009 This presentation is available at http://www.slideshare.net/josemmf Tallinn Technical University :: May 5th 2009 This presentation is available at http://www.slideshare.net/josemmf On using BS to improve the reliability and availability of reconfigurable hardware Thanks for your attention! J. M. Martins Ferreira [ jmf@fe.up.pt ]