Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

bluespec talk


Published on

Bluespec is a language based on haskell for designing VHDL and verilog hardware.

Published in: Technology, Design
  • Be the first to comment

bluespec talk

  1. 1. Synthesis of Synchronous Assertions with Guarded Atomic Actions MIT && Bluespec, Inc. guys Presented by: Suman Karumuri Andy Bartholomew
  2. 2. Life Cycle of a chip
  3. 3. Quarks to Parallel Universes … <ul><li>PFET/NFET </li></ul><ul><li>Transistor ( 2 FETS) </li></ul><ul><li>NAND / OR / NOT gates. </li></ul><ul><li>Circuits </li></ul><ul><li>Modules </li></ul><ul><li>Integrated Circuits (IC) </li></ul><ul><li>ASIC’s / Chip </li></ul>
  4. 4. Birth of a chip <ul><li>Requirements </li></ul><ul><li>Design </li></ul><ul><li>Coding </li></ul><ul><li>Testing and simulation. </li></ul><ul><li>Formal verification. </li></ul><ul><li>Synthesis </li></ul>25 % time 75 % time
  5. 5. Birth of a chip <ul><li>Requirements </li></ul><ul><li>Design </li></ul><ul><li>Coding </li></ul><ul><ul><li>HDL, RTL. </li></ul></ul><ul><ul><li>Verilog HDL </li></ul></ul><ul><ul><li>VHDL </li></ul></ul><ul><ul><ul><li>System (transistor level) </li></ul></ul></ul><ul><ul><ul><li>Behavioral (expressions) </li></ul></ul></ul><ul><ul><ul><li>Structural (functions) </li></ul></ul></ul><ul><ul><ul><li>OO (Regular Languages) </li></ul></ul></ul><ul><ul><li>SystemC </li></ul></ul><ul><ul><li>Lava, Bluespec (High level languages). </li></ul></ul><ul><li>Testing and simulation </li></ul><ul><li>Formal verification. </li></ul><ul><li>Synthesis </li></ul>
  6. 6. Birth of a chip <ul><li>Requirements </li></ul><ul><li>Design </li></ul><ul><li>Coding </li></ul><ul><li>Testing and simulation </li></ul><ul><ul><li>Software </li></ul></ul><ul><ul><li>Hardware </li></ul></ul><ul><li>Formal verification </li></ul><ul><ul><li>Model Checking </li></ul></ul><ul><ul><li>Proving programs </li></ul></ul><ul><li>Synthesis </li></ul>3% of test space This paper Turing Award 2008
  7. 7. Birth of a chip <ul><li>Requirements </li></ul><ul><li>Design </li></ul><ul><li>Coding </li></ul><ul><li>Testing and simulation </li></ul><ul><li>Formal verification </li></ul><ul><li>Synthesis (Burn the design on FPGA) </li></ul><ul><ul><li>Chip Area </li></ul></ul><ul><ul><li>Power consumption </li></ul></ul><ul><ul><li>Minimal number of transistors </li></ul></ul><ul><ul><li>Speed </li></ul></ul>
  8. 8. Blue spec
  9. 9. Motivation <ul><li>SystemC lessons </li></ul><ul><ul><li>Single assignment. </li></ul></ul><ul><ul><li>No state. </li></ul></ul><ul><ul><li>No destructive assignment. </li></ul></ul><ul><ul><li>Chaining of states. </li></ul></ul><ul><ul><li>Weak Type system. </li></ul></ul><ul><li>Lava lessons. </li></ul><ul><ul><li>Haskell. ( Functional, Monads, Polymorphic Type inference). </li></ul></ul><ul><ul><li>Modules. </li></ul></ul><ul><li>Bluespec </li></ul><ul><ul><li>Full fledged language instead of haskell modules. </li></ul></ul><ul><ul><li>“ Behavioral model is atomic actions with guards on state”. </li></ul></ul><ul><ul><ul><li>Data flow model </li></ul></ul></ul><ul><ul><li>OO for Reuse. </li></ul></ul>
  10. 10. Bluespec -> Chip Extended Haskell TRS Verilog Or C RTL Synthesis Concurrency and atomicity Correct Programs TRS: Term Rewriting system RTL: Register transfer language.
  11. 11. Bluespec Language <ul><li>Extended Haskell + Bit Vectors (Data types) </li></ul><ul><li>No clocks. </li></ul><ul><li>Modules for OO. </li></ul><ul><ul><li>Rules </li></ul></ul><ul><ul><li>Methods </li></ul></ul><ul><ul><li>Scheduler </li></ul></ul><ul><li>Data Flow language. </li></ul><ul><ul><li>Guarded atomic actions. </li></ul></ul>
  12. 12. Rules <ul><li>Atomic Expressions. </li></ul><ul><li>Execute when the guard is true. </li></ul><ul><li>Run for 1 clock cycle. </li></ul><ul><li>Local to a module (private methods). </li></ul><ul><li>Can call methods. </li></ul><ul><li>rule sync_cache(state == Synchronize); </li></ul><ul><li>case (cache[index]) matches </li></ul><ul><li> tagged Valid {.tag, .data, .isDirty}: </li></ul><ul><li> if (isDirty) begin </li></ul><ul><li>writeToMemory({index, tag}, data); </li></ul><ul><li> end </li></ul><ul><li>default: </li></ul><ul><li> noAction; </li></ul><ul><li>endcase </li></ul><ul><li>endrule </li></ul>Guard Method Call
  13. 13. Methods <ul><li>Set of commands invoked by a rule or other methods. </li></ul><ul><li>Public methods in C++. </li></ul><ul><li>Perform an Action, Value or ActionValue. </li></ul><ul><li>method Action get_data(Address addr) </li></ul><ul><li>if (state == Ready); </li></ul><ul><li>Index i = get_index(addr); </li></ul><ul><li>case (cache[i]) </li></ul><ul><li>tagged Valid {.tag, .data, .isDirty}: </li></ul><ul><li>if (tag == get_tag(addr)) </li></ul><ul><li> sendToProc(addr, data); //hit </li></ul><ul><li>else //conflict miss </li></ul><ul><li> getFromMemory(addr); </li></ul><ul><li>endcase </li></ul><ul><li>endmethod </li></ul>Another way of adding guards
  14. 14. Modules <ul><li>Consists of Interfaces, Rules and Method implementation. </li></ul><ul><li>Enables Reuse. </li></ul><ul><li>interface CacheController; </li></ul><ul><li>method Action get_data(Address addr); </li></ul><ul><li>method Action write_data(Address addr,Value v); </li></ul><ul><li>method Action sync(); </li></ul><ul><li>method Action flush(); </li></ul><ul><li>endinterface </li></ul>
  15. 15. Summary: Bluespec All state (e.g., Registers, FIFOs, RAMs, ...) is explicit. Behavior is expressed in terms of guarded atomic actions on the state: Rule: condition  action Rules can manipulate state in other modules only via their interfaces. interface module
  16. 16. Scheduler <ul><li>Generates a static schedule by looking at guard conditions on rules and methods. </li></ul><ul><li>Ensures atomicity. </li></ul><ul><li>Runs non-conflicting rules concurrently. </li></ul><ul><li>Rules are scheduled locally. </li></ul><ul><li>Methods are scheduled globally. </li></ul>
  17. 17. Compiler model Compiler generates a scheduler to pick a non-conflicting subset of “ready” rules Muxing for each state element  1  n Modules (Current state) Modules (Next state) Rules      n  n guard action Scheduler  1  n  1  n “ CAN_FIRE” “ WILL_FIRE”
  18. 18. SVA + Blue spec = BSV
  19. 19. System Verilog Assertions (SVA) <ul><li>A temporal logic. </li></ul><ul><li>Validate behavior of a design. </li></ul><ul><li>Uses: test benches, formal verifiers , simulation. </li></ul><ul><li>Sequences, Properties. </li></ul>
  20. 20. Sequence <ul><li>Simple Sequence </li></ul><ul><li>sequence seq; </li></ul><ul><ul><li>(x ##1 y) </li></ul></ul><ul><ul><li>or </li></ul></ul><ul><ul><li>(x ##1 y ##1 z); </li></ul></ul><ul><li>endsequence </li></ul>
  21. 21. Sequence <ul><li>Simple Sequence </li></ul><ul><li>sequence seq; </li></ul><ul><ul><li>(x ##1 y) </li></ul></ul><ul><ul><li>or </li></ul></ul><ul><ul><li>(x ##1 y ##1 z); </li></ul></ul><ul><li>endsequence </li></ul><ul><li>True on CC1. </li></ul>
  22. 22. Sequence <ul><li>Simple Sequence </li></ul><ul><li>sequence seq; </li></ul><ul><ul><li>(x ##1 y) </li></ul></ul><ul><ul><li>or </li></ul></ul><ul><ul><li>(x ##1 y ##1 z); </li></ul></ul><ul><li>endsequence </li></ul><ul><li>True on CC2. </li></ul>
  23. 23. Complex Sequence <ul><li>sequence reqack; </li></ul><ul><li>req && data_in == 0 </li></ul><ul><li>##1 data_in > 0 [*3:5] </li></ul><ul><li>##1 ack && data_in == 0; </li></ul><ul><li>endsequence </li></ul>First clock cycle Starting Second Clock Cycle for the next 3-5 clock cycles Finally data_in is low when we get an ack.
  24. 24. Properties <ul><li>Made up of sequences. </li></ul><ul><li>Implication operator |-> . </li></ul><ul><li>sequence |-> property </li></ul><ul><li>property goodbuffer; </li></ul><ul><li>(req ##1 data_in > 0) </li></ul><ul><li>|-> !fifo_in.full; </li></ul><ul><li>endproperty </li></ul><ul><li>sequence |=> property </li></ul>
  25. 25. Assertions <ul><li>Properties are checked via assertions. </li></ul><ul><li>always assert property (goodbuffer); </li></ul>
  26. 26. Bluespec System Verilog(BSV)
  27. 27. Challenges <ul><li>SVA model is clocked. </li></ul><ul><li>Bluespec model is not. </li></ul><ul><li>Some schedules will not be valid; designer intervention required. </li></ul><ul><ul><li>Achieved through scheduler configuration. </li></ul></ul>
  28. 28. Compiling assertions <ul><li>Sequences and properties are compiled into FSMs. </li></ul><ul><li>An assertion is turned into a module. </li></ul><ul><li>Assertions are run as rules. </li></ul><ul><li>We can use the same Bluespec compilation techniques as before. </li></ul>
  29. 29. Compiling sequences <ul><li>x ##1 y </li></ul>x y end
  30. 30. Assertions in hardware <ul><li>Properties can run across multiple clock cycles. </li></ul><ul><li>In software, we just spawn a concurrent thread to check the assertion. </li></ul><ul><li>You can’t do that in hardware. </li></ul><ul><li>Instead we create multiple copies of the same FSM along the length of the sequence. </li></ul>
  31. 31. Assertions in hardware <ul><li>always assert x ##1 y </li></ul>or t=0 x y end x y end x y
  32. 32. Assertions in hardware <ul><li>always assert x ##1 y </li></ul>or t=1 x y end x y end x y
  33. 33. Assertions in hardware <ul><li>always assert x ##1 y </li></ul>or t=2 x y end x y end x y
  34. 34. Composing sequences <ul><li>Simple booleans can be generalized into a sequence module. </li></ul>
  35. 35. Other combinations
  36. 39. General model of an assertion
  37. 40. Coverage <ul><li>A bunch of productions in SVA are not covered in BSV </li></ul><ul><li>Recursion </li></ul><ul><ul><li>Solve halting problem to generate FSMs. </li></ul></ul><ul><ul><li>Can be used when recursion depth can be statically determined. </li></ul></ul><ul><li>Disable iff and other properties. </li></ul>
  38. 41. Case study
  39. 42. functional assertion <ul><li>“ On a write request only one cache-way is written” </li></ul><ul><li>property goodWriteRequest; </li></ul><ul><li>write_request |=> </li></ul><ul><li>if (cache_tag_resp.next_evict_way0) </li></ul><ul><li>isWrite(way0_req) </li></ul><ul><li> && !isWrite(way1_req) </li></ul><ul><li>else isWrite(way1_req) </li></ul><ul><li> && !isWrite(way0_req); </li></ul><ul><li>endproperty </li></ul>
  40. 43. Performance assertion <ul><li>“ When a cpu request is made a cache memory read is made in the same cycle. For read requests, either main memory is read or result returned in next cycle.” </li></ul><ul><li>property cpu_read_perf; </li></ul><ul><li>read_request |-> </li></ul><ul><li> isRead(way0_req) </li></ul><ul><li> && isRead(way1_req) </li></ul><ul><li> && isRead(tag_req) </li></ul><ul><li> ##1 isRead(c2memory_req) </li></ul><ul><li> || isRead(c2p_data); </li></ul><ul><li>endproperty </li></ul>
  41. 44. Statistic-gathering assertion! <ul><li>You couldn’t do this before! </li></ul><ul><li>Counting read hits </li></ul><ul><li>property count_read_hits; </li></ul><ul><li>read_request |=> </li></ul><ul><li> isValid(c2p_data); </li></ul><ul><li>endproperty </li></ul><ul><li>always assert property (count_read_hits) </li></ul><ul><li>read_hits <= read_hits + 1; </li></ul><ul><li>else </li></ul><ul><li>read_misses <= read_misses + 1; </li></ul>
  42. 45. Advantages <ul><li>Code Reuse. High level semantics. </li></ul><ul><li>High-level programming constructs from Bluespec + the temporal logic ala SVA. </li></ul><ul><li>More tests. Hardware simulation is a lot faster (1000x) than software simulation. </li></ul><ul><li>Dynamic testing. </li></ul><ul><li>Statistics gathering. </li></ul>
  43. 46. Misgivings <ul><li>Ad-hoc design (from a theoretical view point) </li></ul><ul><li>Guards may reduce concurrency. </li></ul><ul><li>Correct concurrent behavior can’t be guaranteed. </li></ul><ul><li>No public docs. </li></ul><ul><li>Tweaking scheduler for clocked model can be problematic. </li></ul><ul><li>Subset of SVA is supported. </li></ul>
  44. 47. Extensions <ul><li>BSV could be extended to assertions checked at specific times instead of always. </li></ul><ul><li>Further coverage of SVA </li></ul><ul><li>Constraint-guided scheduler. </li></ul>
  45. 48. Compiling Guards <ul><li>Before compilation </li></ul><ul><li>rule r1 (fifo1) </li></ul><ul><li>… do r1 </li></ul><ul><li>… call r2 </li></ul><ul><li>rule r2 (fifo2) </li></ul><ul><li>… do r2 </li></ul><ul><li>After Compilation </li></ul><ul><li>rule r3 </li></ul><ul><li>(fifo1 and </li></ul><ul><li> fifo2) </li></ul><ul><li>… do r1 </li></ul><ul><li>… do r2 </li></ul>
  46. 49. Better model <ul><li>rule r1 </li></ul><ul><li>(if fifo1) </li></ul><ul><li>… do r1 </li></ul><ul><li>(if fifo2) </li></ul><ul><li>… do r2 </li></ul>Now another rule can use fifo1 while fifo2 is being used by r1.
  47. 50. Guards <ul><li>No correctness guarantees. </li></ul><ul><li>Reduced concurrency. </li></ul>Solution: Transactions in Bluespec.
  48. 51. Questions?
  49. 52. BSV code compliation function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f (stage,stage_data[stage]); return (stage_data[3]); Stage_f can be inlined now. But the number of transistors has tripled. stage_data[1] = stage_f (0,stage_data[0]); stage_data[2] = stage_f (1,stage_data[1]); stage_data[3] = stage_f (2,stage_data[2]);
  50. 53. Folding <ul><li>Reuse a block over multiple cycles </li></ul>we expect: Throughput to Area to decrease – less parallelism Speed up clock to compensate  hyper-linear increase in energy decrease – reusing a block f g f f g
  51. 54. 802.11a Transmitter Synthesis results (Only the IFFT block is changing) TSMC .18 micron; numbers reported are before place and route. All these designs were done in less than 24 hours! 1.0 MHz 04 4.91 Combinational 12 MHZ 6.0 MHz 3.0 MHz 1.5 MHz 1.0 MHz 1.0 MHz Min. Freq Required 48 24 12 06 04 04 ThroughputLatency (CLKs/sym) 1.84 SF(2 Bfly-4s) 2.45 SF(4 Bfly-4s) 5.25 Pipelined 1.52 3.69 3.97 Area (mm 2 ) SF (1 Bfly4) Super-Folded (8 Bfly-4s) Folded (16 Bfly-4s) IFFT Design The same source code