Reconfigurable Computing


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • [20060325] Changed: Systolix RadioScape ( acquires Systolix: [20051023] Dropped: Triscend This has got acquired by Xilinx in Mar, 2003. Refer: Xilinx’s solutions in Reconfigurable space is given in: [20051024] Dropped: Cognigine Cannot find this website in the Internet. There are many references to their work – but all dates to early 2000s only.
  • RadioScape uses Systolix's DSP expertise to expand its licensable intellectual property portfolio for Layer-1 wireless baseband development.
  • With this platform in place, algorithms are mapped onto the array. This is done by drawing the signal flow across the array - describing it in an HDL such as Verilog, or a higher-level language like Handel-C - or Matlab. Need an 8-bit adder? Use two ALUs. 32-bit adder? 8 ALUs. Perhaps an Add/Compare/Select (ACS) unit? Again, just a few ALUs.
  • Reconfigurable Computing

    1. 1. March 25, 2006 Reconfigurable Computing Dr. Partha Pratim Das Head of Engineering, Interra Systems (India) Pvt. Ltd. Emerging Architectures for Embedded Systems
    2. 2. In memory of … <ul><li>Ben Sloman, 1967 - 2002 </li></ul><ul><li>Founder Vice President of </li></ul><ul><li>Corporate Development & </li></ul><ul><li>Software Engineering </li></ul><ul><li>Elixent Ltd., UK. </li></ul><ul><li>In 2001, Ben introduced me to the wonderful world of Reconfigurable Computing. </li></ul>
    3. 3. Source & Disclaimer <ul><li>Information about the Paradigms, Architectures, Applications, Tools, Capacity, Advantages & Scope of various RC Architectures have been borrowed from the respective sites of the companies. The speaker bears no responsibility for their correctness. Neither does he promote or demote any specific company on the merit or otherwise of their technology. </li></ul>
    4. 4. Outline <ul><li>Settings the Stage – Why Reconfigurable Computing? </li></ul><ul><li>Reconfigurable Computing – Leading Companies </li></ul><ul><li>Reconfigurable Computing – Case Study of New Computing Machines </li></ul><ul><ul><li>D-Fabrix – based on Reconfigurable Array Architecture </li></ul></ul><ul><ul><li>ACM – based on SRGA </li></ul></ul><ul><ul><li>XPP – based on Dataflow Computing </li></ul></ul><ul><ul><li>PulseDSP™ – based on Systolic Computing </li></ul></ul><ul><li>Programming RAP – Overview of Development Tools </li></ul><ul><li>Sum Up </li></ul><ul><li>Questions </li></ul>
    5. 5. Setting the Stage What are Embedded Systems? & Why Reconfigurable Computing for them?
    6. 6. What are Embedded Systems? <ul><li>Any device that includes a Computer but is not itself a General-Purpose Computer </li></ul><ul><li>Computers as Components </li></ul>
    7. 7. A Perfect Example!
    8. 8. Other Examples <ul><li>Personal Convenience </li></ul><ul><ul><li>Calculator </li></ul></ul><ul><ul><li>Alarm (Talking) Clock </li></ul></ul><ul><ul><li>Radio </li></ul></ul><ul><ul><li>CD / MP3 Player </li></ul></ul><ul><ul><li>Personal Digital Assistance (PDA) </li></ul></ul><ul><ul><li>Photo Copier </li></ul></ul><ul><li>Personal Communication </li></ul><ul><ul><li>Cordless Phone </li></ul></ul><ul><ul><li>Answering Machine </li></ul></ul><ul><ul><li>Cell Phone </li></ul></ul><ul><ul><li>Fax </li></ul></ul><ul><li>Public Utilities </li></ul><ul><ul><li>Automatic Teller Machine </li></ul></ul><ul><ul><li>Electronic Voting Machine </li></ul></ul><ul><li>Camera </li></ul><ul><ul><li>Analog </li></ul></ul><ul><ul><li>Digital </li></ul></ul><ul><ul><li>Handy Cam </li></ul></ul><ul><li>Automobile </li></ul><ul><ul><li>Engine </li></ul></ul><ul><ul><li>Brake </li></ul></ul><ul><ul><li>Dash </li></ul></ul><ul><ul><li>Car Stereo </li></ul></ul><ul><li>Aviation </li></ul><ul><li>Television </li></ul><ul><ul><li>Analog TV: Channel Selection </li></ul></ul><ul><ul><li>Digital TV: Decompression </li></ul></ul><ul><ul><li>CAS: De-scrambling </li></ul></ul><ul><li>Household Appliances </li></ul><ul><ul><li>Microwave Oven </li></ul></ul><ul><ul><li>Washing Machine </li></ul></ul><ul><ul><li>Air-conditioner </li></ul></ul><ul><li>Surveillance Systems </li></ul><ul><ul><li>Burglar Alarm </li></ul></ul><ul><ul><li>CCTV </li></ul></ul><ul><ul><li>Metal Detector </li></ul></ul><ul><ul><li>Biometric Identification System </li></ul></ul><ul><ul><li>Secure ID </li></ul></ul><ul><li>Control & Automation </li></ul><ul><ul><li>Railway Signaling </li></ul></ul><ul><ul><li>Steel Industry – Blast Furnace </li></ul></ul><ul><ul><li>Aluminum Extraction </li></ul></ul><ul><ul><li>Fire Alarm </li></ul></ul><ul><ul><li>Industrial Process Control </li></ul></ul><ul><li>Medical Systems </li></ul><ul><ul><li>Pace Maker </li></ul></ul><ul><ul><li>Laparoscopic Appliances </li></ul></ul><ul><ul><li>Monitors – ECG, EEG, PET </li></ul></ul><ul><li>Computer Accessories </li></ul><ul><ul><li>Printer </li></ul></ul><ul><ul><li>Plotter </li></ul></ul><ul><ul><li>Scanner </li></ul></ul><ul><li>Networking </li></ul><ul><ul><li>NIC Cards </li></ul></ul><ul><ul><li>N/w Components – HUB, Router, Switch </li></ul></ul><ul><ul><li>Modem </li></ul></ul><ul><li>Global Positioning System </li></ul><ul><ul><li>Navigation </li></ul></ul><ul><ul><li>Exploration </li></ul></ul>
    9. 9. Characteristics of Embedded Systems <ul><li>Real-Time Operation (always ?) </li></ul><ul><li>Low Manufacturing Cost </li></ul><ul><li>Low Power </li></ul><ul><li>Universal </li></ul><ul><li>& Market Driven </li></ul><ul><li>Sophisticated Functionality </li></ul><ul><li>Application Dependent Processor </li></ul><ul><li>Restricted Memory </li></ul><ul><li>Fault Tolerant </li></ul><ul><li>Safe </li></ul><ul><li>Domain Specific </li></ul><ul><li>& Technology Driven </li></ul>
    10. 10. Embedded Systems Market Segments Source: The Death of the DSP by Nick Tredennick , August 2000
    11. 11. The zero-cost segment <ul><li>To a first approximation represents almost all of the embedded systems market </li></ul><ul><li>The segment for which low cost is the overriding consideration. </li></ul><ul><li>Consumer appliances that generally have minimal processing needs </li></ul><ul><ul><li>microwave ovens, electric razors, blenders, toasters, washing machines, … </li></ul></ul><ul><li>Sells in high volumes (millions of units to tens of millions of units). </li></ul><ul><li>Characterized by intense price competition - ideal would be zero cost to implement. </li></ul>
    12. 12. The zero-power segment <ul><li>To a first approximation represents a few percent of the embedded systems market </li></ul><ul><li>The segment for which zero power dissipation represents the ideal. </li></ul><ul><li>Consumer items that are expected to run on a single button-size battery or on weak ambient light </li></ul><ul><ul><li>smoke detectors, basic cellular phones, pagers, pacemakers, hearing aids, MP3 players, pocket calculators, etc. </li></ul></ul><ul><li>Minimum product cost remains a concern. </li></ul>
    13. 13. The zero-delay segment <ul><li>To a first approximation represents a little more than zero percent of the embedded systems market </li></ul><ul><li>The segment for which zero delay from data in to result out represents the ideal. </li></ul><ul><li>Consumer items </li></ul><ul><ul><li>high-end printers, scanners, copiers, and fax machines, </li></ul></ul><ul><li>Processing power and throughput are important </li></ul><ul><li>Minimum product cost is still the criteria </li></ul>
    14. 14. The zero-volume segment <ul><li>To more than a first approximation, represents zero percent of the embedded systems market </li></ul><ul><li>The segment for which the application potential is nearly zero. </li></ul><ul><ul><li> production volumes and profits will also be close to zero </li></ul></ul><ul><li>Why Intel did design 80960MX microprocessor? </li></ul><ul><li>The only known application was the YF-22 aircraft. </li></ul><ul><li>Later the only prototype of the YF-22 crashed & the application volume for the ’960MX actually went to zero. </li></ul><ul><li>Could Intel have expected to sell more than a few thousand ’960MX processors? </li></ul><ul><li>There must be some other reason to capture the application. </li></ul><ul><li>One motive is public relations. </li></ul>
    15. 15. YF-22
    16. 16. The Leading Edge Wedge
    17. 17. The Leading Edge Wedge <ul><li>Handheld devices </li></ul><ul><ul><li>digital cameras, mobile phones, GPS receivers, PDAs, etc. </li></ul></ul><ul><ul><li>drive more computing into portable devices. </li></ul></ul><ul><li>Being consumer devices, they fall into the zero-cost segment . </li></ul><ul><li>Having high computing requirements, they fall into the zero-delay segment . </li></ul><ul><li>Being portable, they fall into the zero-power segment . </li></ul><ul><li>Target: cheap, highly capable devices that give us instant answers and that work on weak ambient light. </li></ul><ul><li>The overlap of the zero-cost, zero-delay, and zero-power segments is the leading-edge wedge . </li></ul>
    18. 18. (Mobile) Technology Road Map
    19. 19. Approach to Computing <ul><li>TASK : First we think of some task or function that we wish to perform </li></ul><ul><li>ALGORITHM : Next we define an algorithm that describes how to perform the task </li></ul><ul><li>MAP : Ultimately we map our algorithm into some kind of physical implementation that will execute the task </li></ul>
    20. 20. Computing Models <ul><li>ASIC –Functionality fixed during fab </li></ul><ul><ul><li>Custom </li></ul></ul><ul><ul><li>SoC </li></ul></ul><ul><ul><li>Structured ASIC </li></ul></ul><ul><li>FPGA – Programmable functionality </li></ul><ul><ul><li>Embedded Processor Core </li></ul></ul><ul><ul><li>Embedded Custom ASIC (peripherals) </li></ul></ul><ul><li> P – General Purpose programmable device </li></ul><ul><li>DSP – Specialized  P </li></ul>
    21. 21. Limitations Of ASICs <ul><li>Prolonged design cycle </li></ul><ul><li>High NRE </li></ul><ul><li>Algorithm is frozen into h/w – no flexibility </li></ul><ul><li>Each function needs its own implementation in h/w – more area and more power </li></ul><ul><li>Design based on HDLs – not good at representing algorithms </li></ul>
    22. 22. Limitations of FPGAs <ul><li>Reconfiguration is slow </li></ul><ul><li>Reconfiguration is power consuming </li></ul><ul><li>Inefficient use of available logic </li></ul><ul><li>Often leads to combinatorial explosion </li></ul><ul><li>Good for Rapid Prototyping </li></ul><ul><li>Design based on HDLs – not good at representing algorithms </li></ul>
    23. 23. Limitations Of DSP/  P <ul><li>Algorithm has to be artificially partitioned </li></ul><ul><li>Constrained to meet the physical bus width </li></ul><ul><li>Mapped on the instruction set of the target device </li></ul><ul><li>Inefficient utilization of available resources </li></ul><ul><li>Loss of inherent parallelism of the algorithm </li></ul>
    24. 24. Summary Observations <ul><li>Algorithm-friendly design tools </li></ul><ul><li>Low Power </li></ul><ul><li>Re-configurability 100,000 times / sec </li></ul>Dynamic Dynamic RC <ul><li>General purpose h/w – fixed & inefficient </li></ul><ul><li>Algorithms changeable but artificially partitioned & constrained to match h/w </li></ul>Pseudo-Dynamic Rigid  P / DSP <ul><li>Power-hungry </li></ul><ul><li>Slow to re-configure </li></ul><ul><li>Inefficient design tools </li></ul>Rigid Pseudo-Dynamic FPGA <ul><li>The h/w is fixed. </li></ul><ul><li>The algorithm frozen in h/w. </li></ul><ul><li>Design tools not good for algorithms. </li></ul>Rigid Rigid ASIC Remarks Algorithms Hardware Resources
    25. 25. Search for New Machines <ul><li>Should be Low Power </li></ul><ul><li>Should be Low Area </li></ul><ul><li>Should keep pace with evolving standards </li></ul><ul><li>Should fit the algorithms – Implement efficiently </li></ul><ul><li>Should be Low cost to design </li></ul><ul><li>Should be Fast to market </li></ul><ul><li>Dynamic algorithms </li></ul><ul><li>implemented on </li></ul><ul><li>dynamic resources </li></ul>
    26. 26. Reconfigurable Architecture Requirements <ul><li>Scalable hardware-like performance </li></ul><ul><ul><li>Massive instruction-level parallelism </li></ul></ul><ul><ul><li>Balance between compute, memory and communication </li></ul></ul><ul><ul><li>Tunable number and type of resources </li></ul></ul><ul><li>Software-like flexibility </li></ul><ul><ul><li>Support multiple applications </li></ul></ul><ul><ul><li>Commit function any time after silicon fabrication </li></ul></ul><ul><ul><li>Fast function changes for multi-mode products </li></ul></ul><ul><li>Silicon efficiency </li></ul><ul><ul><li>Price/performance overhead must be low </li></ul></ul>
    27. 27. Alternate Nomenclature <ul><li>Reconfigurable Computing </li></ul><ul><li>Configurable Computing </li></ul><ul><li>Reconfigurable Array Architecture </li></ul><ul><li>Self Reconfiguring Architecture </li></ul><ul><li>Adaptive Computing Machine </li></ul><ul><li>Reconfigurable Algorithmic Process (RAP) </li></ul><ul><li>Dataflow Computing </li></ul>
    28. 28. Reconfigurable Computing Leading Companies
    29. 29. FPGA Leaders <ul><li>Xilinx </li></ul><ul><li>Altera </li></ul><ul><li>Actel </li></ul><ul><li>Quicklogic </li></ul><ul><li>Lattice </li></ul><ul><li>In a way an FPGA may also be an RC architecture. </li></ul><ul><li>Reconfiguration in FPGA is Slow, Power-hungry and Less flexible. </li></ul><ul><li>In this lecture we would stay away from reviewing FPGA companies & technologies while we look for more dynamic architectural options. </li></ul>
    30. 30. RC Runners <ul><li>Elixent </li></ul><ul><li>QuickSilver </li></ul><ul><li>Pact Corp </li></ul><ul><li>Systolix </li></ul><ul><li>(WOS of RadioScape from Jan, 2002) </li></ul><ul><li>Xilinx (acquired Triscend in Mar, 2004) </li></ul><ul><li>Pico Chip </li></ul><ul><ul><li>Let us first review these companies before taking up case studies for some of their architectures </li></ul></ul>
    31. 31. Elixent Ltd. <ul><li>Supplies Reconfigurable Algorithmic Processors (RAP) </li></ul><ul><ul><li>emerging IP business space </li></ul></ul><ul><ul><li>significant competitive edge for </li></ul></ul><ul><ul><ul><li>1st tier OEMs, </li></ul></ul></ul><ul><ul><ul><li>Application Specific Standard Part vendors and </li></ul></ul></ul><ul><ul><ul><li>IC integrators. </li></ul></ul></ul><ul><li>Elixent technology addresses the top three customer needs: </li></ul><ul><ul><li>Increased functionality </li></ul></ul><ul><ul><li>Reduced Time To Market </li></ul></ul><ul><ul><li>Reduced Design Costs </li></ul></ul>
    32. 32. Elixent Ltd. <ul><li>UK based Company </li></ul><ul><ul><li>Offices in UK, US & Japan </li></ul></ul><ul><li>Spin-off from HP Research Laboratories </li></ul><ul><li>Investors </li></ul><ul><ul><li>VC Firm 3i </li></ul></ul><ul><ul><li>Hewlett Packard </li></ul></ul><ul><ul><li>Actel </li></ul></ul><ul><li>Partners </li></ul><ul><ul><li>Interra – HDL entry </li></ul></ul><ul><ul><li>Celoxica – Handel-C entry </li></ul></ul><ul><ul><li>AccelChip – MATLAB entry </li></ul></ul><ul><ul><li>Others in the DSP Design Space </li></ul></ul>
    33. 33. Quick Silver Technology <ul><li>Adaptive Computing Machine (ACM) </li></ul><ul><ul><li>Functionality will adapt – on-the-fly – by downloading s/w applications </li></ul></ul><ul><ul><li>Single device can perform various media rich apps </li></ul></ul><ul><ul><li>Has its own language/design tools </li></ul></ul><ul><ul><ul><li>Silverware </li></ul></ul></ul><ul><ul><li>Strongly suggests that HDLs are not the way to go </li></ul></ul><ul><ul><ul><li>could be marketing stunt for non-availability of HDL solution </li></ul></ul></ul>
    34. 34. Quick Silver <ul><li>Mobile Communication company </li></ul><ul><ul><li>Founded 1998 </li></ul></ul><ul><ul><li>San Jose based company </li></ul></ul><ul><ul><li>Offices in San Diego, Seattle, U.K. and Japan </li></ul></ul><ul><ul><li>Got 13 Million new funding in April, 2002 </li></ul></ul><ul><ul><li>Investors: </li></ul></ul><ul><ul><ul><li>TechFund Capital </li></ul></ul></ul><ul><ul><ul><li>JP Morgan Partners </li></ul></ul></ul><ul><ul><ul><li>Portview Communication Partners </li></ul></ul></ul><ul><ul><ul><li>Selby Venture Partners </li></ul></ul></ul><ul><ul><ul><li>Bellsouth cellural </li></ul></ul></ul><ul><ul><ul><li>Kyocera Corp </li></ul></ul></ul>
    35. 35. Pact Corp Technology <ul><li>eXtreme Processing Platform (XPP) </li></ul><ul><ul><li>Set of ALU, RAM, I/O elements along with Configuration manager </li></ul></ul><ul><ul><li>Mix of C-Subset and Native Mapping Language(NML) </li></ul></ul><ul><ul><ul><li>Have a C-compiler (XPP-VC) </li></ul></ul></ul><ul><ul><ul><li>Simulation and development tools around C and NML </li></ul></ul></ul><ul><ul><li>NML is a sort of structural language </li></ul></ul><ul><ul><ul><li>High level of components are there like Counters </li></ul></ul></ul>
    36. 36. Pact Corp <ul><li>Fabless semiconductor and IP vendor </li></ul><ul><ul><li>Offers IP and ASSPs based on its XPP architecture </li></ul></ul><ul><ul><li>Simulation and Development tools available </li></ul></ul><ul><ul><li>Corporate office in Germany </li></ul></ul><ul><ul><ul><li>Sales/Marketing office in San Jose </li></ul></ul></ul><ul><ul><li>Recently teamed with Quicklogic </li></ul></ul><ul><ul><li>Have sound funding </li></ul></ul>
    37. 37. PulseDSP™: Systolix <ul><li>Highly Scaleable Architecture </li></ul><ul><ul><li>Multiple Data Widths, 8 to 64 bits internally </li></ul></ul><ul><ul><li>Multiple Array Sizes, 32 to 14,000 processing elements </li></ul></ul><ul><ul><li>Multiple I/O Ports, up to 52 channels at 200MSPS each </li></ul></ul><ul><li>Sustained performance of up to 200GMAC/S for 16bit operations </li></ul><ul><li>Real-time signal processing up to video rates </li></ul><ul><li>Supports multiple independent signal data streams </li></ul><ul><li>Integrated control and data processing functions </li></ul><ul><li>Supports Linear and Non-Linear systems </li></ul><ul><li>Dynamically or Statically programmed </li></ul><ul><li>Full application development environment </li></ul>
    38. 38. Systolix PulseDSP Ltd <ul><li>A wholly owned subsidiary of RadioScape Ltd . </li></ul><ul><li>Founded in 1998 and is based in Liverpool, UK. </li></ul><ul><li>Acquired by RadioScape in Jan 2002. </li></ul><ul><li>Specializes in the development and commercial licensing of advanced DSP technologies and associated software tools. </li></ul><ul><li>Introduced PulseDSP technology </li></ul><ul><ul><li>A multiprocessor architecture that provides very low cost, high performance programmable DSP. </li></ul></ul><ul><li>PulseDSP technology is ideal for </li></ul><ul><ul><li>high MAC rates and </li></ul></ul><ul><ul><li>rapid data throughput </li></ul></ul><ul><ul><li>Example – digital IF, baseband signal processing, software radio solutions. </li></ul></ul><ul><li>Licensed the first PulseDSP technology to Analog Devices. </li></ul>
    39. 39. Reconfigurable Architectures Case Study of New Computing Machines
    40. 40. Case Study <ul><li>D-Fabrix Array </li></ul><ul><li>ACM – Adaptive Computing Machine </li></ul><ul><ul><li>Based on SRGA – Self-Reconfigurable Gate Array Architecture </li></ul></ul><ul><li>XPP – eXtreme Processing Platform </li></ul><ul><li>PulseDSP™ – Systolic Architecture </li></ul>
    41. 41. D-Fabrix Array Elixent Source : DFA1000 RISC Accelerator Data Sheet & Website
    42. 42. D-Fabrix Array <ul><li>Massive instruction-level-parallelism </li></ul><ul><li>Regular tiled structure: </li></ul><ul><ul><li>low design cost, </li></ul></ul><ul><ul><li>configurable, </li></ul></ul><ul><ul><li>portable </li></ul></ul><ul><li>Rapid re-configuration </li></ul> 
    43. 43. D-Fabrix Array <ul><li>Components are: </li></ul><ul><ul><li>4-bit ALUs, </li></ul></ul><ul><ul><li>registers and </li></ul></ul><ul><ul><li>the &quot;switchbox&quot;. </li></ul></ul> 
    44. 44. Basic D-Fabrix Array Element ALU 4 4 1 Typical instructions: A + B Cin ? A:B A - B A == B A & B A > B A | B not A A xor B not B INSTR 4 A B C IN /CONTROL C OUT 1 4 F Output register options: Transparent Reset: 0000, 1111 Clocked: always, when enabled, never REG
    45. 45. D-Fabrix Array: Tile <ul><li>Combine two of each into the &quot;tile&quot;. </li></ul> 
    46. 46. D-Fabrix Array: Tiling & Memory <ul><li>Combine 100’s or 1000’s of tiles to create the D-Fabrix array. </li></ul><ul><li>Memory is distributed to give fast, local storage with massive bandwidth. </li></ul> 
    47. 47. D-Fabrix: Routing 16 4-bit busses cross each ALU horizontally and vertically for short and long connections M 4-bit connections are made by setting a configuration bit to ‘1’ Each ALU connects to 8 others via just one switch delay Under 128 configuration bits per ALU+switchbox => Can configure 512 ALUs (64 kbits) in ~20  s
    48. 48. D-Fabrix Array: Virtual Hardware <ul><li>Once the math units are in place, the switchboxes link them together. They are part of a rich interconnect, providing both local and global connectivity. </li></ul><ul><li>The algorithm implemented in &quot;Virtual Hardware&quot;, it's being processed on a hardware implementation. But, it's software. </li></ul><ul><li>A new set of &quot;virtual hardware&quot; can be loaded at any time. In microseconds you can switch to the next hardware configuration. </li></ul><ul><li>Some applications have one configuration, and only alter that when the standards change, or the specification creeps. </li></ul><ul><li>Others re-use the silicon dynamically, switching modes, or even &quot;folding&quot; algorithms to use smaller arrays. </li></ul> 
    49. 49. DFA 1000: D-Fabrix based RISC Accelerator
    50. 50. DFA 1000: D-Fabrix based RISC Accelerator <ul><li>Pre-configured D-Fabrix implementation for accelerating RISC Systems </li></ul><ul><li>A peripheral set to facilitate its integration into SOC designs </li></ul><ul><li>High-speed data interfaces to the D-Fabrix core array </li></ul><ul><ul><li>low latency and </li></ul></ul><ul><ul><li>no overhead on the system bus. </li></ul></ul><ul><li>The AMBA bus interface </li></ul><ul><ul><li>Programming the array, </li></ul></ul><ul><ul><li>Transferring data to and from the host RISC </li></ul></ul><ul><ul><li>Much lower bandwidth control and configuration path. </li></ul></ul><ul><li>Local high-speed RAMs, directly accessible by the array or by the RISC; </li></ul><ul><li>D-Fabrix array itself. </li></ul>
    51. 51. DFA 1000: Advantages <ul><li>Over FPGAs </li></ul><ul><ul><li>An order of magnitude better in die size for a given performance </li></ul></ul><ul><ul><li>Retains homogeneous architecture – the whole array can be utilized for any given task </li></ul></ul><ul><li>Over DSPs </li></ul><ul><ul><li>An order of magnitude improvement in most algorithms, simply by matching the computing to the algorithm </li></ul></ul><ul><li>Over RISCs </li></ul><ul><ul><li>RISC architectures are not typically optimized for the dataflow algorithms common in DSP and media processing </li></ul></ul>
    52. 52. DFA 1000: Applications & Benchmarks 200Mpixel/second (two macro-blocks in parallel) JPEG Encoder 400Mpixels/sec (four 8x8 DCTs in parallel) 8x8 DCT (16 image lines in parallel) Dither 400Mpixels/sec Floyd-Steinberg Color ~1024 Voice channels UMTS Viterbi 400Msample/sec 5th Order CIC Filter
    53. 53. DFA 1000: Imaging Application <ul><li>one port captures data, </li></ul><ul><li>the second displays it, </li></ul><ul><li>the AHB is used for control </li></ul>
    54. 54. DFA 1000: Software Defined Radio Application <ul><li>I/O ports used to transfer data to the antenna </li></ul><ul><li>AHB used for data output to the host RISC. </li></ul>
    55. 55. ACM – Adaptive Computing Machine Based on SRGA – Self-Reconfigurable Gate Array Architecture QuickSilver Source: A Self-Reconfigurable Gate Array Architecture , Reetinder Sidhu et al, 10th International Workshop on Field Programmable Logic and Applications, August 2000 . A look into QuickSilver's ACM architecture, Paul Master, CTO QuickSilver, EE Times, September 12, 2002 (4:39 p.m. EST) Website
    56. 56. Self Reconfigurable Gate Array Architecture <ul><li>Logic adapts itself based on computation proceeds, based on input and intermediate results </li></ul><ul><li>Device needs to store multiple contexts of configuration and context switch between them </li></ul><ul><li>Self Reconfiguration by modifying the configuration memory </li></ul>
    57. 57. SRGA Architecture <ul><li>A Self Reconfigurable device characterized having the following features: </li></ul><ul><ul><li>Fast Context Switching </li></ul></ul><ul><ul><li>Fast Random Access of Configuration Memory </li></ul></ul><ul><li>Efficient Architecture should allow single cycle context switching as well as single cycle random memory access </li></ul>
    58. 58. SRGA Architecture <ul><li>Consists of a rectangular gate array of PEs </li></ul><ul><li>Each PE consists of a logic cell </li></ul><ul><li>and memory block </li></ul><ul><li>Logic cell contains LUT and a </li></ul><ul><li>flip-flop </li></ul><ul><li>- Each PE connected to neighboring PEs and switches </li></ul>
    59. 59. SRGA Architecture <ul><li>A configuration context contains bits that configures all the logic cells and switches in the mesh of trees network </li></ul><ul><li>Configuration context stored in memory blocks </li></ul><ul><li>Memory access operation transfers data between rows and columns of PE </li></ul><ul><li>Each memory block is implemented as random access memory that can read/write single bit every clock cycle </li></ul>
    60. 60. SRGA: Context Switch Operation <ul><li>For a context switch to occur, some logic on the currently active context needs to write into a specified memory the address of the context to switch to </li></ul><ul><li>In each memory block, the configuration bits are loaded into in the first half of the next clock cycle </li></ul><ul><li>During the second half of the next clock cycle the current context is saved </li></ul>
    61. 61. ACM: Adaptive Computing Machine <ul><li>An SRGA based Heterogeneous Architecture </li></ul><ul><li>Five types of nodes: </li></ul><ul><ul><li>Arithmetic – different, variable width, linear arithmetic functions like a FIR filter, a DCT, an FFT </li></ul></ul><ul><ul><li>Bit manipulation – different, variable-width bit-manipulation functions like LFSR, Walsh code generator, GOLD code generator, TCP/IP packet discriminator </li></ul></ul><ul><ul><li>Finite state machine , </li></ul></ul><ul><ul><li>Scalar – execute legacy code </li></ul></ul><ul><ul><li>Configurable input/output – I/O in the form of a UART or bus interfaces such as PCI, USB, Firewire and other I/O-intensive actions </li></ul></ul>
    62. 62. ACM: Fractal Architecture
    63. 63. ACM: Advantages <ul><li>Any node can be adapted to perform a new function, clock cycle by clock cycle. </li></ul><ul><li>Rather than passing data from function to function, the data can remain resident in a node while the function of the node changes on a clock cycle-by-clock cycle basis . </li></ul><ul><li>Adaptable hundreds of thousands of times a second </li></ul><ul><li>Portions of an algorithm that are actually being executed need to be resident in the chip at any one time. </li></ul><ul><li>Heavy Silicon reuse </li></ul><ul><li>Tremendous reductions in silicon area & power consumption. </li></ul>
    64. 64. XPP: eXtreme Processing Platform Pact Corp Source: The XPP White Paper
    65. 65. XPP: eXtreme Processing Platform <ul><li>The XPP idea consists of: </li></ul><ul><ul><li>Data stream processing </li></ul></ul><ul><ul><li>Configurable ALUs communicating via a packet oriented, automatically synchronized communication network </li></ul></ul><ul><ul><li>User transparent configuration management </li></ul></ul><ul><li>Works in the Dataflow Computing Paradigm </li></ul>
    66. 66. XPP: How it Works
    67. 67. XPP: Configurations <ul><li>Configurations are basic parallel calculation modules which are derived from a data flow graph of an algorithm. </li></ul><ul><li>Nodes of the data flow graph are mapped to the fundamental machine operations such as multiplication, addition etc. </li></ul><ul><li>Graph’s edges are the connections between the nodes. </li></ul><ul><li>As long as data packets stream through a single configuration, the graph remains static - no opcodes and connections are changed. </li></ul>
    68. 68. XPP: Configurations – Vector * Matrix
    69. 69. XPP: Configuration Flow: Decoupling of data processing & configuration <ul><li>Replace </li></ul><ul><ul><li>Von-Neumann instruction stream by </li></ul></ul><ul><ul><li>a configuration stream </li></ul></ul><ul><li>Process </li></ul><ul><ul><li>streams of data instead of </li></ul></ul><ul><ul><li>single machine words . </li></ul></ul>
    70. 70. PulseDSP™ Systolix Source : Website
    71. 71. PulseDSP™ <ul><li>A radical new approach to implementing programmable signal processing functions. </li></ul><ul><li>Comes from Systolix's. </li></ul><ul><li>Extraordinary performance </li></ul><ul><li>High level of flexibility </li></ul><ul><li>Very low manufacturing cost. </li></ul>
    72. 72. PulseDSP™: Architecture <ul><li>A large number of highly efficient processors arranged in a &quot;systolic array&quot; </li></ul><ul><li>Numerical data is pumped around the processors in the same way that blood is pumped around the body. </li></ul><ul><li>Massive Parallelism helps exploit the parallelism in most DSP algorithms. </li></ul><ul><li>Many billions of multiply accumulate calculations can be performed every second.  </li></ul>
    73. 73. PulseDSP™: Cells <ul><li>Instructions and data are all held locally - hence no memory bottlenecks. </li></ul><ul><li>Inherently synchronous </li></ul><ul><li>Guaranteed Performance </li></ul><ul><li>Application independent. </li></ul><ul><li>Simplifies and speeds development </li></ul><ul><li>Enables the designer to make full use of the processing power available. </li></ul>
    74. 74. PulseDSP™: Tools <ul><li>Directly maps the original algorithm signal flow to the PulseDSP array </li></ul><ul><li>User does not need to have any understanding of the underlying architecture. </li></ul><ul><li>The compiler takes full advantage of the inherent parallelism in DSP algorithms to provide true parallel processing solution. </li></ul><ul><li>The PulseDSP architecture doesn't just provide two or three parallel MAC units – it provides thousands, all fully utilized!  </li></ul>
    75. 75. PulseDSP™: Summary <ul><li>Highly Scaleable Architecture       - Multiple Data Widths, 8 to 64 bits internally       - Multiple Array Sizes, 32 to 14,000 processing elements       - Multiple I/O Ports, up to 52 channels at 200MSPS each </li></ul><ul><li>Sustained performance of up to 200GMAC/S for 16bit operations </li></ul><ul><li>Real-time signal processing up to video rates </li></ul><ul><li>Supports multiple independent signal data streams </li></ul><ul><li>Integrated control and data processing functions </li></ul><ul><li>Supports Linear and Non-Linear systems </li></ul><ul><li>Dynamically or Statically programmed </li></ul><ul><li>Full application development environment </li></ul>
    76. 76. Programming RAP Overview of Development Tools Source : Websites of Respective Companies
    77. 77. Summary of Tools CGN16XXX Data Flow Description / C Intelligent Network Processor Software Development Environment (SDE) Cognigine picoArray Signal Flow Description / Embedded C picoTools picoChip XPP Core / NML / System C C, NML (Native Mapping Language) XDS – Software Development Suite, XPP-VC –Vectorizing C-Compiler Pact Corp ACM C + temporal & spatial extensions. Silverware QuickSilver A7S CSoC VHDL / Verilog / Schematic Capture FastChip Development System Triscend PulseDSP cores Schematic Capture Systolix Design System (SDS) Systolix D-Fabrix Verilog / VHDL / Handel-C / MATLAB D-Sign Elixent Target Entry Tools Company
    78. 78. D-Sign: Elixent NOM Generator NOM2EOM Converter Synthesis Physical RDA Generator AIM Lib ADM Lib Concorde Macros Optimisations Verilog / VHDL Design De-Compiler Intermediate Design in Verilog / VHDL / XML IDM Lib Library Maker Arch-I Adaptor Bit-twiddling Nibble align AIM ADM IDM PRO rules
    79. 79. XPP: Applications Development <ul><li>The XPP development suite provides program development and debugging support. </li></ul><ul><li>XPP-VC can perform </li></ul><ul><ul><li>instruction level parallelism </li></ul></ul><ul><ul><li>pipelining </li></ul></ul><ul><ul><li>automatic resource management </li></ul></ul><ul><ul><li>multi-threading </li></ul></ul>
    80. 80. XDS – XPP Dev. Tool: Pact Corp
    81. 81. XPP-VC – XPP C Compiler: Pact Corp
    82. 82. Sum Up <ul><li>Discussed the issues in Architectural Choice for Embedded Algorithm Implementation </li></ul><ul><li>Reviewed a few Reconfigurable Architectures </li></ul><ul><ul><li>Data Flow Paradigm – Petri Nets </li></ul></ul><ul><ul><li>Systolic Arrays </li></ul></ul><ul><ul><li>Configuration Streaming </li></ul></ul><ul><ul><li>SIMD </li></ul></ul><ul><li>Looked at Tools Availability </li></ul>
    83. 83. Questions <ul><li>? </li></ul>
    84. 84. <ul><li>Thank You </li></ul>