Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiti...
A Personal Story - Background <ul><li>Trained as electronic engineer, but seduced by software </li></ul><ul><li>Working fi...
A Personal Story – A pattern emerges <ul><li>I had been trying for many years to build complex algorithms (graphics and hi...
Moore’s Law – just a reminder <ul><li>A reminder of what an amazing industry we are embedded in </li></ul><ul><li>A doubli...
A Personal Story – What does it all mean? <ul><li>Moore’s Law continues to force entry ticket prices up and ever greater i...
The Design Problem – statistics of failure <ul><li>18%  of all projects are cancelled within 5 months* </li></ul><ul><li>5...
<ul><li>Moore’s Law : Chip complexity grows at over 40% CAGR (Compound Annual Growth Rate).  </li></ul><ul><li>Designer pr...
<ul><li>Rapidly increasing complexity is the root of the problem </li></ul><ul><li>The only practical way to handle comple...
Handel-C solution: treat hardware like software <ul><li>Exploit the massive leverage created by the software industry </li...
Choosing a Programming Language  <ul><li>Hardware implementations need  efficiently  to use both  time  and  space  (= par...
The Handel Solution <ul><li>No existing language met the basic requirements, so the Handel model of programming was create...
Handel-C in brief <ul><li>Handel-C is based on ANSI-C </li></ul><ul><li>It has well-defined semantics </li></ul><ul><li>Si...
Handel-C Example  A Windowed Display System <ul><li>par { </li></ul><ul><li>sync_generator (sx, sy);  // process 1 </li></...
Our first FPGA Platform – HARP, 1991 <ul><li>FPGA + SRAM </li></ul><ul><li>Transputer + DRAM </li></ul><ul><li>Four fast s...
SW HW
Company ‘E’  : Redesign of a Failing Project <ul><li>A team of 2 software engineers developed core component of IPv6 route...
Company ‘L’  : Algorithm Acceleration Trial <ul><li>A team of 2 software engineers (with no previous HW experience) transf...
Customer ‘C’  : Internal Design Competition <ul><li>Competition to design MP3 encoder between: </li></ul><ul><ul><li>Tradi...
Xilinx Design Challenge <ul><li>A Xilinx-specified “Design Challenge” </li></ul><ul><li>To implement JPEG2000 using conven...
JPEG2000 Architecture and Communication Model Pre processing RGB to YUV conversion Quantisation Tier-2 Encoder Rate Contro...
<ul><li>Xilinx project benchmark to validate FPGA system tools </li></ul><ul><ul><li>Start with C description of JPEG2000 ...
JPEG2000 Case Study results <ul><li>DK Design Suite 1 st  pass </li></ul><ul><ul><li>Slices 646 </li></ul></ul><ul><ul><li...
Does it work?  -  Demonstrations <ul><li>RC100 Board:  </li></ul><ul><ul><li>Single Xilinx XC2S200 FPGA </li></ul></ul><ul...
<ul><li>Solutions for Algorithm Design </li></ul><ul><ul><li>Algorithm acceleration </li></ul></ul><ul><ul><li>Rapid Proto...
Proven Customer Success Lockheed Hubble  Telescope Canon PowerShot Digital Camera Toyota Prius Hybrid Aeroastro Vision  Re...
Thank You Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica L...
Upcoming SlideShare
Loading in …5
×

Computing Without Computers - Oct08

1,207 views

Published on

A design methodology and a language framework which contributes to providing a solid, scalable framework for developing next-generation silicon-based systems.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,207
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Computing Without Computers - Oct08

    1. 1. Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School
    2. 2. A Personal Story - Background <ul><li>Trained as electronic engineer, but seduced by software </li></ul><ul><li>Working first in industry, then academia </li></ul><ul><li>Building hardware and software to support fast user interfaces </li></ul><ul><ul><li>Software: silicon compiler, parallel graphics algorithms </li></ul></ul><ul><ul><li>Hardware: microcoded, SIMD, MIMD and ASIC processors </li></ul></ul><ul><li>1990, Oxford academic – ‘road to Damscus’ experience </li></ul><ul><ul><li>Saw my first FPGA – and the future! </li></ul></ul><ul><ul><li>All previous threads came together - simultaneously </li></ul></ul><ul><ul><li>HLLs, regular architectures, algorithms in hardware, parallelism, real-time, design automation, communications, hardware o/s, program algebra, … </li></ul></ul>
    3. 3. A Personal Story – A pattern emerges <ul><li>I had been trying for many years to build complex algorithms (graphics and highly interactive user interfaces) into hardware </li></ul><ul><li>I tried: </li></ul><ul><ul><li>User micro-coding </li></ul></ul><ul><ul><li>Massively parallel, SIMD array processing </li></ul></ul><ul><ul><li>Custom designed silicon </li></ul></ul><ul><ul><li>MIMD networks of transputers </li></ul></ul><ul><li>All were short-term successes, but long-term failures - I hadn’t realised that what I was mostly doing was fighting Moore’s Law </li></ul><ul><li>None of these hardware platforms that I built or used stayed around long enough to be a stable platform </li></ul><ul><li>The largest investment - in the software – was written off each time Moore’s Law made yet another architecture redundant </li></ul>
    4. 4. Moore’s Law – just a reminder <ul><li>A reminder of what an amazing industry we are embedded in </li></ul><ul><li>A doubling of transistor count every two years </li></ul><ul><li>First published 1965 and it's still driving the industry </li></ul><ul><li>It still has many more years to run </li></ul><ul><li>It is completely pervasive. Nothing escapes its influence </li></ul><ul><li>The Opportunity: </li></ul><ul><ul><li>4,000 transistors per circuit in 1970 </li></ul></ul><ul><ul><li>1 billion transistors by 2005 </li></ul></ul><ul><ul><li>$1/transistor in 1968 to $1/50 million transistors today </li></ul></ul><ul><li>The Problems: </li></ul><ul><ul><li>Rock's Law - foundries double in cost each generation </li></ul></ul><ul><ul><ul><li>A 300mm foundry costs $3 Billion (Intel pushing for 450mm)‏ </li></ul></ul></ul><ul><ul><ul><li>A 65nm mask set is around $3m </li></ul></ul></ul><ul><ul><li>Somebody has to design these chips </li></ul></ul>
    5. 5. A Personal Story – What does it all mean? <ul><li>Moore’s Law continues to force entry ticket prices up and ever greater integration and to reduce the number of different chip solutions available </li></ul><ul><li>What are tomorrow’s commodity chips? </li></ul><ul><ul><li>FPGAs will be around for decades </li></ul></ul><ul><ul><li>10 6 LUTs available soon </li></ul></ul><ul><li>I see FPGA fabric as the world’s first, truly stable, parallel processing substrate </li></ul><ul><li>(though the ‘grid’ may be some sort of competition)‏ </li></ul><ul><li>1990 – believing that FPGAs change the nature of the game, an act of faith “One day, most hardware designs will be done through programming languages and FPGAs” </li></ul><ul><li>And the research question was: </li></ul><ul><li>“ what do we have to do to make it come true?” </li></ul>
    6. 6. The Design Problem – statistics of failure <ul><li>18% of all projects are cancelled within 5 months* </li></ul><ul><li>58% are late to market* </li></ul><ul><li>20% of products are not within 50% of specification* </li></ul><ul><li>15% of deep sub-micron designs require up to four re-spins </li></ul><ul><li>Of the products that do get to market: </li></ul><ul><ul><li>On time and 50% over budget earn only 4% less profit over 5 years † </li></ul></ul><ul><ul><li>6 months late and on budget earn 33% less profit over 5 years † </li></ul></ul><ul><li>Every 4 weeks delay in product launch equals 14% loss in market share‡ </li></ul>* Source : Current and Emerging Embedded Markets and Opportunities † Source: McKinsey & Co. ‡ Source: John Chambers, CEO Cisco
    7. 7. <ul><li>Moore’s Law : Chip complexity grows at over 40% CAGR (Compound Annual Growth Rate). </li></ul><ul><li>Designer productivity has historically grown at 21% CAGR* </li></ul><ul><li>The difference is the Design Gap </li></ul><ul><li>It is the gap between what you can design (with fixed resources)‏ </li></ul><ul><li>and what you must design (to stay in business)‏ </li></ul><ul><li>The Design Gap increases by around 20% CAGR </li></ul>The Design Problem – The Design Gap * Source: Gartner Group
    8. 8. <ul><li>Rapidly increasing complexity is the root of the problem </li></ul><ul><li>The only practical way to handle complexity is to raise the level of design abstraction </li></ul><ul><li>We are guided by previous shifts in hardware design methodology which raised the level of abstraction: </li></ul><ul><li>- from schematics to HDLs </li></ul><ul><li>- from assembler code to HLLs </li></ul>The Design Problem - Complexity
    9. 9. Handel-C solution: treat hardware like software <ul><li>Exploit the massive leverage created by the software industry </li></ul><ul><li>A rapid and simple flow from program to implementation </li></ul><ul><ul><li>Compile/P&R, run, edit – in minutes, just like with software </li></ul></ul><ul><li>Hardware and software development use same methodology </li></ul><ul><li>Hardware development in less time with a smaller team </li></ul><ul><li>Enables hardware development by system architects and software engineers as well as hardware engineers; these skills all converge </li></ul><ul><li>This might be the only design option for really complex designs </li></ul>
    10. 10. Choosing a Programming Language <ul><li>Hardware implementations need efficiently to use both time and space (= parallelism)‏ </li></ul><ul><li>Q: Why not compile ordinary C++/C programs into hardware? </li></ul><ul><li>A: Nobody knows how to write a compiler that efficiently and reliably invents the parallelism that the designer didn’t specify </li></ul><ul><li>Conclusion: We require a language that allows (forces) the designer explicitly to denote the parallelism required in the computation </li></ul><ul><li>Q: Why not use a language such as occam, Java, …? </li></ul><ul><li>A: Nobody knows how to write a compiler that efficiently and reliably invents the timing specifications that the designer didn’t specify </li></ul><ul><li>Conclusion: We require a language that allows (forces) the designer explicitly to denote the time that computations take </li></ul><ul><li>These might appear to be two backwards steps – but NO! </li></ul>
    11. 11. The Handel Solution <ul><li>No existing language met the basic requirements, so the Handel model of programming was created </li></ul><ul><li>Handel-C is the embedding of the Handel model in C language </li></ul><ul><li>Handel-C is a language for programming applications </li></ul><ul><ul><li>Handel-C is not an HDL. Nor is it C used as an HDL </li></ul></ul><ul><ul><li>Handel-C is meaningful to both s/w and h/w engineers </li></ul></ul><ul><ul><li>Handel-C is exceptionally easy to learn and use </li></ul></ul><ul><li>The par command gives control over space </li></ul><ul><li>The single clock assignment rule gives control over time </li></ul>
    12. 12. Handel-C in brief <ul><li>Handel-C is based on ANSI-C </li></ul><ul><li>It has well-defined semantics </li></ul><ul><li>Similar to occam in spirit, but adding timing and replacing pseudo-parallelism with true parallelism </li></ul><ul><li>Other additions: </li></ul><ul><ul><li>channels for communications between parallel processes </li></ul></ul><ul><ul><li>flexible bit-widths and better logical operators </li></ul></ul><ul><ul><li>constructs for RAM, ROM, interfacing, etc. </li></ul></ul>
    13. 13. Handel-C Example A Windowed Display System <ul><li>par { </li></ul><ul><li>sync_generator (sx, sy); // process 1 </li></ul><ul><li>while (1) // process 2 </li></ul><ul><li>if inside (window1, sx, sy)‏ </li></ul><ul><li>video = contents (window1, sx, sy)‏ </li></ul><ul><li>else if inside (window2, sx, sy)‏ </li></ul><ul><li>video = contents (window2, sx, sy)‏ </li></ul><ul><li>else video = background_colour; </li></ul><ul><li>while (1) … mouse; update window1, 2 … // process 3 </li></ul><ul><li>} </li></ul>
    14. 14. Our first FPGA Platform – HARP, 1991 <ul><li>FPGA + SRAM </li></ul><ul><li>Transputer + DRAM </li></ul><ul><li>Four fast serial links for expansion </li></ul><ul><li>Physically stackable (TRAM) module for arbitrary expansion </li></ul><ul><li>I confidently predicted that Xilinx and Altera would be building things like this as single chips by 1995! </li></ul>
    15. 15. SW HW
    16. 16. Company ‘E’ : Redesign of a Failing Project <ul><li>A team of 2 software engineers developed core component of IPv6 router in 2 man-months using Handel-C </li></ul><ul><li>Team of 3 hardware engineers failed to produce the design using VHDL in over 36 man-months </li></ul>Handel-C Design 33 MHz 15% V1000 FPGA 20 Pages Code V HDL Design Design Not Completed >100% V1000 FPGA >400 Pages Actual Months 0 5 10 15 IPv6 Router Code
    17. 17. Company ‘L’ : Algorithm Acceleration Trial <ul><li>A team of 2 software engineers (with no previous HW experience) transferred an algorithm from a CPU to an FPGA </li></ul><ul><li>Run-time was 21 seconds on a 600MHz Pentium III </li></ul><ul><li>23 times performance improvement after 42 man-days </li></ul>Signal Processing Algorithm > 700 s 0.9 s 28 s 16 s Company Training Session 600 MHz CPU Algorithm Run-time (seconds)‏ Man-days 0 10 40 700 30 20 10 0
    18. 18. Customer ‘C’ : Internal Design Competition <ul><li>Competition to design MP3 encoder between: </li></ul><ul><ul><li>Traditional hardware design team using HDL-based approach and </li></ul></ul><ul><ul><li>Small group of software designers using Celoxica technology </li></ul></ul><ul><li>Handel-C group </li></ul><ul><ul><li>Converted existing software implementation of MP3 encoder to Handel-C </li></ul></ul><ul><ul><li>Optimized, working hardware that beat design specifications in 7 weeks (including training time)‏ </li></ul></ul>In the same time, the hardware group had not completed writing the specification!
    19. 19. Xilinx Design Challenge <ul><li>A Xilinx-specified “Design Challenge” </li></ul><ul><li>To implement JPEG2000 using conventional HDL and Handel-C approaches </li></ul><ul><li>Comparison made between Handel-C and HDL approach </li></ul><ul><li>See Article in Xcell Volume 46 </li></ul><ul><li>Online at www.xilinx.com/publications/xcellonline/xcell_46/xc_celoxica46.htm </li></ul>
    20. 20. JPEG2000 Architecture and Communication Model Pre processing RGB to YUV conversion Quantisation Tier-2 Encoder Rate Control Original Image Coded Image DWT- Wavelet Transform Tier-1 Encoder Hardware models Software models
    21. 21. <ul><li>Xilinx project benchmark to validate FPGA system tools </li></ul><ul><ul><li>Start with C description of JPEG2000 algorithm </li></ul></ul><ul><ul><li>Use Software-Compiled System Design methodology </li></ul></ul><ul><ul><li>Partition and Implement JPEG2000 Design </li></ul></ul><ul><ul><li>Compare results against original VHDL design performance </li></ul></ul>JPEG2000 Project overview Top level block diagram for JPEG2000 operation Pre processing RGB to YUV conversion Wavelet Transform Quantisation Tier-1 Encoder Tier-2 Encoder Rate Control Original Image Coded Image
    22. 22. JPEG2000 Case Study results <ul><li>DK Design Suite 1 st pass </li></ul><ul><ul><li>Slices 646 </li></ul></ul><ul><ul><li>Device utilization 6% </li></ul></ul><ul><ul><li>Speed (MHz)* 110 </li></ul></ul><ul><ul><li>Lines of code 386 </li></ul></ul><ul><ul><li>Design time (days) 6 </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Rapid Handel-C (HC) implementation by an engineer with no prior knowledge of JPEG2000. Primary design focus was area efficiency . </li></ul><ul><li>Common language base made easy porting to hardware of the DWT source & DSM allowed partition, co verification & data to be easily moved between HW & SW </li></ul><ul><li>Optimizations included using signals instead of registers, maximum use of dual ported memory & reduction in routing logic by syntax duplication in Handel-C. Place & Route tools configured to optimize the implementation for area efficiency </li></ul><ul><li>Final implementation integrated existing HDL IP block into the design flow for maximum design re-use value (black boxing)‏ </li></ul><ul><li>Observations </li></ul><ul><ul><li>Comparable </li></ul></ul><ul><ul><li>HC faster </li></ul></ul><ul><ul><li>HC quicker </li></ul></ul><ul><ul><li>Expert vs Novice </li></ul></ul><ul><ul><li>HDL </li></ul></ul><ul><ul><li>800 </li></ul></ul><ul><ul><li>7% </li></ul></ul><ul><ul><li>128 </li></ul></ul><ul><ul><li>435 </li></ul></ul><ul><ul><li>20* </li></ul></ul>* Doesn’t include partitioning spec. development <ul><li>2 nd pass </li></ul><ul><ul><li>546 </li></ul></ul><ul><ul><li>5% </li></ul></ul><ul><ul><li>130 </li></ul></ul><ul><ul><li>395 </li></ul></ul><ul><ul><li>7 (6+1)‏ </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Final </li></ul><ul><ul><li>758* </li></ul></ul><ul><ul><li>7% </li></ul></ul><ul><ul><li>151 </li></ul></ul><ul><ul><li>395 </li></ul></ul><ul><ul><li>7 (6+1)‏ </li></ul></ul><ul><ul><li> </li></ul></ul>*Lena image used as test-bench throughout input bit width=12, max 1K image width * Includes IP Block Insertion
    23. 23. Does it work? - Demonstrations <ul><li>RC100 Board: </li></ul><ul><ul><li>Single Xilinx XC2S200 FPGA </li></ul></ul><ul><ul><li>28 x 42 = 1176 CLBs (2352 LUTs)‏ </li></ul></ul><ul><ul><li>Flash memory with stored configurations </li></ul></ul><ul><ul><li>PLD to reload the FPGA from the flash memory </li></ul></ul><ul><ul><li>Digital/Analogue converter to create video signal </li></ul></ul><ul><li>All demos fit in 1200 CLBs – some in under 500 </li></ul><ul><li>A few of them use external memory </li></ul><ul><li>No computer. No software. No operating system </li></ul><ul><li>Cheapest FPGAs: over 340 LUTs/$ (Oct08, one-off price)‏ </li></ul>
    24. 24. <ul><li>Solutions for Algorithm Design </li></ul><ul><ul><li>Algorithm acceleration </li></ul></ul><ul><ul><li>Rapid Prototyping </li></ul></ul><ul><ul><li>SW & FPGA Implementation </li></ul></ul><ul><li>Technologies for Algorithm to Implementation </li></ul><ul><ul><li>MATLAB to C </li></ul></ul><ul><ul><li>C to FPGA </li></ul></ul><ul><ul><li>System Prototyping Boards </li></ul></ul><ul><ul><li>IP Libraries </li></ul></ul><ul><ul><li>Implementation Services </li></ul></ul><ul><li>Over 100 customers worldwide </li></ul>Shortening the time to develop and deploy complex image processing systems Agility Design Solutions
    25. 25. Proven Customer Success Lockheed Hubble Telescope Canon PowerShot Digital Camera Toyota Prius Hybrid Aeroastro Vision Recognition Harris Satellite Communications Raytheon Airborne Systems & NLOS
    26. 26. Thank You Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School

    ×