Your SlideShare is downloading. ×
Computing Without Computers - Oct08
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Computing Without Computers - Oct08


Published on

A design methodology and a language framework which contributes to providing a solid, scalable framework for developing next-generation silicon-based systems.

A design methodology and a language framework which contributes to providing a solid, scalable framework for developing next-generation silicon-based systems.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Transcript

    • 1. Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School
    • 2. A Personal Story - Background
      • Trained as electronic engineer, but seduced by software
      • Working first in industry, then academia
      • Building hardware and software to support fast user interfaces
        • Software: silicon compiler, parallel graphics algorithms
        • Hardware: microcoded, SIMD, MIMD and ASIC processors
      • 1990, Oxford academic – ‘road to Damscus’ experience
        • Saw my first FPGA – and the future!
        • All previous threads came together - simultaneously
        • HLLs, regular architectures, algorithms in hardware, parallelism, real-time, design automation, communications, hardware o/s, program algebra, …
    • 3. A Personal Story – A pattern emerges
      • I had been trying for many years to build complex algorithms (graphics and highly interactive user interfaces) into hardware
      • I tried:
        • User micro-coding
        • Massively parallel, SIMD array processing
        • Custom designed silicon
        • MIMD networks of transputers
      • All were short-term successes, but long-term failures - I hadn’t realised that what I was mostly doing was fighting Moore’s Law
      • None of these hardware platforms that I built or used stayed around long enough to be a stable platform
      • The largest investment - in the software – was written off each time Moore’s Law made yet another architecture redundant
    • 4. Moore’s Law – just a reminder
      • A reminder of what an amazing industry we are embedded in
      • A doubling of transistor count every two years
      • First published 1965 and it's still driving the industry
      • It still has many more years to run
      • It is completely pervasive. Nothing escapes its influence
      • The Opportunity:
        • 4,000 transistors per circuit in 1970
        • 1 billion transistors by 2005
        • $1/transistor in 1968 to $1/50 million transistors today
      • The Problems:
        • Rock's Law - foundries double in cost each generation
          • A 300mm foundry costs $3 Billion (Intel pushing for 450mm)‏
          • A 65nm mask set is around $3m
        • Somebody has to design these chips
    • 5. A Personal Story – What does it all mean?
      • Moore’s Law continues to force entry ticket prices up and ever greater integration and to reduce the number of different chip solutions available
      • What are tomorrow’s commodity chips?
        • FPGAs will be around for decades
        • 10 6 LUTs available soon
      • I see FPGA fabric as the world’s first, truly stable, parallel processing substrate
      • (though the ‘grid’ may be some sort of competition)‏
      • 1990 – believing that FPGAs change the nature of the game, an act of faith “One day, most hardware designs will be done through programming languages and FPGAs”
      • And the research question was:
      • “ what do we have to do to make it come true?”
    • 6. The Design Problem – statistics of failure
      • 18% of all projects are cancelled within 5 months*
      • 58% are late to market*
      • 20% of products are not within 50% of specification*
      • 15% of deep sub-micron designs require up to four re-spins
      • Of the products that do get to market:
        • On time and 50% over budget earn only 4% less profit over 5 years †
        • 6 months late and on budget earn 33% less profit over 5 years †
      • Every 4 weeks delay in product launch equals 14% loss in market share‡
      * Source : Current and Emerging Embedded Markets and Opportunities † Source: McKinsey & Co. ‡ Source: John Chambers, CEO Cisco
    • 7.
      • Moore’s Law : Chip complexity grows at over 40% CAGR (Compound Annual Growth Rate).
      • Designer productivity has historically grown at 21% CAGR*
      • The difference is the Design Gap
      • It is the gap between what you can design (with fixed resources)‏
      • and what you must design (to stay in business)‏
      • The Design Gap increases by around 20% CAGR
      The Design Problem – The Design Gap * Source: Gartner Group
    • 8.
      • Rapidly increasing complexity is the root of the problem
      • The only practical way to handle complexity is to raise the level of design abstraction
      • We are guided by previous shifts in hardware design methodology which raised the level of abstraction:
      • - from schematics to HDLs
      • - from assembler code to HLLs
      The Design Problem - Complexity
    • 9. Handel-C solution: treat hardware like software
      • Exploit the massive leverage created by the software industry
      • A rapid and simple flow from program to implementation
        • Compile/P&R, run, edit – in minutes, just like with software
      • Hardware and software development use same methodology
      • Hardware development in less time with a smaller team
      • Enables hardware development by system architects and software engineers as well as hardware engineers; these skills all converge
      • This might be the only design option for really complex designs
    • 10. Choosing a Programming Language
      • Hardware implementations need efficiently to use both time and space (= parallelism)‏
      • Q: Why not compile ordinary C++/C programs into hardware?
      • A: Nobody knows how to write a compiler that efficiently and reliably invents the parallelism that the designer didn’t specify
      • Conclusion: We require a language that allows (forces) the designer explicitly to denote the parallelism required in the computation
      • Q: Why not use a language such as occam, Java, …?
      • A: Nobody knows how to write a compiler that efficiently and reliably invents the timing specifications that the designer didn’t specify
      • Conclusion: We require a language that allows (forces) the designer explicitly to denote the time that computations take
      • These might appear to be two backwards steps – but NO!
    • 11. The Handel Solution
      • No existing language met the basic requirements, so the Handel model of programming was created
      • Handel-C is the embedding of the Handel model in C language
      • Handel-C is a language for programming applications
        • Handel-C is not an HDL. Nor is it C used as an HDL
        • Handel-C is meaningful to both s/w and h/w engineers
        • Handel-C is exceptionally easy to learn and use
      • The par command gives control over space
      • The single clock assignment rule gives control over time
    • 12. Handel-C in brief
      • Handel-C is based on ANSI-C
      • It has well-defined semantics
      • Similar to occam in spirit, but adding timing and replacing pseudo-parallelism with true parallelism
      • Other additions:
        • channels for communications between parallel processes
        • flexible bit-widths and better logical operators
        • constructs for RAM, ROM, interfacing, etc.
    • 13. Handel-C Example A Windowed Display System
      • par {
      • sync_generator (sx, sy); // process 1
      • while (1) // process 2
      • if inside (window1, sx, sy)‏
      • video = contents (window1, sx, sy)‏
      • else if inside (window2, sx, sy)‏
      • video = contents (window2, sx, sy)‏
      • else video = background_colour;
      • while (1) … mouse; update window1, 2 … // process 3
      • }
    • 14. Our first FPGA Platform – HARP, 1991
      • FPGA + SRAM
      • Transputer + DRAM
      • Four fast serial links for expansion
      • Physically stackable (TRAM) module for arbitrary expansion
      • I confidently predicted that Xilinx and Altera would be building things like this as single chips by 1995!
    • 15. SW HW
    • 16. Company ‘E’ : Redesign of a Failing Project
      • A team of 2 software engineers developed core component of IPv6 router in 2 man-months using Handel-C
      • Team of 3 hardware engineers failed to produce the design using VHDL in over 36 man-months
      Handel-C Design 33 MHz 15% V1000 FPGA 20 Pages Code V HDL Design Design Not Completed >100% V1000 FPGA >400 Pages Actual Months 0 5 10 15 IPv6 Router Code
    • 17. Company ‘L’ : Algorithm Acceleration Trial
      • A team of 2 software engineers (with no previous HW experience) transferred an algorithm from a CPU to an FPGA
      • Run-time was 21 seconds on a 600MHz Pentium III
      • 23 times performance improvement after 42 man-days
      Signal Processing Algorithm > 700 s 0.9 s 28 s 16 s Company Training Session 600 MHz CPU Algorithm Run-time (seconds)‏ Man-days 0 10 40 700 30 20 10 0
    • 18. Customer ‘C’ : Internal Design Competition
      • Competition to design MP3 encoder between:
        • Traditional hardware design team using HDL-based approach and
        • Small group of software designers using Celoxica technology
      • Handel-C group
        • Converted existing software implementation of MP3 encoder to Handel-C
        • Optimized, working hardware that beat design specifications in 7 weeks (including training time)‏
      In the same time, the hardware group had not completed writing the specification!
    • 19. Xilinx Design Challenge
      • A Xilinx-specified “Design Challenge”
      • To implement JPEG2000 using conventional HDL and Handel-C approaches
      • Comparison made between Handel-C and HDL approach
      • See Article in Xcell Volume 46
      • Online at
    • 20. JPEG2000 Architecture and Communication Model Pre processing RGB to YUV conversion Quantisation Tier-2 Encoder Rate Control Original Image Coded Image DWT- Wavelet Transform Tier-1 Encoder Hardware models Software models
    • 21.
      • Xilinx project benchmark to validate FPGA system tools
        • Start with C description of JPEG2000 algorithm
        • Use Software-Compiled System Design methodology
        • Partition and Implement JPEG2000 Design
        • Compare results against original VHDL design performance
      JPEG2000 Project overview Top level block diagram for JPEG2000 operation Pre processing RGB to YUV conversion Wavelet Transform Quantisation Tier-1 Encoder Tier-2 Encoder Rate Control Original Image Coded Image
    • 22. JPEG2000 Case Study results
      • DK Design Suite 1 st pass
        • Slices 646
        • Device utilization 6%
        • Speed (MHz)* 110
        • Lines of code 386
        • Design time (days) 6
      • Rapid Handel-C (HC) implementation by an engineer with no prior knowledge of JPEG2000. Primary design focus was area efficiency .
      • Common language base made easy porting to hardware of the DWT source & DSM allowed partition, co verification & data to be easily moved between HW & SW
      • Optimizations included using signals instead of registers, maximum use of dual ported memory & reduction in routing logic by syntax duplication in Handel-C. Place & Route tools configured to optimize the implementation for area efficiency
      • Final implementation integrated existing HDL IP block into the design flow for maximum design re-use value (black boxing)‏
      • Observations
        • Comparable
        • HC faster
        • HC quicker
        • Expert vs Novice
        • HDL
        • 800
        • 7%
        • 128
        • 435
        • 20*
      * Doesn’t include partitioning spec. development
      • 2 nd pass
        • 546
        • 5%
        • 130
        • 395
        • 7 (6+1)‏
      • Final
        • 758*
        • 7%
        • 151
        • 395
        • 7 (6+1)‏
      *Lena image used as test-bench throughout input bit width=12, max 1K image width * Includes IP Block Insertion
    • 23. Does it work? - Demonstrations
      • RC100 Board:
        • Single Xilinx XC2S200 FPGA
        • 28 x 42 = 1176 CLBs (2352 LUTs)‏
        • Flash memory with stored configurations
        • PLD to reload the FPGA from the flash memory
        • Digital/Analogue converter to create video signal
      • All demos fit in 1200 CLBs – some in under 500
      • A few of them use external memory
      • No computer. No software. No operating system
      • Cheapest FPGAs: over 340 LUTs/$ (Oct08, one-off price)‏
    • 24.
      • Solutions for Algorithm Design
        • Algorithm acceleration
        • Rapid Prototyping
        • SW & FPGA Implementation
      • Technologies for Algorithm to Implementation
        • MATLAB to C
        • C to FPGA
        • System Prototyping Boards
        • IP Libraries
        • Implementation Services
      • Over 100 customers worldwide
      Shortening the time to develop and deploy complex image processing systems Agility Design Solutions
    • 25. Proven Customer Success Lockheed Hubble Telescope Canon PowerShot Digital Camera Toyota Prius Hybrid Aeroastro Vision Recognition Harris Satellite Communications Raytheon Airborne Systems & NLOS
    • 26. Thank You Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School