SlideShare a Scribd company logo
1 of 66
Download to read offline
Low Power Processors at CSEM
Christian Piguet
CSEM, Neuchâtel, Switzerland
History of the quartz electronic watch
• "The watch, almost more than the steam engine, was the real protagonist
of the Industrial Revolution“ (Lewis Mumford, American social philosopher)
• In December 2007, we have celebrated the 40th anniversary of the first
electronic watch, a Swiss quartz watch named Beta developed by the
Centre Electronique Horloger (CEH)
• The first quartz watch was a Swiss wristwatch presented in 1967
• It was exactly 20 years after the invention of the transistor
• In Switzerland, the competencies in low-power electronics come directly
from the watch industry
Research in 1962-67: Time Base
• Development of a quartz resonator, very risky project.
• The main problem was the miniaturization of such a resonator at 8 kHz
1967: Beta 2
On the left:
electromecha-
nical part
On the right:
the printed
circuit with
the IC and
the quartz
Beta 21 from OMEGA
First quartz watch
"OMEGA Constellation",
Electroquartz, f 8192,
vibrating motor 256 Hz,
caliber 1301 (Beta 21)
Common project from
20 Swiss watchmakers
Analog display, 1972,
width: 37 mm, steel,
réf.: A-32121
Price: Euro 980.--
CEH Research Projects after 1967
• Digital adjustment of the quartz oscillator, called Beta 3 and 4, for which
some pulses were removed to achieve exactly 32’768 Hz
• Quartz oscillators, static and dynamic frequency dividers designed as
asynchronous speed-independent circuits
• New quartz, such as the ZT, as well as new displays composed of LED for
analog displays
• ROM, RAM and EEPROM memories and the first RISC-like watch
microprocessors before that the name “RISC” was introduced in 1980 by
Berkeley
Single bipolar integrated circuit,
called ODC-04, feature size 6 m,
containing about 110 components
MOS transistor in the CEH CMOS 6 technology
Copyright 2007 CSEM | Titre | Auteur | Page 6
A Swiss History: Watch Microcontrollers
• A watch circuit was 2’000 MOS at the time
• In 1971, 1st microprocessor (4004)
• In 1974, also for watches? Question!
• In 1976, conference in Switzerland on
« microcompressors arrival » !
• In 1978, uP working group with
CEH, Uni Ne, EPFL, watchmakers
• Goal: to find uP architecture specific to
electronic watches, mainly very low power
consumption
Gate Matrix Logic
CSEM 1985
Copyright 2007 CSEM | Titre | Auteur | Page 7
Binary Decision Machine (BDM)
• BDD (Binary Decision Diagram)
• EPFL Research of
Prof. D. Mange, LSL
• Two instructions
IF and DO
• One can add:
CALL and RETURN
• Karnaugh Table either in hardware or in software
• In software: BDD, executed by a BDM
• Very simple uP Architecture
1 1 0 1
0 1 1 0
00 01 11 10
0
1
ab
z
c
1 1 0 1
0 1 1 0
00 01 11 10
0
1
ab
z
c
1 1 0 1
0 1 1 0
00 01 11 10
0
1
ab
z
c
1 1 0 1
0 1 1 0
00 01 11 10
0
1
ab
z
c
a
c
c
b
c
b
1 0 1 1 0 0 1
Copyright 2007 CSEM | Titre | Auteur | Page 8
Binary Decision Machine
• Instruction format: a single but very long word (similar to RISC, earlier)
• But it was already the case for the first mainframe computers of the fifties
• RISC today
is simply
re-discovering
old technology
ROM
STACK
M
U
X
P
C
+1
MUX
TEST
H
W
H H H
INPUTS
Very Long Word
Copyright 2007 CSEM | Titre | Auteur | Page 9
Number of clock cycles: benchmark
• An analysis showed that the number of clock cycles executed by these watch
processors for a watch application was about 100 (each second to increment
the seconds, minutes, hours,… while the number of clock cycles of a
conventional microprocessor like Intel 8048 was about 2'000 clock cycles for
the same task.
• So in energy: 70 times more efficient than a 8048 uP
• Watch uP: Single instruction word, from 12 bits to 18 bits
• Instruction Sets of 6 to 20 instructions (true RISC!!)
• About 20'000 MOS transistor count
• 6 micron technology at the time, it was about 50 mm2 of silicon, a very big
circuit!
Copyright 2007 CSEM | Titre | Auteur | Page 10
Architecture
• It was a BDM
and one datapath
• First uP: failure
• Layout too difficult!
• Too complex
• Sagrada Familia
4*PC ROM
512*16
M +1
M
U
X
TEST MUX
SEQUENCER
SP
CO PMUX
Ø4
Ø1
Ø3
POP PUSH
W
#ADD
#MUX
IR15:0
RAM
30*8
PROM
#M
PRAM
Ø4 WRAM
OUTPUT
Ø4
INPUT
RRAM
RRAM
BUS 7:0
SA
ALU
MA
Ø4
Ø3
LA
LA
C
#OP
T2
T1
RESIR
IR15:13
Copyright 2007 CSEM | Titre | Auteur | Page 11
The 1st microprocessor: COMBO (1983)
• 800 instructions
• 16 bits instr
• 7 bit data
• 20K MOS
• 40 mm2
• 6 microns
• 1.5 Volt
• 0.4 A 16KHz
Copyright 2007 CSEM | Titre | Auteur | Page 12
Comparison
Year Micro techn Nb MOS Address
1971 4004 P-MOS 8 2’300 4K
1972 8008 P-MOS 8 3’500 16K
1974 8080 N-MOS 6 5’000 64K
1976 8085 N-MOS 4 6’000 64K
1978 8086 N-MOS 3 29’000 1M
1982 80286 N-MOS 2.3 130’000 16M
1985 80386 CMOS 2 275’000 4096M
At CSEM:
1985: 2e uP in 4 m with
20 instructions of 17 bits,
24K MOS, 20 mm2, 0.4 A
à 1.5 Volt.
1987: uP ETA, 35'000
MOS
1990: other uP with
100'000 MOS.
Copyright 2007 CSEM | Titre | Auteur | Page 13
Competition
• Competition (AMI, Eurosil, Hewlett Packard, Intel, Mitsubishi, National, RCA
and Sharp) have designed watch microprocessors generally consuming more
than 4 or 5 A, so 10 to 100 times more than CEH watch microprocessors
• Electronic digital watches appear around 1975.
• In 1977, the price of a digital watch was set down from more than 100$ to 10$.
• For Christmas1976, TI sold LED watches with 5 functions for $9.95.
• Profits disappear, it was similar to the calculator market and only 3 very big
companies remain in this market: Casio, Seiko and Texas Instruments. TI
decreased its prices to kick-off Casio and Seiko.
• 20 years after, Intel President Gordon Moore had still an old Microna watch
fabricated by Intel (my watch at 30 millions $, he said) to remember this
lesson.
• But Seiko and Casio were stronger than TI to decrease their prices, and it is
TI that was forced to leave this market.
Copyright 2007 CSEM | Titre | Auteur | Page 14
Watch microprocessor PUNCH (1990-1993)
• Swiss watchmakers have decided
to design a common watch uP that
has to be the heart of all Swiss
watches
• The choice of the architecture is a
multi-task machines
• This allows us to define several
independent tasks and to execute
them in pseudo-parallelism
• As soon as a task has to start, it is
started immediately
moteur
Contrôle
S M H J
1/1001/10 S M
couronne
automate
1Hz
100Hz
modes
automate
Copyright 2007 CSEM | Titre | Auteur | Page 15
Hardware Scheduler TIME MODES MOTOR
TIME MODES MOTOR
scheduler
Parallel Tasks in a Watch
Application
Hardware Scheduler
Microprocessor
Task
1
Task
2
Task
3
Task
4
• Estimation: about 20%
less executed instructions
• 103 assembly instructions
(18 bits)
• data of 8 bits.
• The uP core contains
11'000 MOS
• a complete microcontroller
with its memories presents
about 150'000 MOS.
• 800 MIPS/watt
Copyright 2007 CSEM | Titre | Auteur | Page 16
Tasks execution
• The originality of the
PUNCH is based on
tasks that are executed
in pseudo-parallelism,
while executing one
instruction of task1, then
task2, etc… and back to
task1.
• It is also possible to
define 1 to 4 tasks, so it
is also possible to have
a conventional
monotask uP.
Task 1
Task 2
Task 3
Task instructions continuously executed
Delayed starting tasks in a single processor
Principle of the MultiTask Architecture
Instructions of 2 tasks alternatively executed
Same scheme than above for 3 tasks
Starting Task
Task 1
Task 2
Task 3
Multitask Principle
Copyright 2007 CSEM | Titre | Auteur | Page 17
Architecture of the multitask PUNCH
ROM
N x 18 bits
Pc 0
Pc 1
Pc 2
Pc 3
Instr. Register
Process Cntrl
scheduler,
stack pointers,
router
EventBank
IOCommunication ExtEvents
21 8
ALU WR
Datapath
Ac0
Ac1
Ac2
Ac3
Ix0
Ix1
Ix2
Ix3
One has to quadruple:
- the PC (program
counter)
- the AC (accumulator)
- the IX (index register)
It results in a reasonable
cost
In monotask mode, one
uses the 4 PC as a
stack
Copyright 2007 CSEM | Titre | Auteur | Page 18
Test Chip
• Used in wrist watches
• Other applications
• Belong to watchmakers,
so difficult to give licenses
to other customers
• We think we can do
even better for reducing
power consumption
 CoolRISC
Copyright 2007 CSEM | Titre | Auteur | Page 19
Punch-based Tissot Two Timer
• It is my watch
Copyright 2007 CSEM | Titre | Auteur | Page 20
8-bit CoolRISC Microprocessor
• RISC instructions (single 22-bit word)
• Load/store architecture
• Bank of 16 registers (not possible to implement multitask, 4*16 reg. is too much)
• More than 100 instructions
• Hardware stack, but also Branch & Link (call-return in software)
• 3 stages pipeline, CPI=1 (Clock per Instruction)
• Gated-Clock Technique (not to clock unused blocks)
• Synthesized by Synopsys (I.P. core in VHDL, then logic synthesis)
• CoolRISC core: 20’000 MOS
Copyright 2007 CSEM | Titre | Auteur | Page 21
0
21
op<3> cc<3> addr<16>
JUMP addr;
JCC addr;
PC0 <-- addr
if cc then PC0 <-- addr
0
21
op<6> addr<16>
CALL addr;
CALLS addr;
PCn <--PCn-1, PC1<--PC0+1, PC0 <-- addr
IP <-- PC0 +1, PC0 <-- addr
0
21
op<9> 1 1 1 1 1 1 1 1 1 1 1 1 1
CALL IP;
CALLS IP;
RET;
RETI;
PUSH;
POP;
PCn <--PCn-1, PC1<--PC0+1, PC0 <-- IP
IP <-- PC0 +1, PC0 <-- IP
PCn-1 <-- PCn
PCn-1 <-- PCn
PCn <--PCn-1, PC1<--IP, PC0 <-- PC0+1
IP <-- PC1, PCn-1 <-- PCn, PC0 <-- PC0+1
reg <-- reg alu-op data
0
21
op<6>
ALU reg, °data;
data<8>
alu<4> reg<4>
ALU operations:
MOVE
CMOVE
SHL
SHLC
SHR
SHRC
CPL
INC
INCC
DEC
DECC
AND
OR
XOR
ADD
ADDC
SUBD
SUBDC
SUBS
SUBSC
MUL
MULA
MSHL
MSHR
MSHRA
CMP
CMPA
TSTB
SETB
CLRB
INVB
0
21
op<5>
ALU reg, addr;
addr<8>
alu<5> reg<4>
reg <-- reg alu-op data-mem(addr)
0
21
op<5>
ALU regr, reg1, reg2;
alu<5> reg2<4>
regr <-- reg2 alu-op reg1
reg1<4> regr<4>
0
21
op<3>
ALU reg, (IX, offset);
ALU reg, (IX, offset);
ALU reg, (IX, offset);
offset<8>
alu<5> reg<4>
reg <-- reg alu-op data-mem(IX+offset)
reg <-- reg alu-op data-mem(IX), IX <-- IX + offset
reg <-- reg alu-op data-mem(IX-offset), IX <-- IX - offset
IX
0
21
op<5>
ALU reg, (IX, R3);
alu<5> reg2<4>
reg <-- reg alu-op data-mem(IX+R3)
IX
1 1 1 1 1 1
0
21
op<16>
FREQ div;
HALT;
NOP;
div<4>
0
21
op<3> cc<3> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
JUMP IP;
JCC IP;
RETS;
PC0 <-- IP
if cc then PC0 <-- IP
PC0 <-- IP
load/store
conditional
shift left
with carry
shift right
with carry
complement
increment
with carry
decrement
with carry
logical and
logical or
logical xor
addition
with carry
op1 - op2
with carry
op2-op1
with carry
multiply
2-compl. mult
multiple shift
multiple shift
2-compl.
compare
2-compl. cmp
bit test
bit set
bit reset
bit invert
COOLRISC 816 INSTRUCTION SET
Instructions
0
15
MSB LSB
PC
0
15
IP high
0
15
0
15
0
15
0
15
0
7
MSB LSB
ACC
0
7
R0
0
7
R1
0
7
R2
0
7
R3
0
7
status
IX0 high
IX1 high
IX2 high
IX3 high
IP low
IX0 low
IX1 low
IX2 low
IX3 low
Copyright 2007 CSEM | Titre | Auteur | Page 22
CoolRISC Pipeline
Fetch &
branch
1 clock cycle
Branch
instructions
fetch
1 clock cycle
Arithmetic
instructions
execute
store
result
- 3-stage pipeline
- no load delay
- no branch delay
Copyright 2007 CSEM | Titre | Auteur | Page 23
Branch Instruction executed in one pipeline stage
1 clock cycle
fetch & branch
fetch
alu
the branch
condition
is available
Critical Path:
- ROM Precharge
- ROM Read
- Branch decode
- Addresses multiplexor
However, at 20 MHz, clock
cycle time is 50 ns
One can execute all this
With CPI=1, at 20 MHz, one
has 20 MIPS, it is very good
In 0.18 um, about 100 MHz,
consequently 100 MIPS
Copyright 2007 CSEM | Titre | Auteur | Page 24
Bypass in the CoolRISC pipeline
Arith Fetch
branch
one clock
18 bit
RISC one word
instruction
Write
Dec
ALU
RAM
Fetch Branch
condition code
ready
Fetch Dec RAM
RAM Write
Arith
Write
ALU
RAM
bypass
Copyright 2007 CSEM | Titre | Auteur | Page 25
CoolRISC 816
PC <16>
ROM (program)
max 64 K
instructions
Branch
Address
16
13
P
C
0
M
U
X
+1
P
C
1
ROM index <16>
PC
2..9
IR1 <22>
Op-code
Control
Unit 2nd
Stage
MUX
ABus <8>
SBus <8>
ALU<8>
CY, Z
RAM Index 2 L
REG1
BBus<8>
Data
RAM Index 2 H
RAM
ROM (data)
and Periph
max 64K
bytes
RomAddr <16>
RomInstr <22>
DataOut <8>
DataIn <8>
RamAddr <16>
ReadNWrite
ChipSelect
PROM
CoolRisc Core 816
ctr
gated
clock
gated
clock
First Pipeline Stage
8
IR2 <22>
C. U. 3rd Stage
CoolRisc 816 Core
Branch Unit
CALL to Interrupt Address
Mulitplier
ACC
U
8 MSB
8 LSB
REG2
RAM Index 3 L
ROM Index L
REG3
Status Register
ROM Index H
RAM Index 3 H
RAM Index 0 L
RAM Index 0 H
RAM Index 1 L
RAM Index 1 H
REG0
Copyright 2007 CSEM | Titre | Auteur | Page 26
Microphotography of the CoolRISC
• Technology 1 m, Nov. 1995
• In 0.5 m, about 3000 MIPS/watt
at 3.0 Volts (with memories)
compared to 100 MIPS/watt for an
Intel C51
• In 0.25 m, only the core (20’000
MOS):
• TSMC 0.25m, 2.5 Volt, 60
MIPS
• Power: 1.05 V. , 10 W per
MHz, 100’000 MIPS/watt
Copyright 2007 CSEM | Titre | Auteur | Page 27
CPI for some microprocessors
Microcontroller
ST62xx
COP800
8048
Z86Cxx
68HC05
PIC16C5x
Punch
CoolRisc 81
CoolRisc 88
CoolRisc 816
instr.
code
12
12
8
8
11
11
12
12
10
10
bits
code
152
120
112
168
160
132
216
192
180
220
exec.
instr.
60
60
35
35
59
59
74
74
58
58
exec.
clocks
2704
2000
1125
692
226 *
300
296
74
58
58
CPI
45
33
32
20
4 *
5
4
1
1
1
* refered to the internal E frequency that is 2 times slower
than the oscillator frequency
For a given routine: shifting out 8-bit data & clock (synchrone)
Copyright 2007 CSEM | Titre | Auteur | Page 28
Number of executed clock cycles
NUMBER OF EXECUTED INSTRUCTIONS
8-bit multiply linear
8-bit multiply looped
16-bit multiply linear
16-bit multiply looped
16-bit division linear
16-bit division looped
CoolRisc 88 PIC 16C5x
Number of instructions
and executed clocks code
executed
code
executed
30
14
127
31
194
36
30
56
127
170
162
213
35
16
240
33
243
27
37
71
233
333
180
227
instr clock
30
56
128
170
162
213
instr clock
148
284
932
1332
760
1108
Copyright 2007 CSEM | Titre | Auteur | Page 29
Wisenet Chip uses the CoolRISC
Copyright 2007 CSEM | Titre | Auteur | Page 30
Conclusion about Watch Microprocessors
• Huge impact of electronic watches on the development of microelectronics in
Switzerland
• One can say that it is similar for the development of microprocessors in
Switzerland.
• This history shows quite well that first Swiss microcontrollers have been
designed for electronic watches, before to be used for other applications
requiring low power consumption
• What is the largest unused computation power in the world? The answer of
D. Lando, Lucent Technologies: it is all the electronic watches in the world!!!!
Low-Power DSP/MCU Cores
C. Piguet
CSEM Centre Suisse d’Electronique et Microtechnique SA
Digital design
• CSEM has a long history of designing low-power processors
• CoolRISC, licensed by Semtech, Swatch group, TI, ...
• Watch processors: PUNCH (1993), µPUS, Combo (1982), ...
• Powerful new processors with ultra low power consumption
• 2005: Macgic, a 16/24-bit DSP (4 MAC)
• 2006: icyflex1 , a flexible processor for DSP/control applications
• 2009: icyflex2, a smaller processor for control applications
• 2009: icyflex4, a scalable processor for DSP/control applications
Macgic and icyflex are registered trademarks of CSEM
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Digital design – low-power processors
• customizable (in VHDL)
• configurable (at run-time)
• Macgic 16/24-bit DSP
• complex datapath (quad MAC)
• very high parallelism (1.5k cycles for a 256 FFT)
• assembler, debugger
• 170 uW/MHz at 1.0 V in 180 nm
• 150’000 equiv NAND gates, 2.1 mm2 in 180 nm
• icyflex1 flexible 32-bit processor
• includes DSP functions (dual MAC)
• high parallelism (ex: 2.6k cycles for a 256 FFT)
• C compiler (gcc), debugger (gdb),...
• 120 uW/MHz at 1.0 V in 180 nm
• 110’000 equiv NAND gates, 1.6 mm2 in 180 nm
icyflexTM
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Ongoing processor development
• icyflex2 :
• 50% less area than icyflex1 (removed DSP characteristics)
• Higher frequency (longer pipeline)
• Lower power consumption for control type applications
• Optimized for C compiler
• icyflex4 :
• Scalable architecture for much higher throughput
• Higher frequency (longer pipeline)
• Lower power consumption for DSP/control type applications
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Processor positioning
CSEM DSP/MCU jan 2009 | C. Piguet | Page
1 MUL 2 MAC 4 MAC … 36 MAC
icyflex2
Control
Computing
Power
DSP
icyflex1
icyflex4
Macgic
1 MUL 2 MAC 4 MAC … 36 MAC
Processor roadmap
CSEM DSP/MCU jan 2009 | C. Piguet | Page
2005 2006 2007 2008 2009 2010
1st prod
Abilis
1st
Si
Macgic
development
1st
prod
1st
Si
icyflex2
dev
1st
prod
1st
Si
icyflex4
dev
1st
Si
icyflex1
dev
icyflex processor family overview
Name MAC
(MUL)
C
compiler
Processor Pipeline
length
Instruction
width
Status
Macgic 4 DSP 3 32 Prod.
icyflex1 2 Yes DSP/MCU 3 32 Prod.
icyflex2 (1) Yes MCU 5 32 Dev.
icyflex4 4 + 4*N Yes DSP/MCU 5-7 64 Dev.
CSEM DSP/MCU jan 2009 | C. Piguet | Page
MACGIC: Mobile TV Chip (DVB-T/H) by Abilis
CSEM DSP/MCU jan 2009 | C. Piguet | Page
This chip contains three MACGIC cores
• Abilis: To become the world leading supplier of
semiconductor solutions of multimode, digital TV receiver
and broadband wireless connectivity for mobile terminals
myTV
World first single die DVB demodulator
• Abilis Systems (Kudelski group), Switzerland
• World first single die programmable DVB-T/H demodulator, Aug 2007
• Unique Software Defined Radio architecture
• Manufactured by IBM, RF-CMOS 90 nm technology world leader
• Ultra-low power DSP technology by CSEM
• Multi-band silicon tuner
• World’s smallest DVB-T/H receiver: 5 x 5 mm
• Performance
• Dynamic Echo Handling for best indoors/mobile reception
• Adaptive demodulation
• Meets MBRAI, exceeds NorDig 1.0.3
• Up to -100 dBm sensitivity (8k, 8MHz, QPSK, ½)
CSEM DSP/MCU jan 2009 | C. Piguet | Page
World first single die DVB demodulator (cont’d)
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Reconfigurable
AD/DA
converters
RF
receiver
S-band
MCU
Subsystem
(Link Layer)
RISC core
HW
Accelerator
Programmable
OFDM
Engine
RF
receiver
DVB-T/H
DVB-T/H
HW
Accelerator
Cordic
WiFi
Other
WiMAX
DVB-T/H
Host
MPEG stream
(Encoded )
Macgic
DSP
Macgic
DSP
Macgic
DSP
RF Tuner Channel Estimation
& Correction
& Decoding
A/D conversion Link layer
Overview of the icyflex1 processor
Digital design – icyflex1 architecture
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Optimized for minimal power consumption:
• 32-bit instructions
• 3-stage pipeline
• Load/store RISC architecture
• Configurable instructions
High level of parallelism with 32-bit instruction words
datapath Load/store
Parallelism datapath-load/store
Up to 10 simple operations in
parallel (2×MUL, 2×ACC, …)
Up to 6 operations in
parallel (2 times load 2
data-words in parallel store,
address generation, …)
Total: up to 16 operations executed in parallel in a single 32-bit instruction
a single 32-bit instruction
CSEM DSP/MCU jan 2009 | C. Piguet | Page
icyflex instruction set and addressing modes
• Hardware loop and repeat instructions
• Standard: MUL, ADD, MAC, CMP, MAX, AND,….
• SIMD (Single Instruction Multiple Data): ADD2, MUL2, MAC2, …
• e.g. 2 independent fixed-point MAC in parallel
• Instructions/addressing modes to support C compiler
• Configurable instructions
• Addressing modes for DSP type processing and for a C compiler
• A large variety of addressing modes:
• Ranging from the basic addressing modes: indirect, 1,  offset, modulo
• To very complex addressing modes (configurable):
– for instance: an <= (an + om + 8 × OFFA ) % mp
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Performance: benchmarks of the icyflex processor
Algorithm on the icyflex processor Clock Cycles
Sum of a vector of N values ~ N/2
Addition or multiplication of 2 vectors of N values ~ N
Norm/mean/standard deviation/clipping of a vector ~ N/2
Minimum/maximum of a vector ~ N/2
Multiplication of 2 matrices of N×M values ~ (N × M) × (N/2+2)
Matrix transposition ~ N × M × (5/8)
FIR filter/convolution ~ ½ per tap
FIR filter/convolution, complex data ~ 2 per tap
IIR filter (biquad) ~ 2 per tap
Complex FFT of N= 64 values ~ 440
Complex FFT of N=256 values ~ 2.6 k
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Performance: Comparison with other DSPs
Company / Processor FIR filter
Clock cycles per tap
Complex FFT 256 points
Clock cycles
CSEM / Macgic Audio-I ~1/4 1.5 k
CSEM / icyflex ~1/2 2.6 k
Analog Devices / Blackfin BF531 ~1/2 3.2 k
Texas Instruments / TMS320VC5501 ~1/2 5.5 k
Philips / CoolFlux DSP ~1/2 5.5 k
Analog Devices / ADSP2191M ~1 7.4 k
Motorola / M56F8323 ~1 12 k
MicroChip / dsPIC30 ~1 ~19 k
Texas Instruments / MSP430F14x ~28 ~53 k
CoolRISC 8-bit - ~60 k
MicroChip / PIC18F4220 ~160 3.2 M
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Processors designs optimized for energy efficiency
Features Starcore Macgic icyflex1 CoolFlux
Bits per Instruction 128-bit 32-bit 32-bit 32-bit
Data Word width 16-bit 24-bit 32-bit 24-bit
Number of MAC 4 4 2 2
Memory Transfer 8 8 4 2
Operations per cycle 32 32 16 8
Number of equivalent NAND gates 600k 150k 115k 45k
Clock cycles for FFT 256 ** 1'614 1'410 * 2’600 * 5’500
Average Power per MHz @ 1V * 350 µW 170 µW *115 µW * 75 µW
Power per MHz @ 1V for FFT * 600 µW 300 µW *200 µW * 130 µW
Normalized energy for FFT @ 1V 2.3 1 1.2 1.7
**single precision *estimated
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Silicon area of the processor core
Processor Equiv NAND gates
Process
0.18 µm 0.13 µm 0.09 µm
icyflex 32-bit (*) 110’000 1.75 mm2 0.70 mm2 0.34 mm2
Macgic Audio-I 24-bit 150’000 2.1 mm2 0.85 mm2 0.41 mm2
Silicon area is dominated by memories in most applications,
or by analog / RF blocks in very deep submicron processes.
* using CSEM’s thick-gate standard cell library
CSEM DSP/MCU jan 2009 | C. Piguet | Page
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Software development tools
• GNU C compiler (gcc)
• software implementation of IEEE floating-point standard
• icyflex instruction parallelism supported by latest releases of gcc
• successful pass of whole GNU test suite for all optimization levels
• GNU assembler / linker (binutils)
• BFD / ELF32 object file format
• Binary, SREC, IHEX memory image file formats
• icyflex instruction set simulator (ISS), written in C++
• Phase-accurate, pipelined
• Wrappers to SystemC, VHDL (Modelsim), Matlab/Simulink
• GNU debugger (gdb)
• Mode 1: instruction set simulator of the icyflex core
• Mode 2: On-Chip Debug (OCD) through a JTAG interface
• Eclipse integrated development environment
• CDT C/C++ IDE plug-in
• icyflex plug-in
• Using library of subroutines and DSP subroutines with optimized minimal number of instructions
CSEM DSP/MCU jan 2009 | C. Piguet | Page
icyflex1 toolchain
C code (.c, .h) Assembly (.asm, .i)
Object (.o)
Assembly (.s)
Listing (.lst)
icfx-gcc icfx-dasp
icfx-as
icfx-ld
icfx-ar/icfx-ranlib
Library (.a)
ELF (.bin)
icfx-run icfx-gdb
icfx-objdump
icfx-run_srec
icfx-objcopy
Hex file (.srec, .ihex)
Listing (.lst)
data (.ext)
GNU-based tool
icfx-stim_mk2
Testvect (.idx,.vhd)
icfx-tv_exec
Waveform (.vcd)
libsim.a libsimicfx.a
non-GNU tool,lib
Overview of the icyflex2 processor
icyflex2 : a trimmed down processor for control apps
Data Move Unit
Data Processing Unit
Accumulate
datapath &
registers
MicroOPeration
datapath &
registers Coprocessor
registers
Program Sequencing Unit
PC
Branch
Flag
Exception
Instr exec/xfer
pc
sbr
in
ex
ec
hd
pf
dm
pb
iel
epl
air
HW loop
lend
lbeg irit
HW stack
slba
scnt sppa
pa
GP registers
r0
r1
r2
r3
r4
r5
r6
r7
X AGU
Y AGU
px0
px2
px4
px6 px7
px1
px3
px5
mx0
mx2
mx4
mx6 cx6
cx0
cx2
cx4
py0 py1 my0 cy0
Host and Debug Unit
Host-side Core-side
Step
stepc
Host register
access
hrs
hrd
Debug engine
dcr
ddr
Config/Status
csr
P Break
X Break
Y Break
2 ALU 2 Multipliers
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Area comparison: icyflex1 vs icyflex2
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Overview of the icyflex4 processor
icyflex4
icyflex4 a scalable processor architecture
• Modular architecture
• Four processing units
• Scalable
• PSU: Program Sequencing Unit
• 8 vectorized interrupts
• 64-bit instruction bundles
• DMU: Data Move & Processing Unit
• 16 GP +16 index registers
• 4 MUL, 4 ALU, 4 SHIFT
• VPU: Vector Processing Unit
• 1, 2, 4 or 8 VPU slices
• HDU: Host and Debug Unit
• Hardware breakpoint engines
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Examples of icyflex-based System-on-Chip
CSEM DSP/MCU jan 2009 | C. Piguet | Page
icyfirst : first integration of icyflex1 in Nov 2006
• This integration targets very low
leakage for applications requiring
limited processing power
• Standard cell library: thick gates for
leakage reduction by 800x
• Measured speed: 400 kHz @ 1.1 V
• Avg dyn power: 140 W/MHz @ 1.1 V
• Core: 115k eq. gates, 1.75 mm2
• Peripherals: 110k eq. gates, 1.5 mm2
• Memory: 10.7 mm2
• TSMC 180 nm generic CMOS
SRAM 2 kiWords of 32-bit
CSEM DSP/MCU jan 2009 | C. Piguet | Page
icyfirst : first integration of icyflex1 in Nov 2006 (cont’d)
• icyflex1 core
• 128 KiBytes SRAM
• Clock generator
• Voltage regulator
• POR, watchdog, timers
• Request controller
• DMA and bus controllers
• 2 x 16 bit GPIO
• 2 x I2C
• 2 x SPI
• 2 x I2S
• JTAG controller
icyflex1
icycam: a System-on-Chip for vision applications
• icyflex1 runs at up to 50 MHz
• QVGA CMOS pixel array (320 x 240)
• 14 um pixel pitch
• logarithmic encoding of luminance
• close to 7 decades of intra-scene dynamic
range encoded on 10 bits
• graphical coprocessor
• SRAM: 128 KiBytes
• DMA, SPI, PPI, GPIO, UART, SDRAM, JTAG
• Tower Semiconductor, 180 nm, CIS
CSEM DSP/MCU jan 2009 | C. Piguet | Page
icycom: a System-on-Chip for RF applications
• icyflex1 runs at up to 3.2 MHz
• RF: 865 ~ 915 MHz, FSK (incl. MSK, GFSK), 4FSK, OOK, OQPSK
• TX: 10 dBm
• RX: -105 dBm at 200 kb/s (BER = 10-3)
• Power management
• Power supplies for external devices
• Low power modes: multiple standby modes
• 10 bit ADC
• SRAM: 64 KiBytes (with MBIST)
• DMA, RTC, Timers, Watchdog, I2C, SPI, I2S, GPIO, UART, JTAG
• TSMC, 180 nm, generic
CSEM DSP/MCU jan 2009 | C. Piguet | Page
icycom: a System-on-Chip for RF applications (cont’d)
CSEM DSP/MCU jan 2009 | C. Piguet | Page
Power
Management
In: 1.0 to 1.8 V
or 2.2 to 3.6 V
Out: Vin, 2.7 V
1.2 to Vin -0.1
icycom chip
A/D
Interfaces
Program &
Data
Memory
icyflex1
IO
supply
RF
External
Component
IO EEPROM
IO
References
• C. Piguet, "Binary-decision and RISC-like machines for semicustom design",
Microprocessors and Microsystems, Vol 14, No 4, May 1990, pp. 231-240.
• J-F Perotto, C. Lamothe, C. Arm, C. Piguet, E. Dijkstra, S. Fink, E. Sanchez, J-P
Wattenhofer, M. Cecchini, "An 8-bit Multitask Micropower RISC Core", JSSC Vol. 29,
No 8, August 1994, pp. 986-991.
• C. Piguet, J.-M. Masgonty, C. Arm, S. Durand, T. Schneider, F. Rampogna, C.
Scarnera, C. Iseli, J.-P- Bardyn, R. Pache, E. Dijkstra, "Low-Power Design of 8-bit
Embedded CoolRISC Microcontroller Cores", IEEE JSSC, Vol. 32, No 7, July 1997, pp.
1067-1078
• C. Piguet, “The First Quartz Electronic Watch”, invited talk at PATMOS, Sevilla, Spain,
September 11-13, 2002.
• C. Arm, J.-M. Masgonty, M. Morgan, C. Piguet, P.-D. Pfister, F. Rampogna, P. Volet;
“Low-Power Quad MAC 170 W/MHz 1.0 V MACGIC DSP Core”, ESSCIRC 2006,
Sept. 19-22. 2006, Montreux, Switzerland
• [ Copyright 2007 CSEM | Titre | Auteur | Page 63
References
• C. Arm, S. Gyger, J.-M. Masgonty, M. Morgan, J.-L. Nagel, C. Piguet, F. Rampogna, P.
Volet, « Low-Power 32-bit Dual-MAC 120 mW/MHz 1.0 V icyflex DSP/MCU Core”,
ESSCIRC 2008, Sept. 15-19, 2008, Edinburgh, Scotland, U.K.
• C. Piguet, « History of the Development of Swiss Watch Microprocessors », IEEE
SSCS NEWS, Summer 2008, Vol. 13, No. 3, pp. 50-55.
• Christian Piguet, Jean-Luc Nagel, Vincent Peiris, Stève Gyger, Daniel Séverac, Marc
Morgan, Jean-Marc Masgonty, « Low-Power Heterogeneous Systems-on-Chips”,
Journal of Low Power Electronics JOLPE, Vol. 4, No 2, pp.111-126, August 2008
Copyright 2007 CSEM | Titre | Auteur | Page 64
Thank you for your attention.

More Related Content

Similar to piguet_sesion_2_09.pdf

ch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdfch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdfKhizarKhizar8
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1SSGMCE SHEGAON
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM Research
 
Sistem mikroprosessor
Sistem mikroprosessorSistem mikroprosessor
Sistem mikroprosessorfahmihafid
 
Lect_1_Evolution of Processors.pptx
Lect_1_Evolution of Processors.pptxLect_1_Evolution of Processors.pptx
Lect_1_Evolution of Processors.pptxvarshaks3
 
They're Not Making Smaller Atoms
They're Not Making Smaller AtomsThey're Not Making Smaller Atoms
They're Not Making Smaller AtomsIan Phillips
 
They're Not Making Smaller Atoms (v2)
They're Not Making Smaller Atoms (v2)They're Not Making Smaller Atoms (v2)
They're Not Making Smaller Atoms (v2)Ian Phillips
 
Ch 1 Introduction(1).docx
Ch 1 Introduction(1).docxCh 1 Introduction(1).docx
Ch 1 Introduction(1).docxRadhikasaud
 
Computer System Architecture Lecture Note 2: History
Computer System Architecture Lecture Note 2: HistoryComputer System Architecture Lecture Note 2: History
Computer System Architecture Lecture Note 2: HistoryBudditha Hettige
 
02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]bogi007
 
Lecture 1 - introduction to computer systems architecture 2018 / 2019
Lecture 1 - introduction to computer systems architecture 2018 / 2019Lecture 1 - introduction to computer systems architecture 2018 / 2019
Lecture 1 - introduction to computer systems architecture 2018 / 2019Mousuf Zaman C
 
VLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptVLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptindrajeetPatel22
 
CSE460 Lecture 1.pptx.pdf
CSE460 Lecture 1.pptx.pdfCSE460 Lecture 1.pptx.pdf
CSE460 Lecture 1.pptx.pdfFarhanFaruk3
 

Similar to piguet_sesion_2_09.pdf (20)

History of computer
History of computerHistory of computer
History of computer
 
Power of vlsi
Power of vlsiPower of vlsi
Power of vlsi
 
ch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdfch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdf
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
 
Sistem mikroprosessor
Sistem mikroprosessorSistem mikroprosessor
Sistem mikroprosessor
 
Lect_1_Evolution of Processors.pptx
Lect_1_Evolution of Processors.pptxLect_1_Evolution of Processors.pptx
Lect_1_Evolution of Processors.pptx
 
They're Not Making Smaller Atoms
They're Not Making Smaller AtomsThey're Not Making Smaller Atoms
They're Not Making Smaller Atoms
 
basic vlsi ppt
basic vlsi pptbasic vlsi ppt
basic vlsi ppt
 
Histry n intro
Histry n introHistry n intro
Histry n intro
 
EE6502 Microprocessor & Microcontroller Regulation 2013
EE6502 Microprocessor & Microcontroller Regulation 2013EE6502 Microprocessor & Microcontroller Regulation 2013
EE6502 Microprocessor & Microcontroller Regulation 2013
 
1 1 vlsi introduction_overview
1 1 vlsi introduction_overview1 1 vlsi introduction_overview
1 1 vlsi introduction_overview
 
They're Not Making Smaller Atoms (v2)
They're Not Making Smaller Atoms (v2)They're Not Making Smaller Atoms (v2)
They're Not Making Smaller Atoms (v2)
 
Ch 1 Introduction(1).docx
Ch 1 Introduction(1).docxCh 1 Introduction(1).docx
Ch 1 Introduction(1).docx
 
Computer System Architecture Lecture Note 2: History
Computer System Architecture Lecture Note 2: HistoryComputer System Architecture Lecture Note 2: History
Computer System Architecture Lecture Note 2: History
 
02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]
 
Lecture 1 - introduction to computer systems architecture 2018 / 2019
Lecture 1 - introduction to computer systems architecture 2018 / 2019Lecture 1 - introduction to computer systems architecture 2018 / 2019
Lecture 1 - introduction to computer systems architecture 2018 / 2019
 
5 fabrication
5 fabrication5 fabrication
5 fabrication
 
VLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptVLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.ppt
 
CSE460 Lecture 1.pptx.pdf
CSE460 Lecture 1.pptx.pdfCSE460 Lecture 1.pptx.pdf
CSE460 Lecture 1.pptx.pdf
 

Recently uploaded

Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Pooja Nehwal
 
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...ranjana rawat
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...anilsa9823
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)kojalkojal131
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...Call Girls in Nagpur High Profile
 
Thane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsThane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsPooja Nehwal
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile servicerehmti665
 
(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...
(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...
(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...ranjana rawat
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...Call Girls in Nagpur High Profile
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样qaffana
 
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service GayaGaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service Gayasrsj9000
 
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up NumberCall Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up NumberMs Riya
 
WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service - Bandra F...
WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service -  Bandra F...WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service -  Bandra F...
WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service - Bandra F...Pooja Nehwal
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gapedkojalkojal131
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Call Girls in Nagpur High Profile
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Pooja Nehwal
 

Recently uploaded (20)

Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
 
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
 
Thane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsThane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call Girls
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile service
 
(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...
(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...
(MEGHA) Hinjewadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune E...
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
 
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
 
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
 
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service GayaGaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
 
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up NumberCall Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
 
WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service - Bandra F...
WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service -  Bandra F...WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service -  Bandra F...
WhatsApp 9892124323 ✓Call Girls In Khar ( Mumbai ) secure service - Bandra F...
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
 

piguet_sesion_2_09.pdf

  • 1. Low Power Processors at CSEM Christian Piguet CSEM, Neuchâtel, Switzerland
  • 2. History of the quartz electronic watch • "The watch, almost more than the steam engine, was the real protagonist of the Industrial Revolution“ (Lewis Mumford, American social philosopher) • In December 2007, we have celebrated the 40th anniversary of the first electronic watch, a Swiss quartz watch named Beta developed by the Centre Electronique Horloger (CEH) • The first quartz watch was a Swiss wristwatch presented in 1967 • It was exactly 20 years after the invention of the transistor • In Switzerland, the competencies in low-power electronics come directly from the watch industry
  • 3. Research in 1962-67: Time Base • Development of a quartz resonator, very risky project. • The main problem was the miniaturization of such a resonator at 8 kHz 1967: Beta 2 On the left: electromecha- nical part On the right: the printed circuit with the IC and the quartz
  • 4. Beta 21 from OMEGA First quartz watch "OMEGA Constellation", Electroquartz, f 8192, vibrating motor 256 Hz, caliber 1301 (Beta 21) Common project from 20 Swiss watchmakers Analog display, 1972, width: 37 mm, steel, réf.: A-32121 Price: Euro 980.--
  • 5. CEH Research Projects after 1967 • Digital adjustment of the quartz oscillator, called Beta 3 and 4, for which some pulses were removed to achieve exactly 32’768 Hz • Quartz oscillators, static and dynamic frequency dividers designed as asynchronous speed-independent circuits • New quartz, such as the ZT, as well as new displays composed of LED for analog displays • ROM, RAM and EEPROM memories and the first RISC-like watch microprocessors before that the name “RISC” was introduced in 1980 by Berkeley Single bipolar integrated circuit, called ODC-04, feature size 6 m, containing about 110 components
  • 6. MOS transistor in the CEH CMOS 6 technology
  • 7. Copyright 2007 CSEM | Titre | Auteur | Page 6 A Swiss History: Watch Microcontrollers • A watch circuit was 2’000 MOS at the time • In 1971, 1st microprocessor (4004) • In 1974, also for watches? Question! • In 1976, conference in Switzerland on « microcompressors arrival » ! • In 1978, uP working group with CEH, Uni Ne, EPFL, watchmakers • Goal: to find uP architecture specific to electronic watches, mainly very low power consumption Gate Matrix Logic CSEM 1985
  • 8. Copyright 2007 CSEM | Titre | Auteur | Page 7 Binary Decision Machine (BDM) • BDD (Binary Decision Diagram) • EPFL Research of Prof. D. Mange, LSL • Two instructions IF and DO • One can add: CALL and RETURN • Karnaugh Table either in hardware or in software • In software: BDD, executed by a BDM • Very simple uP Architecture 1 1 0 1 0 1 1 0 00 01 11 10 0 1 ab z c 1 1 0 1 0 1 1 0 00 01 11 10 0 1 ab z c 1 1 0 1 0 1 1 0 00 01 11 10 0 1 ab z c 1 1 0 1 0 1 1 0 00 01 11 10 0 1 ab z c a c c b c b 1 0 1 1 0 0 1
  • 9. Copyright 2007 CSEM | Titre | Auteur | Page 8 Binary Decision Machine • Instruction format: a single but very long word (similar to RISC, earlier) • But it was already the case for the first mainframe computers of the fifties • RISC today is simply re-discovering old technology ROM STACK M U X P C +1 MUX TEST H W H H H INPUTS Very Long Word
  • 10. Copyright 2007 CSEM | Titre | Auteur | Page 9 Number of clock cycles: benchmark • An analysis showed that the number of clock cycles executed by these watch processors for a watch application was about 100 (each second to increment the seconds, minutes, hours,… while the number of clock cycles of a conventional microprocessor like Intel 8048 was about 2'000 clock cycles for the same task. • So in energy: 70 times more efficient than a 8048 uP • Watch uP: Single instruction word, from 12 bits to 18 bits • Instruction Sets of 6 to 20 instructions (true RISC!!) • About 20'000 MOS transistor count • 6 micron technology at the time, it was about 50 mm2 of silicon, a very big circuit!
  • 11. Copyright 2007 CSEM | Titre | Auteur | Page 10 Architecture • It was a BDM and one datapath • First uP: failure • Layout too difficult! • Too complex • Sagrada Familia 4*PC ROM 512*16 M +1 M U X TEST MUX SEQUENCER SP CO PMUX Ø4 Ø1 Ø3 POP PUSH W #ADD #MUX IR15:0 RAM 30*8 PROM #M PRAM Ø4 WRAM OUTPUT Ø4 INPUT RRAM RRAM BUS 7:0 SA ALU MA Ø4 Ø3 LA LA C #OP T2 T1 RESIR IR15:13
  • 12. Copyright 2007 CSEM | Titre | Auteur | Page 11 The 1st microprocessor: COMBO (1983) • 800 instructions • 16 bits instr • 7 bit data • 20K MOS • 40 mm2 • 6 microns • 1.5 Volt • 0.4 A 16KHz
  • 13. Copyright 2007 CSEM | Titre | Auteur | Page 12 Comparison Year Micro techn Nb MOS Address 1971 4004 P-MOS 8 2’300 4K 1972 8008 P-MOS 8 3’500 16K 1974 8080 N-MOS 6 5’000 64K 1976 8085 N-MOS 4 6’000 64K 1978 8086 N-MOS 3 29’000 1M 1982 80286 N-MOS 2.3 130’000 16M 1985 80386 CMOS 2 275’000 4096M At CSEM: 1985: 2e uP in 4 m with 20 instructions of 17 bits, 24K MOS, 20 mm2, 0.4 A à 1.5 Volt. 1987: uP ETA, 35'000 MOS 1990: other uP with 100'000 MOS.
  • 14. Copyright 2007 CSEM | Titre | Auteur | Page 13 Competition • Competition (AMI, Eurosil, Hewlett Packard, Intel, Mitsubishi, National, RCA and Sharp) have designed watch microprocessors generally consuming more than 4 or 5 A, so 10 to 100 times more than CEH watch microprocessors • Electronic digital watches appear around 1975. • In 1977, the price of a digital watch was set down from more than 100$ to 10$. • For Christmas1976, TI sold LED watches with 5 functions for $9.95. • Profits disappear, it was similar to the calculator market and only 3 very big companies remain in this market: Casio, Seiko and Texas Instruments. TI decreased its prices to kick-off Casio and Seiko. • 20 years after, Intel President Gordon Moore had still an old Microna watch fabricated by Intel (my watch at 30 millions $, he said) to remember this lesson. • But Seiko and Casio were stronger than TI to decrease their prices, and it is TI that was forced to leave this market.
  • 15. Copyright 2007 CSEM | Titre | Auteur | Page 14 Watch microprocessor PUNCH (1990-1993) • Swiss watchmakers have decided to design a common watch uP that has to be the heart of all Swiss watches • The choice of the architecture is a multi-task machines • This allows us to define several independent tasks and to execute them in pseudo-parallelism • As soon as a task has to start, it is started immediately moteur Contrôle S M H J 1/1001/10 S M couronne automate 1Hz 100Hz modes automate
  • 16. Copyright 2007 CSEM | Titre | Auteur | Page 15 Hardware Scheduler TIME MODES MOTOR TIME MODES MOTOR scheduler Parallel Tasks in a Watch Application Hardware Scheduler Microprocessor Task 1 Task 2 Task 3 Task 4 • Estimation: about 20% less executed instructions • 103 assembly instructions (18 bits) • data of 8 bits. • The uP core contains 11'000 MOS • a complete microcontroller with its memories presents about 150'000 MOS. • 800 MIPS/watt
  • 17. Copyright 2007 CSEM | Titre | Auteur | Page 16 Tasks execution • The originality of the PUNCH is based on tasks that are executed in pseudo-parallelism, while executing one instruction of task1, then task2, etc… and back to task1. • It is also possible to define 1 to 4 tasks, so it is also possible to have a conventional monotask uP. Task 1 Task 2 Task 3 Task instructions continuously executed Delayed starting tasks in a single processor Principle of the MultiTask Architecture Instructions of 2 tasks alternatively executed Same scheme than above for 3 tasks Starting Task Task 1 Task 2 Task 3 Multitask Principle
  • 18. Copyright 2007 CSEM | Titre | Auteur | Page 17 Architecture of the multitask PUNCH ROM N x 18 bits Pc 0 Pc 1 Pc 2 Pc 3 Instr. Register Process Cntrl scheduler, stack pointers, router EventBank IOCommunication ExtEvents 21 8 ALU WR Datapath Ac0 Ac1 Ac2 Ac3 Ix0 Ix1 Ix2 Ix3 One has to quadruple: - the PC (program counter) - the AC (accumulator) - the IX (index register) It results in a reasonable cost In monotask mode, one uses the 4 PC as a stack
  • 19. Copyright 2007 CSEM | Titre | Auteur | Page 18 Test Chip • Used in wrist watches • Other applications • Belong to watchmakers, so difficult to give licenses to other customers • We think we can do even better for reducing power consumption  CoolRISC
  • 20. Copyright 2007 CSEM | Titre | Auteur | Page 19 Punch-based Tissot Two Timer • It is my watch
  • 21. Copyright 2007 CSEM | Titre | Auteur | Page 20 8-bit CoolRISC Microprocessor • RISC instructions (single 22-bit word) • Load/store architecture • Bank of 16 registers (not possible to implement multitask, 4*16 reg. is too much) • More than 100 instructions • Hardware stack, but also Branch & Link (call-return in software) • 3 stages pipeline, CPI=1 (Clock per Instruction) • Gated-Clock Technique (not to clock unused blocks) • Synthesized by Synopsys (I.P. core in VHDL, then logic synthesis) • CoolRISC core: 20’000 MOS
  • 22. Copyright 2007 CSEM | Titre | Auteur | Page 21 0 21 op<3> cc<3> addr<16> JUMP addr; JCC addr; PC0 <-- addr if cc then PC0 <-- addr 0 21 op<6> addr<16> CALL addr; CALLS addr; PCn <--PCn-1, PC1<--PC0+1, PC0 <-- addr IP <-- PC0 +1, PC0 <-- addr 0 21 op<9> 1 1 1 1 1 1 1 1 1 1 1 1 1 CALL IP; CALLS IP; RET; RETI; PUSH; POP; PCn <--PCn-1, PC1<--PC0+1, PC0 <-- IP IP <-- PC0 +1, PC0 <-- IP PCn-1 <-- PCn PCn-1 <-- PCn PCn <--PCn-1, PC1<--IP, PC0 <-- PC0+1 IP <-- PC1, PCn-1 <-- PCn, PC0 <-- PC0+1 reg <-- reg alu-op data 0 21 op<6> ALU reg, °data; data<8> alu<4> reg<4> ALU operations: MOVE CMOVE SHL SHLC SHR SHRC CPL INC INCC DEC DECC AND OR XOR ADD ADDC SUBD SUBDC SUBS SUBSC MUL MULA MSHL MSHR MSHRA CMP CMPA TSTB SETB CLRB INVB 0 21 op<5> ALU reg, addr; addr<8> alu<5> reg<4> reg <-- reg alu-op data-mem(addr) 0 21 op<5> ALU regr, reg1, reg2; alu<5> reg2<4> regr <-- reg2 alu-op reg1 reg1<4> regr<4> 0 21 op<3> ALU reg, (IX, offset); ALU reg, (IX, offset); ALU reg, (IX, offset); offset<8> alu<5> reg<4> reg <-- reg alu-op data-mem(IX+offset) reg <-- reg alu-op data-mem(IX), IX <-- IX + offset reg <-- reg alu-op data-mem(IX-offset), IX <-- IX - offset IX 0 21 op<5> ALU reg, (IX, R3); alu<5> reg2<4> reg <-- reg alu-op data-mem(IX+R3) IX 1 1 1 1 1 1 0 21 op<16> FREQ div; HALT; NOP; div<4> 0 21 op<3> cc<3> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 JUMP IP; JCC IP; RETS; PC0 <-- IP if cc then PC0 <-- IP PC0 <-- IP load/store conditional shift left with carry shift right with carry complement increment with carry decrement with carry logical and logical or logical xor addition with carry op1 - op2 with carry op2-op1 with carry multiply 2-compl. mult multiple shift multiple shift 2-compl. compare 2-compl. cmp bit test bit set bit reset bit invert COOLRISC 816 INSTRUCTION SET Instructions 0 15 MSB LSB PC 0 15 IP high 0 15 0 15 0 15 0 15 0 7 MSB LSB ACC 0 7 R0 0 7 R1 0 7 R2 0 7 R3 0 7 status IX0 high IX1 high IX2 high IX3 high IP low IX0 low IX1 low IX2 low IX3 low
  • 23. Copyright 2007 CSEM | Titre | Auteur | Page 22 CoolRISC Pipeline Fetch & branch 1 clock cycle Branch instructions fetch 1 clock cycle Arithmetic instructions execute store result - 3-stage pipeline - no load delay - no branch delay
  • 24. Copyright 2007 CSEM | Titre | Auteur | Page 23 Branch Instruction executed in one pipeline stage 1 clock cycle fetch & branch fetch alu the branch condition is available Critical Path: - ROM Precharge - ROM Read - Branch decode - Addresses multiplexor However, at 20 MHz, clock cycle time is 50 ns One can execute all this With CPI=1, at 20 MHz, one has 20 MIPS, it is very good In 0.18 um, about 100 MHz, consequently 100 MIPS
  • 25. Copyright 2007 CSEM | Titre | Auteur | Page 24 Bypass in the CoolRISC pipeline Arith Fetch branch one clock 18 bit RISC one word instruction Write Dec ALU RAM Fetch Branch condition code ready Fetch Dec RAM RAM Write Arith Write ALU RAM bypass
  • 26. Copyright 2007 CSEM | Titre | Auteur | Page 25 CoolRISC 816 PC <16> ROM (program) max 64 K instructions Branch Address 16 13 P C 0 M U X +1 P C 1 ROM index <16> PC 2..9 IR1 <22> Op-code Control Unit 2nd Stage MUX ABus <8> SBus <8> ALU<8> CY, Z RAM Index 2 L REG1 BBus<8> Data RAM Index 2 H RAM ROM (data) and Periph max 64K bytes RomAddr <16> RomInstr <22> DataOut <8> DataIn <8> RamAddr <16> ReadNWrite ChipSelect PROM CoolRisc Core 816 ctr gated clock gated clock First Pipeline Stage 8 IR2 <22> C. U. 3rd Stage CoolRisc 816 Core Branch Unit CALL to Interrupt Address Mulitplier ACC U 8 MSB 8 LSB REG2 RAM Index 3 L ROM Index L REG3 Status Register ROM Index H RAM Index 3 H RAM Index 0 L RAM Index 0 H RAM Index 1 L RAM Index 1 H REG0
  • 27. Copyright 2007 CSEM | Titre | Auteur | Page 26 Microphotography of the CoolRISC • Technology 1 m, Nov. 1995 • In 0.5 m, about 3000 MIPS/watt at 3.0 Volts (with memories) compared to 100 MIPS/watt for an Intel C51 • In 0.25 m, only the core (20’000 MOS): • TSMC 0.25m, 2.5 Volt, 60 MIPS • Power: 1.05 V. , 10 W per MHz, 100’000 MIPS/watt
  • 28. Copyright 2007 CSEM | Titre | Auteur | Page 27 CPI for some microprocessors Microcontroller ST62xx COP800 8048 Z86Cxx 68HC05 PIC16C5x Punch CoolRisc 81 CoolRisc 88 CoolRisc 816 instr. code 12 12 8 8 11 11 12 12 10 10 bits code 152 120 112 168 160 132 216 192 180 220 exec. instr. 60 60 35 35 59 59 74 74 58 58 exec. clocks 2704 2000 1125 692 226 * 300 296 74 58 58 CPI 45 33 32 20 4 * 5 4 1 1 1 * refered to the internal E frequency that is 2 times slower than the oscillator frequency For a given routine: shifting out 8-bit data & clock (synchrone)
  • 29. Copyright 2007 CSEM | Titre | Auteur | Page 28 Number of executed clock cycles NUMBER OF EXECUTED INSTRUCTIONS 8-bit multiply linear 8-bit multiply looped 16-bit multiply linear 16-bit multiply looped 16-bit division linear 16-bit division looped CoolRisc 88 PIC 16C5x Number of instructions and executed clocks code executed code executed 30 14 127 31 194 36 30 56 127 170 162 213 35 16 240 33 243 27 37 71 233 333 180 227 instr clock 30 56 128 170 162 213 instr clock 148 284 932 1332 760 1108
  • 30. Copyright 2007 CSEM | Titre | Auteur | Page 29 Wisenet Chip uses the CoolRISC
  • 31. Copyright 2007 CSEM | Titre | Auteur | Page 30 Conclusion about Watch Microprocessors • Huge impact of electronic watches on the development of microelectronics in Switzerland • One can say that it is similar for the development of microprocessors in Switzerland. • This history shows quite well that first Swiss microcontrollers have been designed for electronic watches, before to be used for other applications requiring low power consumption • What is the largest unused computation power in the world? The answer of D. Lando, Lucent Technologies: it is all the electronic watches in the world!!!!
  • 32. Low-Power DSP/MCU Cores C. Piguet CSEM Centre Suisse d’Electronique et Microtechnique SA
  • 33. Digital design • CSEM has a long history of designing low-power processors • CoolRISC, licensed by Semtech, Swatch group, TI, ... • Watch processors: PUNCH (1993), µPUS, Combo (1982), ... • Powerful new processors with ultra low power consumption • 2005: Macgic, a 16/24-bit DSP (4 MAC) • 2006: icyflex1 , a flexible processor for DSP/control applications • 2009: icyflex2, a smaller processor for control applications • 2009: icyflex4, a scalable processor for DSP/control applications Macgic and icyflex are registered trademarks of CSEM CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 34. Digital design – low-power processors • customizable (in VHDL) • configurable (at run-time) • Macgic 16/24-bit DSP • complex datapath (quad MAC) • very high parallelism (1.5k cycles for a 256 FFT) • assembler, debugger • 170 uW/MHz at 1.0 V in 180 nm • 150’000 equiv NAND gates, 2.1 mm2 in 180 nm • icyflex1 flexible 32-bit processor • includes DSP functions (dual MAC) • high parallelism (ex: 2.6k cycles for a 256 FFT) • C compiler (gcc), debugger (gdb),... • 120 uW/MHz at 1.0 V in 180 nm • 110’000 equiv NAND gates, 1.6 mm2 in 180 nm icyflexTM CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 35. Ongoing processor development • icyflex2 : • 50% less area than icyflex1 (removed DSP characteristics) • Higher frequency (longer pipeline) • Lower power consumption for control type applications • Optimized for C compiler • icyflex4 : • Scalable architecture for much higher throughput • Higher frequency (longer pipeline) • Lower power consumption for DSP/control type applications CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 36. Processor positioning CSEM DSP/MCU jan 2009 | C. Piguet | Page 1 MUL 2 MAC 4 MAC … 36 MAC icyflex2 Control Computing Power DSP icyflex1 icyflex4 Macgic 1 MUL 2 MAC 4 MAC … 36 MAC
  • 37. Processor roadmap CSEM DSP/MCU jan 2009 | C. Piguet | Page 2005 2006 2007 2008 2009 2010 1st prod Abilis 1st Si Macgic development 1st prod 1st Si icyflex2 dev 1st prod 1st Si icyflex4 dev 1st Si icyflex1 dev
  • 38. icyflex processor family overview Name MAC (MUL) C compiler Processor Pipeline length Instruction width Status Macgic 4 DSP 3 32 Prod. icyflex1 2 Yes DSP/MCU 3 32 Prod. icyflex2 (1) Yes MCU 5 32 Dev. icyflex4 4 + 4*N Yes DSP/MCU 5-7 64 Dev. CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 39. MACGIC: Mobile TV Chip (DVB-T/H) by Abilis CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 40. This chip contains three MACGIC cores • Abilis: To become the world leading supplier of semiconductor solutions of multimode, digital TV receiver and broadband wireless connectivity for mobile terminals myTV
  • 41. World first single die DVB demodulator • Abilis Systems (Kudelski group), Switzerland • World first single die programmable DVB-T/H demodulator, Aug 2007 • Unique Software Defined Radio architecture • Manufactured by IBM, RF-CMOS 90 nm technology world leader • Ultra-low power DSP technology by CSEM • Multi-band silicon tuner • World’s smallest DVB-T/H receiver: 5 x 5 mm • Performance • Dynamic Echo Handling for best indoors/mobile reception • Adaptive demodulation • Meets MBRAI, exceeds NorDig 1.0.3 • Up to -100 dBm sensitivity (8k, 8MHz, QPSK, ½) CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 42. World first single die DVB demodulator (cont’d) CSEM DSP/MCU jan 2009 | C. Piguet | Page Reconfigurable AD/DA converters RF receiver S-band MCU Subsystem (Link Layer) RISC core HW Accelerator Programmable OFDM Engine RF receiver DVB-T/H DVB-T/H HW Accelerator Cordic WiFi Other WiMAX DVB-T/H Host MPEG stream (Encoded ) Macgic DSP Macgic DSP Macgic DSP RF Tuner Channel Estimation & Correction & Decoding A/D conversion Link layer
  • 43. Overview of the icyflex1 processor
  • 44. Digital design – icyflex1 architecture CSEM DSP/MCU jan 2009 | C. Piguet | Page Optimized for minimal power consumption: • 32-bit instructions • 3-stage pipeline • Load/store RISC architecture • Configurable instructions
  • 45. High level of parallelism with 32-bit instruction words datapath Load/store Parallelism datapath-load/store Up to 10 simple operations in parallel (2×MUL, 2×ACC, …) Up to 6 operations in parallel (2 times load 2 data-words in parallel store, address generation, …) Total: up to 16 operations executed in parallel in a single 32-bit instruction a single 32-bit instruction CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 46. icyflex instruction set and addressing modes • Hardware loop and repeat instructions • Standard: MUL, ADD, MAC, CMP, MAX, AND,…. • SIMD (Single Instruction Multiple Data): ADD2, MUL2, MAC2, … • e.g. 2 independent fixed-point MAC in parallel • Instructions/addressing modes to support C compiler • Configurable instructions • Addressing modes for DSP type processing and for a C compiler • A large variety of addressing modes: • Ranging from the basic addressing modes: indirect, 1,  offset, modulo • To very complex addressing modes (configurable): – for instance: an <= (an + om + 8 × OFFA ) % mp CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 47. Performance: benchmarks of the icyflex processor Algorithm on the icyflex processor Clock Cycles Sum of a vector of N values ~ N/2 Addition or multiplication of 2 vectors of N values ~ N Norm/mean/standard deviation/clipping of a vector ~ N/2 Minimum/maximum of a vector ~ N/2 Multiplication of 2 matrices of N×M values ~ (N × M) × (N/2+2) Matrix transposition ~ N × M × (5/8) FIR filter/convolution ~ ½ per tap FIR filter/convolution, complex data ~ 2 per tap IIR filter (biquad) ~ 2 per tap Complex FFT of N= 64 values ~ 440 Complex FFT of N=256 values ~ 2.6 k CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 48. Performance: Comparison with other DSPs Company / Processor FIR filter Clock cycles per tap Complex FFT 256 points Clock cycles CSEM / Macgic Audio-I ~1/4 1.5 k CSEM / icyflex ~1/2 2.6 k Analog Devices / Blackfin BF531 ~1/2 3.2 k Texas Instruments / TMS320VC5501 ~1/2 5.5 k Philips / CoolFlux DSP ~1/2 5.5 k Analog Devices / ADSP2191M ~1 7.4 k Motorola / M56F8323 ~1 12 k MicroChip / dsPIC30 ~1 ~19 k Texas Instruments / MSP430F14x ~28 ~53 k CoolRISC 8-bit - ~60 k MicroChip / PIC18F4220 ~160 3.2 M CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 49. Processors designs optimized for energy efficiency Features Starcore Macgic icyflex1 CoolFlux Bits per Instruction 128-bit 32-bit 32-bit 32-bit Data Word width 16-bit 24-bit 32-bit 24-bit Number of MAC 4 4 2 2 Memory Transfer 8 8 4 2 Operations per cycle 32 32 16 8 Number of equivalent NAND gates 600k 150k 115k 45k Clock cycles for FFT 256 ** 1'614 1'410 * 2’600 * 5’500 Average Power per MHz @ 1V * 350 µW 170 µW *115 µW * 75 µW Power per MHz @ 1V for FFT * 600 µW 300 µW *200 µW * 130 µW Normalized energy for FFT @ 1V 2.3 1 1.2 1.7 **single precision *estimated CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 50. Silicon area of the processor core Processor Equiv NAND gates Process 0.18 µm 0.13 µm 0.09 µm icyflex 32-bit (*) 110’000 1.75 mm2 0.70 mm2 0.34 mm2 Macgic Audio-I 24-bit 150’000 2.1 mm2 0.85 mm2 0.41 mm2 Silicon area is dominated by memories in most applications, or by analog / RF blocks in very deep submicron processes. * using CSEM’s thick-gate standard cell library CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 51. CSEM DSP/MCU jan 2009 | C. Piguet | Page Software development tools • GNU C compiler (gcc) • software implementation of IEEE floating-point standard • icyflex instruction parallelism supported by latest releases of gcc • successful pass of whole GNU test suite for all optimization levels • GNU assembler / linker (binutils) • BFD / ELF32 object file format • Binary, SREC, IHEX memory image file formats • icyflex instruction set simulator (ISS), written in C++ • Phase-accurate, pipelined • Wrappers to SystemC, VHDL (Modelsim), Matlab/Simulink • GNU debugger (gdb) • Mode 1: instruction set simulator of the icyflex core • Mode 2: On-Chip Debug (OCD) through a JTAG interface • Eclipse integrated development environment • CDT C/C++ IDE plug-in • icyflex plug-in • Using library of subroutines and DSP subroutines with optimized minimal number of instructions
  • 52. CSEM DSP/MCU jan 2009 | C. Piguet | Page icyflex1 toolchain C code (.c, .h) Assembly (.asm, .i) Object (.o) Assembly (.s) Listing (.lst) icfx-gcc icfx-dasp icfx-as icfx-ld icfx-ar/icfx-ranlib Library (.a) ELF (.bin) icfx-run icfx-gdb icfx-objdump icfx-run_srec icfx-objcopy Hex file (.srec, .ihex) Listing (.lst) data (.ext) GNU-based tool icfx-stim_mk2 Testvect (.idx,.vhd) icfx-tv_exec Waveform (.vcd) libsim.a libsimicfx.a non-GNU tool,lib
  • 53. Overview of the icyflex2 processor
  • 54. icyflex2 : a trimmed down processor for control apps Data Move Unit Data Processing Unit Accumulate datapath & registers MicroOPeration datapath & registers Coprocessor registers Program Sequencing Unit PC Branch Flag Exception Instr exec/xfer pc sbr in ex ec hd pf dm pb iel epl air HW loop lend lbeg irit HW stack slba scnt sppa pa GP registers r0 r1 r2 r3 r4 r5 r6 r7 X AGU Y AGU px0 px2 px4 px6 px7 px1 px3 px5 mx0 mx2 mx4 mx6 cx6 cx0 cx2 cx4 py0 py1 my0 cy0 Host and Debug Unit Host-side Core-side Step stepc Host register access hrs hrd Debug engine dcr ddr Config/Status csr P Break X Break Y Break 2 ALU 2 Multipliers CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 55. Area comparison: icyflex1 vs icyflex2 CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 56. Overview of the icyflex4 processor
  • 57. icyflex4 icyflex4 a scalable processor architecture • Modular architecture • Four processing units • Scalable • PSU: Program Sequencing Unit • 8 vectorized interrupts • 64-bit instruction bundles • DMU: Data Move & Processing Unit • 16 GP +16 index registers • 4 MUL, 4 ALU, 4 SHIFT • VPU: Vector Processing Unit • 1, 2, 4 or 8 VPU slices • HDU: Host and Debug Unit • Hardware breakpoint engines CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 58. Examples of icyflex-based System-on-Chip
  • 59. CSEM DSP/MCU jan 2009 | C. Piguet | Page icyfirst : first integration of icyflex1 in Nov 2006 • This integration targets very low leakage for applications requiring limited processing power • Standard cell library: thick gates for leakage reduction by 800x • Measured speed: 400 kHz @ 1.1 V • Avg dyn power: 140 W/MHz @ 1.1 V • Core: 115k eq. gates, 1.75 mm2 • Peripherals: 110k eq. gates, 1.5 mm2 • Memory: 10.7 mm2 • TSMC 180 nm generic CMOS SRAM 2 kiWords of 32-bit
  • 60. CSEM DSP/MCU jan 2009 | C. Piguet | Page icyfirst : first integration of icyflex1 in Nov 2006 (cont’d) • icyflex1 core • 128 KiBytes SRAM • Clock generator • Voltage regulator • POR, watchdog, timers • Request controller • DMA and bus controllers • 2 x 16 bit GPIO • 2 x I2C • 2 x SPI • 2 x I2S • JTAG controller icyflex1
  • 61. icycam: a System-on-Chip for vision applications • icyflex1 runs at up to 50 MHz • QVGA CMOS pixel array (320 x 240) • 14 um pixel pitch • logarithmic encoding of luminance • close to 7 decades of intra-scene dynamic range encoded on 10 bits • graphical coprocessor • SRAM: 128 KiBytes • DMA, SPI, PPI, GPIO, UART, SDRAM, JTAG • Tower Semiconductor, 180 nm, CIS CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 62. icycom: a System-on-Chip for RF applications • icyflex1 runs at up to 3.2 MHz • RF: 865 ~ 915 MHz, FSK (incl. MSK, GFSK), 4FSK, OOK, OQPSK • TX: 10 dBm • RX: -105 dBm at 200 kb/s (BER = 10-3) • Power management • Power supplies for external devices • Low power modes: multiple standby modes • 10 bit ADC • SRAM: 64 KiBytes (with MBIST) • DMA, RTC, Timers, Watchdog, I2C, SPI, I2S, GPIO, UART, JTAG • TSMC, 180 nm, generic CSEM DSP/MCU jan 2009 | C. Piguet | Page
  • 63. icycom: a System-on-Chip for RF applications (cont’d) CSEM DSP/MCU jan 2009 | C. Piguet | Page Power Management In: 1.0 to 1.8 V or 2.2 to 3.6 V Out: Vin, 2.7 V 1.2 to Vin -0.1 icycom chip A/D Interfaces Program & Data Memory icyflex1 IO supply RF External Component IO EEPROM IO
  • 64. References • C. Piguet, "Binary-decision and RISC-like machines for semicustom design", Microprocessors and Microsystems, Vol 14, No 4, May 1990, pp. 231-240. • J-F Perotto, C. Lamothe, C. Arm, C. Piguet, E. Dijkstra, S. Fink, E. Sanchez, J-P Wattenhofer, M. Cecchini, "An 8-bit Multitask Micropower RISC Core", JSSC Vol. 29, No 8, August 1994, pp. 986-991. • C. Piguet, J.-M. Masgonty, C. Arm, S. Durand, T. Schneider, F. Rampogna, C. Scarnera, C. Iseli, J.-P- Bardyn, R. Pache, E. Dijkstra, "Low-Power Design of 8-bit Embedded CoolRISC Microcontroller Cores", IEEE JSSC, Vol. 32, No 7, July 1997, pp. 1067-1078 • C. Piguet, “The First Quartz Electronic Watch”, invited talk at PATMOS, Sevilla, Spain, September 11-13, 2002. • C. Arm, J.-M. Masgonty, M. Morgan, C. Piguet, P.-D. Pfister, F. Rampogna, P. Volet; “Low-Power Quad MAC 170 W/MHz 1.0 V MACGIC DSP Core”, ESSCIRC 2006, Sept. 19-22. 2006, Montreux, Switzerland • [ Copyright 2007 CSEM | Titre | Auteur | Page 63
  • 65. References • C. Arm, S. Gyger, J.-M. Masgonty, M. Morgan, J.-L. Nagel, C. Piguet, F. Rampogna, P. Volet, « Low-Power 32-bit Dual-MAC 120 mW/MHz 1.0 V icyflex DSP/MCU Core”, ESSCIRC 2008, Sept. 15-19, 2008, Edinburgh, Scotland, U.K. • C. Piguet, « History of the Development of Swiss Watch Microprocessors », IEEE SSCS NEWS, Summer 2008, Vol. 13, No. 3, pp. 50-55. • Christian Piguet, Jean-Luc Nagel, Vincent Peiris, Stève Gyger, Daniel Séverac, Marc Morgan, Jean-Marc Masgonty, « Low-Power Heterogeneous Systems-on-Chips”, Journal of Low Power Electronics JOLPE, Vol. 4, No 2, pp.111-126, August 2008 Copyright 2007 CSEM | Titre | Auteur | Page 64
  • 66. Thank you for your attention.