SlideShare a Scribd company logo
1 of 38
Prepared and Presented By:
Hossam Hassan
MSIS LAB, CBNU
An Ultra-Low Power Asynchronous-Logic In-Situ Self-
Adaptive VDD System for Wireless Sensor Networks
Authors: Tong Lin, Kwen-Siong Chong, Joseph S. Chang, and Bah-Hwee Gwee
Journal: IEEE Journal of Solid-State Circuits, vol. 48, no. 2, 2013
Outline
• Preliminaries
• Wireless Sensor Network
• Node Architecture
• Proposed Idea for Low Power Design
• Self-Adaptive VDD System for Wireless Sensor Networks
• Adaptive Vdd Scaling Systems
• System Design
• Results And Benchmarking
Preliminaries
• What Is Asynchronous Logic?
• Traditional way of Sequencing and Computation is the use of a
global time reference (“the clock”)
• Can we compute without a clock?
• Yes!: “asynchronous” or “clockless” logic
• Also “self-timed” or “speed-independent”
• Asynchronous system: collection of modules communicating
by handshake protocols
• Can we compute without a clock and without delay
assumptions?
• Quasi-delay-insensitive (QDI) logic
Adopted from:
Alain J. Martin, California Institute of Technology
Preliminaries
• Why Asynchronous and QDI Logic?
• No clock
• Up to 50% of clock power recuperated (get back)
• Automatic shut-off of idle parts
• Perfect clock gating
• No glitches (spurious transitions)
• Up to 50% of power in combinational circuits
• Automatic adaptation to parameter’s variations
• Voltage scaling: Perfect exchange of delay against energy through voltage scaling
• Flexibility of asynchronous interfaces:
• Better use of concurrency
• Robustness to PVT Variations: Variations of physical parameters all affect timing.
Adopted from:
Alain J. Martin, California Institute of Technology
Preliminaries
• Disadvantages of Async
• Size overhead (more transistors) (i.e. Handshaking)
• Poorly understood and rarely taught
• No industrial CAD tools (yet) (i.e. Custom Design)
• No well-developed testing procedure (yet) (i.e. Custom Design)
Preliminaries
• Static Logic vs Dynamic Logic
Preliminaries
• NULL Convention Logic
• NCL is a delay-insensitive (DI) asynchronous (i.e. clockless) paradigm, which means that NCL
circuits will operate correctly regardless of when circuit inputs become available; therefore
NCL circuits are said to be correct by-construction (i.e. no timing analysis is necessary for
correct operation). NCL circuits utilize dual-rail or quad-rail logic to achieve delay-
insensitivity.
Preliminaries
• Pre-Charge Static Logic (PCSL):
• It is an asynchronous-logic Quasi-Delay-Insensitive architecture
based on Static-Logic, featuring fully-range Dynamic Voltage Scaling
including robust operation in the sub-threshold voltage regime,
with simultaneous low hardware overheads, high-speed and yet
low power dissipation.
• The PCSL logic circuit achieves this by integration of the Request
sub-circuit into the Static-Logic cell.
• During the initial phase, the output of Static-Logic cell (within the
PCSL logic circuit) is pre-charged.
• During the evaluate phase, the Static-Logic cell computes the input
and the PCSL logic circuit outputs the computation.
Enable the
circuit
State Retention
(i.e store the logic
output value)
Pre-Charged Static-Logic
(PCSL) architecture
Preliminaries
• Muller C-elements:
• It is a small digital block widely used in design of asynchronous circuits and systems.
• In a Synchronous Circuit, the role of the clock is to define points in time where signals are stable and valid. In
between the clock ticks, signals may exhibit hazards and may make multiple transitions as combo circuit
stabilizes.
• In Asynchronous System, situation is different. The absence of clock means signals are valid all the time, every
transition has a meaning and consequently any hazard and races must be avoided.
Muller C Element and corresponding
CMOS implementation.
Truth Table for Muller C
Element
Preliminaries
• Filter bank
• In signal processing, a filter bank is an array of band-pass filters that separates the input
signal into multiple components, each one carrying a single frequency sub-band of the
original signal.
• The process of decomposition performed by the filter bank is called analysis (meaning
analysis of the signal in terms of its components in each sub-band); the output of analysis is
referred to as a sub-band signal with as many sub-bands as there are filters in the filter bank.
• The reconstruction process is called synthesis, meaning reconstitution of a complete signal
resulting from the filtering process.
Preliminaries
• Frequency Response Masking (FRM):
• Frequency-response masking filters are a technique to design sharp low-pass, high-pass,
bandpass and band-stop filters with arbitrary passband bandwidth.
• furthermore linear phase FIR filters are generated, which have advantages such as
guaranteed stability and are free of phase distortion.
• however, the problem with FIR filters is the high complexity for sharp filters
• with the frequency-response masking technique the resulting filter has very sparse
coefficients
• since only a very small fraction of its coefficient values are nonzero, its complexity is very
much lower than the infinite word-length minimax optimum filter
• with an additional multiplier-less design method the complexity is reduced to a minimum
• in linear phase FIR filters phase is a linear function of frequency
• they have a symmetric impulse response
Preliminaries
• Dynamic frequency scaling
• It is a technique in computer architecture whereby the frequency of a microprocessor can be
automatically adjusted "on the fly", either to conserve power or to reduce the amount of
heat generated by the chip.
• It is commonly used in laptops and other mobile devices, where energy comes from a battery
and thus is limited.
• Dynamic voltage scaling:
• It is another power conservation technique that is often used in conjunction with frequency
scaling, as the frequency that a chip may run at is related to the operating voltage.
• Since increasing power use may increase the temperature, increases in voltage or frequency
may increase system power demands.
Preliminaries
Impact of DVS
Wireless Sensor Network
• Spatially distributed autonomous sensors
• Monitor physical or environmental conditions
• Temperature, sound, etc.
• Pass their data through the network to a main location
• Modern networks are bi-directional, also enabling control of sensor activity
• Applications
• Battlefield surveillance
• Industrial process monitoring
Wireless Sensor Network
• The WSN is built of "nodes“
• a few to several hundreds or even thousands
• each node is connected to one (or sometimes several) sensors
• Each such sensor network node has typically several parts
• a radio transceiver
• a microcontroller
• an electronic circuit for interfacing with the sensors
• an energy source, usually a battery
• As the WSN is typically designed for multiple-year operational life-span,
power is carefully budgeted and where pertinent, energized only when
required, such that the overall average power is typically 10–100 uW.
• Achieve the lowest possible power operation for the prevailing throughput
and circuit conditions—VDD adjusted to within 50 mV of the minimum
voltage, yet high operational robustness with minimal overheads for a WSN.
Node Architecture
Proposed Idea for Low Power Design
• Signal processor accounts for ~50% of total power
consumption
• ‘Sub-threshold Self-Adaptive Scaling’ (SSAVS)
• Circuits work in sub-threshold region
• Supply voltage is adjusted dynamically depending on
the processing speed required by external environment
• Adopting the Quasi-Delay-Insensitive (QDI)
asynchronous-logic protocols where the circuits
therein are self-timed,
• Embodiment of Subthreshold Pre-Charged-Static-
Logic (PCSL) design approach.
• Async SSAVS system has been benchmarked against
its conventional sync DVFS system counterpart.
Proposed Idea for Low Power Design
• Asynchronous logic implementation
• Pre-charged Static Logic (PCSL)
• Superior than existing asynchronous logics in energy, delay and chip area.
Self-Adaptive VDD System for Wireless
Sensor Networks
• As the WSN is typically designed for multiple-year operational life-span, power is carefully
budgeted and where pertinent, energized only when required, such that the overall average
power is typically 10–100 uW.
• In our WSN depicted in Fig. 1, its overall active/passive operation ratio is approximately 20/80. In
the passive mode, only the Sensor Front-End module is continuously energized. The Sensor and
the Conditioning Circuits therein are powered directly by VDD_BAT ( 2.8 V) battery, via a Low-
Dropout (LDO) Regulator.
• The Simple Processor is powered by VDD_NOM (1.2 V) via a power-efficient Buck DC-DC
Converter.
• The Simple Processor ascertains if the input is possibly useful, and if it is, the WSN goes into
active mode where it signals the Power Management module to energize the Signal Processor
module via VDD_ADJ .
Self-Adaptive VDD System for Wireless
Sensor Networks
• The voltage of VDD_ADJ, typically in the sub-threshold voltage (sub-Vt) range, is self-adjusted
such that the lowest possible voltage is used—to enable ultra-low power operation.
• Signal Processor Module:
• The Signal Processor module buffers (via a FIFO) the output of the Simple Processor, filters the output
signal before final computation by the Microcontroller Unit (MCU).
• When the MCU ascertains that the filtered signal is useful, the Wireless Transceiver is energized and the
processed signal is subsequently transmitted wirelessly.
• With the wireless transmission expected to be 0.01% active and with a 20/80 WSN active/passive
operation, 50% of the overall power is attributed to the Signal Processor module, which is of interest in
terms of power dissipation.
Self-Adaptive VDD System for Wireless
Sensor Networks
• The approaches taken to minimize power involve all levels of the design space including
algorithmic design and at the hardware level.
• Frequency Response Masking (FRM) technique
• In the algorithmic design, the filtering in the Signal Processor module embodies the Frequency
Response Masking (FRM) technique.
• This involves the Interpolated Finite Impulse Response (IFIR) Filter and the FRM Filter Bank (FB), and is
computationally more efficient than the usual FIR and IIR filter approaches.
• Ultra-low power design techniques in the hardware level, the operation in the sub- region is one
of the most effective.
• This is particularly applicable because the speed of the digital circuits in the Signal Processor is
modest—the clocking speed ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from 0.1
kSamples/s (kS/s) to 100 kS/s.
Self-Adaptive VDD System for Wireless
Sensor Networks
• Despite the potential advantages of sub- operation, this region of operation is challenging here
for several reasons.
• First, the WSN is designed to work in a wide range of conditions, including extreme environments (-55o
C to +125o C).
• Second, Process, Voltage and Temperature (PVT) variations for fine-dimensioned CMOS processes
increase dramatically in sub- operation, and the ensuing delay variations are very severe, possibly
intractable. Typically, a very large delay safety margin (for synchronous-logic (sync) circuits) would need
to be allowed for.
• Third, the input signal to the Signal Processor module is variable. From a robust operation perspective,
the circuits would need to be designed to meet the worst-case conditions— the fastest input rate and
extreme temperatures.
• To design the WSN for ultra-low power operation, a self-adjusting VDD approach whilst operating
in the sub-Vt region, termed ‘Sub-threshold Self-Adaptive VDD Scaling’ (SSAVS) where the VDD is
in-situ dynamically self-adjusted is adopted.
Self-Adaptive VDD System for Wireless
Sensor Networks
• The operation involves ‘dialing up’ VDD when the need for computation increases or when the
operating conditions are less favorable, and VDD is ‘dialed-down’ when the conditions are the
converse.
• Put simply, the lowest VDD is used where possible because in general the lower the VDD, the lower is
the power dissipation due to dynamic and leakage currents.
• The novel self-adjustment is obtained very simply—by exploiting (and comparing) the existing
Request and Acknowledge signals of the QDI protocol signaling, and thereafter adjusting the
VDD_ADJ accordingly. The ensuing overhead is hence very low.
Adaptive Vdd Scaling Systems
• The general modality of adaptive VDD scaling systems to reduce power is to adaptively adjust as
low as possible (with appropriate timing margin) to meet the throughput requirement for the
prevailing operating conditions (including PVT variations).
• This largely requires the pertinent circuit delay variations to be tracked, observed, or inferred.
• There are many reported techniques, but it can be argued that these reported tracked, observed
and inferred techniques are inadequate in terms of robustness, particularly in sub-Vt operation.
Further, the hardware/computation overheads are considerable, including the need to scale VDD
with the scaling of the clock frequency, i.e. Dynamic Voltage Frequency Scaling (DVFS).
• The proposed idea directly measuring the delay and comparing it against the throughput for the
prevailing conditions, and VDD is thereafter adjusted accordingly.
• To enable this, the adoption of the self-timed async QDI where its dual-rail encoding includes the
Request signal which indicates that the input sample is ready and the Acknowledge signal that indicates
the completion of the computation.
Adaptive Vdd Scaling Systems
• By counting the number of Requests against Acknowledges within a given period, we ascertain if
the delay of the circuit is excessive, or otherwise, with respect to the throughput for the
prevailing conditions.
• VDD is thereafter adjusted accordingly such that the delay is just slightly less than the delay between
input samples, thereby satisfying the throughput.
• Further, as Acknowledges is inherent in QDI async protocols, the computation is uninterrupted
while VDD is transitioning during its self-adjustment; in reported adaptive scaling systems, circuit
operation typically ceases when is transitioning.
System Design
• Fig. 2 depicts the proposed SSAVS system
within the Power Management module
embodying the SSAVS Controller and its
associated adjustable VDD means (a Buck
DC-DC Converter), and the PCSL-based 8x8-
Bit Quad-Channel Async QDI FRM FB within
the FRM FB.
• There are two voltage rails in the overall
proposed SSAVS system a fixed VDD_NOM
and a variable VDD_ADJ whose sub-Vt
voltage typically ranges from 150mV to
400mV.
• For ease of illustration, the specific VDD rail
is shown in parenthesis for the supply rails
and for signals of the various modules.
System Design
• In Fig. 2, the voltage of input and of request signals is first adjusted from VDD_NOM =1.2 V to
VDD_ADJ by the Step-Down Level Converter, and are thereafter buffered by the Async FIFO
Buffer (depth of 50) before input (Input_FB and Req_FB) to the async FRM FB.
• The FB outputs (Output 1–4) and their associated Acknowledges (combined from Ack 1–4 via the
Completion Detection Circuit) are output to the MCU for further processing.
• Acknowledge is also fed back to the Async FIFO Buffer.
• The Request and Acknowledge signals are input to the Power Management module, and
Acknowledge is stepped up from VDD_ADJ to VDD_NOM.
• The SSAVS Controller within the Power Management module monitors the number of requests
and Acknowledge signals in each period (a 10 Hz clock generated by the Update VDD Clock
Generator for a target throughput of 1 kS/s).
• The VDD_Code is a 5-bit code that sets one of 24 voltage levels (in the Buck DC-DC Converter)
ranging from ‘00000’=50 mV to ‘10111’=1.2 V (in 50 mV steps) for VDD_ADJ.
System Design
• Fig. 3 graphically depicts an example of the self-adjustment of VDD_ADJ.
• When the WSN is first initiated, the SSAVS Controller outputs VDD_Code = ‘10111’, equivalently
VDD_ADJ = 1.2 V, and the speed of the FB would far exceed the required computation.
• The voltage of VDD_ADJ of the FB is in-situ adaptively self-adjusted to be as low as possible
(within 50 mV) to meet the throughput for the prevailing operating conditions, and on average,
the voltage of VDD_ADJ is slightly higher than the actual required minimum.
• Hence, the FB is ultra-low power and highly power-efficient.
System Design
• In view of the need for sub-Vt operation, it is imperative to adopt circuits based on the static-logic
family to mitigate the effects of critical transistor sizing; dynamic- and pass-logic families are
inappropriate.
• Pre-Charged Static-Logic’ (PCSL).
• The basic architecture comprises an Inverting Static-Logic Cell, three transistors (for output pre-charging
during the reset phase/evaluation during the computation phase), and two inverters (for output
buffering). The outputs are Q.T (Output True) and Q.F (Output False).
The basic architecture of the proposed async cells, coined ‘Pre-Charged Static-Logic’ (PCSL).
System Design
• In PCSL cells, when Request is ‘0’, both outputs are ‘0’. On the other hand, when Request is ‘1’
(indicating that an operation is ready) and when the input signals are valid, the operation
commences and an ensuing output is obtained.
• The architecture of the PCSL cell involves an integration of the sub-circuit associated with the
signal and a buffer (to each output) into the standard static-logic library cell (redesigned for dual-
rail async), thereby sharing of (common) transistors.
• This reduces the number of transistors, resulting in simultaneous lower power/energy dissipation,
faster speed and smaller IC area.
System Design
• To depict the hardware advantage of the proposed PCSL approach, the 2-input AND/NAND gate in
can be compared to the same gate realized by three reported static-logic QDI approaches:
a) Delay-Insensitive- Minterm-Synthesis (DIMS) approach
b) NULL Convention Logic (NCL) with complex gates (denoted NCL1), and
c) NCL with fast-reset complex gates denoted NCL2).
System Design
• On the basis of simulations (130 nm CMOS), delay and IC area of six basic cells of the various
approaches. The competing cells are normalized to the PCSL cells whose actual values are shown
within parentheses. The average attributes are tabulated in the last row.
• Cells embodying the proposed PCSL approach simultaneously exhibit the lowest , shortest delay
and smallest IC area.
System Design
• With the proposed PCSL QDI realization approach, an 8x8-Bit Quad-Channel Async QDI FRM FB is
designed.
• A semicustom design flow is adopted.
• Each FB channel comprises an Async Read/Write Controller, an 8x8-Bit Coefficient Memory, an 8x8-Bit
Data Memory, an 8-Bit PCSL Multiplier, and a 20-Bit PCSL Adder.
• To preserve the QDI protocol and proper async handshaking, Datapath Completion Detection (DCD) and
Latch Completion Detection (LCD) circuits are included with Muller C-elements (denoted by a ‘C’).
Latch Completion
Detection (LCD)
Datapath Completion
Detection (DCD)
Scenario 1, the
sync DVFS
system
embodies a
temperature
sensor and on
the basis of the
measured
temperature and
pre-
characterization
of the sync filter,
the clocking
frequency is
selected
accordingly.
RESULTS AND
BENCHMARKING
Scenario 2, the
sync DVFS
system is much
simpler where
the clocking
frequency is
fixed (to the
worst-case) to
accommodate all
conditions.
RESULTS AND
BENCHMARKING
• Scenario 1, no specific FB is particularly advantageous—the sync DVFS FB and async SSAVS FB are
advantageous in different conditions.
• Nevertheless, the sync FB may be disadvantageous if the temperature sensor overheads
associated with DVFS for Scenario 1 are considered.
• In Scenario 2, the async FB is advantageous in terms of reduced delay with respect to VDD, usually
lower Eper with respect to VDD, and in terms of power dissipation, advantageous in some
conditions (while the sync advantageous in other conditions).
• Further, in the context of continuous circuit operation and overheads associated with DVS, the
proposed SSAVS is advantageous over the conventional DVFS in terms of uninterrupted circuit
operation and not requiring external intervention (such as changing clock rate, pre-
characterization, etc.).
Results And Benchmarking
Ultra-Low Power Asynchronous Logic Wireless Sensor Network
Ultra-Low Power Asynchronous Logic Wireless Sensor Network

More Related Content

What's hot

Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Ashwin Shroff
 
COMBINATIONAL PLD-BASED STATE MACHINES
COMBINATIONAL PLD-BASED STATE MACHINESCOMBINATIONAL PLD-BASED STATE MACHINES
COMBINATIONAL PLD-BASED STATE MACHINESdaxesh chauhan
 
Analog to Digital converter in ARM
Analog to Digital converter in ARMAnalog to Digital converter in ARM
Analog to Digital converter in ARMAarav Soni
 
VLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALUVLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALUSachin Kumar Asokan
 
Programmable array logic
Programmable array logicProgrammable array logic
Programmable array logicGaditek
 
I2C programming with C and Arduino
I2C programming with C and ArduinoI2C programming with C and Arduino
I2C programming with C and Arduinosato262
 
Design and Fabrication of 4-bit processor
Design and Fabrication of  4-bit processorDesign and Fabrication of  4-bit processor
Design and Fabrication of 4-bit processorPriyatham Bollimpalli
 
fpga programming
fpga programmingfpga programming
fpga programmingAnish Gupta
 
Ppt on interfacing led and 7 segment with 8951
Ppt on interfacing led and 7 segment with 8951 Ppt on interfacing led and 7 segment with 8951
Ppt on interfacing led and 7 segment with 8951 pooja jaiswal
 
An application of 8085 register interfacing with LCD
An application  of 8085 register interfacing with LCDAn application  of 8085 register interfacing with LCD
An application of 8085 register interfacing with LCDTaha Malampatti
 
Programmable Logic Devices Plds
Programmable Logic Devices PldsProgrammable Logic Devices Plds
Programmable Logic Devices PldsGaditek
 
Interfacing to the analog world
Interfacing to the analog worldInterfacing to the analog world
Interfacing to the analog worldIslam Samir
 
Report no.5(microprocessor)
Report no.5(microprocessor)Report no.5(microprocessor)
Report no.5(microprocessor)Ronza Sameer
 

What's hot (20)

Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)
 
COMBINATIONAL PLD-BASED STATE MACHINES
COMBINATIONAL PLD-BASED STATE MACHINESCOMBINATIONAL PLD-BASED STATE MACHINES
COMBINATIONAL PLD-BASED STATE MACHINES
 
Lcd interfacing
Lcd interfacingLcd interfacing
Lcd interfacing
 
LCD interfacing
LCD interfacingLCD interfacing
LCD interfacing
 
CPLD xc9500
CPLD xc9500CPLD xc9500
CPLD xc9500
 
PLDs
PLDsPLDs
PLDs
 
Analog to Digital converter in ARM
Analog to Digital converter in ARMAnalog to Digital converter in ARM
Analog to Digital converter in ARM
 
Gdi cell
Gdi cellGdi cell
Gdi cell
 
VLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALUVLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALU
 
Programmable array logic
Programmable array logicProgrammable array logic
Programmable array logic
 
I2C programming with C and Arduino
I2C programming with C and ArduinoI2C programming with C and Arduino
I2C programming with C and Arduino
 
Design and Fabrication of 4-bit processor
Design and Fabrication of  4-bit processorDesign and Fabrication of  4-bit processor
Design and Fabrication of 4-bit processor
 
fpga programming
fpga programmingfpga programming
fpga programming
 
Altera flex
Altera flexAltera flex
Altera flex
 
Ppt on interfacing led and 7 segment with 8951
Ppt on interfacing led and 7 segment with 8951 Ppt on interfacing led and 7 segment with 8951
Ppt on interfacing led and 7 segment with 8951
 
An application of 8085 register interfacing with LCD
An application  of 8085 register interfacing with LCDAn application  of 8085 register interfacing with LCD
An application of 8085 register interfacing with LCD
 
Lpc2148 i2c
Lpc2148 i2cLpc2148 i2c
Lpc2148 i2c
 
Programmable Logic Devices Plds
Programmable Logic Devices PldsProgrammable Logic Devices Plds
Programmable Logic Devices Plds
 
Interfacing to the analog world
Interfacing to the analog worldInterfacing to the analog world
Interfacing to the analog world
 
Report no.5(microprocessor)
Report no.5(microprocessor)Report no.5(microprocessor)
Report no.5(microprocessor)
 

Viewers also liked

Embedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals masterEmbedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals masterHossam Hassan
 
Public Seminar_Final 18112014
Public Seminar_Final 18112014Public Seminar_Final 18112014
Public Seminar_Final 18112014Hossam Hassan
 
On Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The SciencesOn Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The SciencesHossam Hassan
 
multi standard multi-band receivers for wireless applications
multi standard  multi-band receivers for wireless applicationsmulti standard  multi-band receivers for wireless applications
multi standard multi-band receivers for wireless applicationsHossam Hassan
 
Assistencia geologica
Assistencia geologicaAssistencia geologica
Assistencia geologicacrom68
 
Embedded linux barco-20121001
Embedded linux barco-20121001Embedded linux barco-20121001
Embedded linux barco-20121001Marc Leeman
 
The move from a hardware centric design to a software centric design: GStream...
The move from a hardware centric design to a software centric design: GStream...The move from a hardware centric design to a software centric design: GStream...
The move from a hardware centric design to a software centric design: GStream...Marc Leeman
 
sasikumarj_resume
sasikumarj_resumesasikumarj_resume
sasikumarj_resumeSasi Kumar
 
OTT in Azerbaijan - Project Brief
OTT in Azerbaijan - Project BriefOTT in Azerbaijan - Project Brief
OTT in Azerbaijan - Project BriefFarhad Shahrivar
 
Ensoft dvb 1
Ensoft dvb 1Ensoft dvb 1
Ensoft dvb 1sarge
 
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1Hossam Hassan
 
Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar
 Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar
Capria no_video_ship_detection_with_dvbt_software_defined_passive_radargrssieee
 
Buildin a small linux kernel
Buildin a small linux kernelBuildin a small linux kernel
Buildin a small linux kerneltrx2001
 
Standard java coding convention
Standard java coding conventionStandard java coding convention
Standard java coding conventionTam Thanh
 
10 ways hardware engineers can make software integration easier
10 ways hardware engineers can make software integration easier10 ways hardware engineers can make software integration easier
10 ways hardware engineers can make software integration easierChris Simmonds
 
Dot matrix display design using fpga
Dot matrix display design using fpgaDot matrix display design using fpga
Dot matrix display design using fpgaHossam Hassan
 

Viewers also liked (20)

Embedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals masterEmbedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals master
 
Public Seminar_Final 18112014
Public Seminar_Final 18112014Public Seminar_Final 18112014
Public Seminar_Final 18112014
 
On Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The SciencesOn Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The Sciences
 
multi standard multi-band receivers for wireless applications
multi standard  multi-band receivers for wireless applicationsmulti standard  multi-band receivers for wireless applications
multi standard multi-band receivers for wireless applications
 
Assistencia geologica
Assistencia geologicaAssistencia geologica
Assistencia geologica
 
Embedded linux barco-20121001
Embedded linux barco-20121001Embedded linux barco-20121001
Embedded linux barco-20121001
 
The move from a hardware centric design to a software centric design: GStream...
The move from a hardware centric design to a software centric design: GStream...The move from a hardware centric design to a software centric design: GStream...
The move from a hardware centric design to a software centric design: GStream...
 
Linux Workshop , Day 3
Linux Workshop , Day 3Linux Workshop , Day 3
Linux Workshop , Day 3
 
sasikumarj_resume
sasikumarj_resumesasikumarj_resume
sasikumarj_resume
 
How To Handle An IRD Audit - Atainz
How To Handle An IRD Audit - AtainzHow To Handle An IRD Audit - Atainz
How To Handle An IRD Audit - Atainz
 
OTT in Azerbaijan - Project Brief
OTT in Azerbaijan - Project BriefOTT in Azerbaijan - Project Brief
OTT in Azerbaijan - Project Brief
 
Ensoft dvb 1
Ensoft dvb 1Ensoft dvb 1
Ensoft dvb 1
 
DVB-T/H Solution
DVB-T/H  SolutionDVB-T/H  Solution
DVB-T/H Solution
 
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
 
Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar
 Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar
Capria no_video_ship_detection_with_dvbt_software_defined_passive_radar
 
Buildin a small linux kernel
Buildin a small linux kernelBuildin a small linux kernel
Buildin a small linux kernel
 
Standard java coding convention
Standard java coding conventionStandard java coding convention
Standard java coding convention
 
Embedded Linux
Embedded LinuxEmbedded Linux
Embedded Linux
 
10 ways hardware engineers can make software integration easier
10 ways hardware engineers can make software integration easier10 ways hardware engineers can make software integration easier
10 ways hardware engineers can make software integration easier
 
Dot matrix display design using fpga
Dot matrix display design using fpgaDot matrix display design using fpga
Dot matrix display design using fpga
 

Similar to Ultra-Low Power Asynchronous Logic Wireless Sensor Network

Design of -- Two phase non overlapping low frequency clock generator using Ca...
Design of -- Two phase non overlapping low frequency clock generator using Ca...Design of -- Two phase non overlapping low frequency clock generator using Ca...
Design of -- Two phase non overlapping low frequency clock generator using Ca...Prashantkumar R
 
Enhancing the Performance of WSN
Enhancing the Performance of WSNEnhancing the Performance of WSN
Enhancing the Performance of WSNDheeraj Kumar
 
Wireless power theft monitoring
Wireless power theft monitoringWireless power theft monitoring
Wireless power theft monitoringBiswajit Pratihari
 
Low-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable ComputingLow-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable ComputingOmar Elshal
 
design and analysis of voltage controlled oscillator
design and analysis of voltage controlled oscillatordesign and analysis of voltage controlled oscillator
design and analysis of voltage controlled oscillatorvaibhav jindal
 
Zigbee based trolley cart access system using rfid
Zigbee based trolley cart access system using rfidZigbee based trolley cart access system using rfid
Zigbee based trolley cart access system using rfidSam Joey
 
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iAHC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iASaurabh Dighe
 
Numerical Relaying.pptx
Numerical Relaying.pptxNumerical Relaying.pptx
Numerical Relaying.pptxrohith650557
 
Silicon to software share
Silicon to software shareSilicon to software share
Silicon to software shareNarendra Patel
 
MC Lecture 9234455566667777777777777.pptx
MC Lecture 9234455566667777777777777.pptxMC Lecture 9234455566667777777777777.pptx
MC Lecture 9234455566667777777777777.pptxBinyamBekeleMoges
 
60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface System60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface SystemGaurav Jaina
 

Similar to Ultra-Low Power Asynchronous Logic Wireless Sensor Network (20)

unit-iv-wireless-sensor-networks-wsns-and-mac-protocols
unit-iv-wireless-sensor-networks-wsns-and-mac-protocols unit-iv-wireless-sensor-networks-wsns-and-mac-protocols
unit-iv-wireless-sensor-networks-wsns-and-mac-protocols
 
Low power
Low powerLow power
Low power
 
It 2013-14-176
It 2013-14-176It 2013-14-176
It 2013-14-176
 
Design of -- Two phase non overlapping low frequency clock generator using Ca...
Design of -- Two phase non overlapping low frequency clock generator using Ca...Design of -- Two phase non overlapping low frequency clock generator using Ca...
Design of -- Two phase non overlapping low frequency clock generator using Ca...
 
Power Gating
Power GatingPower Gating
Power Gating
 
Fingerprint Biometrics
Fingerprint BiometricsFingerprint Biometrics
Fingerprint Biometrics
 
Enhancing the Performance of WSN
Enhancing the Performance of WSNEnhancing the Performance of WSN
Enhancing the Performance of WSN
 
Wireless power theft monitoring
Wireless power theft monitoringWireless power theft monitoring
Wireless power theft monitoring
 
Low-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable ComputingLow-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable Computing
 
design and analysis of voltage controlled oscillator
design and analysis of voltage controlled oscillatordesign and analysis of voltage controlled oscillator
design and analysis of voltage controlled oscillator
 
Zigbee based trolley cart access system using rfid
Zigbee based trolley cart access system using rfidZigbee based trolley cart access system using rfid
Zigbee based trolley cart access system using rfid
 
VLSI Power Reduction
VLSI Power ReductionVLSI Power Reduction
VLSI Power Reduction
 
SoC Power Reduction
SoC Power ReductionSoC Power Reduction
SoC Power Reduction
 
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iAHC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
 
Sem
SemSem
Sem
 
Numerical Relaying.pptx
Numerical Relaying.pptxNumerical Relaying.pptx
Numerical Relaying.pptx
 
Wireless Sensor Networking
Wireless Sensor NetworkingWireless Sensor Networking
Wireless Sensor Networking
 
Silicon to software share
Silicon to software shareSilicon to software share
Silicon to software share
 
MC Lecture 9234455566667777777777777.pptx
MC Lecture 9234455566667777777777777.pptxMC Lecture 9234455566667777777777777.pptx
MC Lecture 9234455566667777777777777.pptx
 
60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface System60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface System
 

More from Hossam Hassan

Software hardware co-design using xilinx zynq soc
Software hardware co-design using xilinx zynq socSoftware hardware co-design using xilinx zynq soc
Software hardware co-design using xilinx zynq socHossam Hassan
 
Introduction to digital signal processing 2
Introduction to digital signal processing 2Introduction to digital signal processing 2
Introduction to digital signal processing 2Hossam Hassan
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis toolsHossam Hassan
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumHossam Hassan
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentationHossam Hassan
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms masterHossam Hassan
 

More from Hossam Hassan (7)

Software hardware co-design using xilinx zynq soc
Software hardware co-design using xilinx zynq socSoftware hardware co-design using xilinx zynq soc
Software hardware co-design using xilinx zynq soc
 
Introduction to digital signal processing 2
Introduction to digital signal processing 2Introduction to digital signal processing 2
Introduction to digital signal processing 2
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis tools
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentation
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms master
 
hot research topics
hot research topicshot research topics
hot research topics
 

Ultra-Low Power Asynchronous Logic Wireless Sensor Network

  • 1. Prepared and Presented By: Hossam Hassan MSIS LAB, CBNU An Ultra-Low Power Asynchronous-Logic In-Situ Self- Adaptive VDD System for Wireless Sensor Networks Authors: Tong Lin, Kwen-Siong Chong, Joseph S. Chang, and Bah-Hwee Gwee Journal: IEEE Journal of Solid-State Circuits, vol. 48, no. 2, 2013
  • 2. Outline • Preliminaries • Wireless Sensor Network • Node Architecture • Proposed Idea for Low Power Design • Self-Adaptive VDD System for Wireless Sensor Networks • Adaptive Vdd Scaling Systems • System Design • Results And Benchmarking
  • 3. Preliminaries • What Is Asynchronous Logic? • Traditional way of Sequencing and Computation is the use of a global time reference (“the clock”) • Can we compute without a clock? • Yes!: “asynchronous” or “clockless” logic • Also “self-timed” or “speed-independent” • Asynchronous system: collection of modules communicating by handshake protocols • Can we compute without a clock and without delay assumptions? • Quasi-delay-insensitive (QDI) logic Adopted from: Alain J. Martin, California Institute of Technology
  • 4. Preliminaries • Why Asynchronous and QDI Logic? • No clock • Up to 50% of clock power recuperated (get back) • Automatic shut-off of idle parts • Perfect clock gating • No glitches (spurious transitions) • Up to 50% of power in combinational circuits • Automatic adaptation to parameter’s variations • Voltage scaling: Perfect exchange of delay against energy through voltage scaling • Flexibility of asynchronous interfaces: • Better use of concurrency • Robustness to PVT Variations: Variations of physical parameters all affect timing. Adopted from: Alain J. Martin, California Institute of Technology
  • 5. Preliminaries • Disadvantages of Async • Size overhead (more transistors) (i.e. Handshaking) • Poorly understood and rarely taught • No industrial CAD tools (yet) (i.e. Custom Design) • No well-developed testing procedure (yet) (i.e. Custom Design)
  • 7. Preliminaries • NULL Convention Logic • NCL is a delay-insensitive (DI) asynchronous (i.e. clockless) paradigm, which means that NCL circuits will operate correctly regardless of when circuit inputs become available; therefore NCL circuits are said to be correct by-construction (i.e. no timing analysis is necessary for correct operation). NCL circuits utilize dual-rail or quad-rail logic to achieve delay- insensitivity.
  • 8. Preliminaries • Pre-Charge Static Logic (PCSL): • It is an asynchronous-logic Quasi-Delay-Insensitive architecture based on Static-Logic, featuring fully-range Dynamic Voltage Scaling including robust operation in the sub-threshold voltage regime, with simultaneous low hardware overheads, high-speed and yet low power dissipation. • The PCSL logic circuit achieves this by integration of the Request sub-circuit into the Static-Logic cell. • During the initial phase, the output of Static-Logic cell (within the PCSL logic circuit) is pre-charged. • During the evaluate phase, the Static-Logic cell computes the input and the PCSL logic circuit outputs the computation. Enable the circuit State Retention (i.e store the logic output value) Pre-Charged Static-Logic (PCSL) architecture
  • 9. Preliminaries • Muller C-elements: • It is a small digital block widely used in design of asynchronous circuits and systems. • In a Synchronous Circuit, the role of the clock is to define points in time where signals are stable and valid. In between the clock ticks, signals may exhibit hazards and may make multiple transitions as combo circuit stabilizes. • In Asynchronous System, situation is different. The absence of clock means signals are valid all the time, every transition has a meaning and consequently any hazard and races must be avoided. Muller C Element and corresponding CMOS implementation. Truth Table for Muller C Element
  • 10. Preliminaries • Filter bank • In signal processing, a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency sub-band of the original signal. • The process of decomposition performed by the filter bank is called analysis (meaning analysis of the signal in terms of its components in each sub-band); the output of analysis is referred to as a sub-band signal with as many sub-bands as there are filters in the filter bank. • The reconstruction process is called synthesis, meaning reconstitution of a complete signal resulting from the filtering process.
  • 11. Preliminaries • Frequency Response Masking (FRM): • Frequency-response masking filters are a technique to design sharp low-pass, high-pass, bandpass and band-stop filters with arbitrary passband bandwidth. • furthermore linear phase FIR filters are generated, which have advantages such as guaranteed stability and are free of phase distortion. • however, the problem with FIR filters is the high complexity for sharp filters • with the frequency-response masking technique the resulting filter has very sparse coefficients • since only a very small fraction of its coefficient values are nonzero, its complexity is very much lower than the infinite word-length minimax optimum filter • with an additional multiplier-less design method the complexity is reduced to a minimum • in linear phase FIR filters phase is a linear function of frequency • they have a symmetric impulse response
  • 12. Preliminaries • Dynamic frequency scaling • It is a technique in computer architecture whereby the frequency of a microprocessor can be automatically adjusted "on the fly", either to conserve power or to reduce the amount of heat generated by the chip. • It is commonly used in laptops and other mobile devices, where energy comes from a battery and thus is limited. • Dynamic voltage scaling: • It is another power conservation technique that is often used in conjunction with frequency scaling, as the frequency that a chip may run at is related to the operating voltage. • Since increasing power use may increase the temperature, increases in voltage or frequency may increase system power demands.
  • 14. Wireless Sensor Network • Spatially distributed autonomous sensors • Monitor physical or environmental conditions • Temperature, sound, etc. • Pass their data through the network to a main location • Modern networks are bi-directional, also enabling control of sensor activity • Applications • Battlefield surveillance • Industrial process monitoring
  • 15. Wireless Sensor Network • The WSN is built of "nodes“ • a few to several hundreds or even thousands • each node is connected to one (or sometimes several) sensors • Each such sensor network node has typically several parts • a radio transceiver • a microcontroller • an electronic circuit for interfacing with the sensors • an energy source, usually a battery • As the WSN is typically designed for multiple-year operational life-span, power is carefully budgeted and where pertinent, energized only when required, such that the overall average power is typically 10–100 uW. • Achieve the lowest possible power operation for the prevailing throughput and circuit conditions—VDD adjusted to within 50 mV of the minimum voltage, yet high operational robustness with minimal overheads for a WSN.
  • 17. Proposed Idea for Low Power Design • Signal processor accounts for ~50% of total power consumption • ‘Sub-threshold Self-Adaptive Scaling’ (SSAVS) • Circuits work in sub-threshold region • Supply voltage is adjusted dynamically depending on the processing speed required by external environment • Adopting the Quasi-Delay-Insensitive (QDI) asynchronous-logic protocols where the circuits therein are self-timed, • Embodiment of Subthreshold Pre-Charged-Static- Logic (PCSL) design approach. • Async SSAVS system has been benchmarked against its conventional sync DVFS system counterpart.
  • 18. Proposed Idea for Low Power Design • Asynchronous logic implementation • Pre-charged Static Logic (PCSL) • Superior than existing asynchronous logics in energy, delay and chip area.
  • 19. Self-Adaptive VDD System for Wireless Sensor Networks • As the WSN is typically designed for multiple-year operational life-span, power is carefully budgeted and where pertinent, energized only when required, such that the overall average power is typically 10–100 uW. • In our WSN depicted in Fig. 1, its overall active/passive operation ratio is approximately 20/80. In the passive mode, only the Sensor Front-End module is continuously energized. The Sensor and the Conditioning Circuits therein are powered directly by VDD_BAT ( 2.8 V) battery, via a Low- Dropout (LDO) Regulator. • The Simple Processor is powered by VDD_NOM (1.2 V) via a power-efficient Buck DC-DC Converter. • The Simple Processor ascertains if the input is possibly useful, and if it is, the WSN goes into active mode where it signals the Power Management module to energize the Signal Processor module via VDD_ADJ .
  • 20. Self-Adaptive VDD System for Wireless Sensor Networks • The voltage of VDD_ADJ, typically in the sub-threshold voltage (sub-Vt) range, is self-adjusted such that the lowest possible voltage is used—to enable ultra-low power operation. • Signal Processor Module: • The Signal Processor module buffers (via a FIFO) the output of the Simple Processor, filters the output signal before final computation by the Microcontroller Unit (MCU). • When the MCU ascertains that the filtered signal is useful, the Wireless Transceiver is energized and the processed signal is subsequently transmitted wirelessly. • With the wireless transmission expected to be 0.01% active and with a 20/80 WSN active/passive operation, 50% of the overall power is attributed to the Signal Processor module, which is of interest in terms of power dissipation.
  • 21. Self-Adaptive VDD System for Wireless Sensor Networks • The approaches taken to minimize power involve all levels of the design space including algorithmic design and at the hardware level. • Frequency Response Masking (FRM) technique • In the algorithmic design, the filtering in the Signal Processor module embodies the Frequency Response Masking (FRM) technique. • This involves the Interpolated Finite Impulse Response (IFIR) Filter and the FRM Filter Bank (FB), and is computationally more efficient than the usual FIR and IIR filter approaches. • Ultra-low power design techniques in the hardware level, the operation in the sub- region is one of the most effective. • This is particularly applicable because the speed of the digital circuits in the Signal Processor is modest—the clocking speed ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from 0.1 kSamples/s (kS/s) to 100 kS/s.
  • 22. Self-Adaptive VDD System for Wireless Sensor Networks • Despite the potential advantages of sub- operation, this region of operation is challenging here for several reasons. • First, the WSN is designed to work in a wide range of conditions, including extreme environments (-55o C to +125o C). • Second, Process, Voltage and Temperature (PVT) variations for fine-dimensioned CMOS processes increase dramatically in sub- operation, and the ensuing delay variations are very severe, possibly intractable. Typically, a very large delay safety margin (for synchronous-logic (sync) circuits) would need to be allowed for. • Third, the input signal to the Signal Processor module is variable. From a robust operation perspective, the circuits would need to be designed to meet the worst-case conditions— the fastest input rate and extreme temperatures. • To design the WSN for ultra-low power operation, a self-adjusting VDD approach whilst operating in the sub-Vt region, termed ‘Sub-threshold Self-Adaptive VDD Scaling’ (SSAVS) where the VDD is in-situ dynamically self-adjusted is adopted.
  • 23. Self-Adaptive VDD System for Wireless Sensor Networks • The operation involves ‘dialing up’ VDD when the need for computation increases or when the operating conditions are less favorable, and VDD is ‘dialed-down’ when the conditions are the converse. • Put simply, the lowest VDD is used where possible because in general the lower the VDD, the lower is the power dissipation due to dynamic and leakage currents. • The novel self-adjustment is obtained very simply—by exploiting (and comparing) the existing Request and Acknowledge signals of the QDI protocol signaling, and thereafter adjusting the VDD_ADJ accordingly. The ensuing overhead is hence very low.
  • 24. Adaptive Vdd Scaling Systems • The general modality of adaptive VDD scaling systems to reduce power is to adaptively adjust as low as possible (with appropriate timing margin) to meet the throughput requirement for the prevailing operating conditions (including PVT variations). • This largely requires the pertinent circuit delay variations to be tracked, observed, or inferred. • There are many reported techniques, but it can be argued that these reported tracked, observed and inferred techniques are inadequate in terms of robustness, particularly in sub-Vt operation. Further, the hardware/computation overheads are considerable, including the need to scale VDD with the scaling of the clock frequency, i.e. Dynamic Voltage Frequency Scaling (DVFS). • The proposed idea directly measuring the delay and comparing it against the throughput for the prevailing conditions, and VDD is thereafter adjusted accordingly. • To enable this, the adoption of the self-timed async QDI where its dual-rail encoding includes the Request signal which indicates that the input sample is ready and the Acknowledge signal that indicates the completion of the computation.
  • 25. Adaptive Vdd Scaling Systems • By counting the number of Requests against Acknowledges within a given period, we ascertain if the delay of the circuit is excessive, or otherwise, with respect to the throughput for the prevailing conditions. • VDD is thereafter adjusted accordingly such that the delay is just slightly less than the delay between input samples, thereby satisfying the throughput. • Further, as Acknowledges is inherent in QDI async protocols, the computation is uninterrupted while VDD is transitioning during its self-adjustment; in reported adaptive scaling systems, circuit operation typically ceases when is transitioning.
  • 26. System Design • Fig. 2 depicts the proposed SSAVS system within the Power Management module embodying the SSAVS Controller and its associated adjustable VDD means (a Buck DC-DC Converter), and the PCSL-based 8x8- Bit Quad-Channel Async QDI FRM FB within the FRM FB. • There are two voltage rails in the overall proposed SSAVS system a fixed VDD_NOM and a variable VDD_ADJ whose sub-Vt voltage typically ranges from 150mV to 400mV. • For ease of illustration, the specific VDD rail is shown in parenthesis for the supply rails and for signals of the various modules.
  • 27. System Design • In Fig. 2, the voltage of input and of request signals is first adjusted from VDD_NOM =1.2 V to VDD_ADJ by the Step-Down Level Converter, and are thereafter buffered by the Async FIFO Buffer (depth of 50) before input (Input_FB and Req_FB) to the async FRM FB. • The FB outputs (Output 1–4) and their associated Acknowledges (combined from Ack 1–4 via the Completion Detection Circuit) are output to the MCU for further processing. • Acknowledge is also fed back to the Async FIFO Buffer. • The Request and Acknowledge signals are input to the Power Management module, and Acknowledge is stepped up from VDD_ADJ to VDD_NOM. • The SSAVS Controller within the Power Management module monitors the number of requests and Acknowledge signals in each period (a 10 Hz clock generated by the Update VDD Clock Generator for a target throughput of 1 kS/s). • The VDD_Code is a 5-bit code that sets one of 24 voltage levels (in the Buck DC-DC Converter) ranging from ‘00000’=50 mV to ‘10111’=1.2 V (in 50 mV steps) for VDD_ADJ.
  • 28. System Design • Fig. 3 graphically depicts an example of the self-adjustment of VDD_ADJ. • When the WSN is first initiated, the SSAVS Controller outputs VDD_Code = ‘10111’, equivalently VDD_ADJ = 1.2 V, and the speed of the FB would far exceed the required computation. • The voltage of VDD_ADJ of the FB is in-situ adaptively self-adjusted to be as low as possible (within 50 mV) to meet the throughput for the prevailing operating conditions, and on average, the voltage of VDD_ADJ is slightly higher than the actual required minimum. • Hence, the FB is ultra-low power and highly power-efficient.
  • 29. System Design • In view of the need for sub-Vt operation, it is imperative to adopt circuits based on the static-logic family to mitigate the effects of critical transistor sizing; dynamic- and pass-logic families are inappropriate. • Pre-Charged Static-Logic’ (PCSL). • The basic architecture comprises an Inverting Static-Logic Cell, three transistors (for output pre-charging during the reset phase/evaluation during the computation phase), and two inverters (for output buffering). The outputs are Q.T (Output True) and Q.F (Output False). The basic architecture of the proposed async cells, coined ‘Pre-Charged Static-Logic’ (PCSL).
  • 30. System Design • In PCSL cells, when Request is ‘0’, both outputs are ‘0’. On the other hand, when Request is ‘1’ (indicating that an operation is ready) and when the input signals are valid, the operation commences and an ensuing output is obtained. • The architecture of the PCSL cell involves an integration of the sub-circuit associated with the signal and a buffer (to each output) into the standard static-logic library cell (redesigned for dual- rail async), thereby sharing of (common) transistors. • This reduces the number of transistors, resulting in simultaneous lower power/energy dissipation, faster speed and smaller IC area.
  • 31. System Design • To depict the hardware advantage of the proposed PCSL approach, the 2-input AND/NAND gate in can be compared to the same gate realized by three reported static-logic QDI approaches: a) Delay-Insensitive- Minterm-Synthesis (DIMS) approach b) NULL Convention Logic (NCL) with complex gates (denoted NCL1), and c) NCL with fast-reset complex gates denoted NCL2).
  • 32. System Design • On the basis of simulations (130 nm CMOS), delay and IC area of six basic cells of the various approaches. The competing cells are normalized to the PCSL cells whose actual values are shown within parentheses. The average attributes are tabulated in the last row. • Cells embodying the proposed PCSL approach simultaneously exhibit the lowest , shortest delay and smallest IC area.
  • 33. System Design • With the proposed PCSL QDI realization approach, an 8x8-Bit Quad-Channel Async QDI FRM FB is designed. • A semicustom design flow is adopted. • Each FB channel comprises an Async Read/Write Controller, an 8x8-Bit Coefficient Memory, an 8x8-Bit Data Memory, an 8-Bit PCSL Multiplier, and a 20-Bit PCSL Adder. • To preserve the QDI protocol and proper async handshaking, Datapath Completion Detection (DCD) and Latch Completion Detection (LCD) circuits are included with Muller C-elements (denoted by a ‘C’). Latch Completion Detection (LCD) Datapath Completion Detection (DCD)
  • 34. Scenario 1, the sync DVFS system embodies a temperature sensor and on the basis of the measured temperature and pre- characterization of the sync filter, the clocking frequency is selected accordingly. RESULTS AND BENCHMARKING
  • 35. Scenario 2, the sync DVFS system is much simpler where the clocking frequency is fixed (to the worst-case) to accommodate all conditions. RESULTS AND BENCHMARKING
  • 36. • Scenario 1, no specific FB is particularly advantageous—the sync DVFS FB and async SSAVS FB are advantageous in different conditions. • Nevertheless, the sync FB may be disadvantageous if the temperature sensor overheads associated with DVFS for Scenario 1 are considered. • In Scenario 2, the async FB is advantageous in terms of reduced delay with respect to VDD, usually lower Eper with respect to VDD, and in terms of power dissipation, advantageous in some conditions (while the sync advantageous in other conditions). • Further, in the context of continuous circuit operation and overheads associated with DVS, the proposed SSAVS is advantageous over the conventional DVFS in terms of uninterrupted circuit operation and not requiring external intervention (such as changing clock rate, pre- characterization, etc.). Results And Benchmarking

Editor's Notes

  1. Can we compute without a clock and without delay assumptions? Quasi-delay-insensitive (QDI) logic
  2. ‹ There is another class of logic gates which relies on the use of a clock signal. This class of circuit is known as dynamic circuits. The clock signal is used to divide the gate operation into two halves. In the first half, the output node is pre-charged to a high or low logic state. In the second half of a clock cycle, the circuit evaluates the correct output state. ‹ When Ø is low, Z is charged to high. When Ø is high, n logic block evaluates input, and conditionally discharges Z. This circuit adds series resistance to the pull-down n-channel transistor, therefore the fall time is increased slightly. ‹ This circuit is dynamic because during evaluation, the output high level at Z is maintained by the stray capacitance at the output node. If Ø stays high (i.e. evaluation period) for a long time, Z may eventually discharge to a low logic level.
  3. In a Synchronous Circuit, the role of the clock is to define points in time where signals are stable and valid. In between the clock ticks, signals may exhibit hazards and may make multiple transitions as combo circuit stabilizes. In Asynchronous System, situation is different. The absence of clock means signals are valid all the time, every transition has a meaning and consequently any hazard and races must be avoided. In the synchronous world, OR Gate only indicates that both inputs are LOW, when HIGH it does not indicate which one signal made a transition. Similarly AND gate only indicates when both inputs are HIGH but does not indicate which one does LOW when the output of AND gate is LOW. Knowing this transition is very important for Asynchronous circuits as these transitions may have a reverse impact or hazard/ Race condition and should be avoided. So a better circuit in this respect is Muller C Element shown in Figure 2.
  4. Dynamic voltage scaling is a power management technique in computer architecture, where the voltage used in a component is increased or decreased, depending upon circumstances. Dynamic voltage scaling to increase voltage is known as overvolting; dynamic voltage scaling to decrease voltage is known as undervolting. Undervolting is done in order to conserve power, particularly in laptops and other mobile devices,[1] where energy comes from a battery and thus is limited, or in rare cases, to increase reliability. Overvolting is done in order to increase computer performance.
  5. The approaches taken to minimize power involve all levels of the design space including algorithmic design and at the hardware level. In the former, the filtering in the Signal Processor module embodies the Frequency Response Masking (FRM) technique [4]. This involves the Interpolated Finite Impulse Response (IFIR) Filter and the FRM Filter Bank (FB), and is computationally more efficient than the usual FIR and IIR filter approaches. Ultra-low power design techniques in the latter are extensively reported in literature [5]–[15] and of these, operation in the sub- region is one of the most effective. This is particularly applicable here because the speed of the digital circuits in the Signal Processor is modest—the clocking speed ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from 0.1 kSamples/s (kS/s) to 100 kS/s.
  6. The modus operandi involves ‘dialing up’ when the need for computation increases or when the operating conditions are less favorable, and is ‘dialed-down’ when the conditions are the converse. Put simply, the lowest is used where possible because in general the lower the , the lower is the power dissipation due to dynamic and leakage currents. In this paper, we describe an SSAVS system for the Signal Processor module in a WSN based on a proposed methodology within the Quasi-Delay-Insensitive (QDI) asynchronous-logic (async) approach [6], [12], [14], [16], and with a novel in-situ self-adjusting means. The proposed design methodology, coined ‘Pre-Charged Static-Logic’ (PCSL) [17], is essentially a static-logic library cell architecture that exploits the fast reset feature and is appropriate for full-range Dynamic Voltage Scaling (DVS) [18]—for ranging from nominal voltage to deep sub- . The proposed SSAVS system for the WSN is demonstrated by means of application to the FRM FB. The novel self-adjustment is obtained very simply—by exploiting (and comparing) the existing Request and Acknowledge signals of the QDI protocol signaling, and thereafter adjusting the accordingly (see Section III later). The ensuing overhead is hence very low. This paper is organized as follows. Section II reviews adaptive scaling systems. Section III presents the design of the proposed system. Section IV presents the measurement results of prototype ICs and benchmarking thereof. Finally, conclusions are drawn in Section V.
  7. The general modality of adaptive scaling systems to reduce power is to adaptively adjust as low as possible (with appropriate timing margin) to meet the throughput requirement for the prevailing operating conditions (including PVT variations). This largely requires the pertinent circuit delay variations to be tracked, observed, or inferred. A reported delay tracking technique is based on a Look-Up Table [19], [20] comprising tabulated pre-characterized throughput versus data according to critical path circuit delay(s) under worst-case PVT conditions for the given throughput. To avoid excessive timingmargins, Statistical Static Timing Analysis [19] may be employed mostly to account for local (within-die) variations. Another reported technique [21] attempts to track real-time variations by adding PVT sensors. However, in sub- operation, because of the exponential relationship of sub- delay with PVT, even small errors in these sensor readings could lead to large circuit delay uncertainties, and the overheads associated with the sensors may defeat any advantage. The reported critical path delay matching [22]–[26] involves a ring oscillator matched to the critical path delay to set the clock frequency, and is subsequently adjusted. For improved matching, the entire logic of the critical path may be replicated at high hardware cost [24]. Although this may be able to mitigate the delay uncertainties issues associated with global PVT variations, it may not comprehensively account for local variations, particularly in sub- operation. Another reported technique employs timing error detection/correction [27]–[30], where VDD is reduced until the ensuing computation is erroneous. VDD is thereafter increased and the computation repeated. The applicability of this technique is arguably limited due to the severe/intractable PVT variations in suboperation, to possibly severe meta-stability issues due to the lack of timing margin, and to the need for re-computations. Another reported technique [31], [32] attempts to ascertain the circuit delay indirectly by measuring the variations in the supply current drawn to infer the ‘duration’ of the computation, and subsequently adjusted. This technique is likely to be ambiguous in sub- operation where the ratio of the current during computation to idle is small. On the basis of the aforesaid review, it can be argued that these reported tracked, observed and inferred techniques are inadequate in terms of robustness, particularly in sub-Vt operation. Further, the hardware/computation overheads are considerable, including the need to scale with the scaling of the clock frequency, i.e. Dynamic Voltage Frequency Scaling (DVFS). We instead propose a definitive means by directly measuring the delay and comparing it against the throughput for the prevailing conditions, and is thereafter adjusted accordingly. To enable this, we adopt the self-timed async QDI (vis-à-vis the conventional sync) where its dual-rail encoding includes the Request signal which indicates that the input sample is ready and the Acknowledge signal that indicates the completion of the computation.
  8. By counting the number of Requests against Acknowledges within a given period, we ascertain if the delay of the circuit is excessive, or otherwise, with respect to the throughput for the prevailing conditions. VDD is thereafter adjusted accordingly such that the delay is just slightly less than the delay between input samples, thereby satisfying the throughput. Further, as is inherent in QDI async protocols, the computation is uninterrupted while is transitioning during its self-adjustment; in reported adaptive scaling systems, circuit operation typically ceases when is transitioning [20]. Of specific interest, note that the delay is definitive because the delay is that ascertained for the prevailing operating conditions, and we will show later that the associated hardware to adjust is very modest. At this juncture, to the best of our knowledge, ultra-low power QDI circuits with self-adaptive , operating in the sub- region and in extreme environments (hence requiring extremely high reliability), have yet to be reported or demonstrated. Further it would be interesting to compare their attributes, including IC area, delay, energy/operation and power dissipation, against their conventional sync DVFS counterpart and under various conditions (see Section IV later).
  9. FB within the FRM FB. There are two voltage rails in the overall proposed SSAVS system: a fixed and a variable whose sub- voltage typically ranges from 150mV to 400mV. For ease of illustration, the specific rail is shown in parenthesis for the supply rails and for signals of the various modules. In Fig. 2, the voltage of and of signals is first adjusted from to by the Step-Down Level Converter, and are thereafter buffered by the Async FIFO Buffer (depth of 50) before input ( and ) to the async FRM FB. The FB outputs ( 1–4) and their associated (combined from 1–4 via the Completion Detection Circuit) are output to theMCU for further processing. is also fed back to the Async FIFO Buffer. The and signals are input to the Power Management module, and is stepped up from to . The SSAVS Controller within the Power Management module monitors the number of and signals in each period (a 10 Hz clock generated by the Update Clock Generator for a target throughput of 1 kS/s). The is a 5-bit code that sets one of 24 voltage levels (in the Buck DC-DC Converter) ranging from ' to ‘ (in 50 mV steps) for .
  10. and the speed of the FB would far exceed the required computation. In this scenario, the number of FB clocks will be equal to the number of clocks in each period. In the next period, the SSAVS Controller will subsequently decrement by 1 bit to ‘10110’ and correspondingly reduces by 50 mV to 1.15 V. The process continues where is continuously decremented as with the voltage of commensurably reduced. Eventually, at period in Fig. 3, is decremented to ‘00010’, equivalently . This is the juncture where the speed of the FRM FB is just slightly slower than the data rate for the prevailing conditions—the number of clocks hence exceeds the number of clocks in one period. Although the speed of the FRM FB is slightly too slow, no error occurs because the unconsumed inputs are stored in the Async FIFO Buffer (Fig. 2). In the next period, , the SSAVS Controller reacts accordingly by incrementing by 1 bit to ‘00011’ and the corresponding increased by 50 mV to 200 mV. With increased, the speed of the FRM FB now slightly exceeds the required computation and the unconsumed inputs stored in the FIFO buffer are in turn computed at a slightly faster rate than the data rate. Consequently, the number of clocks is now less than the number of clocks and at the end of this
  11. cells, when is ‘0’, both outputs are ‘0’. On the other hand, when is ‘1’ (indicating that an operation is ready) and when the input signals are valid, the operation commences and an ensuing output is obtained. The architecture of the PCSL cell involves an integration of the subcircuit associated with the signal and a buffer (to each output) into the standard static-logic library cell (redesigned for dual-rail async), thereby sharing of (common) transistors. This reduces the number of transistors, resulting in simultaneous lower power/energy dissipation, faster speed and smaller IC area (see Table II later). On the basis of this architecture, Figs. 4(b)–(g) depict the schematic of six basic PCSL cells (all with 3-transistor limit in any stack). To depict the hardware advantage of the proposed PCSL approach, the 2-input AND/NAND gate in Fig. 4(b) can be compared to the same gate realized by three reported static-logic QDI approaches in Figs. 5(a)–(c): (a) Delay-Insensitive- Minterm-Synthesis (DIMS) approach [33], (b) NULL Convention Logic (NCL) with complex gates [34] (denoted NCL1), and (c) NCL with fast-reset complex gates [35] (denoted NCL2). On the basis of simulations (130 nm CMOS), Table II benchmarks , delay and IC area of the aforesaid six basic cells of the various approaches. The competing cells are normalized to the PCSL cells whose actual values are shown within parentheses. The average attributes are tabulated in the last row.
  12. It is apparent from Table II that the cells embodying the proposed PCSL approach feature the lowest , save the simple AND/NAND and OR/NOR gates of NCL1. On average, of cells embodying the reported DIMS, NCL1, and NCL2 approaches is significantly higher: 4.0 , 1.6 , and 1.9 respectively. It is also apparent that the cells embodying the proposed PCSL approach feature the shortest delay (the sum of two components, (computation phase) and (reset phase), averaged over all input combinations), save the simple AND/NAND and OR/NOR gates of NCL1. On average, the reported DIMS, NCL1, and NCL2 cells are significantly slower: 4.1 , 1.8 , and 1.9 respectively. It is also apparent that the cells embodying the proposed PCSL approach require the smallest IC area; the layouts are based on the standard-cell approach where the cell height is fixed at 4 and the cell width is in multiples of 0.4 . On average, the IC area required for cells embodying the reported DIMS, NCL1, and NCL2 approaches is significantly larger: 4.7 , 2.6 , and 2.7 respectively; from a perspective of dual-rail async and (single-rail) sync circuits, the smaller IC area is worthwhile because the IC area overhead of the former is somewhat mitigated. In short, cells embodying the proposed PCSL approach simultaneously exhibit the lowest , shortest delay and smallest IC area. With the proposed PCSL QDI realization approach, an 8 8-Bit Quad-Channel Async QDI FRM FB is designed. A semicustom design flow is adopted, where the front-end is designed using an assortment of in-house design tools and commercial synthesis tools based on a flow similar to NCL-X [34]. The back-end implementation, on the other hand, is based on commercial EDA tools with our customized library cells (including the proposed PCSL).
  13. For Scenario 1, we will use a (delay) point along the 3σ plot of the pertinent temperature and adjust that point for 10% VDD variation; the 10% VDD variation is congruous with the International Technology Roadmap for Semiconductors.
  14. For Scenario 1, we will use a (delay) point along the 3σ plot of the pertinent temperature and adjust that point for 10% VDD variation; the 10% VDD variation is congruous with the International Technology Roadmap for Semiconductors.