Ultra-Low Power Asynchronous Logic Wireless Sensor Network

Prepared and Presented By:
Hossam Hassan
MSIS LAB, CBNU
An Ultra-Low Power Asynchronous-Logic In-Situ Self-
Adaptive VDD System for Wireless Sensor Networks
Authors: Tong Lin, Kwen-Siong Chong, Joseph S. Chang, and Bah-Hwee Gwee
Journal: IEEE Journal of Solid-State Circuits, vol. 48, no. 2, 2013

Outline
• Preliminaries
• Wireless Sensor Network
• Node Architecture
• Proposed Idea for Low Power Design
• Self-Adaptive VDD System for Wireless Sensor Networks
• Adaptive Vdd Scaling Systems
• System Design
• Results And Benchmarking

Preliminaries
• What Is Asynchronous Logic?
• Traditional way of Sequencing and Computation is the use of a
global time reference (“the clock”)
• Can we compute without a clock?
• Yes!: “asynchronous” or “clockless” logic
• Also “self-timed” or “speed-independent”
• Asynchronous system: collection of modules communicating
by handshake protocols
• Can we compute without a clock and without delay
assumptions?
• Quasi-delay-insensitive (QDI) logic
Adopted from:
Alain J. Martin, California Institute of Technology

Preliminaries
• Why Asynchronous and QDI Logic?
• No clock
• Up to 50% of clock power recuperated (get back)
• Automatic shut-off of idle parts
• Perfect clock gating
• No glitches (spurious transitions)
• Up to 50% of power in combinational circuits
• Automatic adaptation to parameter’s variations
• Voltage scaling: Perfect exchange of delay against energy through voltage scaling
• Flexibility of asynchronous interfaces:
• Better use of concurrency
• Robustness to PVT Variations: Variations of physical parameters all affect timing.
Adopted from:
Alain J. Martin, California Institute of Technology

Preliminaries
• Disadvantages of Async
• Size overhead (more transistors) (i.e. Handshaking)
• Poorly understood and rarely taught
• No industrial CAD tools (yet) (i.e. Custom Design)
• No well-developed testing procedure (yet) (i.e. Custom Design)

Preliminaries
• Static Logic vs Dynamic Logic

Preliminaries
• NULL Convention Logic
• NCL is a delay-insensitive (DI) asynchronous (i.e. clockless) paradigm, which means that NCL
circuits will operate correctly regardless of when circuit inputs become available; therefore
NCL circuits are said to be correct by-construction (i.e. no timing analysis is necessary for
correct operation). NCL circuits utilize dual-rail or quad-rail logic to achieve delay-
insensitivity.

Preliminaries
• Pre-Charge Static Logic (PCSL):
• It is an asynchronous-logic Quasi-Delay-Insensitive architecture
based on Static-Logic, featuring fully-range Dynamic Voltage Scaling
including robust operation in the sub-threshold voltage regime,
with simultaneous low hardware overheads, high-speed and yet
low power dissipation.
• The PCSL logic circuit achieves this by integration of the Request
sub-circuit into the Static-Logic cell.
• During the initial phase, the output of Static-Logic cell (within the
PCSL logic circuit) is pre-charged.
• During the evaluate phase, the Static-Logic cell computes the input
and the PCSL logic circuit outputs the computation.
Enable the
circuit
State Retention
(i.e store the logic
output value)
Pre-Charged Static-Logic
(PCSL) architecture

Preliminaries
• Muller C-elements:
• It is a small digital block widely used in design of asynchronous circuits and systems.
• In a Synchronous Circuit, the role of the clock is to define points in time where signals are stable and valid. In
between the clock ticks, signals may exhibit hazards and may make multiple transitions as combo circuit
stabilizes.
• In Asynchronous System, situation is different. The absence of clock means signals are valid all the time, every
transition has a meaning and consequently any hazard and races must be avoided.
Muller C Element and corresponding
CMOS implementation.
Truth Table for Muller C
Element

Preliminaries
• Filter bank
• In signal processing, a filter bank is an array of band-pass filters that separates the input
signal into multiple components, each one carrying a single frequency sub-band of the
original signal.
• The process of decomposition performed by the filter bank is called analysis (meaning
analysis of the signal in terms of its components in each sub-band); the output of analysis is
referred to as a sub-band signal with as many sub-bands as there are filters in the filter bank.
• The reconstruction process is called synthesis, meaning reconstitution of a complete signal
resulting from the filtering process.

Preliminaries
• Frequency Response Masking (FRM):
• Frequency-response masking filters are a technique to design sharp low-pass, high-pass,
bandpass and band-stop filters with arbitrary passband bandwidth.
• furthermore linear phase FIR filters are generated, which have advantages such as
guaranteed stability and are free of phase distortion.
• however, the problem with FIR filters is the high complexity for sharp filters
• with the frequency-response masking technique the resulting filter has very sparse
coefficients
• since only a very small fraction of its coefficient values are nonzero, its complexity is very
much lower than the infinite word-length minimax optimum filter
• with an additional multiplier-less design method the complexity is reduced to a minimum
• in linear phase FIR filters phase is a linear function of frequency
• they have a symmetric impulse response

Preliminaries
• Dynamic frequency scaling
• It is a technique in computer architecture whereby the frequency of a microprocessor can be
automatically adjusted "on the fly", either to conserve power or to reduce the amount of
heat generated by the chip.
• It is commonly used in laptops and other mobile devices, where energy comes from a battery
and thus is limited.
• Dynamic voltage scaling:
• It is another power conservation technique that is often used in conjunction with frequency
scaling, as the frequency that a chip may run at is related to the operating voltage.
• Since increasing power use may increase the temperature, increases in voltage or frequency
may increase system power demands.

Wireless Sensor Network
• Spatially distributed autonomous sensors
• Monitor physical or environmental conditions
• Temperature, sound, etc.
• Pass their data through the network to a main location
• Modern networks are bi-directional, also enabling control of sensor activity
• Applications
• Battlefield surveillance
• Industrial process monitoring

Wireless Sensor Network
• The WSN is built of "nodes“
• a few to several hundreds or even thousands
• each node is connected to one (or sometimes several) sensors
• Each such sensor network node has typically several parts
• a radio transceiver
• a microcontroller
• an electronic circuit for interfacing with the sensors
• an energy source, usually a battery
• As the WSN is typically designed for multiple-year operational life-span,
power is carefully budgeted and where pertinent, energized only when
required, such that the overall average power is typically 10–100 uW.
• Achieve the lowest possible power operation for the prevailing throughput
and circuit conditions—VDD adjusted to within 50 mV of the minimum
voltage, yet high operational robustness with minimal overheads for a WSN.

Proposed Idea for Low Power Design
• Signal processor accounts for ~50% of total power
consumption
• ‘Sub-threshold Self-Adaptive Scaling’ (SSAVS)
• Circuits work in sub-threshold region
• Supply voltage is adjusted dynamically depending on
the processing speed required by external environment
• Adopting the Quasi-Delay-Insensitive (QDI)
asynchronous-logic protocols where the circuits
therein are self-timed,
• Embodiment of Subthreshold Pre-Charged-Static-
Logic (PCSL) design approach.
• Async SSAVS system has been benchmarked against
its conventional sync DVFS system counterpart.

Proposed Idea for Low Power Design
• Asynchronous logic implementation
• Pre-charged Static Logic (PCSL)
• Superior than existing asynchronous logics in energy, delay and chip area.

Self-Adaptive VDD System for Wireless
Sensor Networks
• As the WSN is typically designed for multiple-year operational life-span, power is carefully
budgeted and where pertinent, energized only when required, such that the overall average
power is typically 10–100 uW.
• In our WSN depicted in Fig. 1, its overall active/passive operation ratio is approximately 20/80. In
the passive mode, only the Sensor Front-End module is continuously energized. The Sensor and
the Conditioning Circuits therein are powered directly by VDD_BAT ( 2.8 V) battery, via a Low-
Dropout (LDO) Regulator.
• The Simple Processor is powered by VDD_NOM (1.2 V) via a power-efficient Buck DC-DC
Converter.
• The Simple Processor ascertains if the input is possibly useful, and if it is, the WSN goes into
active mode where it signals the Power Management module to energize the Signal Processor
module via VDD_ADJ .

Sensor Networks
• The voltage of VDD_ADJ, typically in the sub-threshold voltage (sub-Vt) range, is self-adjusted
such that the lowest possible voltage is used—to enable ultra-low power operation.
• Signal Processor Module:
• The Signal Processor module buffers (via a FIFO) the output of the Simple Processor, filters the output
signal before final computation by the Microcontroller Unit (MCU).
• When the MCU ascertains that the filtered signal is useful, the Wireless Transceiver is energized and the
processed signal is subsequently transmitted wirelessly.
• With the wireless transmission expected to be 0.01% active and with a 20/80 WSN active/passive
operation, 50% of the overall power is attributed to the Signal Processor module, which is of interest in
terms of power dissipation.

Sensor Networks
• The approaches taken to minimize power involve all levels of the design space including
algorithmic design and at the hardware level.
• Frequency Response Masking (FRM) technique
• In the algorithmic design, the filtering in the Signal Processor module embodies the Frequency
Response Masking (FRM) technique.
• This involves the Interpolated Finite Impulse Response (IFIR) Filter and the FRM Filter Bank (FB), and is
computationally more efficient than the usual FIR and IIR filter approaches.
• Ultra-low power design techniques in the hardware level, the operation in the sub- region is one
of the most effective.
• This is particularly applicable because the speed of the digital circuits in the Signal Processor is
modest—the clocking speed ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from 0.1
kSamples/s (kS/s) to 100 kS/s.

Sensor Networks
• Despite the potential advantages of sub- operation, this region of operation is challenging here
for several reasons.
• First, the WSN is designed to work in a wide range of conditions, including extreme environments (-55o
C to +125o C).
• Second, Process, Voltage and Temperature (PVT) variations for fine-dimensioned CMOS processes
increase dramatically in sub- operation, and the ensuing delay variations are very severe, possibly
intractable. Typically, a very large delay safety margin (for synchronous-logic (sync) circuits) would need
to be allowed for.
• Third, the input signal to the Signal Processor module is variable. From a robust operation perspective,
the circuits would need to be designed to meet the worst-case conditions— the fastest input rate and
extreme temperatures.
• To design the WSN for ultra-low power operation, a self-adjusting VDD approach whilst operating
in the sub-Vt region, termed ‘Sub-threshold Self-Adaptive VDD Scaling’ (SSAVS) where the VDD is
in-situ dynamically self-adjusted is adopted.

Sensor Networks
• The operation involves ‘dialing up’ VDD when the need for computation increases or when the
operating conditions are less favorable, and VDD is ‘dialed-down’ when the conditions are the
converse.
• Put simply, the lowest VDD is used where possible because in general the lower the VDD, the lower is
the power dissipation due to dynamic and leakage currents.
• The novel self-adjustment is obtained very simply—by exploiting (and comparing) the existing
Request and Acknowledge signals of the QDI protocol signaling, and thereafter adjusting the
VDD_ADJ accordingly. The ensuing overhead is hence very low.

Adaptive Vdd Scaling Systems
• The general modality of adaptive VDD scaling systems to reduce power is to adaptively adjust as
low as possible (with appropriate timing margin) to meet the throughput requirement for the
prevailing operating conditions (including PVT variations).
• This largely requires the pertinent circuit delay variations to be tracked, observed, or inferred.
• There are many reported techniques, but it can be argued that these reported tracked, observed
and inferred techniques are inadequate in terms of robustness, particularly in sub-Vt operation.
Further, the hardware/computation overheads are considerable, including the need to scale VDD
with the scaling of the clock frequency, i.e. Dynamic Voltage Frequency Scaling (DVFS).
• The proposed idea directly measuring the delay and comparing it against the throughput for the
prevailing conditions, and VDD is thereafter adjusted accordingly.
• To enable this, the adoption of the self-timed async QDI where its dual-rail encoding includes the
Request signal which indicates that the input sample is ready and the Acknowledge signal that indicates
the completion of the computation.

Adaptive Vdd Scaling Systems
• By counting the number of Requests against Acknowledges within a given period, we ascertain if
the delay of the circuit is excessive, or otherwise, with respect to the throughput for the
prevailing conditions.
• VDD is thereafter adjusted accordingly such that the delay is just slightly less than the delay between
input samples, thereby satisfying the throughput.
• Further, as Acknowledges is inherent in QDI async protocols, the computation is uninterrupted
while VDD is transitioning during its self-adjustment; in reported adaptive scaling systems, circuit
operation typically ceases when is transitioning.

System Design
• Fig. 2 depicts the proposed SSAVS system
within the Power Management module
embodying the SSAVS Controller and its
associated adjustable VDD means (a Buck
DC-DC Converter), and the PCSL-based 8x8-
Bit Quad-Channel Async QDI FRM FB within
the FRM FB.
• There are two voltage rails in the overall
proposed SSAVS system a fixed VDD_NOM
and a variable VDD_ADJ whose sub-Vt
voltage typically ranges from 150mV to
400mV.
• For ease of illustration, the specific VDD rail
is shown in parenthesis for the supply rails
and for signals of the various modules.

System Design
• In Fig. 2, the voltage of input and of request signals is first adjusted from VDD_NOM =1.2 V to
VDD_ADJ by the Step-Down Level Converter, and are thereafter buffered by the Async FIFO
Buffer (depth of 50) before input (Input_FB and Req_FB) to the async FRM FB.
• The FB outputs (Output 1–4) and their associated Acknowledges (combined from Ack 1–4 via the
Completion Detection Circuit) are output to the MCU for further processing.
• Acknowledge is also fed back to the Async FIFO Buffer.
• The Request and Acknowledge signals are input to the Power Management module, and
Acknowledge is stepped up from VDD_ADJ to VDD_NOM.
• The SSAVS Controller within the Power Management module monitors the number of requests
and Acknowledge signals in each period (a 10 Hz clock generated by the Update VDD Clock
Generator for a target throughput of 1 kS/s).
• The VDD_Code is a 5-bit code that sets one of 24 voltage levels (in the Buck DC-DC Converter)
ranging from ‘00000’=50 mV to ‘10111’=1.2 V (in 50 mV steps) for VDD_ADJ.

System Design
• Fig. 3 graphically depicts an example of the self-adjustment of VDD_ADJ.
• When the WSN is first initiated, the SSAVS Controller outputs VDD_Code = ‘10111’, equivalently
VDD_ADJ = 1.2 V, and the speed of the FB would far exceed the required computation.
• The voltage of VDD_ADJ of the FB is in-situ adaptively self-adjusted to be as low as possible
(within 50 mV) to meet the throughput for the prevailing operating conditions, and on average,
the voltage of VDD_ADJ is slightly higher than the actual required minimum.
• Hence, the FB is ultra-low power and highly power-efficient.

System Design
• In view of the need for sub-Vt operation, it is imperative to adopt circuits based on the static-logic
family to mitigate the effects of critical transistor sizing; dynamic- and pass-logic families are
inappropriate.
• Pre-Charged Static-Logic’ (PCSL).
• The basic architecture comprises an Inverting Static-Logic Cell, three transistors (for output pre-charging
during the reset phase/evaluation during the computation phase), and two inverters (for output
buffering). The outputs are Q.T (Output True) and Q.F (Output False).
The basic architecture of the proposed async cells, coined ‘Pre-Charged Static-Logic’ (PCSL).

System Design
• In PCSL cells, when Request is ‘0’, both outputs are ‘0’. On the other hand, when Request is ‘1’
(indicating that an operation is ready) and when the input signals are valid, the operation
commences and an ensuing output is obtained.
• The architecture of the PCSL cell involves an integration of the sub-circuit associated with the
signal and a buffer (to each output) into the standard static-logic library cell (redesigned for dual-
rail async), thereby sharing of (common) transistors.
• This reduces the number of transistors, resulting in simultaneous lower power/energy dissipation,
faster speed and smaller IC area.

System Design
• To depict the hardware advantage of the proposed PCSL approach, the 2-input AND/NAND gate in
can be compared to the same gate realized by three reported static-logic QDI approaches:
a) Delay-Insensitive- Minterm-Synthesis (DIMS) approach
b) NULL Convention Logic (NCL) with complex gates (denoted NCL1), and
c) NCL with fast-reset complex gates denoted NCL2).

System Design
• On the basis of simulations (130 nm CMOS), delay and IC area of six basic cells of the various
approaches. The competing cells are normalized to the PCSL cells whose actual values are shown
within parentheses. The average attributes are tabulated in the last row.
• Cells embodying the proposed PCSL approach simultaneously exhibit the lowest , shortest delay
and smallest IC area.

System Design
• With the proposed PCSL QDI realization approach, an 8x8-Bit Quad-Channel Async QDI FRM FB is
designed.
• A semicustom design flow is adopted.
• Each FB channel comprises an Async Read/Write Controller, an 8x8-Bit Coefficient Memory, an 8x8-Bit
Data Memory, an 8-Bit PCSL Multiplier, and a 20-Bit PCSL Adder.
• To preserve the QDI protocol and proper async handshaking, Datapath Completion Detection (DCD) and
Latch Completion Detection (LCD) circuits are included with Muller C-elements (denoted by a ‘C’).
Latch Completion
Detection (LCD)
Datapath Completion
Detection (DCD)

Scenario 1, the
sync DVFS
system
embodies a
temperature
sensor and on
the basis of the
measured
temperature and
pre-
characterization
of the sync filter,
the clocking
frequency is
selected
accordingly.
RESULTS AND
BENCHMARKING

Scenario 2, the
sync DVFS
system is much
simpler where
the clocking
frequency is
fixed (to the
worst-case) to
accommodate all
conditions.
RESULTS AND
BENCHMARKING

• Scenario 1, no specific FB is particularly advantageous—the sync DVFS FB and async SSAVS FB are
advantageous in different conditions.
• Nevertheless, the sync FB may be disadvantageous if the temperature sensor overheads
associated with DVFS for Scenario 1 are considered.
• In Scenario 2, the async FB is advantageous in terms of reduced delay with respect to VDD, usually
lower Eper with respect to VDD, and in terms of power dissipation, advantageous in some
conditions (while the sync advantageous in other conditions).
• Further, in the context of continuous circuit operation and overheads associated with DVS, the
proposed SSAVS is advantageous over the conventional DVFS in terms of uninterrupted circuit
operation and not requiring external intervention (such as changing clock rate, pre-
characterization, etc.).
Results And Benchmarking

Ultra-Low Power Asynchronous Logic Wireless Sensor Network

Ultra-Low Power Asynchronous Logic Wireless Sensor Network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Ultra-Low Power Asynchronous Logic Wireless Sensor Network

Similar to Ultra-Low Power Asynchronous Logic Wireless Sensor Network (20)

More from Hossam Hassan

More from Hossam Hassan (7)

Ultra-Low Power Asynchronous Logic Wireless Sensor Network

Editor's Notes