2. The Mismatch Noise Cancellation(MNC) Top Level
To the third stage
of the Pipelined
ADC
MNC Top Level
Clock1
Gama1[1:0]
Gama2[1:0]
Gama12[1:0]
PRN1_0
RawOut_PN[11:0]
RawOut_Corr[11:0]
Clock2
Reset
En_Shuffle
OV_P
OV_M
PRN1_1
PRN1_2
PRN2_0
PRN2_1
PRN2_2
PRN3_0
PRN3_1
PRN3_2
To the first stage
of the Pipelined
ADC
To the second stage
of the Pipelined
ADC
From the first stage
of the Pipelined ADC
From the first stage
of the Pipelined ADC
The output of the
rest of the pipeline after
The output of the pipeline
after Mismatch noise
cancellation
Figure 1: Top level of the MNC
architecture
3. The MNC Architecture
X1_EST[15:0]X2_EST[15:0]X3_EST[15:0]qvoffset[??:0]
MNC mismatch Estimation
Clock1/2Gama1[1:0]Gama2[1:0]Gama12[1:0] PN1PN2PN3Dither
RawOut_PN[11:0]
Clock1/2Reset
RawOut_Corr[11:0]
RawOut_PN[11:0]Gama1[1:0]Gama2[1:0]Gama12[2:0]PN1PN2PN3X1[15:0]X2[15:0]X3[15:0]qvoffset
MNC Noise Cancellation
Reset
En_Shuffle
Clock2
Clock1
PN1_1PN1_2PN1_3PN2_1PN2_2PN2_3PN3_1PN3_2PN3_3
Pseudo Random Generator
Dither[10:0]
PN1_ECPN2_ECPN3_EC Gama2_EC Gama1_ECGama12_EC
OV_P OV_M
PN_DELAY_EQUALIZER_EC Gama_Delay_Equalizer_EC
MNCTOPLEVEL
PN1PN2PN3
PN_DLY_EQ_EEGama_DLY_EQ_EE
Figure 2: The component block of the MNC architecture
4. Components of the Mismatch Noise Cancellation
Figure 1, and Figure 2 illustrates the top level of the MNS architecture and the component of the
architecture respectively.
The main components of the MNC Architecture are the following blocks/components:
1- Pseudo Random Generator
This block generates random binary sequences for use by the rest of the components in the ADC
pipeline and the rest of the MNC blocks
2- Mismatch Estimation
This block is responsible for the estimation of the mismatches.
3- Noise Cancellation
This block is responsible for correcting the effect of the mismatches.
5. 4- Synchronization Elements
Synchronization and delay elements to synchronize the MNC circuit to the rest of the pipeline, as
well as synchronizing the data flow within the MNC architecture.
6. I- Pseudo Random Generator
The Random Number Generator is implemented as an Linear Feedback Shift Register(LFSR) type II,
with one output fed-back to many points, i.e. taps across the LFSR
Reset
En_Shuffle
Clock2
Clock1
PN1_1
PN1_2
PN1_3
PN2_1
PN2_2
PN2_3
PN3_1
PN3_2
PN3_3
Pseudo Random Generator
Dither[10:0]
7. TYPE-II LFSR
Type-II LFSR
This Linear Feedback shift register(LFSR) topology used in the MNC architecture has a generator
polynomial of degree 31 and produces a maximal lenght binary sequence of length (2^31 - 1).
b0b1
b2b3b4b5b6bn-1 bn-2
+ + +
0
9. The MNC Noise Cancellation architectuer is pipelined to meet the system clock requirement and the throughput of the ADC. The
Noise Cancellation must maintain the same throughput as that of the ADC, since its operation corrects the output of the ADC each
cycle.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
ECX2 PN2 X2 γ2••=
ECX3 PN3 X3 γ3••=
ECX12 PN1 PN2 X2 γ12•••=
ECX13 PN1 PN3 X3 γ12•••=
ECXSUM ECX1 ECX2 ECX3 ECX12 ECX13+ + + +=
RawOutDAC RawOutPN ECX1 ECX2 ECX3 ECX12 ECX13+ + + +( ) 2
m2 1–( )
•–=
MismatchTerm PN1 X1• PN2 X2• PN3 X3• PN1 PN2 X2•• PN1 PN3 X3••+–+ +=
10. (8)
(9)
The signal flow graph for the computations shown in (1) through (9) are illustrated in Figure 3.
TotCorrecfactor 1 MismatchTerm–=
RawOutCorrect RawOutDAC TotCorrecfactor•=
12. Arithmetic Operations used in the Noise Cancellation architecture
Table 1: The list of operations used in the MNC noise cancellation architecture.
Operation Symbol Definition
Scaling Scale a 2’s complement number, and corresponds to mul-
tiplication or division by a power of 2 integer. The way
scaling is implemented in this architecture, has no hard-
ware cost.
Sign Extenstion Sign Extends a 2’s complement number. For 2’s comple-
ment variables, the sign extension does not change the
value of the variable.
Addition Adder, adds the values of 2’s complement numbers.
Subtraction Subtracter, subtracts 2’s complement numbers
Simple multiplication This multiplication is done using logic and not a full
parallel multiplier.
Rounding This operations performs rounding to nearest on a 2’s
complement value.
scale
SXT
+
-
*
Round
13. The list of operations used in the MNC noise cancellation blocks are shwon in Table 1.
Parallel multiplication Full parallel multiplier.
Carry Save Adder The carry save adder reduces the problem of adding three
numbers into that of adding just two numbers and per-
forms this reduction operation within a time delay inde-
pendent of the word size.
Operation Symbol Definition
*
∑
14. Architectural innovations and contributions
1- The use of Extended Precision without sacrificing area nor speed.
The architecture presented here employs an innovative approach for the increase of the precision of the architecture without sacrific-
ing the area and delay. This approach makes use of a bit true C-level model for the architecture that allows us to have in-depth
insight into all the intermediate variables and their upper and lower ranges. This approach have allowed us to use precision equivalent
to 19-bits of precision, while only having 16 physical bits. This amounts to the increase of the precision of computations by a factor of
8.
To represent a signed fixed point number in 2’s complement format, we use the following representation as in (10):
(10)
This notation, presents a number that has wordsize, . The binary-point bit position within the fixed-point word is and the
“t” signifies the fact that this signed number is represented in 2’s complement format. The real value of a fixed-point number repre-
sented in (10), is shown in (11).
(11)
The binary-point bit position within the fixedpoint word decided the precision with wich the fixed point can represent real numbers.
For the Mismatch Noise Cancellation, the inputs to the computation have three variables that require high precision. These variables,
X1, X2 and X3 represent some linear combinations of the capacitance mismatches in a pipeline stage of the ADC.
S wsize bp t >, ,<
wsize bp
2
bp–
bwsize 1–– 2
wsize 1–
⋅ bi 2
i
⋅
i 0=
wsize 2–
∑+
•
15. The fixed-point representation shown in Table 2 illustrates the format S<16,15,t>. This format has a precision of and the
corresponding value range .
Insight into the range of values of the capacitance mismatches in a typical submicron CMOS or BICMOS process enables us to extend
the precision of the computation up to
Table 2: 2’s complement representation of a fixed point fraction reprsented by wsize of 16-bits.
Table 3: 2’s complement representation of a fixed point fration represented by wsize of 16-bits and extended precision.
Format Precision Range Pictorial representation
S<16,15,t>
Format Precision Range Pictorial representation
S<16,19,t>
P 2 15–=
1– V≤ 1<
P 2 15–= 1– V≤ 1<
b0b1b2b3b13b14b15 b12
bp = 15
P 2 19–= 0.0625– V≤ 0.0625<
b0b1b2b3b13b14b15 b12
bp = 19
16. 2- Performing Tree-Height reduction on the Signal Flow graph to minimize the delay through additions or subtrac-
tions trees.
We can identify two segments in the signal flow graph for the MNC noise cancellation computation that can we can use the properties of the
addition computation to reduce the delay of those segments of the data flow graphs. First the computation in Equation (5), corresponds to the
signal flow graph segment shown in Figure 4
As can be seen in Figure 5, we can re-arrange the computation tree (this segment of the data flow graph). Mathematically those two computa-
tions are equivalent. However, the computation in Figure 5 less delay, since there is only 3 adders on the critical path, as opposed to four for the
computation in Figure 4.
+
+
+
ECX2
ECX3
ECX12
ECX1+
ECX13
ECXSUM
Figure 4: Computation of the variable ECXSUM prior to the application of Tree-Height Reduction.
17. + +
+
+
ECX1 ECX12 ECX13
ECXSUM
ECX2 ECX3
Figure 5: Computation of the variable ECXSUM after the application of Tree-Height Reduction. The critical
path through the transformed computation tree has one adder delay less compared to the one without Tree height
redcution optimization.
-
+
+
PN1*X1
PN2*X2
PN1*PN3*X3
PN1*PN2*X2+
PN3*X3
MismatchTerm
-+
Figure 6: Computation of the variable MismatchTerm prior to the application of Tree-Height Reduction.
18. + -
+
+
PN1*X1 PN1*PN3*X3 PN1*PN2*X2
Mismatch_Term
PN2*X2 PN3*X3
-
+
Figure 7: Computation of the variable Mismatch_Term after the application of Tree-Height Reduction. The
critical path through the transformed computation tree has one adder delay less compared to the one without
Tree height redcution optimization.
19. 3- Identification of the suitable arithmetic operator that can benefit from Carry Save Transformations.
Making use of the carry save Adders architectures to reduce the area and delay of different addition/subtraction trees.
To have an additional impact(in addition to the tree height reduction) on the addition tree segments shown in Figure 4 and Figure 6,
carry save transformation can be used to further reduce the delay.
The idea of carry save transformation is to reduce the addition of 3 numbers to that of just two numbers, and to achieve this reduction in
constant time, that is to say, that the delay of the transformation is independent of the word size.
Mathematically the carry save transformation accepts three n-bit numbers, such as x,y and z in Figure 8, and produces two output
numbers u, and v, such that:
(12)
∑
Figure 8: Symbol for Carry Save Adder
x y z
u v
x y z+ + u v+=
20. (13)
(14)
This is for i=0,1,2,...,n-1. Bit is being zero.
Since there is no carry involved in this computation one could compute the values of and for all values of “i” in parallel. This
allows execution in constant time, independent of the bit-width of the operands. Both the parity and majority function can be imple-
mented by simple logic similar in cost and delay to that of a full adder.
The computation for ECXSUM, illustrated in Figure 4, is transformed using carry save addition and the result of the transofrmation is
illustrated in Figure 9.
Similarly in Figure 10, the computation for Mismatch term is illustrated after the carry save transformation.
ui parity xi yi zi, ,( )=
vi 1+ majority xi yi zi, ,( )=
v0
vi 1+ ui
21. ∑
∑
∑
ECX1 ECX13 ECX12
ECX3 ECX2
+
Figure 9: The Computation of ECXSUM after the Carry Save Adder Transformation.
23. 4- Reducing the number of full multipliers in the architecture to just one multiplier.
Careful investigation of the ranges of values for the variables within the signal flow graph, we managed to remove all the unnecessary
multipliers from the implementation and replacing those with much simpler(smaller area) and faster logic to implement the multiplica-
tion operation. The multiplication operations suitable for such replacement are those who are actually multiplying two variables with
wide descriptancy in the bit-width.
*
PN1 * X1[S<16,19,t>]
ECX1[S<16,19,t>]
γ1 [S<2,0,t>]
Figure 11: ECX1 computation using simplified logic. The area required is about 28% of the area of a full multiplier
that perform the same operation, and the delay is about 45% of the delay required for a full multiplier.
*
PN1 * X1[S<16,19,t>]
ECX1[S<16,19,t>]
γ1 [S<2,0,t>]
Replace the multiplier with simplified equivalent logic
results in reduction in both area and delay
24. 5- Using a single Binary Random Number Generator to generate all the binary Random numbers as well as
the dither signal.
This was possible by developing a bit-true C-model for the Random number generator and extrapolating the cross correlation informa-
tion. The cross correlation information verified the possibility of having same random number generator produce several random bina-
rysequences as well as the dither signal. Moreover, creating a simulation enviornment that made the verification of the C-model versus
the hardware model possible.
6- Fixed point optimization
Thefixed point algorithm is designed such that issues of overflow and quantization does not affect the signal processing of the algo-
rithm. Optimizing the sign format, the word length at various internal points (i.e. internal variables) within the signal flow graph enables
us to tailor the hardware to the required computations such that we do not use excessive hardware. Extensive C-Level simulation as
well as VHDL simulationare performed to insure the proper operation of the hardware under fixed point condition as well as optimizing
the hardware in order for it not to sacrifice any signal to quantization noise ratio.
Figure 12 illustrates an abstraction of the verification methodology used to enable the fixed point optimizations used in this architec-
ture.
25. RTL SimulationC-Level Simulation
PN1_X1_VHDPN1_X1_C Compare
PN2_X2_VHDPN2_X2_C Compare
PN3_X3_VHDPN3_X3_C Compare
ECX1_VHDLECX1_C Compare
ECX2_VHDLECX2_C Compare
ECX12_VHDLECX12_C Compare
ecxSUM_VHDecxSUM_C Compare
Figure 12: Verification methodology for fixed point optimization of internal variables.
26. Noise Cancellation Archtiecture
In this section we illustrate the computation of the different segments of the signal flow graph for the MNC noise cancellation.
This section makes use of the operators defined in Table 1.
(15)
In performing these computations illustrated in Figure 13, we replace the multipliers with faster and smaller logic than when using full
multipliers.
Similar optimizations are performed for the following multiplications illustrated in Figure 14, Figure 15, Figure 16, and Figure 17.
ECX1 PN1 X1 γ1••=
*PN1 * X1[S<16,19,t>]
ECX1[S<16,19,t>]
γ1
PN1 [S<2,0,t>] X1[S<16,19,t>]
[S<2,0,t>]
*
Figure 13: computation of PN1*X1 and ECX1
31. Noise Cancellation Archtiecture(Continued)
(20)
(21)
MismatchTerm PN1 X1• PN2 X2• PN3 X3• PN1 PN2 X2•• PN1 PN3 X3••+–+ +=
TotCorrecfactor 1 MismatchTerm–=
+ -
+
+
PN1*X1[S<16,19,t>] PN1*PN3*X3[S<16,19,t>] PN1*PN2*X2[S<16,19,t>]
Mismatch_Term[S[17,19,t>]
SXT
SE1<17,19,t>
SXT
PN2*X2[S<16,19,t>]
SE1<17,19,t>
SXT
PN3*X3[S<16,19,t>]
SXT
-
+
Integer = 1
scale
-
SXT
SE4<21,19,t>]SE<21,19,t>
Round
-
S<21,19,t>
S<17,15,t>
Tot_Correc_factor
SXT
SE1<17,19,t>
SE1<17,19,t>
SE1<17,19,t>
Figure 18: Computation of the Total_Correc_factor, detailed implementation transformation such as the carry
save transformation is not shown.
32. Noise Cancellation Archtiecture(Continued)
(22)RawOutCorrect RawOutDAC TotCorrecfactor•=
*
RawOutDAC[S<13,0,t>]
RawOut_Corr[S<12,0,t>]
Tot_Correc_factor[SE1<17,15,t>]
Round
RawOut_Corr_int[SE3<30,15,t>]
Figure 19: Computing the final output RawOut_Corr after MNC noise cancellation. This is the
only point in the signal flow graph that we use a full parallel multilier.
Full parallel
pipelined multiplier
33. The output of the MNC is finally combined with the output of the first stage of the pipelined ADC to form the final 14-bit output of the
pipelined analog to digital converter. This is illustrated in Figure 20.
First ADC Pipeline Stage The rest of the Pipeline stages(Pipe Stages 2,3,4,5,6)
Digital Error Correction
MNC TOP LEVEL Delay Stages
MNC ON/OFF
RawOut_Corr[11:0]
RawOut_PN[11:0]
Digital Error Correction (final stage)
offon
PN1_1
PN1_2
PN1_3
Delay Stages
2’s Complement
2’s Complement
2’s Complement
2’s Complement
3
12
123
14
Output of the first stage
Final output of the ADC
gama1
gama2
gama12
Figure 20: The MNC architecture as it is used in the 14-bit ADC pipeline to cancenll the mismatch noise
34. MNC Mismatch Estimation
The MNC mismatch Estimation block is illustrated in Figure 21. The binary random sequences PN1, PN2, and PN3 are inputs as well
as the “Dither” input are outputs from the Random Generator block after proper delay equalization.
The inputs (Gama1), and (Gama2) are from the analog ADC. The MNC mismatch estimation block, generates The estimated
values for the variables X1, X2, X3 and qvoffset. All these variables are used by the MNC noise cancellation block to correct the out-
put of the ADC for mismach noise. Figure 22 illustrates the block diagram of the MNC estimation logic.
X1_EST[15:0]<
X2_EST[15:0]
X3_EST[15:0]
qvoffset[10:0]
MNC Mismatch Estimation
Clock 1/2
Gama1[1:0]
Gama2[1:0]
PN1
PN2
PN3
Dither
Figure 21: MNC mismatch Estimation
γ1 γ2
35. MNC Mismatch Estimation Architecture
Dither generator
+
RawOut_PN
RawOut_PN_Dither
1 when > 1024
-1 when < -1024
0 otherwise
RawOut_PN_Quant
* * *PN3
PN1
PN2
V1
Averager
qvoffset_Ave
if γ1= 1 if γ1= -1
Sum_Gama1 = Sum_Gama1 - V1Sum_Gama1 = Sum_Gama1 + V1
Count1_Sum = Count1_Sum + 1
V2
if γ2= 1 if γ2= -1
Sum_Gama2 = Sum_Gama2- V2Sum_Gama2 = Sum_Gama2 + V2
Count2_Sum = Count2_Sum + 1
V3
if γ3= 1 if γ3= -1
Sum_Gama3 = Sum_Gama3 - V3Sum_Gama3= Sum_Gama3 + V3
Count3_Sum = Count3_Sum + 1
if count1_Sum = 2^n
X1_Ave = Sum_Gama1 / Count1_Sum X2_Ave = Sum_Gama2 / Count2_Sum X3_Ave = Sum_Gama3 / Count3_Sum
if count2_Sum = 2^n if count3_Sum = 2^n
Average(X1) Average(X2) Average(X3)
Figure 22: Block diagram of the MNC mismatchestimation
36. The computed average values for X1, X2, X3 and qvoffset are to be scaled and rounded before it is fed to the MNC noise cancellation
block.
The computation of the average value is done keeping in mind that we want to reduce the complexity of the hardware. We divide the
accumulated values for X1, X2, X3, qvoffset with a number that is a power of 2.
Due to the statistical nature of this computation, the accumulation process tends to take a large number of cycles before the average
values of the computed mismatched become stable and converge to within a reasonable error band to the correct values.
Figure 23 illustrates the behavior of the MNC estimation block as we increase the number of cycles used for averaging the values to be
estimated. It is evident that as the number of cycles reaches a certain point, the error in the estimation of the mismatches decreases to
within an error band of around +/- 10%. The percentage error in estimating the mismatches is illustrated in Figure 24. The results of
such simulations is used for guiding the choice of the number of cycles that is needed to get a more or less some good separation of
the cross correlation between the different contributers to the mismatch (namely X1, X2, X3 which represent the mismatches of the
capacitance).
Finally rounding and scaling is performed on these average values, and they are used as input to the MNC mismatch cancellation.
37. Figure 23: Simulation of the convergence behavior of the MNC estimation versus the number of cycles used for
averaging.
38. Figure 24: Percentage error of the mismatch estimation versus the number of cycles used for
averaging
39. FFT Comparison
Graph(I)
Graph(II)
Graph(III)
Figure 25: FFT comparison for the following cases: (I) Double precision C-level simulations, (II) Hardware
simulation, (III) Double-precision C-simulation followed by rounding.
40. Table 4: Comparison of the simulation result for MNC algorithm.
Graph(I), in Figure 25 illustrates the FFT of the output of the ADC when The Error Correction/Cancellation is per-
formed using C-simulation with double precision.
Graph(II) in Figure 25 illustrates the FFT of the output of the ADC when the The Error Correction/Cancellation is performed
using the architecture described here (simulated by VHDL), and the se
Graph(III) in Figure 25 illustrates the FFT of the output of the ADC when the The Error Correction/Cancellation is per-
formed C-simulation, however the correction result is rounded before it is added to the pipeline output to corre-
sponding to the real ADC resolution.
It is evident from comparing the result of the hardware simulation and the C-simulation with rounding, that the results obtained from
the hardware is within 1 dB of that obtained from the C-simulation with rounding.
From C Simulation, IDEAL case
Graph(I)
From VHDL/Hardware
Simulation
Graph(II)
From C Simulation with Rounding
Graph(III)
sqnr = 85.0130 dB sqnr = 81.5997 dB sqnr = 82.5849 dB