EAMTA_VLSI Architecture Design for Particle Filtering in
1. VLSI Architecture Design for Particle Filtering in
Real-time
A. Pasciaroni∗†, J. A. Rodr´ıguez†, F. Masson∗†, P. Juli´an∗†, E. Nebot‡
∗Dep. Ing. El´ectrica y Computadoras, Universidad Nacional del Sur
Av. Alem 1253, Bah´ıa Blanca, Argentina
†CONICET, Argentina
‡Australian Centre for Field Robotics, University of Sydney, Australia
Abstract—Particle Filter is an algorithm that provides system
state estimation even for non-linear and non-gaussian systems.
For applications that require a large number of particles, real
time constraint is hard to accomplish since the algorithm is
computationally expensive and the resampling step becomes a
bottleneck. In this work, a VLSI architecture for particle filtering
in real time is presented. The proposed design implements a
fraction of the processing using piecewise linear functions and
allocates them as global resources. In this way, a large number of
processing elements (PE) working in parallel can be instantiated
in the design. An example based on a range-only localization
using Radio-Frecuency identification (RFID) tags is developed to
illustrate the approach. The received signal strength indicator
(RSSI) is used to estimate the distance between transmitter
and receiver. A VHDL RTL model of the processing data flow
is implemented and compared to Matlab simulations showing
similar results.
Index Terms—Particle Filter, VLSI Design, RFID, RTL.
I. INTRODUCTION
Particle Filters (PF) [1] are a method to perform statistical
dynamic state estimation. The probability density function of
a given state is represented by a set of weighted entities or
particles which is updated iteratively according to sensor mea-
surements and a dynamic system model. The three main steps
of the particle filter are: sampling, update and resampling.
This last step presents high data dependency between particles,
becoming the major bottleneck in the execution time of the
filter.
There exist applications that require real-time estimation of
non-linear and non-gaussian systems as robot localization and
visual tracking [2], [3], [4]. These applications are well suited
for particle filtering but a large number of particles is required
to provide accurate estimations. Since the PF algorithm is
computationally expensive and the resampling step cannot
be fully parallelized, particle filter computation in real time
is limited by the available computational resources. In this
context, a VLSI implementation that exploits algorithm data
level parallelism will allow particle filtering at real time.
Previous works have addressed particle filter implementa-
tions for real time applications [5] [6] [7] [8]. In [5] a PF
architecture composed of multiple processing elements and a
central unit for the bearing-only tracking problem is presented
and implemented in FPGA. Particle filter steps are performed
locally on each processing element (PE). After resampling, a
central unit controls the particle exchange among processors
in order to reduce performance degradation. Several commu-
nication schemes are introduced including a fixed particle
exchange among processors. In [7] a VLSI design of the
processing element is presented which also includes a pipeline
dataflow that deals with logic blocks of variable latency. In [8]
a central unit that performs communication schemes, intro-
duced in [5], for an architecture composed of four processing
elements is designed and a VLSI implementation is presented.
In [6] a parallel pipelined design is presented. The number of
replicated pipeline stages is variable. Taking into account the
rate of each stage an optimal number of replicated stages is
determined. However, a VLSI implementation that takes full
advantage of the data level parallelism present in the algorithm,
has not been developed yet.
In this work a VLSI architecture for particle filtering in real
time applications is presented. It is composed of processing
clusters with one resampling module and an array of PE. Each
PE performs several steps of the PF operation that do not
present data dependency, in a pipelined fashion. Therefore,
if more PE can be instantiated in a given Silicon area, more
particles can be effectively processed in parallel, increasing the
throughput. Afterwards, resampling modules gather PE outputs
so that the resampling is performed in groups. In addition, to
reduce the PE area, a fraction of the PE data processing is
time-multiplexed so hardware dedicated to this processing is
instantiated once and can be shared by multiple PE.
The application chosen to illustrate the approach is target
tracking based on Received Signal Strength Indicator (RSSI)
of Radio Frequency Identification devices (RFID).
The paper is organized as follows. Section II presents the
localization framework and RSSI sensor model. The archi-
tecture and microarchitecture design is presented in section
III. Execution time of proposed architecture is presented in
section IV. Simulation results comparing the VHDL RTL and
Matlab models are presented in section V. Finally, Section VI
is dedicated to the conclusions.
II. LOCALIZATION FRAMEWORK
In sensor networks, Radio Frequency based localization
systems have gained importance in those environments where
Global Positioning based system (GPS) do not perform well
due to poor satellite availability or multiple path issues [9]
[10]. This a possible situation for the choosen target appli-
cation: trucks localization in opencast mining enviroments
2. 0 10 20 30 40 50 60
−150
−100
−50
0
50
Two Ray Model
Distance [m]
AverageSignalStrength[dBm]
Fig. 1: Two Ray Model for a communication link of 433 Mhz in a
rural enviroment.
[9]. The RFID technology comprises the receivers, antennas
and RFID tags. The tags send their identification number to
the receivers. Making use of RSSI it is possible to estimate
the distance between a tag and a receiver since RSSI values
decrease with distance with a known law. Due to several
factors that affect propagation of electromagnetic waves in
a medium (refractions, reflections, scattering), the received
power vs distance relation varies with the obstacles in the
environment, the height and direction of the antenna and also
the power of the signal transmitted. This results in a non-
biyective and thus multimodal sensor function.
Figure 1 shows a typical two-ray propagation model of RF
signals [11] for a rural environment and a communication
frequency of 433 MHz and transmitter and receiver height
of 2.5 m. It shows the average signal strength of the received
power versus distance. For a given distance the distribution
of RF signal is considered Gaussian and its variance varies
with the signal strength [9]. It is possible to observe that for
a received power of −70 dBm there exist multiple distance
values: 8 m, 15.5 m, 20 m and 43.1 m being one of those the
true value of the tag position. This example shows the multi-
modal probabilistic density function associated with RFID
sensor.
RSSI based localization can be performed using the particle
filter algorithm. Consider a hypothetical scenario of one RFID
tag moving in 2-D and one antenna located at the origin. Let
pi
k denote the ith particle, where pi
k = x ˙x y ˙y
′
. The
target system evolution is given by
f(pi
k−1, vx, vy) =
1 ∆T 0 0
0 1 0 0
0 0 1 ∆T
0 0 0 1
· pi
k−1+
0.5 · ∆T 2
0
∆T 0
0 0.5 · ∆T 2
0 ∆T
·
vx
vy
,
(1)
where vx and vy are drawn from a uniform distribution
U[0, Q].
The pseudocode of the Particle Filter algorithm for the
chosen application and for a set of N particles is described
below:
random initialization of particles;
for i ← 1 to N do
pi
k = f(pi
k−1, vx, vy); //sampling
di
= sqrt(pi
k(1)
2
+ pi
k(3)
2
);
Poti
= Fsensor(di
);
wi
=
1√
2π·σ2
· exp(−(P oti
−P otmeasurement)2
2·σ2 ) //update
end
[ ˆw, ˆp] = resampling(w, pk);
where Potmeasurement is the power measurement of the
received signal whose variance is σ2
and Fsensor(d) is the
mathematical expression of the Two ray propagation model
whose characteristic is shown in Figure 1. Depending on
the obstacles present in the enviroment a more complex
sensor model can be utilized. For the resampling step there
exist several algorithms [12], [5], [13]. Position estimation is
computed by the following equation:
˜x =
N
i=1
ˆpi
· ˆwi
(2)
III. DESIGN
A. Architecture
The PF filter algorithm does not presents data dependence
between particles except for the resampling step. When the
number of particles increases the resampling execution time
can become a bottleneck. A strategy to reduce the resampling
execution time is to divide the total number of particles into
groups so parallelism level is increased [5]. Each particle
group is processed by a dedicated processor. Since the resam-
pling step is sequentially executed, there exists a trade off be-
tween the number of processors and the estimation error: as the
number of particle groups increases, so does the degradation of
the filter [14]. In order to reduce this performance degradation
a particle exchange must be performed among processors. In
[15] an optimization of the particles exchange procedure is
presented. A formal analysis, applying the Kullback-Leibler
divergence, proves that the exchange of particles with largest
weights between adjacent processors results in better accuracy
than a random particle mixing. In [14] this exchange is
performed after resampling thus the selection of particles
with largest weight is avoided. The analysis of algorithm
parallelization has been done in [14] allowing the selection
of an optimal configuration. Once one filter iteration has been
performed, the estimate of each processor is combined in order
to provide a global estimation [15].
The system consists of two modules: the measurement unit
and the processing unit. The system block diagram is shown
in Fig. 2-a. The measurement unit sets up the RSSI value and
computes the reciprocal of the noise variance. The processing
unit performs the PF algorithm and provides an estimated
position.
In order to process thousands of particles in real-time the
processing unit architecture must exploit data level parallelism
and at the same time take into account the strategy described
above. A parallelism level hierarchy is adopted. The first
3. level is performed by introducing multiple processing elements
(PEs) each one performing the PF algorithm steps that do not
present data dependency. The second level consists in gath-
ering PEs in clusters so data input for the resampling step is
made up of the processed particle and weight of each PE inside
a cluster. For the final estimation of position, the estimate of
each cluster is combined as was previously mentioned. Particle
exchange among clusters is also performed.
The proposed VLSI design implements most area consum-
ing operations in external (out of the array) Look-up tables
(LUT). These LUTs are taken away from the processing
element dataflow and put them as global resources. For each
table there is a Broadcast module that sequentially reads the
table and performs interpolation. The interpolated value and
interpolation address are broadcasted to all PEs through buses.
Each PE locally computes its required interpolation address
and compares it with the current value in the bus. If an
equivalence is found, the corresponding data value is acquired
by the PE.
Figure 2-b shows a more detailed architecture of the pro-
cessing unit. It has 4 clusters with 4 PEs each. Sensor measure-
ment and the reciprocal of its variance 1/σ2
are communicated
to all PEs. Four global resources are introduced: Square,
Sqrt, Sensor and Normal LUT. Each broadcast module has
two independent buses: interpolation/address and data/bus.
Resampling, pseudo-random number generator (PRNG) and
Word-to-memory modules inside each cluster are also in-
troduced. All modules are explained in further subsections.
Communication among clusters is not shown to simplify the
diagram.
Each cluster has its own local memory and works without
data dependence of others except when the particle exchange
is performed. Processing elements belonging to a cluster share
local memory.
Regarding control logic, each cluster has its own control
logic that manages main memory reading and writing and also
global control signals. Furthermore, each processing element
that integrates a cluster has a dataflow pipeline whose control
is distributed. Since each pipeline stage has a variable delay
dependent on the time instant when the corresponding value is
present in data bus, global pipeline control is not affordable.
Therefore, each stage has a local control logic dependent on
data events.
B. Cluster Operation
Architecture cluster operation proceeds as follows: while
in execution, each PE inside a cluster reads a particle from
memory. Each Broadcast module sequentially reads its corre-
sponding LUT, interpolates and broadcasts interpolated value
and interpolation address to all PEs. Since the PE dataflow
is pipelined, a single table read is utilised to process several
particles. Main memory has two ports so memory reading and
writing is performed simultaneously.
Two arrays, one made up of particles and another one
of processed weights from each PE are the input for the
Resampling module. Once the arrays have been totally up-
dated, resampling is performed. The elements of resampling
Fig. 2: a) Block Diagram of the VLSI architecture for proposed
tracking system, b) architecture of the processing unit.
arrays are processed sequentially. As soon as one element is
resampled, it is immediately updated by corresponding PE.
Once all data from local memory has been processed
communication among clusters is performed.
C. LUTs design
The functions implemented in the LUTs are: square, square
root, two ray propagation model (as shown in Fig. 1) and
normal distribution. All of them are evaluated with a piecewise
linear function with uniform segmentation. By performing
interpolation, a reduction in table size is achieved. At a point
x ∈ [a , b], a linear interpolation is calculated as follows:
˜f(x) =
f(b) − f(a)
b − a
· (x − a) + f(a) (3)
This operation is performed by the broadcast module shown
in Fig. 3. A counter generates 2N+M
words where the N most
significant bits are used for LUT addressing and the remaining
M bits for interpolation.
The introduced dataflow is composed of several tabulated
functions and interpolations in cascade. When the interpolated
value from a broadcast module is captured by the correspond-
ing pipeline stage, it becomes the interpolation address for the
next tabulated function. It is desirable to find an appropriate
word length for LUT addressing, interpolation and function
value quantization. This length should maximise the ratio
between interpolation address word length and interpolated
value word length. At the same time, the approximation errors
4. Fig. 3: Broadcast Module
TABLE I: Piecewise Linear Function Setup
F unctionN M Q R S Size
Kbits
Range X Interp.
Error
Square 9 2 14 17 - 7 [0,40] 5 · 10−4
Sqrt 10 2 11 13 5 11 [0,3200] 3 · 10−4
Sensor 10 2 10 12 1 10 [0,113] 4 · 10−4
Normal 9 1 10 11 3 5 [0,5] 5 · 10−3
should be reduced since they are propagated through the
dataflow. In this regard, the accuracy analysis introduced in
[16] for practical implementation of piecewise linear functions
is adopted. Table I shows the setup chosen for each piecewise
linear function implementation where N, M, Q are the number
of bits assigned for segmentation, interpolation and function
value quantization. R and S are output data resolution and
discarded input bits. The error introduced by each interpola-
tion, which is calculated as the median of the absolute error
over one thousand samples of evaluation interval, i.e.,
error(x) = mean(|
f(x) − finterp(x))
f(x)
|) (4)
is also included in the table.
The normal distribution implementation requires to evaluate
normal distributions with different values of variance. Any
normal distribution can be obtained from the standard normal
distribution. If a distribution with mean µ and variance σ2
must be evaluated for a value t, the following equations allow
the calculation using only the standard normal distribution
function:
z =
t − µ
σ
, (5)
pNormal =
1
σ
· pStandardNormal(z), (6)
where
PStandardNormal(z) =
1
√
2 · π
· exp(
−z2
2
). (7)
Moreover, as the function is symmetric around the mean,
there is only need to store half of the evaluation interval,
reducing even more the LUT size.
The architecture comprises dual port memories thus the two
values for interpolation can be obtained simultaneously.
Fig. 4: PE Micro-architecture
D. PE Micro Architecture
Each PE sequentially performs the two algorithm steps that
do not present data dependency: sampling and update. Pro-
cessing is divided into several modules in order to implement
a level module pipeline: Sampling, Acquisition Square Value,
Acquisition Sqrt Value, Acquisition Sensor Value and Acqui-
sition Normal Value. Figure 4 shows the pipelined dataflow
microarchitecture.
1) Sampling Unit: The sampling unit processes data from
main memory current location. Memory word datawidth is 48
bits where each particle component has 12 bits. Range for
position and velocity is [−40, 40] m and [−25, 25] m/s. This
unit performs a translation in the plane by using a simplified
version of the dynamic model detailed in (1). This simplifica-
tion allows a reduction in the number of multiplications. For
this design the dynamic model is fixed but future designs will
consider a programable model. The translated positions and
velocities are computed as follows
px(k) = px(k − 1) + vx(k − 1) · △T +
1
2
· nx (8)
py(k) = py(k − 1) + vy(k − 1) · △T +
1
2
· ny (9)
vx(k) = vx(k − 1) + nx (10)
vy(k) = vy(k − 1) + ny (11)
where nx and ny are drawn from a uniform distribution
U[0, W]. Depending on the value of the △T parameter, the
W value should be adjusted in order to provide similar
accelerations than the original model. The random noise is
generated by a linear feedback shift register [LFSR] [17]
with internal XORs of 16 bits with reconfigurable seed. This
pseudo random number generator is a shared resource inside
a cluster. Each PE takes a number at its corresponding turn.
The eight most significant bits are used fot the nx component
and the eight less significant bits for the ny component. Each
component noise is pre-multiplied by the variance value Q.
Either Q and △T registers are programmables of 8 bits length.
The output of the sampling unit has the same datawidht as
its input.
2) Acquisition Value Units: All acquisition units detect
when data input is equal to the current value in the interpola-
tion address bus. This detection is performed with a bitwise xor
operation. When an equivalence is detected the data present
in the data bus is acquired.
5. The Acquisition Square Value unit, performs the sum of the
inputs squared. When x or y are negative two’s complement is
performed. Thus |x| and |y| have 11-bit word length and are
compared to the interpolation address bus. Once the squared
value is captured for both components, sum is performed with
17 bits output data width. The broadcast module for the Sqrt
function provides a 12-bit interpolation address bus. Therefore
the 5 less significant bits of x2
+ y2
are discarded when
Acquisition Sqrt Value unit compares its data input with the
value present in the interpolation address bus. The same occurs
for the block Acquisition Sensor Value with the less significant
bit discarded from its input word.
The Acquisition Normal Value Unit generates a word using
(5) with µ equal to Potmeasurement. Once an equivalence is
detected, the data present in the bus is acquires. Finally it is
multiplied by the reciprocal of the standard deviation as stated
in (6). The reciprocal of the variance has 8-bit width as well as
the power measurement. In order to perform subtraction in (5),
the 5 less significant bits of the input word are discarded. The
word length after this equation is 16 bits. According to Table
I, the tabulated normal function requires 1 interpolation bit,
therefore the 6 less significant bits are not taken into account
resulting in a 10-bit word length. Once data value is captured
by the PE, it is multiplied by the reciprocal of the variance,
resulting in a 19-bit word lentgh.
E. Resampling unit
The resampling algorithm selected for implementation is the
modified Independent Metropolis Hasting (IMH) [12] which
substitutes division operation for comparison and particles
and their weights are sequentially processed. The algorithm
is summarized in the following pseudocode:
wprev = wk
1
;
for i ← 2 to NUMPART ICLES do
u ∼ U(0, 1);
if ( u · wprev > wk
i
) then
wprev = wprev; resample = 1;
else
wprev = wk
i
; resample = 0;
end
end
Algorithm 1: Implemented resampling algorithm
Figure 5 shows the architecture of resampling and word-to-
memory modules. The particle array is fulfilled with output
particles from sampling unit. Both arrays particle array
and weight array must be fully updated in order to initiate
resampling operation. First particle of the whole set is always
resampled. Subsequent particles will be stored in memory
depending on the comparison among their weight and wprev.
The random number generator is implemented with a LFSR
of 16 bits.
The resample signal controls the data stored in memory. If
value of resample is 1, the data present in the particle register
is written to memory else actual processed particle is selected
and wprev is updated.
In order to sinchronize the translated particle with the
pipeline time schedule, it must be delayed as many times as the
Fig. 5: Word-To-memory Architecture
number of pipeline stages in between the sampling unit and the
Acquisition normal value unit. Each PE reads a particle from
a memory location and, once the particle is resampled, word-
to-memory unit stores it at the same location. Since a dual
port memory is considered and the architecture is pipelined,
memory reading and writing is done simultaneously. Control
is achieved with a read address and write adress counter. The
former is dependent on control signals from sampling unit
and the latter is dependent on control signals from word-to-
memory unit.
IV. EXECUTION TIME
Since the execution time of each module is variable, each
PE will complete its processing at different times. The resam-
pling module begins operation when the first PE has finished
processing its particle. Figure 6 shows the execution time of
the dataflow for a cluster made up of two processing elements.
Pipeline delay between output data values is given by the
slowest stage. In the presented design this corresponds to
the stage with the largest interpolation bus address, since it
takes 2N+M
cycles in order to acquire the last interpolated
value. This is the case of the Sqrt function. In the worst case
execution time, a new particle is processed every 4096 cycles.
As resampling takes one cycle to process each particle, the
number of cycles to finish the resampling operation depends
on the number of PEs in a cluster. Therefore, the last element
of the resampling array will be updated every 2N+M
+ P,
where P is the number of PEs in the cluster.
V. RESULTS
A. Simulation Results
A VHDL RTL model of the processing element was de-
veloped. The implementation flow was the following: first a
fixed point Matlab implementation of the processing element
described above was generated and compared to its floating
point counterpart to prove its proper operation. Second, an
RTL model that matches the fixed point Matlab implemen-
tation was developed. At this stage of the implementation,
6. Fig. 6: Filter execution time.
0 2 4 6 8 10 12 14 16 18 20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Distance [m]
Weight
Matlab Model vs RTL Model
Matlab Model
RTL Model
Fig. 7: Weights vs. distance.
the RSSI measurement, the reciprocal of the noise standard
deviation and the position estimation are generated off-line.
Figure 7 shows the distribution of weights vs distance for
the floating point Matlab implementation and the RTL model,
in the case where the measured power is −55.42 dBm and
σ = 2.5 dBm. Since RSSI measurements are 8-bit quantized,
the normal distribution is also quantized. It can be noticed that
the RTL model provides similar results to the floating point
Matlab implementation.
A 2-D tracking scenario can be simulated to show the
dynamic performance. In this case, the fixed point Matlab
implementation is used, instead of the RTL model, in order
to reduce simulation time. The scenario is composed of a unit
moving at nearly constant velocity and three fixed antennas
s1, s2 and s3 placed at positions: [0, 0], [−20, 0], [0, 20],
respectively. The position of the target unit evolves with time
according to (1).
The mobile initial state is x0 =
[−8m, 12m/s, 10m, −2m/s] and ∆T = 0.1s. The total
number of particles used is 4096, which are uniformly
distributed on a region delimited by the intervals [−20m, 20m]
and [0, π] radians at the beginning of the simulation. Particle
velocities have been randomly initialised with uniform
distribution in the interval [17, 7] for ˙x and [7, −3] for ˙y.
Figure 8 shows the trajectory of the target unit (green
line) and simulation results for the Matlab model and the
RTL model, in red and black lines, respectively. Both models
provide very close results.
−8 −6 −4 −2 0 2 4 6 8
0
5
10
15
20
Floating Point Matlab Model
Fixed Point Matlab Model
Tartet trajectory
Antenna
Fig. 8: Tracking of a moving target with three antennas.
TABLE II: Synthesis Results
Module Area [µm2]
Sampling 37453
Acq. Square Value 6144
Acq. Sqrt Value 2268
Acq. Sensor Value 1932
Acq. Normal Value 13293
Total Area EP 87086
B. Synthesis Results
The RTL model of the processing element described in
section III was synthesized using Synopsis DC Compiler and
0.13µm CMOS technology. Since the array is composed of
several processing elements it is desirable to have the area
required for this basic unit. Table II shows the area of the
processing element and its modules.
VI. CONCLUSIONS
A VLSI architecture for particle filtering in real time was
presented. This architecture exploits the data level parallelism
in the algorithm and also takes into account performance
degradation due to resampling parallelization. Introducing
global resources allows an increase in concurrent hardware.
Processing dataflow was described along with a piecewise lin-
ear function implementation. An RTL model of the proposed
design was generated. Simulation shows that the architecture
correctly implements the PF adapted to the specific applica-
tion. Further work is needed to choose an optimal number of
PEs per cluster.
VII. ACKNOWLEDGMENTS
The results of this paper were partially supported by PICT
2010-2657 3D Gigascale Integrated Circuits for Nonlinear
Computation, Filter and Fusion with Applications in Industrial
Field Robotics of Agencia Nacional de Promoci´on Cient´ıfica y
Tecnol´ogica (ANPCyT) of the Argentine Ministry of Science
and Technology (MINCYT).
REFERENCES
[1] N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to
nonlinear/non-gaussian bayesian state estimation,” IEE Proc. Of Radar
and Signal Processing, vol. 140, no. 2, pp. 107–113, 1993.
[2] M. Isard and A. Blake, “Condensation - conditional density propagation
for visual tracking,” International Journal of Computer Vision, vol. 29,
no. 1, pp. 5–28, 1998.
7. [3] D. Fox, “Kld-sampling: Adaptive particle filters and mobile robot
localization,” in Advances in Neural Information Processing Systems
14, vol. 2, 2001, pp. 713–720.
[4] D. F. C Kwok and M. Meila, “Real-time particle filters,” Proceedings
of the IEEE, vol. 92, no. 3, pp. 469–484, Mar 2004.
[5] M. Bolic, P. M. Djuric, and S. Hong, “Resampling algorithms and
architectures for distributed particle filters,” IEEE Transactions on Signal
Processing, vol. 53, no. 7, pp. 2442–2450, July 2005.
[6] A. C. Sankaranarayanan, A. Srivastava, and R. Chellappa, “Algorithmic
and architectural optimizations for computationally efficient particle
filtering,” IEEE transcactions on Image Processing, vol. 17, no. 5, pp.
737–748, May 2008.
[7] S.-S. Chin and S. Hong, “Vlsi design of high-throughput processing
element for real-time particle filtering,” in Signals, Circuits and Systems,
vol. 2, 2003, pp. 617–620.
[8] S. Hong, S. S. Chin, M. Boli, and P. M. Djuric, “Design and implemen-
tation of flexible resampling mechanism for high-speed parallel particle
filters,” Journal of VLSI signal processing systems for signal, image and
video technology, vol. 44, pp. 47–62, 2006.
[9] G. Kloos, J. E. Guivant, E. M. Nebot, and F. Masson, “Range based
localisation using rf and the application to mining safety,” in Proceedings
of the 2006 IEEE/RSJ International Conference on Intelligent Robots
and Systems, Oct 2006, pp. 1304–1311.
[10] S. Sanudo and F. R. Masson, “Desempe˜no del filtro de part´ıculas acotado
en una aplicaci´on de localizaci´on y seguimiento de camiones en una
explotaci´on minera,” in XIV Reunion de Trabajo en Procesamiento de
la Informacion y Control, vol. 1, 2011, pp. 712–717.
[11] H. Xia, H. L. Bertoni, L. Maciel, A. Lindsay-Stewart, and R. Rowe,
“Radio propagation characteristics for line-of-sight microcellular and
personal communications,” IEEE Transactions on Antennas and Propa-
gation, vol. 41, no. 10, pp. 1439–1447, Oct 1993.
[12] L. Miao, J. J. Zhang, C. Chakrabarti, and A. Papandreou-Suppappola,
“Algorithm and parallel implementation of particle filtering and its use
in waveform-agile sensing.” Signal Processing Systems, vol. 65, no. 2,
pp. 211–227.
[13] M. Bolic, P. M. Djuric, and S. Hong, “Resampling algorithms for particle
filters: A computational complexity perspective,” EURASIP Journal on
Applied Signal Processing, vol. 15, pp. 2267–2277, 2004.
[14] A. Pasciaroni, S. Sanudo, J. Rodriguez, F. Masson, and P. Julian,
“Modelling and analysis of parallel particle filters,” in XV Reunion de
Trabajo en Procesamiento de la Informacion y Control, vol. 1, no. 1,
2013, pp. 1–6.
[15] B. Balasingam, M. Bolic, P. Djuric, and J. Miguez, “Efficient distributed
resampling for particle filters,” in IEEE Int. Conf. on Acoustics, Speech
and Signal Processing (ICASSP), 2011, pp. 3772–3775.
[16] O. Lischitz, P. Julian, J. Rodriguez, and O. Agamennoni, “Accuracy
analysis for an on-chip digital pwl realization,” in XIV Reunion de
Trabajo en Procesamiento de la Informacion y Control, 2011, pp. 429–
434.
[17] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg, “Exhaustive gen-
eration of bit patterns with applications to vlsi self-testing,” IEEE
Transactions on Computers, vol. C-32, no. 2, pp. 190–194, Feb 1983.