SlideShare a Scribd company logo
GeneralizedDivision-FreeArchitectureand
CompactMemoryStructureforResampling
inParticleFilters
Syed Asad Alam and Oscar Gustafsson
{syed.asad.alam, oscar.gustafsson}@liu.se
Department of Electrical Engineering, Linköping University, Sweden
Aims and Objectives
• The most basic form of resampling algorithm in a
particle filter has a high hardware cost
• Requires normalized and ordered data set for
implementation → Multinomial resampling
• Alternative algorithms used to avoid multinomial →
Stratified, systematic resampling
• Aim
– Architecture for multinomial resampling free
from the need of ordering and normalization
– Memory optimization for the weights and
random function
Particle Filters
• Model based filtering
– State transition and observation models are
non-linear and noise non-gaussian
• Purpose → Estimation of a state from a set of
observations corrupted by noise
• Applications → Target tracking, computer vision,
channel estimation . . .
• Steps → Time-update, weight computation and
resampling
estimate
ResamplingTime update
computation
Weight
Input observations
xn
yn
wn
˜xn, ˜wn
Output
Figure 1: Overall structure of particle filter.
Comparator
Memory
Weight
Normalized
Replicated−
Factors
Replicated/
Memory
Discarded/
Normalization
and
Cumulative
Sum
Unit
Generation
Number
Random
Particle
Memoryand Time
Update
Sample
(i) Time Update
Unit
Unit
Control
Unit
(iii) Resampling
(ii) Weight Memory and
Random Number Generation
Memory address
wK
W ′
K
U′
K
Figure 2: Basic architecture of particle filter.
Resampling
Resampling prevents degeneration of particles and
improves estimation by eliminating and replicating
particles having low and high particle weights respectively.
The total number of particles, M, remains the same.
0 1
Systematic Stratified Multinomial
Figure 3: Uniformly distributed samples for M = 10.
Standard algorithms for resampling available in the
literature
• Multinomial → Uniform random numbers – U[0, 1)
• Stratified → Partition U[0, 1) into M regions, one
sample from each interval with a random offset
• Systematic → Similar to stratified resampling, offset
is fixed
Proposed Idea
Background
• Complexity of multinomial resampling can be reduced
from M2
to approximately 2M by generating ordered
random numbers → High hardware cost
• Accumulation and normalization will provide an
intrinsic ordering, reducing the hardware cost
• The comparison, needed for replication and
discarding particles, can be formulated as:
WK
WM
⋚
UK
UM
(1)
where WK = K
j=0 wj and UK = K
j=0 uj, are the running
sums, WM and UM are the cumulative sum.
Division Free Architecture
Reformulation of (1) gives:
WK × UM
W ′
K
⋚ UK × WM
U′
K
(2)
• No normalization required
• Equally efficient for non-powers-of-two M
• Independent of generating ordered random numbers
• Can be used for stratified and systematic resampling
with appropriate random number generators
REG
REG
Weight
Memory
Accumulator
Memory
Random
value
Accumulator
From Control
Unit
From Control
Unit
uK
wK
WM
Bw + log2 M
Br + log2 M
UM
U′
K
W ′
K
Figure 4: Memory and data generation for resampling with
stored cumulative sum.
Memory Optimization
• Storing cumulative sum of data increases the word
length requirement for the two memories
• Can be reduced from Bw + log2 M and Br + log2 M to
Bw and Br respectively by on-the-fly accumulation
• Accumulators are placed after each memory
• Extra hardware cost of multiplexer and associated
control logic
Weight
Memory
REG
Accumulator
Memory
Random
value
REG
Accumulator
Unit
Unit
From Control
From Control
uK
wK
WMUM
Bw
Br
U′
K
W ′
K
Figure 5: Memory and data generation for resampling with
on-the-fly cumulative sum.
Results
Complexity – Standard Cells
Table 1: Complexity, in terms of Area (mm2
),
of architectures based on stored and online sum.
Particle
count
Bit
growth
Stored
sum
Online
sum
Savings
(%)
10 4 0.022 0.014 36.36
20 5 0.035 0.019 45.71
100 7 0.112 0.088 21.43
128 7 0.114 0.088 22.81
200 8 0.220 0.153 30.45
256 8 0.220 0.154 30.00
512 9 0.441 0.291 34.01
1000 10 0.833 0.550 33.97
1024 10 0.857 0.555 35.24
2000 11 1.703 1.103 35.23
2048 11 1.731 1.105 36.16
Complexity – FPGA
Number of particles, M
512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k
NumberofLUTs
0
50
100
150
200
250
300
350
400
450
500
Stored
Online
Figure 6: Look-up table used by architectures based on
stored and online sum.
Number of particles, M
512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k
Numberof36kBRAM
0
10
20
30
40
50
60
70
Stored
Online
Figure 7: Memory used by architectures based on stored
and online sum.
Summary
• Proposed a generalized division free architecture for
the resampling stage
• Independent of the non-powers-of-two number of
particles, normalization and generation of ordered
random numbers
• Achieved by use of double multipliers and
accumulators
• Memory optimization results in reduction of area and
memory usage up to 45% and 50% respectively
• Achieved by on-the-fly accumulation of particle
weights and random numbers
• Each memory holds the original particle weight and
random number
• Reduces the word length required for each memory

More Related Content

What's hot

Objects as points
Objects as pointsObjects as points
Objects as points
DADAJONJURAKUZIEV
 
Mmclass5
Mmclass5Mmclass5
Mmclass5
Hassan Dar
 
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUAcceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Mahesh Khadatare
 
Visualizing and understanding convolutional networks(2014)
Visualizing and understanding convolutional networks(2014)Visualizing and understanding convolutional networks(2014)
Visualizing and understanding convolutional networks(2014)
WoochulShin10
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
Cnn
CnnCnn
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)
RAHUL BHOJWANI
 
Summary of survey papers on deep learning method to 3D data
Summary of survey papers on deep learning method to 3D dataSummary of survey papers on deep learning method to 3D data
Summary of survey papers on deep learning method to 3D data
Arithmer Inc.
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
Dongmin Choi
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
Shunta Saito
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Dongmin Choi
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
Dongmin Choi
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computing
RAHUL BHOJWANI
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Sunghoon Joo
 
Squeeeze models
Squeeeze modelsSqueeeze models
Squeeeze models
Dong Heon Cho
 
Yol ov2
Yol ov2Yol ov2
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Dongmin Choi
 
Non-parametric probability distribution for fitting data
Non-parametric probability distribution for fitting dataNon-parametric probability distribution for fitting data
Non-parametric probability distribution for fitting data
Nikhil Chandra Sarkar
 

What's hot (20)

Objects as points
Objects as pointsObjects as points
Objects as points
 
Mmclass5
Mmclass5Mmclass5
Mmclass5
 
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUAcceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
 
Visualizing and understanding convolutional networks(2014)
Visualizing and understanding convolutional networks(2014)Visualizing and understanding convolutional networks(2014)
Visualizing and understanding convolutional networks(2014)
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Cnn
CnnCnn
Cnn
 
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)
 
Summary of survey papers on deep learning method to 3D data
Summary of survey papers on deep learning method to 3D dataSummary of survey papers on deep learning method to 3D data
Summary of survey papers on deep learning method to 3D data
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computing
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
Squeeeze models
Squeeeze modelsSqueeeze models
Squeeeze models
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Non-parametric probability distribution for fitting data
Non-parametric probability distribution for fitting dataNon-parametric probability distribution for fitting data
Non-parametric probability distribution for fitting data
 

Similar to ES_SAA_OG_PF_ECCTD_Pos

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
James McCombs
 
02-11-2005.ppt
02-11-2005.ppt02-11-2005.ppt
02-11-2005.ppt
GeraudRusselGouneChe1
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
PHM2106-Presentation-Hubbard
PHM2106-Presentation-HubbardPHM2106-Presentation-Hubbard
PHM2106-Presentation-HubbardCharles Hubbard
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
asdq4
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
Aritra Sarkar
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
NECST Lab @ Politecnico di Milano
 
SYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptx
SYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptxSYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptx
SYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptx
SaiDhanushM
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Gurbinder Gill
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
Arun Joseph
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
Dongmin Choi
 
Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...
Tonmoy Ibne Arif
 
5378086.ppt
5378086.ppt5378086.ppt
5378086.ppt
kavita417551
 
287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt
287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt
287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt
DrUrvashiBansal
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
Zahra Sadeghi
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
supratikmondal6
 
An application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic forAn application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic for
Khyati Rajput
 

Similar to ES_SAA_OG_PF_ECCTD_Pos (20)

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
 
02-11-2005.ppt
02-11-2005.ppt02-11-2005.ppt
02-11-2005.ppt
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
PHM2106-Presentation-Hubbard
PHM2106-Presentation-HubbardPHM2106-Presentation-Hubbard
PHM2106-Presentation-Hubbard
 
Energy saving policies final
Energy saving policies finalEnergy saving policies final
Energy saving policies final
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
 
SYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptx
SYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptxSYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptx
SYSTOLIC ARCH IN COMPUTER OPERATING SYSTEM.pptx
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...
 
5378086.ppt
5378086.ppt5378086.ppt
5378086.ppt
 
287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt
287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt
287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
An application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic forAn application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic for
 

ES_SAA_OG_PF_ECCTD_Pos

  • 1. GeneralizedDivision-FreeArchitectureand CompactMemoryStructureforResampling inParticleFilters Syed Asad Alam and Oscar Gustafsson {syed.asad.alam, oscar.gustafsson}@liu.se Department of Electrical Engineering, Linköping University, Sweden Aims and Objectives • The most basic form of resampling algorithm in a particle filter has a high hardware cost • Requires normalized and ordered data set for implementation → Multinomial resampling • Alternative algorithms used to avoid multinomial → Stratified, systematic resampling • Aim – Architecture for multinomial resampling free from the need of ordering and normalization – Memory optimization for the weights and random function Particle Filters • Model based filtering – State transition and observation models are non-linear and noise non-gaussian • Purpose → Estimation of a state from a set of observations corrupted by noise • Applications → Target tracking, computer vision, channel estimation . . . • Steps → Time-update, weight computation and resampling estimate ResamplingTime update computation Weight Input observations xn yn wn ˜xn, ˜wn Output Figure 1: Overall structure of particle filter. Comparator Memory Weight Normalized Replicated− Factors Replicated/ Memory Discarded/ Normalization and Cumulative Sum Unit Generation Number Random Particle Memoryand Time Update Sample (i) Time Update Unit Unit Control Unit (iii) Resampling (ii) Weight Memory and Random Number Generation Memory address wK W ′ K U′ K Figure 2: Basic architecture of particle filter. Resampling Resampling prevents degeneration of particles and improves estimation by eliminating and replicating particles having low and high particle weights respectively. The total number of particles, M, remains the same. 0 1 Systematic Stratified Multinomial Figure 3: Uniformly distributed samples for M = 10. Standard algorithms for resampling available in the literature • Multinomial → Uniform random numbers – U[0, 1) • Stratified → Partition U[0, 1) into M regions, one sample from each interval with a random offset • Systematic → Similar to stratified resampling, offset is fixed Proposed Idea Background • Complexity of multinomial resampling can be reduced from M2 to approximately 2M by generating ordered random numbers → High hardware cost • Accumulation and normalization will provide an intrinsic ordering, reducing the hardware cost • The comparison, needed for replication and discarding particles, can be formulated as: WK WM ⋚ UK UM (1) where WK = K j=0 wj and UK = K j=0 uj, are the running sums, WM and UM are the cumulative sum. Division Free Architecture Reformulation of (1) gives: WK × UM W ′ K ⋚ UK × WM U′ K (2) • No normalization required • Equally efficient for non-powers-of-two M • Independent of generating ordered random numbers • Can be used for stratified and systematic resampling with appropriate random number generators REG REG Weight Memory Accumulator Memory Random value Accumulator From Control Unit From Control Unit uK wK WM Bw + log2 M Br + log2 M UM U′ K W ′ K Figure 4: Memory and data generation for resampling with stored cumulative sum. Memory Optimization • Storing cumulative sum of data increases the word length requirement for the two memories • Can be reduced from Bw + log2 M and Br + log2 M to Bw and Br respectively by on-the-fly accumulation • Accumulators are placed after each memory • Extra hardware cost of multiplexer and associated control logic Weight Memory REG Accumulator Memory Random value REG Accumulator Unit Unit From Control From Control uK wK WMUM Bw Br U′ K W ′ K Figure 5: Memory and data generation for resampling with on-the-fly cumulative sum. Results Complexity – Standard Cells Table 1: Complexity, in terms of Area (mm2 ), of architectures based on stored and online sum. Particle count Bit growth Stored sum Online sum Savings (%) 10 4 0.022 0.014 36.36 20 5 0.035 0.019 45.71 100 7 0.112 0.088 21.43 128 7 0.114 0.088 22.81 200 8 0.220 0.153 30.45 256 8 0.220 0.154 30.00 512 9 0.441 0.291 34.01 1000 10 0.833 0.550 33.97 1024 10 0.857 0.555 35.24 2000 11 1.703 1.103 35.23 2048 11 1.731 1.105 36.16 Complexity – FPGA Number of particles, M 512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k NumberofLUTs 0 50 100 150 200 250 300 350 400 450 500 Stored Online Figure 6: Look-up table used by architectures based on stored and online sum. Number of particles, M 512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k Numberof36kBRAM 0 10 20 30 40 50 60 70 Stored Online Figure 7: Memory used by architectures based on stored and online sum. Summary • Proposed a generalized division free architecture for the resampling stage • Independent of the non-powers-of-two number of particles, normalization and generation of ordered random numbers • Achieved by use of double multipliers and accumulators • Memory optimization results in reduction of area and memory usage up to 45% and 50% respectively • Achieved by on-the-fly accumulation of particle weights and random numbers • Each memory holds the original particle weight and random number • Reduces the word length required for each memory