1. Space, Time, Power: Evolving Concerns for Parallel Algorithms February 2008
2. Real and Abstract Parallel Systems • Space: where are the processors located? • Time: how does location affect the time of algorithms? • Power: what happens when power is a constraint?
3. Some Real Systems: IBM BlueGene/L 212,992 CPUs 478 Tflops #1 supercomputer since 11/04 At Lawrence Livermore Nat’l Lab ≈ $200 Million 3-d toroidal interconnect Max distance (# proc)1/3
4. Another Real System: ZebraNet PI M M a r t o n o s i
5. Location, Location, Location • Processors may only be able to communicate with nearby processors • or, time to communicate is a function of distance • or, many processors trying to communicate to ones far away can create communication bottleneck • Feasible, efficient programs need to take location into account
6. What if Space is actually Computers? Cellular Automata • Finite automata, next state depends on current state and neighbors’ states: location matters! • ≈ 1950 von Neumann used as a model of parallelism and interaction in space • Other research: Burks & al. at UM, Conway, Wolfram,… • Can model leaf growth, traffic flow, etc.
7. Parallel Algorithms: Time Maze of black/white pixels, one per processor in CA. Can I get out? Nature-like propagation algorithm: time linear in area Beyer, Levialdi ≈ 1970: time linear in edgelength. CA as parallel computer, not just nature simulator
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Ramoni Adeogun, PhD
This slide set present our work on channel modelling for multi-room complex environments. The content has also been presented at and publish in IEEE PIMRC, 2018. The paper titled Transfer Function Computation for Complex Indoor Channels Using Propagation Graphs can be downloaded here: https://vbn.aau.dk/en/publications/transfer-function-computation-for-complex-indoor-channels-using-p
TLDR (Twin Learning for Dimensionality Reduction) is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self-supervised learning losses.
1. Space, Time, Power: Evolving Concerns for Parallel Algorithms February 2008
2. Real and Abstract Parallel Systems • Space: where are the processors located? • Time: how does location affect the time of algorithms? • Power: what happens when power is a constraint?
3. Some Real Systems: IBM BlueGene/L 212,992 CPUs 478 Tflops #1 supercomputer since 11/04 At Lawrence Livermore Nat’l Lab ≈ $200 Million 3-d toroidal interconnect Max distance (# proc)1/3
4. Another Real System: ZebraNet PI M M a r t o n o s i
5. Location, Location, Location • Processors may only be able to communicate with nearby processors • or, time to communicate is a function of distance • or, many processors trying to communicate to ones far away can create communication bottleneck • Feasible, efficient programs need to take location into account
6. What if Space is actually Computers? Cellular Automata • Finite automata, next state depends on current state and neighbors’ states: location matters! • ≈ 1950 von Neumann used as a model of parallelism and interaction in space • Other research: Burks & al. at UM, Conway, Wolfram,… • Can model leaf growth, traffic flow, etc.
7. Parallel Algorithms: Time Maze of black/white pixels, one per processor in CA. Can I get out? Nature-like propagation algorithm: time linear in area Beyer, Levialdi ≈ 1970: time linear in edgelength. CA as parallel computer, not just nature simulator
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Ramoni Adeogun, PhD
This slide set present our work on channel modelling for multi-room complex environments. The content has also been presented at and publish in IEEE PIMRC, 2018. The paper titled Transfer Function Computation for Complex Indoor Channels Using Propagation Graphs can be downloaded here: https://vbn.aau.dk/en/publications/transfer-function-computation-for-complex-indoor-channels-using-p
TLDR (Twin Learning for Dimensionality Reduction) is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self-supervised learning losses.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Presentation on
S.M. LaValle and J.J Kuffner. Rapidly-exploring random trees: Progress and prospects. In Robotics: The Algorithmic Perspective. 4th Int. Workshop on the Algorithmic Foundations of Robotics., Hanover, NH, 2000. A. K. Peters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
MIMO System Performance Evaluation for High Data Rate Wireless Networks usin...IJMER
Space–time block coding is used for data communication in fading channels by multiple
transmit antennas. Message data is encoded by applying a space–time block code and after the encoding
the data is break into ‘n’ streams of simultaneously transmitted strings through n transmit antennas. The
received signal at the receiver end is the superposition of the n transmitted signals distorted due to noise
.For data recovery maximum likelihood decoding scheme is applied through decoupling of the signals
transmitted from different antennas instead of joint detection. The maximum likelihood decoding scheme
applies the orthogonal structure of the space–time block code (OSTBC) and gives a maximum-likelihood
decoding algorithm based on linear processing at the receiver. In this paper orthogonal space–time
block codes based model is developed using Matlab/Simulink to get the maximum diversity order for a
given number of transmit and receive antennas subject with a simple decoding algorithm.
The simulink block of orthogonal space coding block with space–time block codes is applied with and
without gray coding. The OSTBC codes gives the maximum possible transmission rate for any number of
transmit antennas using any arbitrary real constellation such of M-PSK array. For different complex
constellation of M- PSK space–time block codes are applied that achieve 1/2 and 3/4 of the maximum
possible transmission rate for MIMO transmit antennas using different complex constellations.
Here, the power analysys for different point ffts is done, analysed and the results are also placed. these results are tested in the tool. though it may not cover the entire concept it covers the necessary things required.
happy reading
Hardware Architecture of Complex K-best MIMO DecoderCSCJournals
This paper presents a hardware architecture of complex K-best Multiple Input Multiple Output (MIMO) decoder reducing the complexity of Maximum Likelihood (ML) detector. We develop a novel low-power VLSI design of complex K-best decoder for MIMO and 64 QAM modulation scheme. Use of Schnorr-Euchner (SE) enumeration and a new parameter, Rlimit in the design reduce the complexity of calculating K-best nodes to a certain level with increased performance. The total word length of only 16 bits has been adopted for the hardware design limiting the bit error rate (BER) degradation to 0.3 dB with list size, K and Rlimit equal to 4. The proposed VLSI architecture is modeled in Verilog HDL using Xilinx and synthesized using Synopsys Design Vision in 45 nm CMOS technology. According to the synthesize result, it achieves 1090.8 Mbps throughput with power consumption of 782 mW and latency of 0.33 us. The maximum frequency the design proposed is 181.8 MHz.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Presentation on
S.M. LaValle and J.J Kuffner. Rapidly-exploring random trees: Progress and prospects. In Robotics: The Algorithmic Perspective. 4th Int. Workshop on the Algorithmic Foundations of Robotics., Hanover, NH, 2000. A. K. Peters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
MIMO System Performance Evaluation for High Data Rate Wireless Networks usin...IJMER
Space–time block coding is used for data communication in fading channels by multiple
transmit antennas. Message data is encoded by applying a space–time block code and after the encoding
the data is break into ‘n’ streams of simultaneously transmitted strings through n transmit antennas. The
received signal at the receiver end is the superposition of the n transmitted signals distorted due to noise
.For data recovery maximum likelihood decoding scheme is applied through decoupling of the signals
transmitted from different antennas instead of joint detection. The maximum likelihood decoding scheme
applies the orthogonal structure of the space–time block code (OSTBC) and gives a maximum-likelihood
decoding algorithm based on linear processing at the receiver. In this paper orthogonal space–time
block codes based model is developed using Matlab/Simulink to get the maximum diversity order for a
given number of transmit and receive antennas subject with a simple decoding algorithm.
The simulink block of orthogonal space coding block with space–time block codes is applied with and
without gray coding. The OSTBC codes gives the maximum possible transmission rate for any number of
transmit antennas using any arbitrary real constellation such of M-PSK array. For different complex
constellation of M- PSK space–time block codes are applied that achieve 1/2 and 3/4 of the maximum
possible transmission rate for MIMO transmit antennas using different complex constellations.
Here, the power analysys for different point ffts is done, analysed and the results are also placed. these results are tested in the tool. though it may not cover the entire concept it covers the necessary things required.
happy reading
Hardware Architecture of Complex K-best MIMO DecoderCSCJournals
This paper presents a hardware architecture of complex K-best Multiple Input Multiple Output (MIMO) decoder reducing the complexity of Maximum Likelihood (ML) detector. We develop a novel low-power VLSI design of complex K-best decoder for MIMO and 64 QAM modulation scheme. Use of Schnorr-Euchner (SE) enumeration and a new parameter, Rlimit in the design reduce the complexity of calculating K-best nodes to a certain level with increased performance. The total word length of only 16 bits has been adopted for the hardware design limiting the bit error rate (BER) degradation to 0.3 dB with list size, K and Rlimit equal to 4. The proposed VLSI architecture is modeled in Verilog HDL using Xilinx and synthesized using Synopsys Design Vision in 45 nm CMOS technology. According to the synthesize result, it achieves 1090.8 Mbps throughput with power consumption of 782 mW and latency of 0.33 us. The maximum frequency the design proposed is 181.8 MHz.
2. 2
Rationale and Innovation
Problem statement
Given a reconfigurable architecture, find an on-chip
position for each functional unit
Innovative contribution: taking into account
Target Device Heterogeneity
Target Device reconfiguation capabilities
Inter-FU Communication
3. 3
Aims
Considering the area assignment problem tailored for
reconfigurable architectures, provide
a formalization of the problem, and
an approach (in 3 algorithms) for solving
6. Reconfigurable Architectures - I
On FPGAs
Reconfigurable Devices
Heterogeneous
Reconfiguration Limits
Different types of
Reconfigurable Architectures:
Total
Partial (Static)
Partial (Dynamic)
6
10. Area Assignment Problem
Let consider a Reconfigurable Architecture
Given a scheduled task graph (TG) of the application
Node: Reconfigurable Functional Unit (RFU) [*],
A netlist obtained after post synthesis and technology
mapping (i.e., before placement and routing)
Aim: find an area assignment for each RFU
10
[*] K. Bazargan, R. Kastner, M.S.: 3-d floorplanning: Simulated annealing and greedy placement methods for
reconfigurable computing systems. IEEE Rapid Systems Prototyping (1999)
11. Related Works - I
[*] introduced the concept of 3D floorplanning for
reconfigurable systems
SA in order to solve HW/SW codesign problem
For each task choose between
HW implementation
SW implementation
Limits
No device limits
considered
No communication
infrastructure
11
[*] K. Bazargan, R. Kastner, M.S.: 3-d floorplanning: Simulated annealing and greedy placement methods for
reconfigurable computing systems. IEEE Rapid Systems Prototyping (1999)
12. Related Works - II
[*] is the state of art in 3D floorplanning
Simulated Annealing over Transitive Closure Graph
Takes into account device reconfiguration limits
Limits
No heterogeneity considered
High overhead communication
infrastructure solution [**]
12
[*] Ping-Hung Yuh, Chia-Lin Yang, Yao-Wen Chang, Hsin-Lung Chen: Temporal Floorplanning Using 3D-subTCG,
Design Automation Conference, 2004
[**] S. P. Fekete, E. Kohler, and J. Teich: Optimal FPGA Module Placement with Temporal Precedence
Constraints, Proc. DATE, 2001.
14. Floorplanning vs. Placement
Characteristic Floorplanning Placement
# items <100 >10.000
Items (for FPGAs) IP-Core Slice, CLB
Aim Find a position for each item
obj. function depends mainly on Area mainly on Wirelength
Constraints Items can be positioned
everywhere
There is a set of
possible positions
14
Placement
Floor plan
15. Floorplacement - I
Hierarchical Approach (Floorplanning + Partitioning)
15
S. N. Adya, I. L. Markov, Fixed-outline Floorplanning: Enabling Hierarchical Design, IEEE Transaction on VLSI
System, 2002
S. N. Adya, S. Chaturvedi, J. A. Roy, David A. Papa, I. L. Markov: Unification of Partitioning, Placement and
Floorplanning , IEEE Intl. Conf. on CAD, 2004
16. Floorplacement - II
Reconfigurable Functional Unit (RFU)
A netlist obtained after post synthesis and technology
mapping (i.e., before placement and routing)
Reconfigurable Region (RR)
16
17. Floorplacement - III
Resource Aware (i.e., not all positions are feasible)
Device heterogeneity
Device Reconfiguration
capabilities
17
19. Proposed Problem Definition
19
Aim
Define RRs
For each task find
find a RR
A position inside RR
Objective Function
Min. Fragmentation
Constraints
Communication issues
Device limits
20. Target Devices: Xilinx Virtex 4 - 5
Target architecture based on EAPR design flow
Target Architecture and Devices
20
23. 1st Algorithm: Partitioning into RR
Aim: identify the RRs and associate each RFU to one RR
How: partitioning the TG minimizing resource
requirement variance of the RRs (moving and swapping
nodes)
23
Resource of type t
required by RFU n,
at static photo p
24. 2nd Algorithm:TFiRR - I
24
Temporal Floorplacement inside RR (TFiRR)
Aim: for each RR find a set of feasible width-height pairs
How: floorplacing RFUs inside corresponding RR
Assumption: RFUs’ height = height of the RR they belong to
Pseudo Code:
31. 3rd Algorithm: RR floorplacement - I
Simulated Annealing
Objective Function
Data Structure
4 Constraint Lists (one per row)
31
32. 3rd Algorithm: RR floorplacement - II
Simulated Annealing: moves
Swap two RRs
Move one RRs
Span over one more row
Un-Span over one less row
After each move packing is performed
(i.e., the floorplacement is compressed)
32
39. Partitioning’s impact on TFiRR
TFiRR on
Partitioned TG
TFiRR on TG
Execution Time 125ms 114ms 4m54s
Width
(normalized)
1.00 1.19 1.04
39
Increasing the number of RFUs decreases the possibility
to pick up the right one
Partitioning is a precondition of the 3rd algorithm in order
to better exploit FPGA’s area (2D Floorplacement)
40. Tests performed directly floorplacing RFUs
Execution time about 100 ms (100K iterations)
Floorplacement – Success Rate
40
43. State of the art
43
Authors Comm.
Infrastructure
Resource
Aware
Reconfiguration
Aware
Device Limits
Aware
Bazargan et al. No No Yes No
Yuh et al. Limited, w/
High Overhead
No Yes Yes
Singhal et al. No No Yes No
Feng et al. No Yes No No
44. Notes
The comparsion is performed with respect to the
description given by Yuh et al in [*]
Yuh’s approach does not support
Multiple Resources
Existence of a Static side
In order perform the comparison
the case study has been chosen in order to avoid
multiple resource limitation
Yuh’s approach has been extended to support a static
side
44
[*] Ping-Hung Yuh, Chia-Lin Yang, Yao-Wen Chang, Hsin-Lung Chen: Temporal Floorplanning Using 3D-subTCG,
Design Automation Conference, 2004
45. The Case Study
A Reconfigurable Architecture (for Biomedical Purpose)
on XC5VLX30T
1. Collecting data from sensor
2. Elaborating them
3. Sending to a host computer thorough the net
45
49. Conclusions - I
An algorithm for the identification of area constraint
for reconfigurable architectures has been introduced
Novelties: taking into account
Target device heterogeneity
Target device reconfiguration capabilities
Communication issues
49
50. Conclusions - II
Results have been published
A. Montone, M.D. Santambrogio, D. Sciuto,
A Design Workflow for the Identification of Area
Constraints in Dynamic Reconfigurable Systems,
IEEE International Symposium on Electronic Design, Test
and Applications (DELTA), 2008
A. Montone, M.D. Santambrogio,
Area Constraint Evaluation for FPGAs,
The Syndicated Q1-2008, A technical newsletter for
FPGA, ASIC Verification and DSP Designers,
Synplicity Incorporation
Under revision:
A Reconfiguration-aware Floorplacer for FPGAs,
IEEE Field Programmable Logic (FPL), 2008
50
51. Future Works
Take into consideration IOBs and inter modules
communications
Partitioning considering clock regions
51