1) The document discusses aggregating parallel computing techniques and hardware/software co-design to implement high-performance remote sensing applications in real-time.
2) A methodology is proposed that applies parallelization techniques to remote sensing algorithms and maps computational tasks to super-systolic array co-processor architectures through hardware/software co-design.
3) Case studies demonstrate applying techniques like loop optimization, tiling, and space-time mapping to the matrix vector multiplication algorithm and implementing the results in FPGA and VLSI platforms.
ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA
1. 1 Aggregation of Parallel Computing and Hardware/Software Co-Design Techniques for High-Performance Remote Sensing Applications Presenter: Dr. Alejandro Castillo Atoche 2011/07/25 IGARSS’11 School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
2.
3.
4. 3 Introduction: Radar Imagery, Facts The advanced high resolution operations of remote sensing (RS) are computationally complex. The recently development remote sensing (RS) image reconstruction/ enhancement techniques are definitively unacceptable for a (near) real time implementation. In previous works, the algorithms were implemented in conventional simulations in Personal Computers (normally MATLAB), in Digital Signal Processing (DSP) platforms or in Clusters of PCs. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
5. 4 Introduction: HW/SW co-design, Facts Why Hardware/software (HW/SW) co-design? The HW/SW co-design is a hybrid method aimed to increase the flexibility of the implementations and improvement of the overall design process. Why Systolic Arrays? Extremely fast. Easily scalable architecture. Why Parallel Techniques? Optimize and improve the performance of the loops that generally take most of the time in RS algorithms. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
6. 5 MOTIVATION First, novel RS imaging applications require now a response in (near) real time in areas such as: target detection for military purpose, tracking wildfires, and monitoring oil spills, etc. Also, in previous works, virtual remote sensing laboratories had been developed. Now, we are intended to design efficient HW architectures pursuing the real time mode. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
7. 6 CONTRIBUTIONS: First, the application of parallel computing techniques using loop optimization transformations generates efficient super-systolic arrays (SSAs)-based co-processors units of the selected reconstructive SP subtasks. Second, the addressed HW/SW co-design methodology is aimed at an efficient HW implementation of the enhancement/reconstruction regularization methods using the proposed SSA-based co-processor architectures. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
8.
9. 8 HW/SW Co-design: Methodology School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
10. 9 Algorithmic ref. Implementation School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
11. 10 Algorithmic ref. Implementation School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
12. 11 Algorithmic ref. Implementation School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
13. 12 Partitioning Phase School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
19. 14 Aggregation of parallel computing techniques CASE STUDY: Matrix Vector Multiplication The Matrix Vector multiplication operation is described by the following sum: where, a: is the input matrix of dimensions mxn v: is the input vector of dimensions nx1 u: is the results vector of dimensions mx1 i: index variable with range 0 to m School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
20.
21.
22. This operation is called Index Matching. for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) { S(i,j): u[i][0] = u[i][0] + a[i][j]*v[0][j]; } } NOTE:The algorithm has not been changed in any way, the addition of coordinate [0] has no effect with respect to the previous form of the algorithm. Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < n Outputs: U[i] = u[i][0] : 0 <= i < m School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
23.
24.
25.
26. We have identified that in this processor array, it only takes 9 time cycles to run the entire matrix vector multiplication algorithm and that for each time cycle the maximum number of processors being used is 5.
27. If we are only using a maximum of 5 processors, why should we build an array of 25!!?School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
28.
29.
30. Now, if we plot the information in the table into a [t,p] axis, we can see that the polytope defined by this selection table is bounded by the inequations: p>= 0, p>= t-n, p <=t and p<=m in the following relation: lower bound of p: p >= max(0,t-n) upper bound of p: p <= min(m,t) for ALL t
31.
32. t is the time at which a processor in a given coordinate is activated in the transformed algorithm.
33.
34.
35.
36. The bit-level Super-Systolic architecture represents a High-Speed Highly-Pipelined structure than can be implemented as coprocessor unit or inclusive stand-alone VLSI ASIC architecture. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
37. 26 FPGA-based Super-Systolic Architecture School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
38. 27 Bit-level SSA design on a high-speed VLSI architecture School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
39. 28 Bit-level SSA design on a high-speed VLSI architecture The chip was designed using a Standard Cell library in a 0.6µm CMOS process. The resulting integrated circuit core has dimensions of 7.4 mm x 3.5 mm. The total gate count is about 32K using approximately 185K transistors. The 72-pin chip will be packaged in an 80 LD CQFP package. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
40. 29 Performance Analysis: VLSI School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
41. 30 Performance Analysis: FPGA 30 School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
42. 31 Conclusions The principal result of this reported study is the aggregation of parallel computing with regularized RS techniques in Super-Systolic Arrays (SSAs) architectures which are integrated via the HW/SW co-design paradigm in FPGA or VLSI platforms for the real time implementation of RS algorithms. The authors consider that with the bit-level implementation of specialized SSAs of processors in combination with VLSI-FPGA platforms represents an emerging research field for the real-time RS data processing for newer Geospatial applications. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
43.
44. A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Descriptive Regularization-Based Hardware/Software Co-Design for Real-Time Enhanced Imaging in Uncertain Remote Sensing Environment”, EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING (JASP), Edit. HINDAWI, Volume 2010, 31 pages, 2010. ISSN: 1687-6172, e-ISSN: 1687-6180, doi:10.1155/ASP. JCR.
45. Yuriy V. Shkvarko, A. Castillo Atoche, D. Torres, “Near Real Time Enhancement of Geospatial Imagery via Systolic Implementation of Neural Network-Adapted Convex Regularization Techniques”, JOURNAL OF PATTERN RECOGNITION LETTERS, Edit. ELSEVIER, 2011. JCR. In PressSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.
46. 33 Thanks for your attention. Dr. Alejandro Castillo Atoche Email: acastill@uady.mx School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.