Faculty of Informatics                   Chair of Computer Architectures                            Fisnik Kraja          ...
• Subject: New computing architecture for future satellites.• Purpose: To introduce many-core and other COTS  technologies...
• On-board computers offer minimal functionality.• Constrains like power , size , heat• High-reliability requirements, bec...
• HRWS SAR  (High resolution wide swath synthetic aperture radar).   •   Used to reduce the amount of data to be transmitt...
Parallelism of the algorithm:                                    • 7 independent panel processing                         ...
• To efficiently apply the upcoming many-core processors  and other COTS products to improve the on-board  processing powe...
3/12/2011   7
I/O   RHPU                                                Memory                                               Memory     ...
•   Solution to the tradeoff between performance and reliability might be the    rotating consistency check, in which only...
Why SSCA#3?      •      Computationally taxing      •      Large block data transfers             L     bl k d t t       f...
SDG:                           Kernel 1:Synthetic SAR returns                           Reconstructed  SAR imagefrom a uni...
The symmetric SMA (UMA)                      The distributed SMA (NUMA)–   1 Nehalem CPU: Intel Core i7 CPU 920     −   2 ...
UMA-SMA                                NUMA-SMAarchitectures offer flexibility but      architectures avoid bottleneck  th...
Sequential FFT   Multithreaded FFT            Parallelized Loops with OpenMP     Tiling Technique               Threaded F...
Most important optimizations:    • Thread Pinning (first touch policy of memory)    • Private Data (stack, local)/Shared D...
By combining many-core processors and other COTS     products with radiation-hardened specific components     one can bene...
Thank you for your attention!                                  Fisnik Kraja            LRR - L h t hl fü R h t h ik und R ...
Upcoming SlideShare
Loading in …5
×

Using Many-Core Processors to Improve the Performance of Space Computing Platforms

570 views

Published on

IEEE Aerospace Conference 2011

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
570
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Using Many-Core Processors to Improve the Performance of Space Computing Platforms

  1. 1. Faculty of Informatics Chair of Computer Architectures Fisnik Kraja Fi ik K j Phd Candidate2011 IEEE Aerospace Conference, 5-12 March 2011, Big Sky, Montana
  2. 2. • Subject: New computing architecture for future satellites.• Purpose: To introduce many-core and other COTS technologies in the design process.• Main points will be: – State f th St t of the art of space applications and computing platforms t f li ti d ti l tf – Proposed system architecture – Performance Estimations (Benchmarking) – Discussions and conclusions3/12/2011 2
  3. 3. • On-board computers offer minimal functionality.• Constrains like power , size , heat• High-reliability requirements, because of radiation effects: – Total Ionizing Dose (TID) – Single Event Upset (SEU) – Single Event Transient (SET) – Single Event Latch up (SEL) Latch-up• New space applications ask for improved on-board processing abilities in terms of abilities, – high processing power and throughput – without losing the required reliability.3/12/2011 3
  4. 4. • HRWS SAR (High resolution wide swath synthetic aperture radar). • Used to reduce the amount of data to be transmitted to ground • Uses separate apertures to transmit and receive • Uses multiply phase centers in receive • Each panel represents an independent phase center • 7 Panels are used, each consisting of 12 tiles 3/12/2011 4
  5. 5. Parallelism of the algorithm: • 7 independent panel processing • 12x7=84 independent tile processing Requirements: 1 Tera 16-bit fixed point Ops/s 16 bit (complex multiply and add) Peak sample rate : 8Gbps Full t F ll antenna average raw data d t rate 603.1 Gbps3/12/2011 It is impossible to fulfill these requirements 5 with currently available technology for space.
  6. 6. • To efficiently apply the upcoming many-core processors and other COTS products to improve the on-board processing power. i• Reliability of the system should be addressed by: – traditional hardware techniques (TMR) – software-implemented fault-tolerant techniques • Thread/process/service replication• This system should provide other important features: – flexibility, – scalability l bilit – portability.3/12/2011 6
  7. 7. 3/12/2011 7
  8. 8. I/O RHPU Memory Memory Memory Reliable Local Bus Bus interfacing3/12/2011 8
  9. 9. • Solution to the tradeoff between performance and reliability might be the rotating consistency check, in which only some processes are replicated and results checked for consistency at a time, but over a longer period all of them get verified. 3/12/2011 9
  10. 10. Why SSCA#3? • Computationally taxing • Large block data transfers L bl k d t t f • Stressful memory access patterns • Scalable to mimic different problem sizes 1. Synthetic Data Generation stage is used to produce raw SAR data approximates, which are similar to what would be obtained from a real SAR system. f l t 2. SAR Sensor Processing stage reconstructs a SAR image using a wavefront spotlight SAR reconstruction method known as 2D F i M t h d Filt i and I t Fourier Matched Filtering d Interpolation. l ti3/12/2011 10
  11. 11. SDG:  Kernel 1:Synthetic SAR returns  Reconstructed  SAR imagefrom a uniform grid of point reflectors 3/12/2011 11
  12. 12. The symmetric SMA (UMA) The distributed SMA (NUMA)– 1 Nehalem CPU: Intel Core i7 CPU 920 − 2 Nehalem CPUs: Intel Xeon CPU X5670,– 2.67 GHz Frequency − 2.93 GHz processor frequency– 8 MB L3 Smart Cache − 12 MB L3 Smart Cache– 4 Cores 4 Cores (8 Threads in Hyper threading) Hyper-threading) − 6 Cores/CPU– 130 W power consumption − 95 W power consumption– 24 Gigabytes of DDR3 RAM − 36(18x2) Gigabytes of DDR3 RAM– 4.8 Giga Transfers/s QPI g − 6.4 Giga Transfers/s QPI g 3/12/2011 12
  13. 13. UMA-SMA NUMA-SMAarchitectures offer flexibility but architectures avoid bottleneck they tend to have memory y y problems in memories, but require p q bottlenecks. manual/pinned allocation of memory for each thread.3/12/2011 13
  14. 14. Sequential FFT Multithreaded FFT Parallelized Loops with OpenMP Tiling Technique Threaded FFT using OpenMP GOMP_CPU_AFFINITY =” 0-11” More Private Variables3/12/2011 14
  15. 15. Most important optimizations: • Thread Pinning (first touch policy of memory) • Private Data (stack, local)/Shared Data(remote cached, evicted) (stack Data(remote, cached • Scheduling Static for loops with regular workloads Dynamic for loops with non regular onesOutlook • The SAR data generation and image formation are scalable to • 4 cores i UMA (U ifi d M in (Unified Memory A Access) ) • 12 cores in NUMA-2x[6Cores, 16GB RAM] • Speedup is almost linear in these SMA architectures • This code is expected to scale to bigger numbers of cores • Further parallelization paradigms are planed: • MPI(Message Passing Interface) for clusters • CUDA f GPGPUs for GPGPU3/12/2011 15
  16. 16. By combining many-core processors and other COTS products with radiation-hardened specific components one can benefit: • A speedup by a factor of 10 to 100 • Improved reliability and robustness of the system. • Efficient and faster application development via already familiar programming models. • Ability to port applications directly to the space environment. • Minimization f the Mi i i ti of th non-recurring d i development ti l t time and costs f d t for future missions. • Efficient, flexible and portable software fault-tolerance techniques that can be applied in the space environment environment. • Portability to future advances in technology.3/12/2011 16
  17. 17. Thank you for your attention! Fisnik Kraja LRR - L h t hl fü R h t h ik und R h Lehrstuhl für Rechnertechnik d Rechnerorganisation i ti Technische Universität München kraja@in.tum.de j @3/12/2011 17

×