Your SlideShare is downloading. ×
  • Like
Using Many-Core Processors to Improve the Performance of Space Computing Platforms
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Using Many-Core Processors to Improve the Performance of Space Computing Platforms


IEEE Aerospace Conference 2011

IEEE Aerospace Conference 2011

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Faculty of Informatics Chair of Computer Architectures Fisnik Kraja Fi ik K j Phd Candidate2011 IEEE Aerospace Conference, 5-12 March 2011, Big Sky, Montana
  • 2. • Subject: New computing architecture for future satellites.• Purpose: To introduce many-core and other COTS technologies in the design process.• Main points will be: – State f th St t of the art of space applications and computing platforms t f li ti d ti l tf – Proposed system architecture – Performance Estimations (Benchmarking) – Discussions and conclusions3/12/2011 2
  • 3. • On-board computers offer minimal functionality.• Constrains like power , size , heat• High-reliability requirements, because of radiation effects: – Total Ionizing Dose (TID) – Single Event Upset (SEU) – Single Event Transient (SET) – Single Event Latch up (SEL) Latch-up• New space applications ask for improved on-board processing abilities in terms of abilities, – high processing power and throughput – without losing the required reliability.3/12/2011 3
  • 4. • HRWS SAR (High resolution wide swath synthetic aperture radar). • Used to reduce the amount of data to be transmitted to ground • Uses separate apertures to transmit and receive • Uses multiply phase centers in receive • Each panel represents an independent phase center • 7 Panels are used, each consisting of 12 tiles 3/12/2011 4
  • 5. Parallelism of the algorithm: • 7 independent panel processing • 12x7=84 independent tile processing Requirements: 1 Tera 16-bit fixed point Ops/s 16 bit (complex multiply and add) Peak sample rate : 8Gbps Full t F ll antenna average raw data d t rate 603.1 Gbps3/12/2011 It is impossible to fulfill these requirements 5 with currently available technology for space.
  • 6. • To efficiently apply the upcoming many-core processors and other COTS products to improve the on-board processing power. i• Reliability of the system should be addressed by: – traditional hardware techniques (TMR) – software-implemented fault-tolerant techniques • Thread/process/service replication• This system should provide other important features: – flexibility, – scalability l bilit – portability.3/12/2011 6
  • 7. 3/12/2011 7
  • 8. I/O RHPU Memory Memory Memory Reliable Local Bus Bus interfacing3/12/2011 8
  • 9. • Solution to the tradeoff between performance and reliability might be the rotating consistency check, in which only some processes are replicated and results checked for consistency at a time, but over a longer period all of them get verified. 3/12/2011 9
  • 10. Why SSCA#3? • Computationally taxing • Large block data transfers L bl k d t t f • Stressful memory access patterns • Scalable to mimic different problem sizes 1. Synthetic Data Generation stage is used to produce raw SAR data approximates, which are similar to what would be obtained from a real SAR system. f l t 2. SAR Sensor Processing stage reconstructs a SAR image using a wavefront spotlight SAR reconstruction method known as 2D F i M t h d Filt i and I t Fourier Matched Filtering d Interpolation. l ti3/12/2011 10
  • 11. SDG:  Kernel 1:Synthetic SAR returns  Reconstructed  SAR imagefrom a uniform grid of point reflectors 3/12/2011 11
  • 12. The symmetric SMA (UMA) The distributed SMA (NUMA)– 1 Nehalem CPU: Intel Core i7 CPU 920 − 2 Nehalem CPUs: Intel Xeon CPU X5670,– 2.67 GHz Frequency − 2.93 GHz processor frequency– 8 MB L3 Smart Cache − 12 MB L3 Smart Cache– 4 Cores 4 Cores (8 Threads in Hyper threading) Hyper-threading) − 6 Cores/CPU– 130 W power consumption − 95 W power consumption– 24 Gigabytes of DDR3 RAM − 36(18x2) Gigabytes of DDR3 RAM– 4.8 Giga Transfers/s QPI g − 6.4 Giga Transfers/s QPI g 3/12/2011 12
  • 13. UMA-SMA NUMA-SMAarchitectures offer flexibility but architectures avoid bottleneck they tend to have memory y y problems in memories, but require p q bottlenecks. manual/pinned allocation of memory for each thread.3/12/2011 13
  • 14. Sequential FFT Multithreaded FFT Parallelized Loops with OpenMP Tiling Technique Threaded FFT using OpenMP GOMP_CPU_AFFINITY =” 0-11” More Private Variables3/12/2011 14
  • 15. Most important optimizations: • Thread Pinning (first touch policy of memory) • Private Data (stack, local)/Shared Data(remote cached, evicted) (stack Data(remote, cached • Scheduling Static for loops with regular workloads Dynamic for loops with non regular onesOutlook • The SAR data generation and image formation are scalable to • 4 cores i UMA (U ifi d M in (Unified Memory A Access) ) • 12 cores in NUMA-2x[6Cores, 16GB RAM] • Speedup is almost linear in these SMA architectures • This code is expected to scale to bigger numbers of cores • Further parallelization paradigms are planed: • MPI(Message Passing Interface) for clusters • CUDA f GPGPUs for GPGPU3/12/2011 15
  • 16. By combining many-core processors and other COTS products with radiation-hardened specific components one can benefit: • A speedup by a factor of 10 to 100 • Improved reliability and robustness of the system. • Efficient and faster application development via already familiar programming models. • Ability to port applications directly to the space environment. • Minimization f the Mi i i ti of th non-recurring d i development ti l t time and costs f d t for future missions. • Efficient, flexible and portable software fault-tolerance techniques that can be applied in the space environment environment. • Portability to future advances in technology.3/12/2011 16
  • 17. Thank you for your attention! Fisnik Kraja LRR - L h t hl fü R h t h ik und R h Lehrstuhl für Rechnertechnik d Rechnerorganisation i ti Technische Universität München j @3/12/2011 17