Muda Proposal

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Muda Proposal - Presentation Transcript

    1. MUDA MUltiple Data Accelerator language Project Overview Feb 24, 2008 Syoyo FUJITA
    2. ?
    3. Nikkei 225 index
    4. ?
    5. GPU slumps CPU soars Geforce 9800 GX2 rumor 1 TFlops?( 3x of G80) 500 GFlops? (+50% of G80) ? No update ! PS3 Mac Pro octa 179.2 Gflops +800 % 204 Gflops 2007 Feb/2008
    6. Nikkei 225 index
    7. Subprime shock! Nikkei 225 index Credit boom ends! US economy declines! Green IT! Future of GPU trend
    8. Accelerated computing many-core GPGPU CPU GPU
    9. Accelerated computing many-core GPGPU NO! CPU GPU GPGPU was dead!! GPU will be dead soon!!
    10. Why GPU -> GPGPU is BAD • Larger latency : host <-> PCI-ex • Internal architecture is black box • Only GPU maker knows it • Larger cost of branching • Debugger? • Program only runs on specific GPU maker’s GPU • Not portable.
    11. Why CPU -> Accelerated computing is GOOD • Easy to program • CPU maker provides good internal spec documentation • Fast execution of branching • gdb :-) • Portable & Versatile
    12. Accelerated computing many-core MUDA CPU
    13. MUDA’s goal • Withdraw CPU’s maximum floating point performance for large data • SIMD • Cache optimized computation
    14. MUDA example MUDA code vec sqrtmu(vec x) { vec y0, y0x, y0xhalf; vec oneish = bit(0x3f800001); y0 = rsqrt(x); y0x = y0 * x; y0xhalf = 0.5 * y0x; return ((oneish - y0 * y0x) * y0xhalf + y0x); }
    15. __m128 sqrtmu (const __m128 * x) { x86/SSE output __m128 y0 ; __m128 y0x ; __m128 y0xhalf ; const __m128 t_vec4 = (__m128)_mm_set1_epi32( 1065353217) ; __m128 oneish = t_vec4 ; const __m128 t_vec6 = (*x) ; const __m128 t_vec5 = _mm_rsqrt_ps( t_vec6) ; y0 = t_vec5 ; const __m128 t_vec8 = y0 ; const __m128 t_vec9 = (*x) ; const __m128 t_vec7 = _mm_mul_ps( t_vec8 , t_vec9 ) ; y0x = t_vec7 ; const float t_float13 = 0.5 ; const float t_float12 = t_float13 ; const __m128 t_vec10 = _mm_set_ps1( t_float12 ) ; const __m128 t_vec14 = y0x ; const __m128 t_vec11 = _mm_mul_ps( t_vec10 , t_vec14 ) ; y0xhalf = t_vec11 ; const __m128 t_vec19 = oneish ; const __m128 t_vec20 = y0 ; const __m128 t_vec21 = y0x ; const __m128 t_vec15 = _mm_mul_ps( t_vec20 , t_vec21 ) ; const __m128 t_vec16 = _mm_sub_ps( t_vec19 , t_vec15 ) ; const __m128 t_vec22 = y0xhalf ; const __m128 t_vec17 = _mm_mul_ps( t_vec16 , t_vec22 ) ; const __m128 t_vec23 = y0x ; const __m128 t_vec18 = _mm_add_ps( t_vec17 , t_vec23 ) ; return t_vec18 ; }
    16. Why MUDA?
    17. No unified way to describe SIMD op • SSE: _mm_add_ps() • AltiVec: vec_add • SPE: spu_add
    18. CPU ISA changes frequently • SSE2(2000), SSE3(2004), SSE4(2006) • SSE5 and Coming New CPU design(?) • 8-element SIMD?, no SIMD in the future CPU? • Keeping up with them is hard and not productive. Waste of your time.
    19. SSE2 C code SSE4 C code MUDA MUDA compiler VMX C code Portable, CPU independent description LLVM IR CPU or Arch dependent code
    20. Status • SSE2 backend : 75 % • SSE4 backend : 0 % • VMX backend : 20 % • LLVM IR backend : 30 % • SIMD math function for MUDA : 5 % • Automatic optimizer : TODO = I’m currently working on
    21. Future direction • Cache miss analysis and memory access optimization • Valgrind, Cache Miss Equation(CME) • Automatic optimization • Such like FFTW, ATLAS and Spiral are doing • Automatic error measurement for floating point computation • Interval Arithmetic, Affine Arithmetic, Gappa
    22. Performance gap 100 75 Better 50 Scalar:SIMD cache miss:cache hit 25 = = 1:4 1:100 0 SIMD Memory
    23. Performance gap 100 Optimizing memory access is much 75 more important than SIMDization Better 50 Scalar:SIMD cache miss:cache hit 25 = = 1:4 1:100 0 SIMD Memory

    + syoyosyoyo, 2 years ago

    custom

    2188 views, 0 favs, 6 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 2188
      • 1665 on SlideShare
      • 523 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 26
    Most viewed embeds
    • 451 views on http://lucille.atso-net.jp
    • 67 views on http://d.hatena.ne.jp
    • 2 views on http://74.125.153.132
    • 1 views on http://209.85.135.104
    • 1 views on http://72.14.235.104

    more

    All embeds
    • 451 views on http://lucille.atso-net.jp
    • 67 views on http://d.hatena.ne.jp
    • 2 views on http://74.125.153.132
    • 1 views on http://209.85.135.104
    • 1 views on http://72.14.235.104
    • 1 views on http://72.14.235.132

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories