Go Native : Squeeze the juice out of your 64-bit processor using C++

859 views
820 views

Published on

Published in: Technology, Business
4 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
859
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
4
Likes
0
Embeds 0
No embeds

No notes for slide

Go Native : Squeeze the juice out of your 64-bit processor using C++

  1. 1.  Go Native Squeeze the juice out of your 64-bit processor using…
  2. 2.  Go Native  C/C++Squeeze the juice out of your 64-bit processor using…
  3. 3. Who am I
  4. 4. Who am I-> Fernando Moreira ( @fpmore )
  5. 5. Who am I-> Fernando Moreira ( @fpmore )-> MSc student @ FEUP
  6. 6. Who am I-> Fernando Moreira ( @fpmore )-> MSc student @ FEUP-> Undergraduate Researcher @ Porto Interactive Center
  7. 7. Who am I-> Fernando Moreira ( @fpmore )-> MSc student @ FEUP-> Undergraduate Researcher @ Porto Interactive Center-> Microsoft Student Partner Lead @ M$ PT
  8. 8. Who am I-> Fernando Moreira ( @fpmore )-> MSc student @ FEUP-> Undergraduate Researcher @ Porto Interactive Center-> Microsoft Student Partner Lead @ M$ PT-> I’ve doing C++ for over… 5y
  9. 9. Who are you ?
  10. 10. Who are you ?-> Norte
  11. 11. Who are you ?-> Norte . Centro
  12. 12. Who are you ?-> Norte . Centro . Sul
  13. 13. Who are you ?-> Norte . Centro . Sul . Açores
  14. 14. Who are you ?-> Norte . Centro . Sul . Açores . Madeira
  15. 15. Who are you ?-> Norte . Centro . Sul . Açores . Madeira . FMI
  16. 16. Who are you ?-> Norte . Centro . Sul . Açores . Madeira . FMI-> Who has experience with C?
  17. 17. Who are you ?-> Norte . Centro . Sul . Açores . Madeira . FMI-> Who has experience with C? And with C++?
  18. 18. Who are you ?-> Norte . Centro . Sul . Açores . Madeira . FMI-> Who has experience with C? And with C++?-> Who has experience with 64bit native dev?
  19. 19. Talk’s Scheduleint main( int argc, char **argv ) { try { } catch( Timeout &e ) { return -1; } return 0;}
  20. 20. Talk’s Scheduleint main( int argc, char **argv ) { try { introducing_x64(); } catch( Timeout &e ) { return -1; } return 0;}
  21. 21. Talk’s Scheduleint main( int argc, char **argv ) { try { introducing_x64(); advantagesOver_x86(); } catch( Timeout &e ) { return -1; } return 0;}
  22. 22. Talk’s Scheduleint main( int argc, char **argv ) { try { introducing_x64(); advantagesOver_x86(); nativeDev_x64( const Topic &t ); } catch( Timeout &e ) { return -1; } Promising not to change the topic.  return 0;}
  23. 23. Talk’s Scheduleint main( int argc, char **argv ) { try { introducing_x64(); advantagesOver_x86(); nativeDev_x64( const Topic &t ); codeAnalysis_and_DebugTools(); } catch( Timeout &e ) { return -1; } return 0;}
  24. 24. Talk’s Scheduleint main( int argc, char **argv ) { try { introducing_x64(); advantagesOver_x86(); nativeDev_x64( const Topic &t ); codeAnalysis_and_DebugTools(); costProspectionOn_x64Dev(); } catch( Timeout &e ) { return -1; } return 0;}
  25. 25. introducing_x64()
  26. 26. introducing_x64()-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…
  27. 27. introducing_x64()-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…-> Notice : IA-64 ≠ AMD64
  28. 28. introducing_x64()-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…-> Notice : IA-64 ≠ AMD64-> AMD64 is backwards compatible with x86 (IA-64 isn’t)
  29. 29. introducing_x64()-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…-> Notice : IA-64 ≠ AMD64-> AMD64 is backwards compatible with x86 (IA-64 isn’t)-> Some Hardware: Phenom, Athlon 64, Core-iX, Core 2, …
  30. 30. introducing_x64()-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…-> Notice : IA-64 ≠ AMD64-> AMD64 is backwards compatible with x86 (IA-64 isn’t)-> Some Hardware: Phenom, Athlon 64, Core-iX, Core 2, …-> Some OS’s : Win(XP.Vista.7), OSX, Several Linux distros.
  31. 31. introducing_x64()This talk will be focused on the AMD64 architecture.
  32. 32. advantagesOver_x86()
  33. 33. advantagesOver_x86()-> Address space : Theoretical limit of 16 ExaBytes (2^64)
  34. 34. advantagesOver_x86()-> Address space : Theoretical limit of 16 ExaBytes (2^64)-> More available registers. (there’s one called RIP)
  35. 35. advantagesOver_x86()-> Address space : Theoretical limit of 16 ExaBytes (2^64)-> More available registers. (there’s one called RIP)-> Larger instruction set with emphasis on SIMD
  36. 36. advantagesOver_x86()-> Address space : Theoretical limit of 16 ExaBytes (2^64)-> More available registers. (there’s one called RIP)-> Larger instruction set with emphasis on SIMD-> SSE1, SSE2, and SSE3 are always there
  37. 37. advantagesOver_x86()-> Address space : Theoretical limit of 16 ExaBytes (2^64)-> More available registers. (there’s one called RIP)-> Larger instruction set with emphasis on SIMD-> SSE1, SSE2, and SSE3 are always there-> Unified function calling convention
  38. 38. advantagesOver_x86() Can run x86 environmentsCan run x86 binaries under x64 environments On Windows: . 32bit processes can’t load 64bit DLLs for execution . 64bit processes can’t load 32bit DLLs for execution
  39. 39. nativeDev_x64( how_it_looks_like )
  40. 40. nativeDev_x64( how_it_looks_like ) -> A valid, yet useless, 64bit application.int main( int argc, char **argv } { return 0;}
  41. 41. nativeDev_x64( how_it_looks_like ) -> A valid, yet useless and dangerous, 64bit application.int main( int argc, char **argv } { size_t external_debt = SIZE_MAX; int *ptr = &external_debt; *ptr = 0; return 0;}
  42. 42. nativeDev_x64( how_it_looks_like ) -> A valid, yet useless and dangerous, 64bit application.int main( int argc, char **argv } { size_t external_debt = SIZE_MAX; int *ptr = &external_debt; *ptr = 0; return 0;}
  43. 43. nativeDev_x64( data_model )
  44. 44. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model
  45. 45. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model-> On Linux : LP64 model
  46. 46. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model-> On Linux : LP64 model-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)
  47. 47. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model-> On Linux : LP64 model-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )
  48. 48. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model-> On Linux : LP64 model-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 ) Can you see the data portability problem?
  49. 49. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model-> On Linux : LP64 model-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )Suggestions: Use conditional compilation and type aliasing.
  50. 50. nativeDev_x64( data_model )-> On Microsoft Win64 : LLP64 model-> On Linux : LP64 model-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )Suggestions: Use conditional compilation and type aliasing. Make conscious usage of the sizeof operator.
  51. 51. nativeDev_x64( data_model )-> On x86 : ptr( 4 ), size_t( 4 ), ptrdiff_t( 4 )
  52. 52. nativeDev_x64( data_model )-> On x86 : ptr( 4 ), size_t( 4 ), ptrdiff_t( 4 )-> On x64 : ptr( 8 ), size_t( 8 ), ptrdiff_t( 8 )
  53. 53. nativeDev_x64( data_model )-> On x86 : ptr( 4 ), size_t( 4 ), ptrdiff_t( 4 )-> On x64 : ptr( 8 ), size_t( 8 ), ptrdiff_t( 8 ) These ones will increase memory usage… But will be performance-wise.
  54. 54. nativeDev_x64( common_pitfalls )
  55. 55. nativeDev_x64( common_pitfalls )-> Usage of magic numbers & bit-wise ops: 0x7fffffff
  56. 56. nativeDev_x64( common_pitfalls )-> Usage of magic numbers & bit-wise ops: 0x7fffffff-> Functions with variable number of arguments : printf
  57. 57. nativeDev_x64( common_pitfalls )-> Usage of magic numbers & bit-wise ops: 0x7fffffff-> Functions with variable number of arguments : printf-> Virtual functions
  58. 58. nativeDev_x64( common_pitfalls )-> Usage of magic numbers & bit-wise ops: 0x7fffffff-> Functions with variable number of arguments : printf-> Virtual functions-> Data exchange between x86 and x64 apps
  59. 59. nativeDev_x64( common_pitfalls )-> Usage of magic numbers & bit-wise ops: 0x7fffffff-> Functions with variable number of arguments : printf-> Virtual functions-> Data exchange between x86 and x64 apps-> Data misalignment : SSE requires 16-byte alignment
  60. 60. nativeDev_x64( optimization_tips )
  61. 61. nativeDev_x64( optimization_tips )-> Use native types for loops or tight data usage
  62. 62. nativeDev_x64( optimization_tips )-> Use native types for loops or tight data usage-> Use 16-byte alignment for SSE loads and stores
  63. 63. nativeDev_x64( optimization_tips )-> Use native types for loops or tight data usage-> Use 16-byte alignment for SSE loads and stores-> Heap-allocs in Win64 and XBOX360 are 16-byte aligned
  64. 64. nativeDev_x64( optimization_tips )-> Use native types for loops or tight data usage-> Use 16-byte alignment for SSE loads and stores-> Heap-allocs in Win64 and XBOX360 are 16-byte aligned-> *Use* intrinsics : #include <immintrin.h>
  65. 65. nativeDev_x64( optimization_tips )-> Use native types for loops or tight data usage-> Use 16-byte alignment for SSE loads and stores-> Heap-allocs in Win64 and XBOX360 are 16-byte aligned-> *Use* intrinsics : #include <immintrin.h>-> Unroll loops and sort object’s member data by their size
  66. 66. nativeDev_x64( real-world_tips )
  67. 67. nativeDev_x64( real-world_tips )-> Don’t sacrifice your software architecture.
  68. 68. nativeDev_x64( real-world_tips )-> Don’t sacrifice your software architecture.-> Don’t use it if you don’t know how to.
  69. 69. nativeDev_x64( real-world_tips )-> Don’t sacrifice your software architecture.-> Don’t use it if you don’t know how to.-> Don’t go into premature optimization.
  70. 70. nativeDev_x64( real-world_tips )-> Don’t sacrifice your software architecture.-> Don’t use it if you don’t know how to.-> Don’t go into premature optimization.-> Do it at lower levels and then hide it.
  71. 71. nativeDev_x64( real-world_tips )-> Don’t sacrifice your software architecture.-> Don’t use it if you don’t know how to.-> Don’t go into premature optimization.-> Do it at lower levels and then hide it.-> Trust your compiler to help you do the job.
  72. 72. codeAnalysis_and_DebugTools()
  73. 73. codeAnalysis_and_DebugTools()-> Your IDE : LEARN to fu**** use it!
  74. 74. codeAnalysis_and_DebugTools()-> Your IDE : LEARN to fu**** use it!-> Conditional break points, call-stack
  75. 75. codeAnalysis_and_DebugTools()-> Your IDE : LEARN to fu**** use it!-> Conditional break points, call-stack-> Free tool : CppCheck (CmdLine, Eclipse, CodeBlocks, …)
  76. 76. codeAnalysis_and_DebugTools()-> Your IDE : LEARN to fu**** use it!-> Conditional break points, call-stack-> Free tool : CppCheck (CmdLine, Eclipse, CodeBlocks, …)-> State-of-the-art tool: PVS-Studio (VS 05,08,10)
  77. 77. codeAnalysis_and_DebugTools()-> Your IDE : LEARN to fu**** use it!-> Conditional break points, call-stack-> Free tool : CppCheck (CmdLine, Eclipse, CodeBlocks, …)-> State-of-the-art tool: PVS-Studio (VS 05,08,10)-> Do pair programming and peer-review if possible
  78. 78. costProspectionOn_x64Dev()
  79. 79. costProspectionOn_x64Dev()-> Hardware & Software (IDE + Plugins + Tools + Libs)
  80. 80. costProspectionOn_x64Dev()-> Hardware & Software (IDE + Plugins + Tools + Libs)-> You’ll need to teach the developers (theory & practice)
  81. 81. costProspectionOn_x64Dev()-> Hardware & Software (IDE + Plugins + Tools + Libs)-> You’ll need to teach the developers (theory & practice)-> A port takes time, adds bugs, and it’s not creative
  82. 82. costProspectionOn_x64Dev()-> Hardware & Software (IDE + Plugins + Tools + Libs)-> You’ll need to teach the developers (theory & practice)-> A port takes time, adds bugs, and it’s not creative-> … plus you’ll probably have to maintain two code paths
  83. 83. costProspectionOn_x64Dev()-> Hardware & Software (IDE + Plugins + Tools + Libs)-> You’ll need to teach the developers (theory & practice)-> A port takes time, adds bugs, and it’s not creative-> … plus you’ll probably have to maintain two code paths-> Full implementation adds creativity, but takes much more time and will add many more bugs.
  84. 84. Lets gostate-of-the-art!
  85. 85. Questions?

×