Using Graphics Cards to Break Passwords

2,509 views
2,297 views

Published on

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,509
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
47
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Using Graphics Cards to Break Passwords

  1. 1. Using Graphics Cards to Break Passwords Andrey Belenko a.belenko@elcomsoft.com !"#$%&()"*
  2. 2. Why use GPUs?
  3. 3. Core i7 die layout Transistor count: 1.17B
  4. 4. Core i7 die layout L3 Cache L3 CacheIO & QPI IO & QPI Queue Core Core Core Core Core Core Memory Controller Transistor count: 1.17B
  5. 5. Branch pred.Fetch & L1 Paging L2Decode & Mem. L1μ-codeSched. Exec
  6. 6. Core i7 die layout Transistor count: 1.17B
  7. 7. 10% CPU dedicates 1/10 of resources to calculations90%
  8. 8. GTX 480 die layout Transistor count: 3B
  9. 9. GTX 480 die layout Transistor count: 3B
  10. 10. • GPU dedicates 1/3 of 30% resources to calculations • 2.5x more transistors than CPU70% • 7x more computing power overall
  11. 11. PBKDF2-SHA1 with 2000 iterations i7-970 15.5KGTX 480 60KGTX 580 68KHD 5970 195K 0K 50K 100K 150K 200K
  12. 12. How to use GPUs?
  13. 13. Basics• GPUs are SIMD and excel at data-parallel tasks• Program for GPU is called ‘kernel’• Kernel runs in instances called threads• Hardware takes care of thread scheduling• Typical GPU has 100s of processors• Need 1000s of threads to fully utilize GPU
  14. 14. Example C=A+BKernel:void sum (int c[], int a[], int b[]) { int Index = getThreadId(); c[Index] = a[Index] + b[Index];}Adding vectors:int A[10], B[10], C[10];sum<<10>> (C, A, B);
  15. 15. Example MD5 Kernel: void md5 (uint8 *dataIn, uint8 *dataOut) { int Index = getThreadId(); uint8 *in = dataIn + MD5_BLOCK_SIZE * Index; uint8 *out = dataOut + MD5_HASH_SIZE * Index; MD5( dataOut, dataIn, MD5_BLOCK_SIZE ); }Computing hashes:uint8 Src[10 * MD5_BLOCK_SIZE];uint8 Dst[10 * MD5_HASH_SIZE];md5<<10>> (Src, Dst);
  16. 16. GPU Computing Stack High-level LanguageTranslation, nooptimizations Intermediate Language Optimization goes here ISA GPU Hardware
  17. 17. GPU Computing Stack GPU world is bipolar NVIDIA ATIHLL CUDA C, OpenCL OpenCLIL PTX IL Documented forISA Not documented RV700 (48xx)HW G80 (8xxx) and up RV670 (38xx) and up
  18. 18. Breaking passwords the CPU wayGenerate H(p) Verify hashpassword Computing H(p) takes the most time, so offload it to the GPU
  19. 19. Breaking passwords the GPU wayCPU GPU CPU H(p) Generate H(p) Verify hashes passwords ... H(p)
  20. 20. Breaking passwords the GPU wayCPU GPU CPU Generate H(p) Verify hashes passwords•If H(p) is fast, PCIe data transfers are the bottleneck •E.g. if H(p) is SHA-1, theoretical peak is ~200M p/s Solution is to offload everything to GPU
  21. 21. Breaking passwords the GPU wayGPU GPU GPU Generate H(p) Verify hashes passwords•If H(p) is fast, PCIe data transfers are the bottleneck •E.g. if H(p) is SHA-1, theoretical peak is ~200M p/s Solution is to offload everything to GPU
  22. 22. How to use GPUs? Implementation considerations
  23. 23. GPU Computing Stack NVIDIA ATIHLL CUDA C, OpenCL OpenCLIL PTX IL Documented forISA Not documented RV700 (48xx)HW G80 (8xxx) and up RV670 (38xx) and up
  24. 24. Choosing language CUDA C vs. PTX• C code translates into PTX without optimizations• Optimization is done when compiling PTX• Intrinsics for device-specific instructions No real reason for developing in PTX
  25. 25. Choosing language OpenCL• Portability requires compilation at runtime • May take significant time and resources • Compiler is part of driver ➯ testing hell • Requires source code in HLL ➯ IP issues• Implementations are not complete and vary across vendors Not mature enough
  26. 26. Choosing language ATI IL• The only viable option if you love your users • Access to device-specific instructions • Best performance• Not a an option if you love your developers • Poor documentation, poor samples • Meaningless compiler errors, no debugger
  27. 27. Achieving performance• Minimize data transfers• Minimize memory accesses • Or at least plan them carefully• Minimize number of registers used • Less registers used means more threads will run simultaneously• Schedule enough threads to keep GPU processors busy• Avoid thread divergence
  28. 28. Porting crypto to GPU• Usually pretty straightforward • MD5, SHA1 and alike require little to no changes• Can be tricky sometimes • RC4 requires many memory accesses, so careful layout is needed • DES requires table lookups which are very expensive
  29. 29. Porting crypto to GPU The DES• Table lookups (s-boxes) are the bottleneck• Avoid them by using bitslicing • S-boxes replaced with logic functions • 32 encryptions in parallel • Requires many registers • Performance depends on compiler heuristics
  30. 30. How to use GPUs? Real-world problems
  31. 31. Scalability Not all GPUs created equal1. Program should scale nicely with the number ofprocessors on GPU • Query processor count from the driver • Partition task accordingly numThreads = F(numProcessors) • Also helps to avoid triggering watchdog and freezing screen
  32. 32. Scalability 8 GPUs in system are not uncommon2. Program should scale nicely with the number ofGPUs • Query device count from the driver • Spawn CPU threads to control each device • Partition task accordinglySpeedup should be linear unless you hit PCIe limits
  33. 33. Compatibility Not everyone’s got Fermi.Yet.• New hardware offers great new features • Cache on Fermi • bitalign instruction on RV770• May require different optimization strategy• May require separate codebase• Support for legacy hardware shouldn’t be dropped Be prepared to handle this sort of complexity
  34. 34. Including GPU code Option 1: include PTX/IL code in your program Pros Cons•Recommended way •Compilation at runtime•Forward compatibility •Can’t test all hardware•No hardware required •IP issues
  35. 35. Including GPU code Option 2: include pre-compiled GPU binaries Pros Cons•No dependency on users’ •May not work with future driver devices•No compilation at runtime •Need to precompile for every supported GPU•Better IP protection •No precompiled binary for GPU = no support
  36. 36. Questions?
  37. 37. Thank you
  38. 38. Using Graphics Cards to Break Passwords Andrey Belenko a.belenko@elcomsoft.com !"#$%&()"*

×