History
• Ian Buck, Dir. of GPU Computing, received his
  PhD from Stanford for his research on GPPM in
  2004
• Started working for Nvidia to commercialize
  GPU computing

• First start was in 2006, Nvidia released CUDA v
  1.0 for G80
• In spring 2008, CUDA 2.0 was released together
  with GT200
About
• With CUDA, normal applications can be
  ported to GPU for higher performance



• No low level or 3D programming
  knowledge required, CUDA works with C
CPU vs GPU
• A CPU core can execute 4 32-bit instructions per
  clock, whilst a GPU can execute 3200 32-bit
  instructions per clock


• A CPU is designed primarily to be an executive
  and make decisions
• A GPU is different, it has a large number of
  ALU’s(Arithmetic/Logic Units), a lot more than a
  CPU.
Structure
• In CUDA, you are required to specify the
  number of blocks and threads in each
  block.

• One block can contain up to 512 threads.
• Each thread on each block is executed
  separately.
Structure
Syntax

• Key parts:
• Identifying a GPU function (__global__,
  __device__)

• Calling a GPU function, specifying number
  of blocks and threads per block
  function<<<block_nr,
  thread_nr>>>(param);
Syntax

• CPU Code:




• Calling function:
Syntax

• GPU Code:




• Calling function:
Bruteforce
• As a lot of information is processed at the
  same time, parallel programming has a
  big impact on bruteforce

• Number of tries increases drastically on a
  GPU than on a CPU
Examples
• Let’s say we have a password to break,
  and the only thing we know is it has
  length=3

• A simple bruteforce would be:
Examples
• A GPU bruteforce:




• Called like this:
Examples
• A more efficient GPU bruteforce:




• Called like this:
Real Life
• Let’s say we have an MD5 and a wordlist
  of 1.000.000 words
• A simple bruteforce would be:
Real Life
• A GPU bruteforce would be:




• Called like this:

• threadIdx.x+blockIdx.x*blockDim.x is the thread
  ID (ranging from 1 to 1.000.000)
• 2000*500=1.000.000 threads
NVidia CUDA for Bruteforce Attacks - DefCamp 2012

NVidia CUDA for Bruteforce Attacks - DefCamp 2012

  • 2.
    History • Ian Buck,Dir. of GPU Computing, received his PhD from Stanford for his research on GPPM in 2004 • Started working for Nvidia to commercialize GPU computing • First start was in 2006, Nvidia released CUDA v 1.0 for G80 • In spring 2008, CUDA 2.0 was released together with GT200
  • 3.
    About • With CUDA,normal applications can be ported to GPU for higher performance • No low level or 3D programming knowledge required, CUDA works with C
  • 4.
    CPU vs GPU •A CPU core can execute 4 32-bit instructions per clock, whilst a GPU can execute 3200 32-bit instructions per clock • A CPU is designed primarily to be an executive and make decisions • A GPU is different, it has a large number of ALU’s(Arithmetic/Logic Units), a lot more than a CPU.
  • 5.
    Structure • In CUDA,you are required to specify the number of blocks and threads in each block. • One block can contain up to 512 threads. • Each thread on each block is executed separately.
  • 6.
  • 7.
    Syntax • Key parts: •Identifying a GPU function (__global__, __device__) • Calling a GPU function, specifying number of blocks and threads per block function<<<block_nr, thread_nr>>>(param);
  • 8.
    Syntax • CPU Code: •Calling function:
  • 9.
    Syntax • GPU Code: •Calling function:
  • 10.
    Bruteforce • As alot of information is processed at the same time, parallel programming has a big impact on bruteforce • Number of tries increases drastically on a GPU than on a CPU
  • 11.
    Examples • Let’s saywe have a password to break, and the only thing we know is it has length=3 • A simple bruteforce would be:
  • 12.
    Examples • A GPUbruteforce: • Called like this:
  • 13.
    Examples • A moreefficient GPU bruteforce: • Called like this:
  • 14.
    Real Life • Let’ssay we have an MD5 and a wordlist of 1.000.000 words • A simple bruteforce would be:
  • 15.
    Real Life • AGPU bruteforce would be: • Called like this: • threadIdx.x+blockIdx.x*blockDim.x is the thread ID (ranging from 1 to 1.000.000) • 2000*500=1.000.000 threads