Bitcoin 
Natalie , Joe, Nattapon
Aim and Objectives 
● Aim 
o Increase hash rate through hardware 
● Objectives 
o Implement Bitcoin hashing algorithm 
 hardware (Zedboard ) 
 software 
 offline and online miner 
● Result 
o compare past block hash performance
Prototype 1 and 2 
● Laptop 
o java SHA256 and API 
server connection 
● Zedboard 
o SHA256 and Mmap 
● Laptop 
o Scripting connection 
● Zedboard 
o Java and HW - SHA256 
o C - Mmap
Purpose of P1 and P2 
● Prototype 1 
o testing Java SHA256 and Bitcoin API 
o Serial connection with Zedboard 
● Prototype 2 
o testing java SHA256 on Zedboard 
o testing Server Connection Scripting 
o testing Mmap
Final Design 
Lightweight Miner 
- no blockchain 
- Java handle 
Connection 
- API, JSON-RPC 
- Submit and Get block 
- SHA256 on C
SHA-256 pseudocode (1) 
Pre-processing: 
append the bit '1' to the message 
append k bits '0', where k is the minimum number >= 0 such that the resulting 
message 
length (modulo 512 in bits) is 448. 
append length of message (without the '1' bit or padding), in bits, as 64-bit big-endian 
integer 
(this will make the entire post-processed length a multiple of 512 bits)
SHA-256 pseudocode (2) 
Process the message in successive 512-bit chunks: 
break message into 512-bit chunksfor each chunk 
create a 64-entry message schedule array w[0..63] of 32- 
bit words (The initial values in w[0..63] don't matter, so 
many implementations zero them here) 
copy chunk into first 16 words w[0..15] of the message 
schedule array Extend the first 16 words into the remaining 
48 words w[16..63] of the message schedule array: 
for i from 16 to 63 
s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 
18) xor (w[i-15] rightshift 3) 
s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 
19) xor (w[i-2] rightshift 10) 
w[i] := w[i-16] + s0 + w[i-7] + s1 
Initialize working variables to current hash value: a := h0 
b := h1 
c := h2 
d := h3 
e := h4 
f := h5 
g := h6 
h := h7 
Compression function main loop: 
for i from 0 to 63 
S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e 
rightrotate 25) 
ch := (e and f) xor ((not e) and g) 
temp1 := h + S1 + ch + k[i] + w[i] 
S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a 
rightrotate 22) 
maj := (a and b) xor (a and c) xor (b and c) 
temp2 := S0 + maj
SHA-256 - Pseudocode (3) 
h := g 
g := f 
f := e 
e := d + temp1 
d := c 
c := b 
b := a 
a := temp1 + temp2 
Add the compressed chunk to the current hash value: 
h0 := h0 + a 
h1 := h1 + b 
h2 := h2 + c 
h3 := h3 + d 
h4 := h4 + e 
h5 := h5 + f 
h6 := h6 + g 
h7 := h7 + hProduce the final hash value (big-endian):digest := hash := h0 append h1 append h2 append h3 append h4 
append h5 append h6 append h7
Hardware - SHA256 Algorithm 
1024bits header (608bits +32bits 
nonce+info) 
512 bits 512bits(128 bits+’1’+zeros + 
64bits length info) 
message chunk 
hashing 
Initialized hash 
values 
message chunk 
hashing 
256bits hash 
output(first rnd) 
512bits(256bits +‘1’+zeros 
+ 64 bits length info) 
message chunk 
Initialized hash hashing 
values 
256bits hash output 
(second round)
Hardware - SHA256 Algorithm 
● Brifely consists of 3 message chunk hashing 
message chunk 
hashing 
message chunk 
hashing 
message chunk 
hashing 
h0-h7 
initial 
h0-h7 
initial 
h0-h7 
Messagechunk 1 
Messagechunk 2 Messagechunk 3 
unit 
1. with some inputs from outputs of preprocessing
Hardware - Structural sha256 
● Saving resources e.g 
(LUT/slices) 
1. multiple array64vectors of 32 bits: word,k 
2. Parallel calculations not entirely feasible 
● trade-off : time 
1. One round of MessageChunk will be 
completed in 64 clk cycles. So SHA 256 
will be completed in 64x3 = 192 ckls. 
2. Search for the right nonce can be very 
exhaustive effort
Hardware - Structural sha256 
wordext(0) 
wordext(1) 
wordext(2) 
wordext(3) 
… 
… 
… 
… 
… 
… 
.. 
wordext(47) 
compress(0) 
… 
… 
… 
… 
… 
… 
.. 
compress(16) 
compress(17) 
compress(18) 
compress(19) 
… 
… 
… 
compress(63) 
hashout 
initial hash values 
Component C 
48 64 1 
Total = (48+64+1)*3=452 
Each instance does hugh 
amount of calculation
Hardware - Structural sha256 
512-bit message 
div in 16 32-bit 
Mem( 
word[0...63]) 
wordext 
Shift_regs(15) 
Shift_regs(14) 
… 
Shift_regs(9) 
… 
Shift_regs(1) 
Shift_regs(0) 
h0 h7 
……... 
mux(0) mux(7) 
compress 
h0 h7 
Component C 
… 
hash output 
Total 3 instances chunk 
Each hugh Mem ( 
k[0...63]) 
……... 
a …….. h 
.
Top Level 
● Increment Nonce from 0 to 0xFFFFFFFF 
big_endian_32 
SHA Algorithm 
Top 
big_endian_256 
Cnt 
Compare with target 
Header Nonce 
PS 
● If the value less than target then output ‘succeed’ signal 
● else if cnt less than 0xFFFFFFFF increment cnt by 1 
● else output ‘fail’ signal
Problems & Future Improvement 
● Hardware is working in simulation but not working 
correctly with MemoryMap from C 
● Data cannot be correctly read from the desired 
address 
● Create a structural vhdl model that can take 
advantage of parallel calculations further 
Time management 
Task allocation
Gantt Chart
Results & Evaluation 
SW miner hashrate: for 2329 hashes 
Zedboard (667 MHz): ~ 0.35 
khash/s 
Laptop (i5, 2.50GHz): ~ 7.40 
kHhWas Mh/isner 
If applied 667Mhz frequency. 
~ 347 khash/s(192ccl for 1 hash)

Implementation of Bitcoin Miner on SW and HW

  • 1.
    Bitcoin Natalie ,Joe, Nattapon
  • 2.
    Aim and Objectives ● Aim o Increase hash rate through hardware ● Objectives o Implement Bitcoin hashing algorithm  hardware (Zedboard )  software  offline and online miner ● Result o compare past block hash performance
  • 3.
    Prototype 1 and2 ● Laptop o java SHA256 and API server connection ● Zedboard o SHA256 and Mmap ● Laptop o Scripting connection ● Zedboard o Java and HW - SHA256 o C - Mmap
  • 4.
    Purpose of P1and P2 ● Prototype 1 o testing Java SHA256 and Bitcoin API o Serial connection with Zedboard ● Prototype 2 o testing java SHA256 on Zedboard o testing Server Connection Scripting o testing Mmap
  • 5.
    Final Design LightweightMiner - no blockchain - Java handle Connection - API, JSON-RPC - Submit and Get block - SHA256 on C
  • 6.
    SHA-256 pseudocode (1) Pre-processing: append the bit '1' to the message append k bits '0', where k is the minimum number >= 0 such that the resulting message length (modulo 512 in bits) is 448. append length of message (without the '1' bit or padding), in bits, as 64-bit big-endian integer (this will make the entire post-processed length a multiple of 512 bits)
  • 7.
    SHA-256 pseudocode (2) Process the message in successive 512-bit chunks: break message into 512-bit chunksfor each chunk create a 64-entry message schedule array w[0..63] of 32- bit words (The initial values in w[0..63] don't matter, so many implementations zero them here) copy chunk into first 16 words w[0..15] of the message schedule array Extend the first 16 words into the remaining 48 words w[16..63] of the message schedule array: for i from 16 to 63 s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3) s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10) w[i] := w[i-16] + s0 + w[i-7] + s1 Initialize working variables to current hash value: a := h0 b := h1 c := h2 d := h3 e := h4 f := h5 g := h6 h := h7 Compression function main loop: for i from 0 to 63 S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25) ch := (e and f) xor ((not e) and g) temp1 := h + S1 + ch + k[i] + w[i] S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22) maj := (a and b) xor (a and c) xor (b and c) temp2 := S0 + maj
  • 8.
    SHA-256 - Pseudocode(3) h := g g := f f := e e := d + temp1 d := c c := b b := a a := temp1 + temp2 Add the compressed chunk to the current hash value: h0 := h0 + a h1 := h1 + b h2 := h2 + c h3 := h3 + d h4 := h4 + e h5 := h5 + f h6 := h6 + g h7 := h7 + hProduce the final hash value (big-endian):digest := hash := h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7
  • 9.
    Hardware - SHA256Algorithm 1024bits header (608bits +32bits nonce+info) 512 bits 512bits(128 bits+’1’+zeros + 64bits length info) message chunk hashing Initialized hash values message chunk hashing 256bits hash output(first rnd) 512bits(256bits +‘1’+zeros + 64 bits length info) message chunk Initialized hash hashing values 256bits hash output (second round)
  • 10.
    Hardware - SHA256Algorithm ● Brifely consists of 3 message chunk hashing message chunk hashing message chunk hashing message chunk hashing h0-h7 initial h0-h7 initial h0-h7 Messagechunk 1 Messagechunk 2 Messagechunk 3 unit 1. with some inputs from outputs of preprocessing
  • 11.
    Hardware - Structuralsha256 ● Saving resources e.g (LUT/slices) 1. multiple array64vectors of 32 bits: word,k 2. Parallel calculations not entirely feasible ● trade-off : time 1. One round of MessageChunk will be completed in 64 clk cycles. So SHA 256 will be completed in 64x3 = 192 ckls. 2. Search for the right nonce can be very exhaustive effort
  • 12.
    Hardware - Structuralsha256 wordext(0) wordext(1) wordext(2) wordext(3) … … … … … … .. wordext(47) compress(0) … … … … … … .. compress(16) compress(17) compress(18) compress(19) … … … compress(63) hashout initial hash values Component C 48 64 1 Total = (48+64+1)*3=452 Each instance does hugh amount of calculation
  • 13.
    Hardware - Structuralsha256 512-bit message div in 16 32-bit Mem( word[0...63]) wordext Shift_regs(15) Shift_regs(14) … Shift_regs(9) … Shift_regs(1) Shift_regs(0) h0 h7 ……... mux(0) mux(7) compress h0 h7 Component C … hash output Total 3 instances chunk Each hugh Mem ( k[0...63]) ……... a …….. h .
  • 14.
    Top Level ●Increment Nonce from 0 to 0xFFFFFFFF big_endian_32 SHA Algorithm Top big_endian_256 Cnt Compare with target Header Nonce PS ● If the value less than target then output ‘succeed’ signal ● else if cnt less than 0xFFFFFFFF increment cnt by 1 ● else output ‘fail’ signal
  • 15.
    Problems & FutureImprovement ● Hardware is working in simulation but not working correctly with MemoryMap from C ● Data cannot be correctly read from the desired address ● Create a structural vhdl model that can take advantage of parallel calculations further Time management Task allocation
  • 16.
  • 17.
    Results & Evaluation SW miner hashrate: for 2329 hashes Zedboard (667 MHz): ~ 0.35 khash/s Laptop (i5, 2.50GHz): ~ 7.40 kHhWas Mh/isner If applied 667Mhz frequency. ~ 347 khash/s(192ccl for 1 hash)

Editor's Notes

  • #4 - to fairly compare - increase SW performance and for portability, migrate all software applications to Zedboard