This presentation describes the design of Cointerra's first generation Bitcoin mining ASIC and system hardware. This was presented by Dr. Javed Barkatullah at Hot Chips 26 Symposium on August 12, 2014 in Cupertino, CA.
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BITCOIN MINING MACHINES
1. GOLDSTRIKETM 1:
COINTERRA’S FIRST GENERATION
CRYPTO-CURRENCY PROCESSOR
FOR BITCOIN MINING MACHINES
Javed Barkatullah, Ph.D., MBA
Timo Hanke, Ph.D.
Ravi Iyengar
Ricky Lewelling
Jim O’Connor
2. BITCOIN MINING WORK
For a message m find a nonce n such
that the 256-bit result
Sha-256 (Sha-256 (m || n) )
has a specified number of leading
zero bits (“target”)
• Sha-256 is a cryptographic hash function
• Cryptographic hash functions are “one way” functions
• This search problem is best solved by trial-and-error
Version
Previous
Hash
Merkle
Root
Target Time Nonce Hash
Double
Sha
Tx_id_0 Tx_id_1 Tx_id_n Tx_id_n+1
Tx_id_0,1 Tx_id_n,n+1
Merkle
Root
Version
Previous
Hash
Merkle
Root
Target Time Nonce
Message
Hash
Double
Sha
Entry m
Entry m+1
Transaction 0 Transaction 1
Double
Sha
Double
Sha
Block Header
3. HISTORY OF BITCOIN MINING HARDWARE
GPU based
Platform
FPGA based
Platform
Graph source: http://bitcoin.sipa.be
Custom ASIC
based Platform
First Bitcoin
Network & CPU
based Platform
↑Hashing
Capacity
↑Network
Difficulty
↑Platform
Hashing
Power
Network Difficulty
adjusted every 2016
blocks mined
1e-06
1e-04
1e-02
1
100
1e04
THash/s
1e-06
1e-04
1e-02
1
100
1e04
Difficultyx106
4. GOLDSTRIKE™ 1 DEVELOPMENT TIMELINE
• 4 months from RTL start to tape out!
• 49 days from tape out to first silicon
• Packaged silicon arrived on Dec. 28, 2013
• First system shipped to customer around mid January, 2014
Cointerra, Inc.
Founded
May
2013
June
2013
July
2013
August
2013
Sept.
2013
Oct.
2013
Nov.
2013
Dec
2013
Jan.
2014
RTL
Start
RTL
Freeze
Tape
Out
First
Silicon
System
Shipped!
Board &
System Design
Start
Feb.
2014
5. • Motorola compatible 4-pin
SPI Port
• PLL with simple bit-bang
interface
• 120 Hash Engines arranged
into 16 super-pipes
• 128 deep Input Work FIFO
• 128 deep Output Status FIFO
• 384-bit Pipe Control Register
(PCR) to enable/disable
individual hash engine
• Low I/O bandwidth
requirement
• New work (384 bits) every 232 clock cycles
per engine
GOLDSTRIKE™ 1 ARCHITECTURE
6. • Two rounds of SHA-256
processing
• Searches for a result in 232
nonce range
• Each round consists of 64
iterations
• Fully unrolled iterations
• Two parallel but connected
pipelines – message &
compressor
• Generates a result out only
if target criteria met
HASH ENGINE
Input Interface
Output Interface
Hash_out
Work_in
Nonce Counter
Comparator
Stage 0
Stage 1
Stage 2
Stage 63
Stage 62
Compressor
Round1
Stage 0
Stage 1
Stage 2
Stage 63
Stage 62
Message
Round2
Stage 0
Stage 1
Stage 2
Stage 63
Stage 62 Message
Round1
Stage 0
Stage 1
Stage 2
Stage 63
Stage 62
Compressor
Round2
7. COMPRESSOR STAGE OF SHA-256 PIPELINE
S0 Maj S1 Ch
+ +
+
+ +
A DB C E F HG
Ain Bin Cin Din Ein Fin Gin Hin Kin Win
Ain Bin Cin Ein Fin Gin
Aout Bout Cout Dout Eout Fout Gout Hout
S0(A) = ( A >>> s1) ( A >>> s2 )
( A >>> s3)
S1(E) = ( E >>> s4) ( E >>> s5 )
( E >>> s6)
Ch(E,F,G) = (E F) (~E G)
Maj(A,B,C) = (A B) (B C)
(C A)
All registers are 32-bits wide
9. • Global Foundries HKMG
28nm HPP process
• 9 metal layers
• 120 hash engines in
11x11 array (grey boxes)
• Top level logic block in
the center
DIE MICROGRAPH
10. • 37.5 x 37.5 mm FCBGA
package
• 4 bare dies per package
• 1296 pins
• > 500 GH/s @ 1.05GHz & 0.7v
GOLDSTRIKE™-1 (GS1) PACKAGE
11. HEAT DISSIPATION CHALLENGE
• Cooling options examined:
• Heat sink + Airflow Common
in CPU applications
• Liquid Cooling Popular
among over-clockers
• Immersion Efficient for data
centers
• Liquid cooling with direct
attach cooling head selected
• Enable a common platform for
both home & data center
customers
Air Temps on Plane 3mm
above PCB Top
12. Up to 2TH/s hash rate per appliance
• Dual PCB with 4 GS1 packages total
• Power budget to meet household outlet capacity
Layout - 4U chassis Design
Driven by cooling requirements
Radiator cross-section
Fan Size
Similar design for TerraMiner IV data
center and home models
Push pull airflow design for maximum
performance
Fans chosen for balance between cost,
performance & audible noise
Dual 1U power supplies for minimal
volume impact
TERRAMINER APPLIANCE
15. ASIC DESIGN CHALLENGES & CHOICES
Challenges:
• High power density and high node toggle rates
• Power delivery
• Heat dissipation
• IR drop and di/dt noise
• Very high sequential cell count
• Reduce die area and power consumption
• Very short (4-month) schedule from RTL start to tape out
• Very small design team
Choices:
• Optimize common core blocks
• Maximize design repeat & reuse
• Utilize highly experienced design team
16. CONCLUDING REMARKS
• Continued demand for higher performance and lower power
appliance
• Maintain Cointerra’s leadership position in Bitcoin mining
industry
• New designs with increased power efficiency and
performance