Bitcoin Internals

Who am I?
James Turner
Polyglot programmer
Worked for ebay , BBC, BSkyB
CTO @ magnr.com
2

Topics
• Binary protocols
• Hashing and probability
• Bloom Filters
• Merkle Trees
• P2P networks and CAP theorem
3

What is a binary protocol
A binary protocol is a protocol which is intended or expected to
be read by a machine rather than a human being, as opposed
to a plain text protocol such as IRC, SMTP, or HTTP. Binary
protocols have the advantage of terseness, which translates into
speed of transmission and interpretation.
5https://en.wikipedia.org/wiki/Binary_protocol
Wikipedia says…

Our own binary protocol?
We can deﬁne our own “Sandwich” protocol as
1) a 32 bit Integer for number of cheese slices
followed by
2) a 32 bit Integer for number of ham slices
So our binary protocol (assuming Big Endian) for 1,1 sandwich would be:
00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000001
This is a ﬁxed format. There are no variable sized parts.
7https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

Binary Protocol Efﬁciency
Our “Sandwich” protocol uses 8 bytes to transmit the cheese
and ham information.
Compare this to JSON, where we might have
{“cheese”:1,”ham”:1}
This is 20 bytes
In this example, we’re >50% more efﬁcient.
However, sometimes you can’t read a binary protocol, our
terminal output would be “”
8
There are 8 bytes here honestly!

Variable length binary protocol
The “Message” protocol:
1) a 32 bit Integer
followed by
2) a variable number of bytes (chars)
So our binary output for 5”hello” would be
00000000 00000000 00000000 00000101 1101000 1100101 1101100 1101100 1101111
9https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

Bitcoin protocol
10https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
Block
Message Header

What is hashing?
A computational function that takes an arbitrary sized input, and produces a ﬁxed
size output.
e.g. sha256(“hello”) produces
“2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824”
Hashing has certain properties which we ﬁnd useful:
• It’s extremely hard to reverse, and calculate the original data from the hash.
• If the input data changes even slightly the hash output is completely different.
11

Hashing collision probability
If 2 pieces of input data produce the same output hash, we have a
“collision”.
Given the random nature of hashing, what is the probability that for any 2
pieces of input data we would generate identical hashes?
If the output of the hashing function is a single byte
e.g. 01101011 or 01111111 or 01110001
We can see that there is a 1 in 256 (2^8) chance of getting a collision.
This can be generalised to 1/(2^n) where n is number of bits.
12

Merkle Trees
13
Root Hash 1234
TX1
Hash0

hash(TX1)
Hash 01

hash(Hash0 , Hash1 )
TX2
Hash1

hash(TX2)
TX3
Hash2

hash(TX3)
TX4
Hash3

hash(TX4)
Data
Hash 23

hash(Hash2 , Hash3 )

Verify a TX using Merkle Trees
14
H 12345678
TX1
H1
H12
DataTX2 TX3 TX4 TX5 TX6 TX7 TX8
H2 H3 H4 H5 H6 H7 H8
H34 H56 H78
H1234 H5678
Verify TX8 exists in block
https://en.bitcoin.it/wiki/Merged_mining_speciﬁcation#Merkle_Branch

Merkle Blocks
15
Merkle Block Message
https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki

Bloom Filters
Let’s assume we have 2 hash functions “f” and “g”
f(x) and g(x) produce 2 random outputs
e.g.
f(“hello”) => 123
g(“hello”) => 192
16https://en.wikipedia.org/wiki/Bloom_ﬁlter

Bloom Filters
We have an array of bits, let’s say 8 (this will ﬁt a single byte).
such that the empty bitset looks like this:

Bloom Filters
By performing the modulus (%) of each hash output with 8 (the size of
the bitset) we should get the following:
123 % 8 =3
192 % 8 = 0
We now mark positions 0 and 3 as “1” bits

Bloom Filters
f(“hello”) g(“hello”) f(“world”) g(“world”)

Bloom Filters (exists)
f(“world”) g(“world”) f(“bar”) g(“bar”)

Bloom Filters (false positives)
f(“foo”) g(“foo”)

Bloom Filters (error rate)
m bits
k is number of hashing functions
k=2 , hash functions f & g
m=8
m is the number of bits in our bitset
n is the number of items represented
n=1, “hello”
f(“hello”)
probability of a single bit NOT being set is (1 - 1/8)^2 , more generally (1-1/m)^k

as n grows, this becomes (1-1/m)^kn
g(“hello”)

Bloom Filter properties
• Memory compaction (lots of items in a small space)
• Possible existence (and false positives)
• Collision probability determined by number of items/number
of bits
23

Bloom Filters in Bitcoin
24
filteradd, filterload, filterremove

CAP theorem
• Consistency
• Availability
• Partition Tolerance
25
https://en.wikipedia.org/wiki/CAP_theorem
The CAP theorem is a negative result that says you cannot
simultaneously achieve all three goals in the presence of
errors. Hence, you must pick one objective to give up.
http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-
theorem/fulltext

CAP theorem in P2P networks
26
A node in the Bitcoin network

Availability
27
X
Available
Available Available
Available
Available
Available
Available
Available
Available
Available
Unavailable
Network: Available
Unavailable
X

Partitioning
28
X
X
Partitioned
TX
TX
TX

Consistency
29
Block 222B
Network: Inconsistent
Block 222B
Block 222B Block 222B
Block 222B
Block 222A Block 222A
Block 222A
Block 222A
Block 222A
Block 222A
Block 222A

Consistency
30
Block 223
Network: Consistent
Block 223
Block 223 Block 223
Block 223
Block 223 Block 223
Block 223
Block 223
Block 223
Block 223
Block 223

Other P2P protocols
• BitTorrent (Distributed Hash Tables)
• Gnutella (Query Routing Tables)
31https://en.wikipedia.org/wiki/List_of_P2P_protocols

Bitcoin Internals

Recommended

Recommended

More Related Content

Similar to Bitcoin Internals

Similar to Bitcoin Internals (20)

Recently uploaded

Recently uploaded (20)

Bitcoin Internals