This document provides an overview of several key concepts in Bitcoin internals. It begins with an introduction of the presenter. It then discusses binary protocols, hashing and collision probability, Merkle trees, Bloom filters, and the CAP theorem as they relate to peer-to-peer networks and Bitcoin specifically. For each topic, it provides a brief high-level explanation and examples to illustrate the concepts. The document is intended to educate about some of the technical underpinnings of how Bitcoin works at a foundational level.
5. What is a binary protocol
A binary protocol is a protocol which is intended or expected to
be read by a machine rather than a human being, as opposed
to a plain text protocol such as IRC, SMTP, or HTTP. Binary
protocols have the advantage of terseness, which translates into
speed of transmission and interpretation.
5https://en.wikipedia.org/wiki/Binary_protocol
Wikipedia says…
7. Our own binary protocol?
We can define our own “Sandwich” protocol as
1) a 32 bit Integer for number of cheese slices
followed by
2) a 32 bit Integer for number of ham slices
So our binary protocol (assuming Big Endian) for 1,1 sandwich would be:
00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000001
This is a fixed format. There are no variable sized parts.
7https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
8. Binary Protocol Efficiency
Our “Sandwich” protocol uses 8 bytes to transmit the cheese
and ham information.
Compare this to JSON, where we might have
{“cheese”:1,”ham”:1}
This is 20 bytes
In this example, we’re >50% more efficient.
However, sometimes you can’t read a binary protocol, our
terminal output would be “”
8
There are 8 bytes here honestly!
9. Variable length binary protocol
The “Message” protocol:
1) a 32 bit Integer
followed by
2) a variable number of bytes (chars)
So our binary output for 5”hello” would be
00000000 00000000 00000000 00000101 1101000 1100101 1101100 1101100 1101111
9https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
11. What is hashing?
A computational function that takes an arbitrary sized input, and produces a fixed
size output.
e.g. sha256(“hello”) produces
“2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824”
Hashing has certain properties which we find useful:
• It’s extremely hard to reverse, and calculate the original data from the hash.
• If the input data changes even slightly the hash output is completely different.
11
12. Hashing collision probability
If 2 pieces of input data produce the same output hash, we have a
“collision”.
Given the random nature of hashing, what is the probability that for any 2
pieces of input data we would generate identical hashes?
If the output of the hashing function is a single byte
e.g. 01101011 or 01111111 or 01110001
We can see that there is a 1 in 256 (2^8) chance of getting a collision.
This can be generalised to 1/(2^n) where n is number of bits.
12
16. Bloom Filters
Let’s assume we have 2 hash functions “f” and “g”
f(x) and g(x) produce 2 random outputs
e.g.
f(“hello”) => 123
g(“hello”) => 192
16https://en.wikipedia.org/wiki/Bloom_filter
17. Bloom Filters
We have an array of bits, let’s say 8 (this will fit a single byte).
such that the empty bitset looks like this:
17https://en.wikipedia.org/wiki/Bloom_filter
18. Bloom Filters
By performing the modulus (%) of each hash output with 8 (the size of
the bitset) we should get the following:
123 % 8 =3
192 % 8 = 0
We now mark positions 0 and 3 as “1” bits
18https://en.wikipedia.org/wiki/Bloom_filter
22. Bloom Filters (error rate)
22https://en.wikipedia.org/wiki/Bloom_filter
m bits
k is number of hashing functions
k=2 , hash functions f & g
m=8
m is the number of bits in our bitset
n is the number of items represented
n=1, “hello”
f(“hello”)
probability of a single bit NOT being set is (1 - 1/8)^2 , more generally (1-1/m)^k
as n grows, this becomes (1-1/m)^kn
g(“hello”)
23. Bloom Filter properties
• Memory compaction (lots of items in a small space)
• Possible existence (and false positives)
• Collision probability determined by number of items/number
of bits
23
25. CAP theorem
• Consistency
• Availability
• Partition Tolerance
25
https://en.wikipedia.org/wiki/CAP_theorem
The CAP theorem is a negative result that says you cannot
simultaneously achieve all three goals in the presence of
errors. Hence, you must pick one objective to give up.
http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-
theorem/fulltext
26. CAP theorem in P2P networks
26
A node in the Bitcoin network