2. Cryptographic Hash Functions - Overview
Properties of Hash Function
• Input
• String of any size
• Output
• fixed size output
• Efficiently Computable
• computing hash of n-bit string
should be O(n)
Properties of Cryptographic Hash
Functions
• Collision Resistance
• Very hard to find inputs m1 and
m2 with h(m1) = h(m2)
• Hiding
• Commitments
• Puzzle-Friendliness
• Search Puzzle
3. Property - Collision Resistance
What is Collision?
• Collisions exist
• Input size > Output size
• Methods exist to find collisions
Input/Message Input/Message
Hash Function Hash Function
Hash Value
Hash Value
Hash
Values
matching
?
Yes
No
Initial Step
Later Stage
Application – Message Digest
4. Methods for Detecting Collisions
General Algorithm
• hash function with a 256‐bit
output size: pick 2^256 + 1
distinct values
• Randomly choose 2^130 + 1
inputs, 99.8% chance of collision
• Problem – Very long time to
find collision
Algorithms to efficiently find
collision
• Example -
• Accepts any length input and
returns 256-bit output
• 3 and 3 + 2^256 would collide
5. Property -
Hiding
• Assertion
• Given an output of the hash function, there
should be no feasible way to find input
• Big idea
• hiding an input that is not spread out by
concatenating with a value that is spread out
• Defining Hiding
• when a secret value r is chosen from a
probability distribution that has high
min‐entropy, then given H(r ‖ x) it is infeasible
to find x.
• Min-entropy is a measure of predictability of
outcome
6. Application -
Commitments • Commit Procedure
• Verify Procedure
• Message and Nonce is shared
Concatenate
given nonce and
message
Find commit
(Hash of
concatenate
value)
Compare the
value found with
commitment
received earlier
Verified if values
match
Generate
random Nonce
Value (eg. 256
Bit)
Concatenate
Nonce value with
message
Hashing
concatenated
value
(Commitment)
Publish
Commitment
7. Property –
Puzzle
Friendliness
• Definition
• Hash function H is puzzle‐friendly if for every
possible n‐bit output value y, if k is chosen
from a distribution with high min‐entropy,
then it is infeasible to find x such that H(k ‖ x)
= y, in time significantly less than 2^n
• Search Puzzle
• Large space where the only way to find valid
solution is to search entire space
• Size of y determines hardness of puzzle
8. SHA-256
• Convert a hash function that works on fixed
length input (Compression function) into a
hash function that works on arbitrary length
input – Merkel Damgard Transform
• Merkel Damgard Transform
• Input size m
• Output size n (length(m) > length(n))
• Input divided into blocks of size m-n
• Pass each block together with output of
previous block into compression function
• First block has no previous output, so we
use an initialization vector (IV)
• Last block gives output
• Example – m = 768 bit, n = 256 bit
10. Centralization vs
Decentralization
• Decentralized technologies - Internet, Email – at its
core decentralized (SMTP)
• Centralized Technologies - Social Networking
• Decentralization is not all or nothing
• Email primarily decentralized, yet dominated by
companies (Centralized)
• Bitcoin is decentralized, but wallet
software, bitcoin managing software centralized
• Extent of decentralization in Bitcoin
• Peer to peer network is decentralized
• Mining is technically decentralized but capital
cost to mine is high (Largely Centralized)
• Updates to Bitcoin nodes provided by trusted
developers (Centralized to a large extent)
11. Distributed Consensus
• Traditional application
• reliability in
distributed system
• Other applications
• Build massive
distributed key-value
store (Maps arbitrary
keys to values)
Distributed consensus protocol : n
nodes that each have an input value.
Some of these nodes are faulty or
malicious.
A distributed consensus protocol has two
properties:
• Must terminate with all honest nodes in
agreement on the value
• The value must have been generated by
an honest node
12. Distributed
consensus
in Bitcoin
• Every transaction is broadcast to all nodes
• Transaction is received whether the receiver is
running a node or not
• What are we trying to reach consensus on?
• Nodes must agree on which transactions were
broadcast and in what order
• Essentially, a global ledger
• Each node has a
• ledger that contain sequence of blocks (block
is a list of transactions), they have reached
consensus for
• Pool of outstanding transactions (Consensus
not reached yet)
13. Method for consensus
• Problems with this approach:
• Nodes might crash or be outright malicious
• Network is highly imperfect
• High latency, due to distributed nature
• No notion of global time possible
Every x minutes, each block
proposes own pool of outstanding
transaction as next block
Each node runs consensus
protocol, on input block
If successful valid block is chosen
as next block
14. Impossibility
results
Lack of global time constraints algorithm usage
• Army divided into divisions, each division led by general
• Some generals are traitors
• Idea is to unify good generals, without letting bad generals
mislead them into a bad plan
• Proven that impossible to achieve if more than 1/3 generals
are traitors
Byzantine generals problem
• Under some conditions (Includes nodes acting in
deterministic manner), consensus impossible with single
faulty process
Fischer‐Lynch‐Paterson impossibility result
15. Distributed Consensus - Breaking
Traditional Assumptions
• Bitcoin violates many assumptions built into the
previous impossibility results
• Idea of incentive – Increases honesty in
participants
• Embraces notion of randomization
• No notion of start and end point for
consensus
• Consensus happens over a long time
• Node cannot be certain if a block
entered ledger
• Over time, probability that your view of
block matching consensus view increases
17. Why do we need ID?
• Sybil Attack
Pseudonymity is the goal of Bitcoin.
Malicious adversaries make it look like there are
different participants but controlled by same
adversary.
• Security and Instructions
Difficult for protocol instructions and to derive
security property.
What can be done?
Select random nodes instead of Node ID
to propose next block.
18. Lottery/Raffle:
We assume that ID/token is given to a node randomly.
Implicit Consensus:
No consensus algorithm to select blocks/voting of any kind to
select node.
The chosen node unilaterally proposes next block.
What if that node is malicious?
Implicitly other nodes will accept or reject that block.
Accept : Extend the blockchain by including that block.
Reject: Extend the blockchain by ignoring that block and build
on previous block that they accepted.
19. Bitcoin
Consensus
Algorithm
1
• New Transactions are broadcast to all nodes
2
• Each node collects new transaction into block.
3
• In each round a random node gets to broadcast
its block
4
• Other nodes accept the block only if all
transactions are valid
5
• Nodes express their acceptance of the block by
including its hash in the next block they create
21. To answer this: 3 ways where adversary can abrupt the process?
1. Stealing Bitcoins
Can Alice can't steal from other user at different address?
No, because their sign can't be forged.
As long as underlying crypto is solid this is not possible.
2. Denial of Service
Alice decides not to include transactions coming from a particular user in the block she
proposes.
Overcome when user waits for honest node to propose block with their transaction.
So even this is not a good attack.
22. 3. Double-Spend Attack
Alice pays Bob in exchange of service and that service is given and this transaction is on
the block proposed now.
If Alice gets the chance to propose a next block, she can propose a block that ignores
payment to Bob and instead contains pointer to previous block.
Alice can add transaction that can transfer the coins that she was sending to Bob to
herself.
23. How do we know if double spend succeeds?
• Depends on which block ends up on blockchain.
• Honest nodes follow longest valid branch
• No right answer as next extended block can be any
• Moral point of view we know which is malicious transaction but technically both seem same
• If it succeeds Bob’s block becomes orphan block.
How can Bob prevent it?
Zero Confirmation Transaction
• Bob allows download even before getting any confirmation on that block
• More chance of double spend
24. 1 Confirmation 3 Confirmations
Double spend attempt
• If Bob is more cautious and sees his block
orphaned, download of software should
be denied.
• Double spend probability decreases with
number of confirmations
• Wait for 6 Confirmations!!
• Protection against this attack is purely on
consensus.
25. Incentives
Bitcoin decentralization - technical mechanism + incentives.
• If there are financial incentives to participants to subvert process, can't assume nodes will
be honest.
Can we penalize malicious nodes?
• No as we don’t know the node ID.
Can we reward the honest nodes?
• We still don’t know the ID to send cash reward, but can use digital currency
• We can pay the honest nodes in units of digital currency.
26. Incentives
1. Block Reward:
• Node that creates block can include a special
transaction(coin-creation) and have recipient address
to itself.
• Value of reward is fixed and halves every 4 years. At
first 50 bitcoins later 25.
But how will this reward honest behavior?
• If block is on long-term consensus branch, then coin-
creation transaction is accepted to take reward by block
creator.
• As there is total of 21 million for how many bitcoins
there can be, block reward will run out in 2140.
27. Incentives
Therefore, we need another incentive mechanism
2. Transaction Fee:
• Transaction creator can make total value of transaction outputs less than total value of inputs.
• Voluntary like tips.
• Whoever creates the block with that transaction collects the transaction fee.
• The three major issues here is that
• How to pick random node?
• How to avoid free-for-all where everybody wants to run Bitcoin node for rewards?
• How to prevent Sybil nodes?
28. Proof of Work
In Proof of Work method, miners compete against each other to solve
a complicated mathematical problem called as hash puzzle
29. Proof of
Work
• Instead of a random node,
select nodes in proportion to
a resource that
cannot be monopolized
• In POW , resource =
computing power
• In POS (Proof of Stake) ,
resource = ownership of
currency
31. Hash Puzzles
• For a new block creation, the miner is
required to find a nonce that satisfies
the following condition
• H(nonce || prev_hash || tx || tx || ...
|| tx) < target
• The only method to successfully
complete this hash puzzle is to simply
attempt enough nonces one by one
until you succeed.
32.
33. Properties of Hash puzzles
Difficult to
compute
Parameterizable
cost
Simple to verify
34. Difficult to compute
• By the end of 2014, difficulty
level is roughly 10^20 hashes in
each block
• Only some nodes bother to
compete : miners
• Bitcoin mining : process of
continuously attempting to solve
these hash puzzles
35. Parameterizable
cost
• Nodes automatically recalculate the target every 2
weeks
• It takes around 10 minutes on average for Bitcoin's
network to produce a block after another
• Over a 2 weeks period, more blocks are added
• Prob(Alice wins the next block) = fraction of global
hash power she owns
• Why do we want to maintain this 10‐minute
invariant?
36. Key security assumption
• Previous : Assume that at-least 50% nodes are honest
• Instead after POW :
• A lot of attacks on Bitcoin are infeasible if the majority of miners, weighted by
hash power, are following the protocol
• At all times, there will be at least a 50% probability that the next block will come
from an honest node due to the competition for proposing the next block.
37. Game theory Point of view
• Game theoretic view : we don’t split
nodes into honest and malicious
• Each node tries to maximize its
payoff
• Active area of research :
• whether the standard behavior of
miners provides a stable condition
in which no miner may obtain a
bigger payout by acting
dishonestly
38. Proof of Work
Problem
• How to pick a random node?
• How to avoid a free-for-all
due to rewards ?
• How to prevent Sybil attacks
?
Solution
• Nodes ∝ Computing Power
• Enable nodes to
engage in competition
• Make it moderately hard to
create new identities
39. Distributed Consensus
• Solving Hash puzzle – probabilistic
• Try nonce by nonce till you succeed
• Bernoulli Trials
• 2 possible outcomes – hash falls in target
or not
• Probability of outcome is fixed between
successive trials
• Since nodes try so many nonces, discrete
probability process can be approximated by
continuous probability process – Poisson
Process
40. Distributed
Consensus
• mean time to find next block = 10 minutes/fraction
of hash power
• 0.1 percent of the total network hash power -> find
blocks once every 10,000 minutes (Approximately)
• Trivial to verify:
• A nonce is published as part of the block
• Trivial to look at the block contents, hash them
all together, and verify that the output is less
than the target
• No central authority needed to verify
43. What happens when consensus fail?
Steal Bitcoin
01
Supress
Transaction
02
Change Block
Reward
03
Destroy
Confidence in
Bitcoin
04
Editor's Notes
Crypto currencies like fiat currencies require security measures to avoid malicious attacks.
In cryptocurrencies, entire mechanism must be decentralized and based just on technology
Cryptographic hash functions provides a mechanism for encoding the rules of crypto currency system in itself
Requirements for a function to be a hash function
Input size
Output size
Efficiently computable
Hash function is a mathematical function with the following properties mentioned above
For hash functions to be cryptographically secure, we need 3 additional properties
Collision resistance
Hiding
Puzzle-friendliness
We will also be discussing an application of each property
We first look at Collision resistance
First let us understand what is a collision in a CHF?
If we have two distinct inputs and when input into the hash function, they yield the same hash output then we have encountered a collision
You can see it pictorially represented in the diagram to the left
Our goal is to make it infeasible to find a collision
Collisions definitely exist as there is more input than output, by pigeon hole principle we know this to be true
We are trying to make it infeasible to find collision, yet there are methods to find collision as show in the slide
Explain the slide contents
General algorithm is very slow, would take a really long time to compute all the values to find collision
Second algorithm is a more practical way of hash function with an inbuilt collision detection mechanism
Application:
We use hash outputs as message digests
We know that if two hashes match then the input should be the same. If they are different then it would violate the collision resistance property
File system example
Find hashes and compare later to see if hashes match
Saves storage and we can compare large files very efficiently
The second property that we will be discussing is Hiding
Consider example of a coin toss, we toss coins and give hash of result to a player who is not aware of the result. They can easily figure out if it heads or tails from the output hash they are given as the input is a limited set that is not spaced out
We need to try and make sure that no input is particularly likely
In the case of example where input is not spread out, we try and add another input that is spread out to our message and then find the hash
Min entropy is a measure of how predictable an outcome is, if its more then it captures the idea that the distribution is very spread out
The application of hiding is commitments
Nonce is a random value that is generated
We need to commit to a value, it cannot be changed after we commit
The third property that we will be seeing is puzzle friendliness
If we want to target hash function to produce a particular set of or single output, if we have chosen a part of the input suitably, it should be very difficult to find another value that allows us to get the required target value
The application of this is search puzzle
Mathematical problem that requires searching over a large space
In search puzzles, the only valid solution is to search the entire large space
Difficulty of the puzzle is determined by size of y,
If y = n, solution is trivial
If y = 1, puzzle is maximally hard
If any value of id was particularly likely, people could cheat by precomputing values
Important to generate this id value in a suitable random way
SHA 256 is a commonly used hash function
the underlying fixed‐length collision‐resistant hash function is called the compression function
Explain that input size remains (m-n)+n = m as the blocks passes through
Now we will head into chapter 2 and discuss about distributed consensus
First, we tackle the issue of Centralization vs decentralization and common examples of both
* A massive key-value store in turn would allow for other applications like distributed DN system for mapping human understandable names to IP address
Now let us look at what happens when a transaction is made
The lack of global time heavily constrains the set of algorithms that can be used in the consensus protocols. In fact, because of these constraints, much of the literature on distributed consensus is somewhat pessimistic
Explain the 2 impossibility results that have been proven
Current state of bitcoin, works better in practice than in theory.
Bitcoin violates many impossibility results as it violates many strict assumptions that were made
It is however important to gain a strong theoretical understanding of the subject, as this will help the currency gain stability and security
Probability that your view and the global view diverge goes down exponentially over time
Distributed Consensus algorithm in Bitcoin is different than traditional as in this P2P system there is no Central authority to give persistent identities to participants. In real life / digital we have ids like username, passport etc to identify us.
Pseudonymity : Nobody is forced to reveal their real life ID, name or IP address to participate.
Sybil attack aims to undermine the authority or power by gaining the majority of influence in the network.
Instructions : For eg, Node with lowest ID should do some step.
Security: By having node id we could assume the number of nodes and derive security property of that.
Because of no ID consensus is harder
Ability to create new block is proportional to processing power
Compensate for ID by doing weaker assumption.
Assumption leads to something called implicit consensus.
Assume token generation and distribution algorithm is smart- if adversary creates multiple nodes, it assigns one ID to all Sybils.
Reject : each block has hash of block it extends so can build on previous block
2. new transaction not yet in blockchain
3. multiple rounds. Raffle system to select random node
4. valid(unspent/double spend attempt, no valid sign) - accept / reject
5. Proposing next block.
2. Alice dislikes Bob. This is just annoyance.
Transaction has Alice signature, Bob Public key and hash.
This hash represents a pointer to previous transaction output that Alice received and is now spending.
Only one transaction can be in chain
The node that proposes next block may decide to build on either of them and this choice will largely depend whether double spend succeed
P2P extend block usually of block they heard first due to latency they can hear about double spend transaction and decide to extend double spend transaction
Bob sees transaction before anyone else
K confirmations -> probability of double spend goes down exponentially as function of k.
6 is a good trade off between time to wait and guarantee of transaction
Protection against invalid transaction – crypto + consensus
No 100% guarantee but waiting 6 transaction is pretty good.
So far looked at technical aspect now incentives
We saw we picked random node and think atleast 50% of time its honest node. But this assumption of honestly is a problem
Consider if we could penalize the node that created double spend transaction
Reward if their block ends up in long term consensus chain
2 Mechanisms
Payment to node in exchange of creating honest block in consensus chain.
It will make sure to behave how other nodes approve and extend their block
Geometric series - Limits the supply of bitcoins to 21 million. This is only way to create bitcoins that’s why 21 million
So in 2140 will Blockchain become insecure and stop? No as we have another mechanism
As block reward run out this will become more important and mandatory though its tip
How system will evolve and depends on game theory – research topic.
For eg there are 200 transaction in block they collect 200 transaction fees.
2nd problem – we created by having incentives where everyone wants to collect rewards
Distributed Consensus algorithm in Bitcoin is different than traditional as in this P2P system there is no Central authority to give persistent identities to participants. In real life / digital we have ids like username, passport etc to identify us.
Crypto currencies like fiat currencies require security measures to avoid malicious attacks.
In cryptocurrencies, entire mechanism must be decentralized and based just on technology
Cryptographic hash functions provides a mechanism for encoding the rules of crypto currency system in itself
Requirements for a function to be a hash function
Input size
Output size
Efficiently computable
Hash function is a mathematical function with the following properties mentioned above
For hash functions to be cryptographically secure, we need 3 additional properties
Collision resistance
Hiding
Puzzle-friendliness
We will also be discussing an application of each property
Security – Adversary should not create fake nodes and take over more than 50% of the block creation.
For this pre requisite is health of mining ecosystem – honest protocol following nodes
Pre requisite for health is value of currency. As incentives are more fraud is more
Value of currency depends on security is system is secure more people buy bitcoin and its value increases
Bootstrapping is the tricky process where all 3 are inter-dependent
51% attack
1. If crypto is there you cant forge but if invalid block is made longer by attacker honest nodes will not accept this and add valid block.
So there is fork creation. If stolen coin is used to buy from Bob he sees its invalid even though it has longest branch.
2. Transaction can be prevented from entering blockchain but it will still be in p2p as transactions are broadcastes
3. not possible as attacker does not control copies of software that honest nodes have
4. Possible