Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Blockchain Technologies for Data Science

400 views

Published on

Bitcoin has brought about a true revolution in how we think about money. In one fell stroke it solved the main problems that afflicted previous attempts at a truly digital currency: distributed consensus, double spending, and external attacks. Perhaps more importantly, it provided the first working version of a blockchain or distributed ledger. However, despite their relative simplicity, the underlying concepts on which these technologies are built are not well known and often obscured by hype and technical jargon.

Since the days of Bitcoin’s founding, many other crypto-currencies have been proposed and released. This tutorial will introduce these technologies in an intuitive way for data scientists, explaining their driving algorithms, motivations, and data structures.

Published in: Economy & Finance
  • Be the first to comment

  • Be the first to like this

Blockchain Technologies for Data Science

  1. 1. Bruno Gonçalves www.data4sci.com Blockchain Technologies for Data Scientists
 
 www.data4sci.com https://github.com/DataForScience/blockchain-data
  2. 2. www.data4sci.com@bgoncalves The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of my employers. The examples provided with this tutorial were chosen for their didactic value and are not mean to be representative of my day to day work. Disclaimer
  3. 3. www.bgoncalves.com@bgoncalves References
  4. 4. www.data4sci.com@bgoncalves How Money Works • Many different types of currency have been used throughout the years • Gold • Cattle • Salt • Rocks • Sea Shells • etc… • Fundamental functions (and properties) of money: • Medium of Exchange (Fungibility, Portability) • Unit of Account (Cognizability) • Store of Value (Durability, Stability of Value / 
 Scarcity) https://www.continentalcurrency.ca/wp-content/uploads/2015/06/continental-currency-exchange.jpg https://upload.wikimedia.org/wikipedia/commons/2/2b/Olaus_Magnus_-_On_Trade_Without_Using_Money.jpg
  5. 5. www.data4sci.com@bgoncalves Money Flower https://www.bis.org/publ/qtrpdf/r_qt1709.pdf https://upload.wikimedia.org/wikipedia/commons/2/23/US_one_dollar_bill%2C_obverse%2C_series_2009.jpg https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/49024-SOS-ATM.JPG/1600px-49024-SOS-ATM.JPG https://pixabay.com/p-3125488/?no_redirect
  6. 6. www.data4sci.com@bgoncalves The Ingredients of a Crypto-Currency • Fundamental functions (and properties) of money: • Medium of Exchange (Fungibility, Portability) • Unit of Account (Cognizability) • Store of Value (Durability, Stability of Value / Scarcity) • Digital currencies have obvious advantages: • Fungibility - A bit is a bit • Portability - We can easily cary terabytes in our pockets or digitally transfer them across the world • Cognizability - It’s just a number • Durability - Information can easily be stored for decades or even centuries • And obvious problems: • Stability of Value / Scarcity • There is no limit to how many you can create • They can be easily duplicated (no trust) • etc… https://pixabay.com/photos/bank-money-finance-shares-save-2907728/
  7. 7. www.data4sci.com@bgoncalves The Ingredients of a Crypto-Currency • How to avoid the need for a centralized authority? • Distributed Ledger • How to avoid double spending? • Consensus algorithm • How to prevent cheaters? • Reward miners for being honest https://pixabay.com/photos/bank-money-finance-shares-save-2907728/ https://news.bitcoin.com/10-years-ago-bitcoins-genesis-block-changed-the-course-of-history/
  8. 8. www.data4sci.com@bgoncalves The road to Bitcoin https://queue.acm.org/detail.cfm?id=3136559
  9. 9. www.data4sci.com@bgoncalves How does it work? • According to “Satoshi Nakamoto”: • Essentially: • Transactions are processed in blocks • Miners verify the transactions in each block [ Mining Reward ] • Blocks are cryptographically signed to prevent tampering [ Proof of Work ] • Each block points to the previous block to form a chain • The longest chain is defined as the correct one [ Consensus ] https://bitcoin.org/bitcoin.pdf
  10. 10. www.data4sci.com@bgoncalves Linked List Header 0: Null Pointer DATA Header 1: Pointer to 0 DATA Header 2: Pointer to 1 DATA Header N: Pointer to N-1 DATA⋯
  11. 11. www.data4sci.com@bgoncalves Blockchain Header 0: Null Hash Header 1: Hash of 0 Header 2: Hash of 1 Header N: Hash of N-1 DATA DATA DATA DATA⋯ • The hashing function ensures that any changes to a validated previous block invalidates it.
  12. 12. www.data4sci.com@bgoncalves Blockchain Header 0: Null Hash Header 1: Hash of Header 0 Header 2: Hash of Header 1 Header N: Hash of Header N-1 ⋯ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ • The hashing function ensures that any changes to a validated previous block invalidates it. • Keep hash only the previous header for efficiency
  13. 13. www.data4sci.com@bgoncalves Blockchain Header 0: Null Hash Txs Hash Header 1: Hash of Header 0 Txs Hash Header 2: Hash of Header 1 Txs Hash Header N: Hash of Header N-1 Txs Hash ⋯ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ • The hashing function ensures that any changes to a validated previous block invalidates it. • Keep hash only the previous header for efficiency • Header must include hash of data
  14. 14. www.data4sci.com@bgoncalves Proof Of Work • Each block is identified by a hash value • To control the rate at which blocks can be verified, and avoid malicious attacks the hash value of the block must obey certain conditions • Miners add values to the data in the header (a nonce) until the hash fulfills these conditions • The cryptographic properties of the hashing function guarantee that each value of the nonce is a roll of the dice (fairness) • The “difficulty” level of the chain establishes how hard it is to obtain a valid hash • Any change to the block data invalidates the hash • The difficulty is adjusted every two weeks to account
 for the hashing capability of the network
  15. 15. www.data4sci.com@bgoncalves Blockchain Security www.data4sci.com https://spectrum.ieee.org/computing/ networks/the-future-of-the-web-looks-a- lot-like-bitcoin
  16. 16. www.data4sci.com@bgoncalves Blockchain Header 0: Null Hash Difficulty 0 Nonce 0 Txs Hash Header 1: Hash of Header 0 Difficulty 1 Nonce 1 Txs Hash Header 2: Hash of Header 1 Difficulty 2 Nonce 2 Txs Hash Header N: Hash of Header N-1 Difficulty 3 Nonce N Txs Hash ⋯ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ • The hashing function ensures that any changes to a validated previous block invalidates it. • Keep hash only the previous header for efficiency • Header must include hash of data
  17. 17. www.data4sci.com@bgoncalves Blockchain Header 0: Null Hash Difficulty 0 Nonce 0 Merkle Root Header 1: Hash of Header 0 Difficulty 1 Nonce 1 Merkle Root Header 2: Hash of Header 1 Difficulty 2 Nonce 2 Merkle Root Header N: Hash of Header N-1 Difficulty N Nonce N Merkle Root ⋯ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ Tx 0 Tx 1 Tx 2 Tx 3 Tx N ⋮ • The hashing function ensures that any changes to a validated previous block invalidates it. • Keep hash only the previous header for efficiency • Header must include hash of data
  18. 18. www.data4sci.com@bgoncalves Hash 44 T4 + T4 Hash 4 Tx 4 Merkle Tree Hash 0 Tx 0 Hash 1 Tx 1 Hash 3 Tx 3 Hash 2 Tx 2 Hash 4 Tx 4 Hash 01 T0 + T1 Hash 23 T2 + T3 Hash 0123 Hash 01 + Hash 23 Hash 4444 Hash 44 + Hash 44 Hash 44 T4 + T4 Merkle Root Hash 01234444 Hash 0123 + Hash 4444
  19. 19. @bgoncalves Merkle Tree def get_root(transactions): missing = len(transactions) next_transactions = [] while missing > 1: print(missing) next_transactions = [] # If it's odd, repeat the last transaction if missing % 2 == 1: transactions.append(transactions[-1]) missing += 1 for i in range(0, missing, 2): input = transactions[i] + transactions[i+1] output = doubleSha256_new(input) next_transactions.append(output) print(i, i+1, next_transactions[-1]) missing = len(next_transactions) transactions = next_transactions[::] return invert_byteorder(next_transactions[0]) merkle_proof.py
  20. 20. www.data4sci.com@bgoncalves Merkle Proof - Tx 2 Hash 0 Tx 0 Hash 1 Tx 1 Hash 3 Tx 3 Hash 2 Tx 2 Hash 4 Tx 4 Hash 4 Tx 4 Hash 01 T0 + T1 Hash 23 T2 + T3 Hash 0123 Hash 01 + Hash 23 Hash 4444 Hash 44 + Hash 44 Hash 44 T4 + T4 Hash 44 T4 + T4 Merkle Root Hash 01234444 Hash 0123 + Hash 4444
  21. 21. www.data4sci.com@bgoncalves Merkle Proof - Tx 2 Hash 3 Tx 3 Hash 2 Tx 2 Hash 01 T0 + T1 Hash 23 T2 + T3 Hash 0123 Hash 01 + Hash 23 Hash 4444 Hash 44 + Hash 44 Merkle Root Hash 01234444 Hash 0123 + Hash 4444 • In general, with N transactions you need only hashes to verify if a transaction is part of the tree. • Building the tree, requires computing….. total hashes log2 (N)
  22. 22. @bgoncalves Merkle Proof def get_root(transactions): missing = len(transactions) next_transactions = [] while missing > 1: print(missing) next_transactions = [] # If it's odd, repeat the last transaction if missing % 2 == 1: transactions.append(transactions[-1]) missing += 1 for i in range(0, missing, 2): input = transactions[i] + transactions[i+1] output = doubleSha256_new(input) next_transactions.append(output) print(i, i+1, next_transactions[-1]) missing = len(next_transactions) transactions = next_transactions[::] return invert_byteorder(next_transactions[0]) merkle_proof.py
  23. 23. www.data4sci.com@bgoncalves Transactions https://en.bitcoin.it/wiki/Transaction Transaction inputs outputs Transaction inputs outputs Transaction inputs outputs
  24. 24. www.data4sci.com@bgoncalves Transactions Transaction HUF inputs outputs Transaction TYU inputs Tx AFD 3 Tx HUF 1 outputs Addr asd: 2.76 Addr otu: 5.50 Transaction AFD inputs outputs 1 03 2 1 0 01 5.15 B 3.14 B • All input values must be spent • Each input is a complete output • 0 or more inputs • 1 or more outputs • Inputs point to previous transaction outputs • Outputs are addresses Fee = ∑ in Vin − ∑ o u t Vo u t ≥ 0 • Difference between inputs and outputs is paid as a fee to the block validator Block Reward
  25. 25. www.data4sci.com@bgoncalves Transactions https://en.bitcoin.it/wiki/Transaction • All input values must be spent • Each input is a complete output • 0 or more inputs • 1 or more outputs • Inputs point to previous transaction outputs • Outputs are addresses Transaction inputs outputs • First transaction in a block is the block reward and goes to the miners (coinbase transaction) • coinbase transactions include all fees for the transactions included in the block • Base block reward is how new Bitcoins are produced
  26. 26. www.data4sci.com@bgoncalves Transactions https://en.bitcoin.it/wiki/Transaction
  27. 27. www.data4sci.com@bgoncalves Bitcoin transactions around the world http://www.vo.elte.hu/papers/2017/bitcoin/
  28. 28. www.data4sci.com@bgoncalves
  29. 29. www.data4sci.com@bgoncalves
  30. 30. www.data4sci.com@bgoncalves
  31. 31. www.data4sci.com@bgoncalves www.data4sci.com@bgoncalves
  32. 32. www.data4sci.com@bgoncalves How much is it worth?
  33. 33. www.data4sci.com@bgoncalves Bitcoin interest over time 0 25 50 75 100 2009-022009-072009-122010-052010-102011-032011-082012-012012-062012-112013-042013-092014-022014-072014-122015-052015-102016-032016-082017-012017-062017-112018-042018-092019-02 blockchain bitcoin
  34. 34. www.data4sci.com@bgoncalves Bitcoin price
  35. 35. Questions?
  36. 36. www.data4sci.com@bgoncalves

×