Successfully reported this slideshow.
Your SlideShare is downloading. ×

Module: Content Addressing in IPFS

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 33 Ad

Module: Content Addressing in IPFS

Download to read offline

Learn all about the most foundational principle of the IPFS architecture: the IPFS Content Identifier, or CID. Through this module, you will understand:
- how IPFS addresses files,
- how IPFS transforms files into Merkle DAGs, as well as
- the detailed anatomy of a CID!

Learn all about the most foundational principle of the IPFS architecture: the IPFS Content Identifier, or CID. Through this module, you will understand:
- how IPFS addresses files,
- how IPFS transforms files into Merkle DAGs, as well as
- the detailed anatomy of a CID!

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Module: Content Addressing in IPFS (20)

Advertisement

Recently uploaded (20)

Module: Content Addressing in IPFS

  1. 1. Module: Content Addressing in IPFS ResNetLab on Tour
  2. 2. Agenda ➔ Content Addressing Motivation ◆ What is it and why it’s good ➔ Content Addressing in IPFS ◆ Chunking ◆ Linking Chunks in Merkle DAGs ◆ Going from Data to Data Structures with IPLD ➔ Anatomy of the IPFS Content Identifier (CID)
  3. 3. Motivation for Content Addressing What is it and why it’s good Motivation Content Addressing in IPFS Anatomy of a CID
  4. 4. Why location addressing fails us ● The URL only points to a single copy, stored in a single location. ● If that copy disappears, there is no way to know where other copies are. ● It is not possible for a user to validate the integrity of the content ○ A malicious actor can poison DNS, or change the copy’s location, without the end user noticing. ○ HTTPS is an improvement, but only secures the transport, not the content. ● No request aggregation, resulting in duplication of effort and bandwidth waste (i.e. no options for multicast in the wild).
  5. 5. Location- vs content-addressing location addressing: ask exactly one host: IP(dns(URL)) content addressing: ask any host, choose closest host The address of content (i.e. its fingerprint) is derived from the content itself and not from the location where it is stored.
  6. 6. /ipfs/QmW98pJrc6FZ6/cat.png IP:120.1.11.22 http://example.com/cat.png http://10.20.30.40/cat.png ipfs://QmW98pJrc6FZ6/cat.png IP:10.20.30.40 IP:15.25.35.45 location content
  7. 7. CIDs are: ● the most fundamental ingredient of the IPFS architecture ● used for content addressing ● used to name every piece of data in IPFS ● a hash with some metadata ● self describing Content Identifier CIDv0: QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv CIDv1: bafybeibxm2nsadl3fnxv2sxcxmxaco2jl53wpeorjdzidjwf5aqdg7wa6u
  8. 8. CIDs are Immutable links Deduplication Identical data can be verified by its address -> Caching -> saves resources and provides faster access to content Self-certification Content is authenticated by its address, not by a Certificate Authority -> Decentralisation Immutability If the content changes, its address also changes -> Integrity Checking
  9. 9. Why Immutability? fingerprint data Integrity Checking hash
  10. 10. == != Integrity Checking Why Immutability? 😊 👍 😤 👎
  11. 11. a CONTENT ADDRESSING CONTENT DISCOVERY & ROUTING CONTENT EXCHANGE ● Chunking ● Linking Chunks in Merkle DAGs ● From Data to Data Structures with IPLD ● Anatomy of the IPFS CID ● Routing & Provider Records ● DHT-based Routing ● Gossip-based Routing ● Bitswap ● GraphSync MUTABLE NAMES & MESSAGE DELIVERY ● Dynamic Data ● IPNS ● PubSub ● CRDTs IPFS Components
  12. 12. Content Addressing 1 Chunking 2 Linking Chunks 3 Going from Data to Data Structures Motivation Content Addressing in IPFS Anatomy of a CID
  13. 13. a CONTENT ADDRESSING CONTENT DISCOVERY & ROUTING CONTENT EXCHANGE ● Chunking ● Linking Chunks in Merkle DAGs ● From Data to Data Structures with IPLD ● Anatomy of the IPFS CID ● Routing & Provider Records ● DHT-based Routing ● Gossip-based Routing ● Bitswap ● GraphSync MUTABLE NAMES & MESSAGE DELIVERY ● Dynamic Data ● IPNS ● PubSub ● CRDTs IPFS Components
  14. 14. Content Addressing Chunking ● Deduplication ● Piecewise Transfer ● Random Access Each chunk is individually addressed and identified by its own hash File Chunked File
  15. 15. Chunking ● Deduplication ● Piecewise Transfer ● Random Access Optimise storage requirements Content Addressing File Chunked File
  16. 16. ● Deduplication ● Piecewise Transfer ● Random Access Identify errors without having to fetch the whole file Discard parts that arrived in error Chunking Content Addressing File Chunked File
  17. 17. ● Deduplication ● Piecewise Transfer ● Random Access Optimise bandwidth requirements Fetch the parts you need only Chunking Content Addressing File Chunked File
  18. 18. a CONTENT ADDRESSING CONTENT DISCOVERY & ROUTING CONTENT EXCHANGE ● Chunking ● Linking Chunks in Merkle DAGs ● From Data to Data Structures with IPLD ● Anatomy of the IPFS CID ● Routing & Provider Records ● DHT-based Routing ● Gossip-based Routing ● Bitswap ● GraphSync MUTABLE NAMES & MESSAGE DELIVERY ● Dynamic Data ● IPNS ● PubSub ● CRDTs IPFS Components
  19. 19. • Files are split in chunks that are connected into a Merkle Directed Acyclic Graph (Merkle DAG). • In Merkle DAGs, similarly to Merkle Trees: • Every chunk is represented as a node. • The hash of the chunk is the address of the node. • The address of every node is embedded in its parent, as a link. • What is not possible with Merkle Trees is for a node to have multiple parents. • Each node in the DAG is addressed and can be accessed independently, making it an entity of its own. Content Addressing in IPFS
  20. 20. File Chunks: UnixFS File: (merkle-link) (a hash) (merkle-tree) 0-200 200-350 0-100 100-200 200-300 300-350 Linking Chunks in a DAG Content Addressing
  21. 21. File Chunks: UnixFS File: (merkle-link) (a hash) (merkle-tree-dag) - directed acyclic graph 0-200 200-350 0-100 100-200 200-300 300-350 Merkle DAGs are graph data structures where each node is content-addressed Visit: dag.ipfs.io Linking Chunks in a DAG Content Addressing
  22. 22. a CONTENT ADDRESSING CONTENT DISCOVERY & ROUTING CONTENT EXCHANGE ● Chunking ● Linking Chunks in Merkle DAGs ● From Data to Data Structures with IPLD ● Anatomy of the IPFS CID ● Routing & Provider Records ● DHT-based Routing ● Gossip-based Routing ● Bitswap ● GraphSync MUTABLE NAMES & MESSAGE DELIVERY ● Dynamic Data ● IPNS ● PubSub ● CRDTs IPFS Components
  23. 23. • IPLD provides the standards and formats to build data structures based on Merkle-DAGs. • With IPLD we go from Data to Data Structures. • You can think of an IPLD Graph as the tree of folders in your file system. It starts from the root and splits to directories and files. • Recall: • Every point in the tree is a node in the IPLD graph • Every node in the IPLD graph has its own CID with the root node having the root CID • But also: every point in the tree points and is linked to from its parent node. InterPlanetary Linked Data (IPLD)
  24. 24. LOCATION-BASED ADDRESSING CONTENT-BASED ADDRESSING IPLD-Powered Merkle-DAGs Content Addressing with Merkle DAGs: From Data to Data Structures with IPLD LOCATION-BASED IDENTIFIER CONTENT-BASED IDENTIFIER IPLD-Powered Merkle-DAGs http://example.com corresponds to ipfs://Qm<hash> Only one host can serve this Anyone with the content can serve this IPLD lets you seamlessly link and traverse various types of content-linked data
  25. 25. a CONTENT ADDRESSING CONTENT DISCOVERY & ROUTING CONTENT EXCHANGE ● Chunking ● Linking Chunks in Merkle DAGs ● From Data to Data Structures with IPLD ● Anatomy of the IPFS CID ● Routing & Provider Records ● DHT-based Routing ● Gossip-based Routing ● Bitswap ● GraphSync MUTABLE NAMES & MESSAGE DELIVERY ● Dynamic Data ● IPNS ● PubSub ● CRDTs IPFS Components
  26. 26. Anatomy of the IPFS CID Motivation Content Addressing in IPFS Anatomy of the IPFS CID
  27. 27. CIDs are: ● used for content addressing ● used to name every piece of data in IPFS ● a hash with some metadata ● self describing Content Identifier Basic Concepts CIDv0: QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv CIDv1: bafybeibxm2nsadl3fnxv2sxcxmxaco2jl53wpeorjdzidjwf5aqdg7wa6u Self-Certification: Content is authenticated by its address Immutability: If the content changes, its address also changes De-duplication: Same data can be identified by its address
  28. 28. Anatomy of a CID <base>base(<cid-version><multicodec><multihash>) Multihash A self-describing hash digest. Multicodec A pre-set number to uniquely identify a format, or protocol. Multibase A self-describing base-encoded string.
  29. 29. Anatomy of a CID name, tag, code, description identity, multihash, 0x00, raw binary ip4, multiaddr, 0x04, dccp, multiaddr, 0x21, dnsaddr, multiaddr, 0x38, protobuf, serialization, 0x50, Protocol Buffers cbor, serialization, 0x51, CBOR raw, ipld, 0x55, raw binary ... Multihash A self-describing hash digest: ● Hash Function (multicodec) ● Hash Digest Length ● Hash Digest • Multicodec: a pre-set number to uniquely identify a format, or protocol. Multibase A self-describing base encoding. ● A multibase prefix. ■ b - base32 ■ z - base58 ■ f - base16 ● Followed by the base encoded data. Full list: https://github.com/multiformats/multicodec
  30. 30. Anatomy of a CID Binary Breakdown 110010010… 1000000000000010 01110000 00000001 00010010 CID Version Multicodec Hash function sha-256 (0x12) Length Multicodec Actual Content Hash! CID-V1 How to interpret the data dag-pb (0x70) 128 | 2 How to interpret the CID CID-V0 or CID-V1 bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi CID-V1 QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR CID-V0 Visit: cid.ipfs.io <- IPLD encoding -> <base>base(<cid-version><multicodec><multihash>)
  31. 31. Anatomy of a CID Common Misconception The CID and in particular the Multihash part (hash digest) of the CID applies to the DAG representation of the file - not the file itself. CID file bytes DAG representation of file bytes
  32. 32. Thank you for watching Reach out in case you have questions or comments!

×