Module: InterPlanetary
Linked Data (IPLD)
ResNetLab on Tour
Agenda
➔ Motivation: Why we need IPLD?
◆ What is it and why it exists
➔ IPFS & IPLD
◆ What’s the relationship?
➔ Fundamental Concepts
◆ Graphs & Linked Data
◆ Merkle DAGs
◆ Merkle Roots
◆ Links as the heart of IPLD
➔ Beyond File Data with IPLD
◆ The IPLD Data Model
◆ IPLD-native codecs
◆ Distributed Data Structures
IPLD: InterPlanetary
Linked Data
Why it’s needed Motivation
IPFS & IPLD
Fundamental Concepts
Beyond File Data with IPLD
Why IPLD?
The Data Layer for content-addressed systems
Can we extract a re-usable data layer from IPFS that can be
used to build other types of content-addressed data
systems?
“Building the next Git should take hours, not days!”
IPLD as Leverage
➔ How can we scale the size and complexity of the data that
we share peer to peer?
➔ Can we unify disparate content addressed formats and link
between them? Git, blockchains, IPFS, etc.
➔ Can we build distributed data structures that we can
interact with like we do with hosted databases, while taking
advantage of the benefits of content addressing?
IPFS & IPLD
Motivation
IPFS & IPLD
Fundamental Concepts
Beyond File Data with IPLD
What’s the relationship
IPFS & IPLD
IPLD is the data layer of IPFS
⬍
IPFS is a block store for IPLD
IPFS & IPLD
File Data
➔ At its most fundamental, IPFS is a collection of:
● binary blobs of data - “blocks”;
● their associated content identifiers - CIDs
➔ Only the smallest files in IPFS are stored as a single blob: to
keep block size practical, files are split up into chunks and
spread across multiple blocks and linked together into a single
graph
➔ Directories are graphs of named links pointing to files, forming
graphs that address other graphs
File Chunks:
UnixFS File:
(merkle-link)
(a hash)
(merkle-tree-dag) - directed acyclic graph
0-200 200-350
0-100 100-200 200-300 300-350
Merkle DAGs are graph data structures where each node is content-addressed
Visit: dag.ipfs.io
Linking Chunks in a DAG
Content Addressing
Leaf 3
CID
Leaf 1 CID Leaf 2 CID
File Chunks:
UnixFS File:
(merkle-link)
(a hash)
Merkle DAGs are graph data structures where each node is content-addressed
Visit: dag.ipfs.io
Linking Chunks in a DAG
Content Addressing
Child 1 CID Child 2 CID
Root CID
bafybeigdyrzt5s… (CIDv1)
QmbWqxBEKC3P8qs… (CIDv0)
(merkle-tree-dag) - directed acyclic graph
Fundamental Concepts
Links as the heart of IPLD Motivation
IPFS & IPLD
Fundamental Concepts
Beyond File Data with IPLD
● Recall “DAG” is “Directed Acyclic Graph”
● Ralph Merkle, formalised the hash tree pattern 1979
Content being hashed may also contain hash digests of other
content; therefore, any content “address” authenticates
content “linked” via the inclusion of their digest in the tree
below it
Merkle DAGs
Git
Merkle Roots: Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Tree
Blob (hash)...
Tree (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Dynamic Data
with Static
Structures
Merkle Mutability
All variations of mutability are
supported in the same way:
• Additions
• Deletions
• Modifications
(or: Deletion + Addition)
Git
Merkle Roots: Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Tree
Blob (hash)...
Tree (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Root
e9c8097d...
Git
Merkle Roots: Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Tree
Blob (hash)...
Tree (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Root
3a8cb91d...
Root
e9c8097d...
Git
Merkle Roots: Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Tree
Blob (hash)...
Tree (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Root
9c2542a8...
Root
e9c8097d...
Root
3a8cb91d...
Git
Merkle Roots: Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Commit
Author + TS
Committer + TS
Message
Tree (hash)
Parent (hash)
Tree
Blob (hash)...
Tree (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Tree
Blob (hash)...
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Blob
byte
[...]
Root
9c2542a8...
Root
e9c8097d...
Root
3a8cb91d...
Links as the
heart of IPLD
● CIDs are hashes with descriptions, they tell you the hash function
used as well as the codec that can be used to interpret the binary data
being linked to.
● The CID’s hash digest is represented as a “multihash” and uses
numbers that identify hash algorithms, such as SHA2-256 (0x12) or
Blake2b-312 (0xb227)
● The CID’s “codec” or its IPLD format code tells you how to decode the
data when you locate it and load its bytes. This could be as simple as
JSON (0x0200), CBOR (0x51) or even raw bytes (0x55).
● Multihash, Multicodec and CIDs are part of the Multiformats system
for self-describing values.
Connecting Graphs:
Anatomy of a CID
110010010…
1000000000000010
01110000
00000001 00010010
CID Version
Multicodec
(IPLD format) (Length)
Multicodec
(Hash Fn) (Hash Digest)
sha-256
(0x12)
CID-V1
dag-pb
(0x70)
128 | 2
bafybeigdyrzt5sfp7udm7hu7…
Visit: multiformats.io
Multihash
Links as the
heart of IPLD
CIDs are the native link format for IPLD that distinguishes it from a
simple data representation system
● Most data serialization formats, such as JSON and CBOR, don’t have a native
way of representing links to content addressed data, so don’t have an in-built
way to form graphs of linked data.
● IPLD brings its own formats that represent CIDs natively in the encoded
bytes.
● IPLD can also be used as a lens through which to view other content
addressed formats, such as Git, or Bitcoin from which we can derive CIDs by
assumption.
Connecting Graphs:
Anatomy of a CID
110010010…
1000000000000010
01110000
00000001 00010010
CID Version
Multicodec
(IPLD format) (Length)
Multicodec
(Hash Fn) (Hash Digest)
sha-256
(0x12)
CID-V1
dag-pb
(0x70)
128 | 2
bafybeigdyrzt5sfp7udm7hu7…
Visit: multiformats.io
Multihash
Beyond File Data with IPLD
Scalable peer to peer data structures Motivation
IPFS & IPLD
Fundamental Concepts
Beyond File Data with IPLD
● The primary IPLD codec of IPFS for use with files and
directories data is called DAG-PB: a dedicated Protobuf
format for representing a set of named links and a binary
blob of data.
● IPFS additionally interprets the binary blob of data using a
second Protobuf format called UnixFS to derive metadata
about files.
IPLD and File
Data in IPFS
Visit: docs.ipfs.io/concepts/file-systems/
IPFS file data: fixed layouts with additional properties to
represent directory structures and basic file metadata
What else do we need? Can we address and organise
complex and large data structures with IPLD blocks without
having to make everything a file?
Beyond File Data
● IPLD defines a Data Model that details the forms that data
can take in memory, and through which a codec transforms
that memory to and from encoded bytes
● The Data Model includes Booleans, Strings, Ints, Floats,
Null, Arrays and Maps, but also Bytes and Links (CIDs).
● An IPLD “block” in this way is similar to the in-memory form
of a JSON data structure but with Links and Bytes.
The IPLD Data Model
Beyond File Data:
Two additional codecs designed for IPLD which can
represent the full IPLD Data Model in a flexible way are:
● DAG-JSON - based on JSON, but with special forms to
encode Links (CIDs) and Bytes
● DAG-CBOR - based on CBOR, but with the addition of a tag
to represent CIDs and additional strictness requirements for
deterministic encoding
Flexible Encoding Formats
Beyond File Data:
A Familiar Data Interface
The Data Model:
The Data Model includes the
common fundamentals available in
most programming languages.
const data = {
string: "☺ we can do strings!",
ints: 1337,
floats: 13.37,
booleans: true,
arrays: [1, 2, 3],
bytes: new Uint8Array([0x01, 0x03, 0x03, 0x07]),
links:
CID(bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae)
}
DAG-JSON
The Data Model:
DAG-JSON extends JSON, adding
determinism, a format for Bytes
and a format for Links
{
"arrays": [1, 2, 3],
"booleans": true,
"bytes": { "/":
{ "bytes": "AQMDBw" }
},
"floats": 13.37,
"ints": 1337,
"links": { "/":
"bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae"
},
"string": "☺ we can do strings!"
}
DAG-CBOR
The Data Model:
DAG-CBOR is a strict subset
of CBOR with determinism
and a special tag to identify
CIDs
a764696e74731905396562797465734401030307656c696e6b73d82a58250001711220785197229dc8bb115294
5da58e2348f7e279eeded06cc2ca736d0e879858b501666172726179738301020366666c6f617473fb402abd70
a3d70a3d66737472696e67781ae298baefb88f202077652063616e20646f20737472696e67732168626f6f6c65
616e73f5
a7 # map(7)
64 # string(4)
696e7473 # "ints"
19 0539 # uint(1337)
65 # string(5)
6279746573 # "bytes"
44 # bytes(4)
01030307 # "x01x03x03x07"
65 # string(5)
6c696e6b73 # "links"
d8 2a # tag(42)
58 25 # bytes(37)
0001711220785197229dc8bb1152945da58e2348f7 # "x00x01qx12 xQ"]¥#H÷"
e279eeded06cc2ca736d0e879858b501 # "âyîÞÐlÂÊsmx0e"
66 # string(6)
617272617973 # "arrays"
83 # array(3)
01 # uint(1)
02 # uint(2)
03 # uint(3)
66 # string(6)
666c6f617473 # "floats"
fb 402abd70a3d70a3d # float(13.37)
66 # string(6)
737472696e67 # "string"
78 1ae298baef # string(22)
e298baefb88f202077652063616e20646f2073747269 # "☺ we can do stri"
6e677321 # "ngs!"
68 # string(8)
626f6f6c65616e73 # "booleans"
f5 # true
Round-trip Through the Data Model
The Data Model:
IPLD’s Data Model is a stable
system for addressing,
constructing, encoding and
decoding data for a content
addressed world.
const data = {
string: "☺ we can do strings!",
ints: 1337,
floats: 13.37,
booleans: true,
arrays: [1, 2, 3],
bytes: new Uint8Array([0x01, 0x03, 0x03, 0x07]),
links:
CID(bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae)
}
The Data Layer for content-addressed systems.
It is a suite of technologies for representing and traversing hash-linked
data.
Including:
● A Data Model
● Mechanisms for deterministic translation from binary data to the Data
Model and back (codecs)
● Addressing and data-description primitives (CIDs / multiformats)
● Tools to address, traverse and mutate large graphs of linked blocks of
data
IPLD is ...
Recap
● Persistent and immutable data structures are not new.
Functional Programming leans heavily on the same concepts.
● Standard libraries for Scala, Clojure, Haskell, etc. are full of data
structures that translate (almost) directly to the distributed,
content-addressed world.
● Algorithms for building, traversing and mutating content
addressed data structures requires careful consideration of the
trade-offs.
● Directional and acyclic graphs of immutable pieces of data can
be challenging to wrangle but scale powerfully.
Distributed Data
Structures
Example: Super-large array
Scaling addressable datasets
[ e1, e2, e3, e4, e5 ]
● Single block with 5 elements and one CID
for that block
Example: Super-large array
[ e1, e2, e3, e4, e5 ] [ e6, e7, e8 ]
[ L1.1, L1.2 ]
Height: 2
Height: 1
● Three distinct content addressed blocks
● Three CIDs
● Two leaf nodes containing our data
● One root to address all content in the DAG
Example: Super-large array
[ e6, e7, e8, e9, e10] [ e11, e12, e13, e14, e15] [ e21, e22, e23, e24, e25 ]
[ e16, e17, e18, e19, e20]
[ e1, e2, e3, e4, e5 ]
[ L1.1, L1.2, L1.3, L1.4, L1.5 ]
Height: 2
Height: 1
Example: Super-large array
[ L2.1, L2.2 ]
[ L1.6 ]
[ L1.1, L1.2, L1.3, L1.4, L1.5 ]
[ e6, e7, e8, e9, e10] [ e11, e12, e13, e14, e15] [ e21, e22, e23, e24, e25 ] [ e26 ]
[ e16, e17, e18, e19, e20]
[ e1, e2, e3, e4, e5 ]
Height: 2
Height: 1
Height: 3
Example: Super-large array Algorithms for
“Advanced Data
Layouts”
[ L2.1, L2.2 ]
[ L1.6 ]
[ L1.1, L1.2, L1.3, L1.4, L1.5 ]
[ e6, e7, e8, e9, e10] [ e11, e12, e13, e14, e15] [ e21, e22, e23, e24, e25 ] [ e26 ]
[ e16, e17, e18, e19, e20]
class Node {
getElementAt (index) {
if (this.height > 1) {
const childIndex = Math.floor(index / (width ** (this.height-1)))
const newIndex = index % (width ** (this.height-1))
// load and traverse into a child
return this.getChildAt(childIndex).getElementAt(index)
}
// read directly from this node's data array
return this.elements[index]
}
}
[ e1, e2, e3, e4, e5 ]
Thank you for watching
Reach out in case you have questions or comments!

Module: InterPlanetary Linked Data (IPLD)

  • 1.
    Module: InterPlanetary Linked Data(IPLD) ResNetLab on Tour
  • 2.
    Agenda ➔ Motivation: Whywe need IPLD? ◆ What is it and why it exists ➔ IPFS & IPLD ◆ What’s the relationship? ➔ Fundamental Concepts ◆ Graphs & Linked Data ◆ Merkle DAGs ◆ Merkle Roots ◆ Links as the heart of IPLD ➔ Beyond File Data with IPLD ◆ The IPLD Data Model ◆ IPLD-native codecs ◆ Distributed Data Structures
  • 3.
    IPLD: InterPlanetary Linked Data Whyit’s needed Motivation IPFS & IPLD Fundamental Concepts Beyond File Data with IPLD
  • 4.
    Why IPLD? The DataLayer for content-addressed systems Can we extract a re-usable data layer from IPFS that can be used to build other types of content-addressed data systems? “Building the next Git should take hours, not days!”
  • 5.
    IPLD as Leverage ➔How can we scale the size and complexity of the data that we share peer to peer? ➔ Can we unify disparate content addressed formats and link between them? Git, blockchains, IPFS, etc. ➔ Can we build distributed data structures that we can interact with like we do with hosted databases, while taking advantage of the benefits of content addressing?
  • 6.
    IPFS & IPLD Motivation IPFS& IPLD Fundamental Concepts Beyond File Data with IPLD What’s the relationship
  • 7.
    IPFS & IPLD IPLDis the data layer of IPFS ⬍ IPFS is a block store for IPLD
  • 8.
    IPFS & IPLD FileData ➔ At its most fundamental, IPFS is a collection of: ● binary blobs of data - “blocks”; ● their associated content identifiers - CIDs ➔ Only the smallest files in IPFS are stored as a single blob: to keep block size practical, files are split up into chunks and spread across multiple blocks and linked together into a single graph ➔ Directories are graphs of named links pointing to files, forming graphs that address other graphs
  • 9.
    File Chunks: UnixFS File: (merkle-link) (ahash) (merkle-tree-dag) - directed acyclic graph 0-200 200-350 0-100 100-200 200-300 300-350 Merkle DAGs are graph data structures where each node is content-addressed Visit: dag.ipfs.io Linking Chunks in a DAG Content Addressing
  • 10.
    Leaf 3 CID Leaf 1CID Leaf 2 CID File Chunks: UnixFS File: (merkle-link) (a hash) Merkle DAGs are graph data structures where each node is content-addressed Visit: dag.ipfs.io Linking Chunks in a DAG Content Addressing Child 1 CID Child 2 CID Root CID bafybeigdyrzt5s… (CIDv1) QmbWqxBEKC3P8qs… (CIDv0) (merkle-tree-dag) - directed acyclic graph
  • 11.
    Fundamental Concepts Links asthe heart of IPLD Motivation IPFS & IPLD Fundamental Concepts Beyond File Data with IPLD
  • 12.
    ● Recall “DAG”is “Directed Acyclic Graph” ● Ralph Merkle, formalised the hash tree pattern 1979 Content being hashed may also contain hash digests of other content; therefore, any content “address” authenticates content “linked” via the inclusion of their digest in the tree below it Merkle DAGs
  • 13.
    Git Merkle Roots: Commit Author+ TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Tree Blob (hash)... Tree (hash)... Tree Blob (hash)... Tree Blob (hash)... Tree Blob (hash)... Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...]
  • 14.
    Dynamic Data with Static Structures MerkleMutability All variations of mutability are supported in the same way: • Additions • Deletions • Modifications (or: Deletion + Addition)
  • 15.
    Git Merkle Roots: Commit Author+ TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Tree Blob (hash)... Tree (hash)... Tree Blob (hash)... Tree Blob (hash)... Tree Blob (hash)... Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Root e9c8097d...
  • 16.
    Git Merkle Roots: Commit Author+ TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Tree Blob (hash)... Tree (hash)... Tree Blob (hash)... Tree Blob (hash)... Tree Blob (hash)... Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Root 3a8cb91d... Root e9c8097d...
  • 17.
    Git Merkle Roots: Commit Author+ TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Tree Blob (hash)... Tree (hash)... Tree Blob (hash)... Tree Blob (hash)... Tree Blob (hash)... Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Root 9c2542a8... Root e9c8097d... Root 3a8cb91d...
  • 18.
    Git Merkle Roots: Commit Author+ TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Commit Author + TS Committer + TS Message Tree (hash) Parent (hash) Tree Blob (hash)... Tree (hash)... Tree Blob (hash)... Tree Blob (hash)... Tree Blob (hash)... Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Blob byte [...] Root 9c2542a8... Root e9c8097d... Root 3a8cb91d...
  • 19.
    Links as the heartof IPLD ● CIDs are hashes with descriptions, they tell you the hash function used as well as the codec that can be used to interpret the binary data being linked to. ● The CID’s hash digest is represented as a “multihash” and uses numbers that identify hash algorithms, such as SHA2-256 (0x12) or Blake2b-312 (0xb227) ● The CID’s “codec” or its IPLD format code tells you how to decode the data when you locate it and load its bytes. This could be as simple as JSON (0x0200), CBOR (0x51) or even raw bytes (0x55). ● Multihash, Multicodec and CIDs are part of the Multiformats system for self-describing values. Connecting Graphs: Anatomy of a CID 110010010… 1000000000000010 01110000 00000001 00010010 CID Version Multicodec (IPLD format) (Length) Multicodec (Hash Fn) (Hash Digest) sha-256 (0x12) CID-V1 dag-pb (0x70) 128 | 2 bafybeigdyrzt5sfp7udm7hu7… Visit: multiformats.io Multihash
  • 20.
    Links as the heartof IPLD CIDs are the native link format for IPLD that distinguishes it from a simple data representation system ● Most data serialization formats, such as JSON and CBOR, don’t have a native way of representing links to content addressed data, so don’t have an in-built way to form graphs of linked data. ● IPLD brings its own formats that represent CIDs natively in the encoded bytes. ● IPLD can also be used as a lens through which to view other content addressed formats, such as Git, or Bitcoin from which we can derive CIDs by assumption. Connecting Graphs: Anatomy of a CID 110010010… 1000000000000010 01110000 00000001 00010010 CID Version Multicodec (IPLD format) (Length) Multicodec (Hash Fn) (Hash Digest) sha-256 (0x12) CID-V1 dag-pb (0x70) 128 | 2 bafybeigdyrzt5sfp7udm7hu7… Visit: multiformats.io Multihash
  • 21.
    Beyond File Datawith IPLD Scalable peer to peer data structures Motivation IPFS & IPLD Fundamental Concepts Beyond File Data with IPLD
  • 22.
    ● The primaryIPLD codec of IPFS for use with files and directories data is called DAG-PB: a dedicated Protobuf format for representing a set of named links and a binary blob of data. ● IPFS additionally interprets the binary blob of data using a second Protobuf format called UnixFS to derive metadata about files. IPLD and File Data in IPFS Visit: docs.ipfs.io/concepts/file-systems/
  • 23.
    IPFS file data:fixed layouts with additional properties to represent directory structures and basic file metadata What else do we need? Can we address and organise complex and large data structures with IPLD blocks without having to make everything a file? Beyond File Data
  • 24.
    ● IPLD definesa Data Model that details the forms that data can take in memory, and through which a codec transforms that memory to and from encoded bytes ● The Data Model includes Booleans, Strings, Ints, Floats, Null, Arrays and Maps, but also Bytes and Links (CIDs). ● An IPLD “block” in this way is similar to the in-memory form of a JSON data structure but with Links and Bytes. The IPLD Data Model Beyond File Data:
  • 25.
    Two additional codecsdesigned for IPLD which can represent the full IPLD Data Model in a flexible way are: ● DAG-JSON - based on JSON, but with special forms to encode Links (CIDs) and Bytes ● DAG-CBOR - based on CBOR, but with the addition of a tag to represent CIDs and additional strictness requirements for deterministic encoding Flexible Encoding Formats Beyond File Data:
  • 26.
    A Familiar DataInterface The Data Model: The Data Model includes the common fundamentals available in most programming languages. const data = { string: "☺ we can do strings!", ints: 1337, floats: 13.37, booleans: true, arrays: [1, 2, 3], bytes: new Uint8Array([0x01, 0x03, 0x03, 0x07]), links: CID(bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae) }
  • 27.
    DAG-JSON The Data Model: DAG-JSONextends JSON, adding determinism, a format for Bytes and a format for Links { "arrays": [1, 2, 3], "booleans": true, "bytes": { "/": { "bytes": "AQMDBw" } }, "floats": 13.37, "ints": 1337, "links": { "/": "bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae" }, "string": "☺ we can do strings!" }
  • 28.
    DAG-CBOR The Data Model: DAG-CBORis a strict subset of CBOR with determinism and a special tag to identify CIDs a764696e74731905396562797465734401030307656c696e6b73d82a58250001711220785197229dc8bb115294 5da58e2348f7e279eeded06cc2ca736d0e879858b501666172726179738301020366666c6f617473fb402abd70 a3d70a3d66737472696e67781ae298baefb88f202077652063616e20646f20737472696e67732168626f6f6c65 616e73f5 a7 # map(7) 64 # string(4) 696e7473 # "ints" 19 0539 # uint(1337) 65 # string(5) 6279746573 # "bytes" 44 # bytes(4) 01030307 # "x01x03x03x07" 65 # string(5) 6c696e6b73 # "links" d8 2a # tag(42) 58 25 # bytes(37) 0001711220785197229dc8bb1152945da58e2348f7 # "x00x01qx12 xQ"]¥#H÷" e279eeded06cc2ca736d0e879858b501 # "âyîÞÐlÂÊsmx0e" 66 # string(6) 617272617973 # "arrays" 83 # array(3) 01 # uint(1) 02 # uint(2) 03 # uint(3) 66 # string(6) 666c6f617473 # "floats" fb 402abd70a3d70a3d # float(13.37) 66 # string(6) 737472696e67 # "string" 78 1ae298baef # string(22) e298baefb88f202077652063616e20646f2073747269 # "☺ we can do stri" 6e677321 # "ngs!" 68 # string(8) 626f6f6c65616e73 # "booleans" f5 # true
  • 29.
    Round-trip Through theData Model The Data Model: IPLD’s Data Model is a stable system for addressing, constructing, encoding and decoding data for a content addressed world. const data = { string: "☺ we can do strings!", ints: 1337, floats: 13.37, booleans: true, arrays: [1, 2, 3], bytes: new Uint8Array([0x01, 0x03, 0x03, 0x07]), links: CID(bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae) }
  • 30.
    The Data Layerfor content-addressed systems. It is a suite of technologies for representing and traversing hash-linked data. Including: ● A Data Model ● Mechanisms for deterministic translation from binary data to the Data Model and back (codecs) ● Addressing and data-description primitives (CIDs / multiformats) ● Tools to address, traverse and mutate large graphs of linked blocks of data IPLD is ... Recap
  • 31.
    ● Persistent andimmutable data structures are not new. Functional Programming leans heavily on the same concepts. ● Standard libraries for Scala, Clojure, Haskell, etc. are full of data structures that translate (almost) directly to the distributed, content-addressed world. ● Algorithms for building, traversing and mutating content addressed data structures requires careful consideration of the trade-offs. ● Directional and acyclic graphs of immutable pieces of data can be challenging to wrangle but scale powerfully. Distributed Data Structures
  • 32.
    Example: Super-large array Scalingaddressable datasets [ e1, e2, e3, e4, e5 ] ● Single block with 5 elements and one CID for that block
  • 33.
    Example: Super-large array [e1, e2, e3, e4, e5 ] [ e6, e7, e8 ] [ L1.1, L1.2 ] Height: 2 Height: 1 ● Three distinct content addressed blocks ● Three CIDs ● Two leaf nodes containing our data ● One root to address all content in the DAG
  • 34.
    Example: Super-large array [e6, e7, e8, e9, e10] [ e11, e12, e13, e14, e15] [ e21, e22, e23, e24, e25 ] [ e16, e17, e18, e19, e20] [ e1, e2, e3, e4, e5 ] [ L1.1, L1.2, L1.3, L1.4, L1.5 ] Height: 2 Height: 1
  • 35.
    Example: Super-large array [L2.1, L2.2 ] [ L1.6 ] [ L1.1, L1.2, L1.3, L1.4, L1.5 ] [ e6, e7, e8, e9, e10] [ e11, e12, e13, e14, e15] [ e21, e22, e23, e24, e25 ] [ e26 ] [ e16, e17, e18, e19, e20] [ e1, e2, e3, e4, e5 ] Height: 2 Height: 1 Height: 3
  • 36.
    Example: Super-large arrayAlgorithms for “Advanced Data Layouts” [ L2.1, L2.2 ] [ L1.6 ] [ L1.1, L1.2, L1.3, L1.4, L1.5 ] [ e6, e7, e8, e9, e10] [ e11, e12, e13, e14, e15] [ e21, e22, e23, e24, e25 ] [ e26 ] [ e16, e17, e18, e19, e20] class Node { getElementAt (index) { if (this.height > 1) { const childIndex = Math.floor(index / (width ** (this.height-1))) const newIndex = index % (width ** (this.height-1)) // load and traverse into a child return this.getChildAt(childIndex).getElementAt(index) } // read directly from this node's data array return this.elements[index] } } [ e1, e2, e3, e4, e5 ]
  • 37.
    Thank you forwatching Reach out in case you have questions or comments!