Interplanetary File System
A new power to the distributed web
A presentation by Gitam Gadtaula
Kathmandu University
Outlines
❖ Overview
❖ Introduction
❖ Why the need of IPFS?
❖ How IPFS Works?
❖ How data is stored?
❖ Challenges
❖ Conclusion
Overview
❖ Today, the Internet is based on HyperText Transfer Protocol (HTTP).
❖ HTTP relies on location addressing which uses IP addresses to identify the specific
server that is hosting the requested information.
❖ This means that the information has to be fetched from the origin server or a
server within the CDN every time it is requested.
❖ As the internet evolved from web1 to web2, and web2 to web3, the need to
transfer huge amounts of data, be it audio or video content became commonplace.
❖ But HTTP wasn’t originally designed to facilitate huge magnitudes of data transfer.
This led to the adoption of better and more efficient system architecture such as
peer-to-peer architecture.
Introduction
❖ A peer-to-peer hypermedia protocol designed to preserve and grow humanity's
knowledge by making the web upgradeable, resilient, and more open.
❖ IPFS uses content-addressing to uniquely identify each file in a global namespace
connecting IPFS hosts.
❖ This peer-peer transfer of data is not new! It has been an age old quest. Recall or lookup
the Napster and Gnutella media sharing services. And the bittorrent services that is
underlying many of our current data services.
❖ Juan Benet in the IPFS white paper describes it as a “Content Addressed, Versioned, P2P
Filesystem”.
❖ IPFS is an implementation of a decentralized network. One of the most popular
decentralized systems is Git, the version control software.
IPFS works by connecting all computing devices with the same system of files via a
system of nodes. It uses a “distributed hash table, an incentivized block exchange, and a
self-certifying namespace.”
Architecture
A high level view of the architecture
of IPFS. Files in distributed storage,
and distributed hash table, uses the
hash of the file as a key to return
the location of the file.
Once the location is determined,
the transfer takes place peer-to-
peer as a decentralized transfer.
Content addressable
❖ A way to store information so it can be retrieved based on its content, not its name
or location.
❖ In simpler terms, it acts similarly to a torrent system, except that instead of sharing
and exchanging media, IPFS exchanges git objects.
❖ This means that the whole system is based around a simple key-value data store.
Any type of content can be inserted, and it will give back a key that can be used to
retrieve the content again at any time.
❖ This is what allows for content addressing instead of location addressing: The key is
completely independent of the origin of the information and can be hosted
anywhere.
Why IPFS ?
Today's web is inefficient and expensive
HTTP downloads files from one server at a time — but peer-to-peer IPFS retrieves
pieces from multiple nodes at once, enabling substantial bandwidth savings. With up to
60% savings for video, IPFS makes it possible to efficiently distribute high volumes of
data without duplication.
Today's web can't preserve humanity's history
The average lifespan of a web page is 100 days before it's gone forever. The medium of
our era shouldn't be this fragile. IPFS makes it simple to set up resilient networks for
mirroring data, and thanks to content addressing, files stored using IPFS are
automatically versioned.
Today's web is addicted to the backbone
IPFS powers the creation of diversely resilient
networks that enable persistent availability —
with or without internet backbone connectivity.
This means better connectivity for the
developing world, during natural disasters, or
just when you're on flaky coffee shop wi-fi.
Today's web is centralized, limiting opportunity
The Internet has turbocharged innovation by being one of the great equalizers in human
history — but increasing consolidation of control threatens that progress. IPFS stays true
to the original vision of an open, flat web by delivering technology to make that vision a
reality.
How IPFS works
When we add a file to IPFS, the file is split into smaller chunks, cryptographically hashed,
and given a unique fingerprint called a content identifier (CID). This CID acts as a
permanent record of your file as it exists at that point in time.
When other nodes look up our file, they ask their peer nodes who's storing the content
referenced by the file's CID. When they view or download our file, they cache a copy —
and become another provider of our content until their cache is cleared.
A node can pin content in order to keep (and provide) it forever, or discard content it
hasn't used in a while to save space. This means each node in the network stores only
content it is interested in, plus some indexing information that helps figure out which
node is storing what.
If you add a new version of your file to IPFS, its cryptographic hash is different, and so it
gets a new CID. This means files stored on IPFS are resistant to tampering and censorship
— any changes to a file don't overwrite the original, and common chunks across files can
be reused in order to minimize storage costs.
However, this doesn't mean you need to remember a long string of CIDs — IPFS can find
the latest version of your file using the IPNS decentralized naming system, and DNSLink
can be used to map CIDs to human-readable DNS names.
How data is stored in IPFS
❖ Data is stored in chunks of 256 KB, called IPFS objects. Files larger than that are split into as many IPFS
objects as it takes to accommodate the file. One IPFS object per file contains links to all of the other
IPFS objects that make up that file.
❖ What Is a Checksum (and Why Should You Care)?
❖ When a file is added to the IPFS network it is given a unique, 24-character hash ID, called the content
ID, or CID. That’s how it is identified and referenced within the IPFS network. Recalculating the hash
when the file is retrieved verifies the integrity of the file. If the check fails, the file has been modified.
❖ When files are legitimately updated, IPFS handles the versioning of files. That means the new version
of the file is stored along with the previous version. IPFS operates like a distributed file system, and
this concept of versioning provides a degree of immutability to that file system.
❖ Let’s say you store a file in IPFS on your node, and someone called Prasun requests it and
downloads it to their node. The next person that asks for that file might get it from you,
or from Prasun, or in a torrent-like way with parts of the file coming from your node and
from Prasun’s node.
❖ The more people who download the file, the more nodes there are to chip in and help
with subsequent file requests.
❖ Garbage collection will periodically remove cached IPFS objects. If you want to
permanently store a file you can pin it to your node. That means it won’t be cleaned out
during garbage collection.
❖ You can pay for storage on cloud storage providers that expose your data to the IPFS
network and keep them permanently pinned, and there are services specifically tailored
to hosting websites that are IPFS accessible.
❖ If something on your website goes viral and drives massive waves of traffic to your
website, the pages will be cached in all the nodes that retrieve those pages. Those
cached pages will be used to help service further page requests, helping you ride the
wave and satisfy demand.
❖ Of course, all of this depends on a sufficient number of nodes being on and
available, and with enough pinned and cached data. And that requires participants.
Challenges with IPFS
Lack of Strong Economic Incentives
Since the start, IPFS has been built as a community product that will benefit from
everyone’s contributions. As a result, no economic incentives were put in place.
While lack of economic incentives do not weaken the technology or what IPFS aims
to do, they simply make it an impractical solution for long-term use; especially for
storing private and enterprise data.
Challenges with IPFS
Unreliable with Private Data
Be it Bittorrent or IPFS– both are perhaps suitable in a voluntary-collaborative space
where academic research or sharing a particular type of data (be it music, movies, or
books) is the need.
But this doesn't mean that the data is reliable.
Challenges with IPFS
Lack of on-chain proofs
With IPFS, along with not having an incentive layer, it is close to impossible to verify
the integrity of the data that it stores. Peers do not have to submit proofs that they
are indeed storing the data or its uptime.
Currently, this is being solved by paying centralized gateway providers to pin your
data, but this leads to a central point of failure and defeats the purpose of using a
decentralized storage provider.
Filecoin and IPFS
❖ Filecoin is an open-source, public cryptocurrency and digital payment system
intended to be a blockchain-based cooperative digital storage and data retrieval
method.
❖ Filecoin is an innovation that sprang from the applications of IPFS. What it does
differently is that it introduces an economic model and incentives to IPFS.
❖ In simpler terms, Filecoin can be considered an electronic currency or crypto
currency, much like Bitcoin. Users can simply provide a portion of their unused
storage on their hard drives in exchange for currency.
Conclusion
❖ HTTP is a twenty-year-old technology and needs to be replaced in order for the
Internet to keep up with technology
❖ IPFS allows for decreased latency, enabling us to utilize the higher memory densities
and faster processing speeds that are continually being developed.
❖ It provides faster overall Internet speed, increased security, and the decentralization
of virtual information.
❖ It brings power of the web and technology back to the people.
Thank you

Interplanetary File System.pptx

  • 1.
    Interplanetary File System Anew power to the distributed web A presentation by Gitam Gadtaula Kathmandu University
  • 2.
    Outlines ❖ Overview ❖ Introduction ❖Why the need of IPFS? ❖ How IPFS Works? ❖ How data is stored? ❖ Challenges ❖ Conclusion
  • 3.
    Overview ❖ Today, theInternet is based on HyperText Transfer Protocol (HTTP). ❖ HTTP relies on location addressing which uses IP addresses to identify the specific server that is hosting the requested information. ❖ This means that the information has to be fetched from the origin server or a server within the CDN every time it is requested. ❖ As the internet evolved from web1 to web2, and web2 to web3, the need to transfer huge amounts of data, be it audio or video content became commonplace. ❖ But HTTP wasn’t originally designed to facilitate huge magnitudes of data transfer. This led to the adoption of better and more efficient system architecture such as peer-to-peer architecture.
  • 4.
    Introduction ❖ A peer-to-peerhypermedia protocol designed to preserve and grow humanity's knowledge by making the web upgradeable, resilient, and more open. ❖ IPFS uses content-addressing to uniquely identify each file in a global namespace connecting IPFS hosts. ❖ This peer-peer transfer of data is not new! It has been an age old quest. Recall or lookup the Napster and Gnutella media sharing services. And the bittorrent services that is underlying many of our current data services. ❖ Juan Benet in the IPFS white paper describes it as a “Content Addressed, Versioned, P2P Filesystem”. ❖ IPFS is an implementation of a decentralized network. One of the most popular decentralized systems is Git, the version control software.
  • 5.
    IPFS works byconnecting all computing devices with the same system of files via a system of nodes. It uses a “distributed hash table, an incentivized block exchange, and a self-certifying namespace.”
  • 6.
    Architecture A high levelview of the architecture of IPFS. Files in distributed storage, and distributed hash table, uses the hash of the file as a key to return the location of the file. Once the location is determined, the transfer takes place peer-to- peer as a decentralized transfer.
  • 7.
    Content addressable ❖ Away to store information so it can be retrieved based on its content, not its name or location. ❖ In simpler terms, it acts similarly to a torrent system, except that instead of sharing and exchanging media, IPFS exchanges git objects. ❖ This means that the whole system is based around a simple key-value data store. Any type of content can be inserted, and it will give back a key that can be used to retrieve the content again at any time. ❖ This is what allows for content addressing instead of location addressing: The key is completely independent of the origin of the information and can be hosted anywhere.
  • 8.
    Why IPFS ? Today'sweb is inefficient and expensive HTTP downloads files from one server at a time — but peer-to-peer IPFS retrieves pieces from multiple nodes at once, enabling substantial bandwidth savings. With up to 60% savings for video, IPFS makes it possible to efficiently distribute high volumes of data without duplication.
  • 9.
    Today's web can'tpreserve humanity's history The average lifespan of a web page is 100 days before it's gone forever. The medium of our era shouldn't be this fragile. IPFS makes it simple to set up resilient networks for mirroring data, and thanks to content addressing, files stored using IPFS are automatically versioned.
  • 10.
    Today's web isaddicted to the backbone IPFS powers the creation of diversely resilient networks that enable persistent availability — with or without internet backbone connectivity. This means better connectivity for the developing world, during natural disasters, or just when you're on flaky coffee shop wi-fi.
  • 11.
    Today's web iscentralized, limiting opportunity The Internet has turbocharged innovation by being one of the great equalizers in human history — but increasing consolidation of control threatens that progress. IPFS stays true to the original vision of an open, flat web by delivering technology to make that vision a reality.
  • 12.
    How IPFS works Whenwe add a file to IPFS, the file is split into smaller chunks, cryptographically hashed, and given a unique fingerprint called a content identifier (CID). This CID acts as a permanent record of your file as it exists at that point in time.
  • 13.
    When other nodeslook up our file, they ask their peer nodes who's storing the content referenced by the file's CID. When they view or download our file, they cache a copy — and become another provider of our content until their cache is cleared.
  • 14.
    A node canpin content in order to keep (and provide) it forever, or discard content it hasn't used in a while to save space. This means each node in the network stores only content it is interested in, plus some indexing information that helps figure out which node is storing what.
  • 15.
    If you adda new version of your file to IPFS, its cryptographic hash is different, and so it gets a new CID. This means files stored on IPFS are resistant to tampering and censorship — any changes to a file don't overwrite the original, and common chunks across files can be reused in order to minimize storage costs.
  • 16.
    However, this doesn'tmean you need to remember a long string of CIDs — IPFS can find the latest version of your file using the IPNS decentralized naming system, and DNSLink can be used to map CIDs to human-readable DNS names.
  • 17.
    How data isstored in IPFS ❖ Data is stored in chunks of 256 KB, called IPFS objects. Files larger than that are split into as many IPFS objects as it takes to accommodate the file. One IPFS object per file contains links to all of the other IPFS objects that make up that file. ❖ What Is a Checksum (and Why Should You Care)? ❖ When a file is added to the IPFS network it is given a unique, 24-character hash ID, called the content ID, or CID. That’s how it is identified and referenced within the IPFS network. Recalculating the hash when the file is retrieved verifies the integrity of the file. If the check fails, the file has been modified. ❖ When files are legitimately updated, IPFS handles the versioning of files. That means the new version of the file is stored along with the previous version. IPFS operates like a distributed file system, and this concept of versioning provides a degree of immutability to that file system.
  • 18.
    ❖ Let’s sayyou store a file in IPFS on your node, and someone called Prasun requests it and downloads it to their node. The next person that asks for that file might get it from you, or from Prasun, or in a torrent-like way with parts of the file coming from your node and from Prasun’s node. ❖ The more people who download the file, the more nodes there are to chip in and help with subsequent file requests. ❖ Garbage collection will periodically remove cached IPFS objects. If you want to permanently store a file you can pin it to your node. That means it won’t be cleaned out during garbage collection. ❖ You can pay for storage on cloud storage providers that expose your data to the IPFS network and keep them permanently pinned, and there are services specifically tailored to hosting websites that are IPFS accessible.
  • 19.
    ❖ If somethingon your website goes viral and drives massive waves of traffic to your website, the pages will be cached in all the nodes that retrieve those pages. Those cached pages will be used to help service further page requests, helping you ride the wave and satisfy demand. ❖ Of course, all of this depends on a sufficient number of nodes being on and available, and with enough pinned and cached data. And that requires participants.
  • 20.
    Challenges with IPFS Lackof Strong Economic Incentives Since the start, IPFS has been built as a community product that will benefit from everyone’s contributions. As a result, no economic incentives were put in place. While lack of economic incentives do not weaken the technology or what IPFS aims to do, they simply make it an impractical solution for long-term use; especially for storing private and enterprise data.
  • 21.
    Challenges with IPFS Unreliablewith Private Data Be it Bittorrent or IPFS– both are perhaps suitable in a voluntary-collaborative space where academic research or sharing a particular type of data (be it music, movies, or books) is the need. But this doesn't mean that the data is reliable.
  • 22.
    Challenges with IPFS Lackof on-chain proofs With IPFS, along with not having an incentive layer, it is close to impossible to verify the integrity of the data that it stores. Peers do not have to submit proofs that they are indeed storing the data or its uptime. Currently, this is being solved by paying centralized gateway providers to pin your data, but this leads to a central point of failure and defeats the purpose of using a decentralized storage provider.
  • 23.
    Filecoin and IPFS ❖Filecoin is an open-source, public cryptocurrency and digital payment system intended to be a blockchain-based cooperative digital storage and data retrieval method. ❖ Filecoin is an innovation that sprang from the applications of IPFS. What it does differently is that it introduces an economic model and incentives to IPFS. ❖ In simpler terms, Filecoin can be considered an electronic currency or crypto currency, much like Bitcoin. Users can simply provide a portion of their unused storage on their hard drives in exchange for currency.
  • 24.
    Conclusion ❖ HTTP isa twenty-year-old technology and needs to be replaced in order for the Internet to keep up with technology ❖ IPFS allows for decreased latency, enabling us to utilize the higher memory densities and faster processing speeds that are continually being developed. ❖ It provides faster overall Internet speed, increased security, and the decentralization of virtual information. ❖ It brings power of the web and technology back to the people.
  • 25.

Editor's Notes

  • #4 What happens when we download a file?
  • #5 Git is a distributed system because every developer who has cloned a repository has a copy of the entire repository, including the history, on their computer. If the central repository is wiped out, any copy of the repository can be used to restore it. IPFS takes that distributed concept and applies it to file storage and data retrieval.