0
P2P Online Storage
     http://wua.la

Web 2.0 Expo Berlin, 2007
  Dominik Grolimund
large, reliable, and secure
distributed online storage

harness idle resources of
participating computers
old dream of computer science
“The design of a world-wide, fully transparent
distributed file system for simultaneous use by
millions of mobile and frequ...
lots of research projects

OceanStore (UC Berkeley)
Past (Microsoft Research)
       CFS (MIT)
we were inspired by them

wanted to make it work

 first step: closed alpha
upload any file in any size
upload any file in any size

 access from anywhere
upload any file in any size

   access from anywhere
share with friends and groups
upload any file in any size

   access from anywhere
share with friends and groups
    publish to the world
free and simple application
(Win, Mac, and soon Linux)
free and simple application
(Win, Mac, and soon Linux)


start with 1 GB of storage
      provided by us
free and simple application
 (Win, Mac, and soon Linux)


  start with 1 GB of storage
        provided by us


  if you w...
online storage
with the “power of P2P”
online storage
with the “power of P2P”


    fast downloads
online storage
with the “power of P2P”


    fast downloads
   no file size limit
online storage
with the “power of P2P”


    fast downloads
   no file size limit
    no traffic limit
privacy
privacy


all files are encrypted on your computer
privacy


 all files are encrypted on your computer
your password never leaves your computer
privacy


 all files are encrypted on your computer
your password never leaves your computer
 so no one, not even we, can s...
screens
screens
screens
screens
how does it work?
data stored in the p2p network


users’s computer can be offline


  how to ensure availability
    (persistent storage)?
two approaches
two approaches


1. make sure the data is always
        in the network
two approaches


1. make sure the data is always
          in the network
move the data when a computer goes offline
two approaches


1. make sure the data is always
           in the network
move the data when a computer goes offline
bad i...
two approaches


1. make sure the data is always
           in the network
move the data when a computer goes offline
bad i...
redundany = replication?




  p = node availability
 k = redundancy factor
  prep = file availability
redundany = replication?




       example
        p = 0.25
         k=5
                      not enough
      prep = 0.763
redundany = replication?




       example
        p = 0.25
         k = 24      unrealistic
      prep = 0.999
erasure codes


    encode m fragments into n
need any m out of n to reconstruct

        reed-solomon (optimal codes)
   ...
availability




               p = 0.25
m = 100, n = 517, k = n/m = 5.17
             pec = 0.999
k = n/m = 5.17 vs. k = ...
y   d points




               x
y   d points
    polynomial of degree d




                             x
y   d points
    polynomial of degree d
    any d points




                             x
alice stores a file




   roadtrip.mpg
alice drags roadtrip.mpg into wuala
1. encrypted on alice’s computer (128 bit AES)
1. encrypted on alice’s computer (128 bit AES)


    2. encoded into redundant fragments
1. encrypted on alice’s computer (128 bit AES)


    2. encoded into redundant fragments




                p2p network

...
1. encrypted on alice’s computer (128 bit AES)


    2. encoded into redundant fragments




                p2p network

...
alice shares the file with bob

        alice and bob have friendship key
alice encrypts file key and exchanges it with bob
...
p2p network
p2p network




1. download subset of fragments (m)
p2p network

                                      if necessary, get
                                      the remaining
 ...
p2p network




1. download subset of fragments (m)
2. decode the file




           p2p network




1. download subset of fragments (m)
3. decrypt the file


         2. decode the file




           p2p network




1. download subset of fragments (m)
bob plays roadtrip.mpg


         2. decode the file




           p2p network




1. download subset of fragments (m)
p2p network
maintenance




 p2p network
maintenance

alice’s computer checks and maintains her files




                  p2p network
maintenance

alice’s computer checks and maintains her files
  if necessary, it constructs new fragments and uploads them

...
maintenance

alice’s computer checks and maintains her files
  if necessary, it constructs new fragments and uploads them

...
maintenance

alice’s computer checks and maintains her files
  if necessary, it constructs new fragments and uploads them

...
p2p network
put
      p2p network
put                 get
      p2p network
distributed hash table (DHT)




put                         get
         p2p network
super nodes
storage nodes
client nodes
get
get
get
get
get
download of fragments (in parallel)
routing
routing

napster: centralized :-(
routing

napster: centralized :-(
 gnutella: flooding :-(
routing

          napster: centralized :-(
           gnutella: flooding :-(

chord, tapestry: structured overlay networks
routing

          napster: centralized :-(
           gnutella: flooding :-(

chord, tapestry: structured overlay networks...
routing

          napster: centralized :-(
           gnutella: flooding :-(

chord, tapestry: structured overlay networks...
routing

           napster: centralized :-(
            gnutella: flooding :-(

chord, tapestry: structured overlay networ...
super node
super node
connected to direct neighbors
super node
connected to direct neighbors
   plus some random links
super node
connected to direct neighbors
   plus some random links

       random links?
super node
connected to direct neighbors
   plus some random links

        random links?
piggy-pack routing information
number of hops depends on
number of hops depends on

  size of the network (n)
number of hops depends on

   size of the network (n)
size of the routing table (R)
number of hops depends on

   size of the network (n)
size of the routing table (R)

 which itself depends on the traffic
number of hops depends on

     size of the network (n)
  size of the routing table (R)

    which itself depends on the t...
simulation results
simulation results

     n=   106
simulation results

      n= 106

R = 1,000: < 3 hops
simulation results

      n= 106

R = 1,000: < 3 hops
 R = 100: 5 hops
simulation results

             n=    106

       R = 1,000: < 3 hops
        R = 100: 5 hops

reasonable already with mo...
small world effects
           (see milgram, watts & strogatz, kleinberg)

  regular graph




high diameter :-(
high clus...
small world effects
           (see milgram, watts & strogatz, kleinberg)

  regular graph           random graph




high...
small world effects
           (see milgram, watts & strogatz, kleinberg)

  regular graph           random graph         ...
routing table
n = 109, R = 10,000
incentives, fairness
incentives, fairness
prevent free-riding
incentives, fairness
prevent free-riding

  local disk space
incentives, fairness
prevent free-riding

  local disk space
    online time
incentives, fairness
prevent free-riding

  local disk space
    online time

upload bandwidth
online storage = local disk space * online time
online storage = local disk space * online time
      example: 10 GB disk space, 70% online --> 7 GB
online storage = local disk space * online time
      example: 10 GB disk space, 70% online --> 7 GB

        we have diff...
online storage = local disk space * online time
      example: 10 GB disk space, 70% online --> 7 GB

        we have diff...
trading storage
trading storage


only if you want to (you start with 1 GB)
trading storage


 only if you want to (you start with 1 GB)
you must be online at least 17% of the time
trading storage


 only if you want to (you start with 1 GB)
you must be online at least 17% of the time
         (⋲ 4 hou...
trading storage


 only if you want to (you start with 1 GB)
you must be online at least 17% of the time
          (⋲ 4 ho...
upload bandwidth

the more upload bandwidth you provide,
 the more download bandwidth you get
“client”   storage node
“client”             storage node




      asymmetric interest
“client”               storage node




      asymmetric interest
   tit-for-tat doesn’t work :-(
“client”                storage node




            asymmetric interest
         tit-for-tat doesn’t work :-(

believe th...
distributed reputation system
that is not susceptible to false reports
     and other forms of cheating




              ...
distributed reputation system
 that is not susceptible to false reports
      and other forms of cheating

     must scale...
Havelaar, NetEcon 2006
1. lots of transactions
    “observations”




                          Havelaar, NetEcon 2006
2. every round (e.g., a week)
                send observations to
       pre-determined neighbors (hash code)




1. lots...
2. every round (e.g., a week)
                send observations to
       pre-determined neighbors (hash code)

          ...
2. every round (e.g., a week)
                send observations to
       pre-determined neighbors (hash code)

          ...
2. every round (e.g., a week)
                send observations to
       pre-determined neighbors (hash code)

          ...
2. every round (e.g., a week)
                send observations to
       pre-determined neighbors (hash code)

          ...
local approximation of contribution




                       Havelaar, NetEcon 2006
“client”   storage node
“client”   storage node
“client”   storage node
“client”   storage node
“client”   storage node
“client”   storage node
“flash crowd”


“client”         storage node
content distribution
           similar to bittorrent
“client”
                tit-for-tat

            some differences d...
encryption
encryption

128 bit AES for encryption
encryption

  128 bit AES for encryption
2048 bit RSA for authentication
encryption

     128 bit AES for encryption
   2048 bit RSA for authentication

all data is encrypted (file + meta data)
encryption

         128 bit AES for encryption
       2048 bit RSA for authentication

     all data is encrypted (file + ...
Cryptree, SRDS 2006
access control




                 Cryptree, SRDS 2006
access control

cryptographic tree structure




                        Cryptree, SRDS 2006
access control

cryptographic tree structure
     untrusted storage




                        Cryptree, SRDS 2006
access control

cryptographic tree structure
     untrusted storage
doesn’t reveal who has access




                    ...
access control

   cryptographic tree structure
        untrusted storage
  doesn’t reveal who has access
very efficient fo...
access control

   cryptographic tree structure
        untrusted storage
  doesn’t reveal who has access
very efficient fo...
vacation     roadtrip.mpg
                                 switzerland.mpg
               videos
                         ...
bob doesn’t see that
                  claire has also access
                      and vice versa

                   bob...
bob doesn’t see that
granting access to this
and all subfolders takes claire has also access
                             ...
demo
thank you!




sign up for the closed alpha
       http://wua.la
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Wuala, P2P Online Storage
Upcoming SlideShare
Loading in...5
×

Wuala, P2P Online Storage

9,422

Published on

Speaker: Dominik Grolimund

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
9,422
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
346
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Wuala, P2P Online Storage"

  1. 1. P2P Online Storage http://wua.la Web 2.0 Expo Berlin, 2007 Dominik Grolimund
  2. 2. large, reliable, and secure distributed online storage harness idle resources of participating computers
  3. 3. old dream of computer science
  4. 4. “The design of a world-wide, fully transparent distributed file system for simultaneous use by millions of mobile and frequently disconnected users is left as an exercise for the reader.” A. Tanenbaum, Distributed Operating System, 1995
  5. 5. lots of research projects OceanStore (UC Berkeley) Past (Microsoft Research) CFS (MIT)
  6. 6. we were inspired by them wanted to make it work first step: closed alpha
  7. 7. upload any file in any size
  8. 8. upload any file in any size access from anywhere
  9. 9. upload any file in any size access from anywhere share with friends and groups
  10. 10. upload any file in any size access from anywhere share with friends and groups publish to the world
  11. 11. free and simple application (Win, Mac, and soon Linux)
  12. 12. free and simple application (Win, Mac, and soon Linux) start with 1 GB of storage provided by us
  13. 13. free and simple application (Win, Mac, and soon Linux) start with 1 GB of storage provided by us if you want more, you can trade local disk space for more online storage
  14. 14. online storage with the “power of P2P”
  15. 15. online storage with the “power of P2P” fast downloads
  16. 16. online storage with the “power of P2P” fast downloads no file size limit
  17. 17. online storage with the “power of P2P” fast downloads no file size limit no traffic limit
  18. 18. privacy
  19. 19. privacy all files are encrypted on your computer
  20. 20. privacy all files are encrypted on your computer your password never leaves your computer
  21. 21. privacy all files are encrypted on your computer your password never leaves your computer so no one, not even we, can see your files
  22. 22. screens
  23. 23. screens
  24. 24. screens
  25. 25. screens
  26. 26. how does it work?
  27. 27. data stored in the p2p network users’s computer can be offline how to ensure availability (persistent storage)?
  28. 28. two approaches
  29. 29. two approaches 1. make sure the data is always in the network
  30. 30. two approaches 1. make sure the data is always in the network move the data when a computer goes offline
  31. 31. two approaches 1. make sure the data is always in the network move the data when a computer goes offline bad idea for lots of data and high churn rate
  32. 32. two approaches 1. make sure the data is always in the network move the data when a computer goes offline bad idea for lots of data and high churn rate 2. introduce redundancy
  33. 33. redundany = replication? p = node availability k = redundancy factor prep = file availability
  34. 34. redundany = replication? example p = 0.25 k=5 not enough prep = 0.763
  35. 35. redundany = replication? example p = 0.25 k = 24 unrealistic prep = 0.999
  36. 36. erasure codes encode m fragments into n need any m out of n to reconstruct reed-solomon (optimal codes) RAID storage systems (vs. low-density-parity-check need (1+e) * m, where e is a fixed, small constant)
  37. 37. availability p = 0.25 m = 100, n = 517, k = n/m = 5.17 pec = 0.999 k = n/m = 5.17 vs. k = 24 using replication
  38. 38. y d points x
  39. 39. y d points polynomial of degree d x
  40. 40. y d points polynomial of degree d any d points x
  41. 41. alice stores a file roadtrip.mpg
  42. 42. alice drags roadtrip.mpg into wuala
  43. 43. 1. encrypted on alice’s computer (128 bit AES)
  44. 44. 1. encrypted on alice’s computer (128 bit AES) 2. encoded into redundant fragments
  45. 45. 1. encrypted on alice’s computer (128 bit AES) 2. encoded into redundant fragments p2p network 3. uploaded into the p2p network
  46. 46. 1. encrypted on alice’s computer (128 bit AES) 2. encoded into redundant fragments p2p network 4. m fragments uploaded onto our servers (boostrap, backup) 3. uploaded into the p2p network
  47. 47. alice shares the file with bob alice and bob have friendship key alice encrypts file key and exchanges it with bob bob wants to download the file
  48. 48. p2p network
  49. 49. p2p network 1. download subset of fragments (m)
  50. 50. p2p network if necessary, get the remaining fragments from our servers 1. download subset of fragments (m)
  51. 51. p2p network 1. download subset of fragments (m)
  52. 52. 2. decode the file p2p network 1. download subset of fragments (m)
  53. 53. 3. decrypt the file 2. decode the file p2p network 1. download subset of fragments (m)
  54. 54. bob plays roadtrip.mpg 2. decode the file p2p network 1. download subset of fragments (m)
  55. 55. p2p network
  56. 56. maintenance p2p network
  57. 57. maintenance alice’s computer checks and maintains her files p2p network
  58. 58. maintenance alice’s computer checks and maintains her files if necessary, it constructs new fragments and uploads them p2p network
  59. 59. maintenance alice’s computer checks and maintains her files if necessary, it constructs new fragments and uploads them p2p network
  60. 60. maintenance alice’s computer checks and maintains her files if necessary, it constructs new fragments and uploads them p2p network
  61. 61. p2p network
  62. 62. put p2p network
  63. 63. put get p2p network
  64. 64. distributed hash table (DHT) put get p2p network
  65. 65. super nodes
  66. 66. storage nodes
  67. 67. client nodes
  68. 68. get
  69. 69. get
  70. 70. get
  71. 71. get
  72. 72. get
  73. 73. download of fragments (in parallel)
  74. 74. routing
  75. 75. routing napster: centralized :-(
  76. 76. routing napster: centralized :-( gnutella: flooding :-(
  77. 77. routing napster: centralized :-( gnutella: flooding :-( chord, tapestry: structured overlay networks
  78. 78. routing napster: centralized :-( gnutella: flooding :-( chord, tapestry: structured overlay networks O(log n) hops :-)
  79. 79. routing napster: centralized :-( gnutella: flooding :-( chord, tapestry: structured overlay networks O(log n) hops :-) n = # super nodes
  80. 80. routing napster: centralized :-( gnutella: flooding :-( chord, tapestry: structured overlay networks O(log n) hops :-) n = # super nodes vulnerable to attacks (partitioning) :-(
  81. 81. super node
  82. 82. super node connected to direct neighbors
  83. 83. super node connected to direct neighbors plus some random links
  84. 84. super node connected to direct neighbors plus some random links random links?
  85. 85. super node connected to direct neighbors plus some random links random links? piggy-pack routing information
  86. 86. number of hops depends on
  87. 87. number of hops depends on size of the network (n)
  88. 88. number of hops depends on size of the network (n) size of the routing table (R)
  89. 89. number of hops depends on size of the network (n) size of the routing table (R) which itself depends on the traffic
  90. 90. number of hops depends on size of the network (n) size of the routing table (R) which itself depends on the traffic we have lots of traffic due to erasure coding
  91. 91. simulation results
  92. 92. simulation results n= 106
  93. 93. simulation results n= 106 R = 1,000: < 3 hops
  94. 94. simulation results n= 106 R = 1,000: < 3 hops R = 100: 5 hops
  95. 95. simulation results n= 106 R = 1,000: < 3 hops R = 100: 5 hops reasonable already with moderate traffic
  96. 96. small world effects (see milgram, watts & strogatz, kleinberg) regular graph high diameter :-( high clustering :-)
  97. 97. small world effects (see milgram, watts & strogatz, kleinberg) regular graph random graph high diameter :-( low diameter :-) high clustering :-) low clustering :-(
  98. 98. small world effects (see milgram, watts & strogatz, kleinberg) regular graph random graph mix high diameter :-( low diameter :-) low diameter :-) high clustering :-) low clustering :-( high clustering :-)
  99. 99. routing table n = 109, R = 10,000
  100. 100. incentives, fairness
  101. 101. incentives, fairness prevent free-riding
  102. 102. incentives, fairness prevent free-riding local disk space
  103. 103. incentives, fairness prevent free-riding local disk space online time
  104. 104. incentives, fairness prevent free-riding local disk space online time upload bandwidth
  105. 105. online storage = local disk space * online time
  106. 106. online storage = local disk space * online time example: 10 GB disk space, 70% online --> 7 GB
  107. 107. online storage = local disk space * online time example: 10 GB disk space, 70% online --> 7 GB we have different mechanisms to measure
  108. 108. online storage = local disk space * online time example: 10 GB disk space, 70% online --> 7 GB we have different mechanisms to measure and check these two variables
  109. 109. trading storage
  110. 110. trading storage only if you want to (you start with 1 GB)
  111. 111. trading storage only if you want to (you start with 1 GB) you must be online at least 17% of the time
  112. 112. trading storage only if you want to (you start with 1 GB) you must be online at least 17% of the time (⋲ 4 hours a day, running average)
  113. 113. trading storage only if you want to (you start with 1 GB) you must be online at least 17% of the time (⋲ 4 hours a day, running average) storage can be earned on multiple computers
  114. 114. upload bandwidth the more upload bandwidth you provide, the more download bandwidth you get
  115. 115. “client” storage node
  116. 116. “client” storage node asymmetric interest
  117. 117. “client” storage node asymmetric interest tit-for-tat doesn’t work :-(
  118. 118. “client” storage node asymmetric interest tit-for-tat doesn’t work :-( believe the software? hack it (kazaa lite) :-(
  119. 119. distributed reputation system that is not susceptible to false reports and other forms of cheating Havelaar, NetEcon 2006
  120. 120. distributed reputation system that is not susceptible to false reports and other forms of cheating must scale well with number of transactions we have lots of small transactions due to erasure coding Havelaar, NetEcon 2006
  121. 121. Havelaar, NetEcon 2006
  122. 122. 1. lots of transactions “observations” Havelaar, NetEcon 2006
  123. 123. 2. every round (e.g., a week) send observations to pre-determined neighbors (hash code) 1. lots of transactions “observations” Havelaar, NetEcon 2006
  124. 124. 2. every round (e.g., a week) send observations to pre-determined neighbors (hash code) 3. discard ego-reports, median, etc. 1. lots of transactions “observations” Havelaar, NetEcon 2006
  125. 125. 2. every round (e.g., a week) send observations to pre-determined neighbors (hash code) 3. discard ego-reports, median, etc. 4. next round, aggregate 1. lots of transactions “observations” Havelaar, NetEcon 2006
  126. 126. 2. every round (e.g., a week) send observations to pre-determined neighbors (hash code) 3. discard ego-reports, median, etc. 4. next round, aggregate 1. lots of transactions 5. update reputation “observations” of storage nodes Havelaar, NetEcon 2006
  127. 127. 2. every round (e.g., a week) send observations to pre-determined neighbors (hash code) 3. discard ego-reports, median, etc. 4. next round, aggregate 1. lots of transactions 5. update reputation “observations” of storage nodes rewarding: upload bandwidth proportional to reputation Havelaar, NetEcon 2006
  128. 128. local approximation of contribution Havelaar, NetEcon 2006
  129. 129. “client” storage node
  130. 130. “client” storage node
  131. 131. “client” storage node
  132. 132. “client” storage node
  133. 133. “client” storage node
  134. 134. “client” storage node
  135. 135. “flash crowd” “client” storage node
  136. 136. content distribution similar to bittorrent “client” tit-for-tat some differences due to erasure codes
  137. 137. encryption
  138. 138. encryption 128 bit AES for encryption
  139. 139. encryption 128 bit AES for encryption 2048 bit RSA for authentication
  140. 140. encryption 128 bit AES for encryption 2048 bit RSA for authentication all data is encrypted (file + meta data)
  141. 141. encryption 128 bit AES for encryption 2048 bit RSA for authentication all data is encrypted (file + meta data) all cryptographic operations performed locally (i.e., on your computer)
  142. 142. Cryptree, SRDS 2006
  143. 143. access control Cryptree, SRDS 2006
  144. 144. access control cryptographic tree structure Cryptree, SRDS 2006
  145. 145. access control cryptographic tree structure untrusted storage Cryptree, SRDS 2006
  146. 146. access control cryptographic tree structure untrusted storage doesn’t reveal who has access Cryptree, SRDS 2006
  147. 147. access control cryptographic tree structure untrusted storage doesn’t reveal who has access very efficient for typical operations Cryptree, SRDS 2006
  148. 148. access control cryptographic tree structure untrusted storage doesn’t reveal who has access very efficient for typical operations (grant access, move, etc.) Cryptree, SRDS 2006
  149. 149. vacation roadtrip.mpg switzerland.mpg videos europe.mpg root alice Cryptree, SRDS 2006
  150. 150. bob doesn’t see that claire has also access and vice versa bob vacation roadtrip.mpg claire switzerland.mpg videos europe.mpg root alice Cryptree, SRDS 2006
  151. 151. bob doesn’t see that granting access to this and all subfolders takes claire has also access and vice versa just one operation all subkeys can be bob derived from that parent key vacation roadtrip.mpg claire switzerland.mpg garfield videos europe.mpg root alice Cryptree, SRDS 2006
  152. 152. demo
  153. 153. thank you! sign up for the closed alpha http://wua.la
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×