1. dumpFS
dumpFS
A Distributed Storage Solution
Carnegie Mellon University
Project for Distributed Systems
• Bruno Garrancho
• Eugénio Pinto
• Nuno Loureiro
Distributed Systems 1
Tuesday, December 21, 2010
2. dumpFS
Acknowledgements
• Prof. António Casimiro
• Prof. Bill Nace
Distributed Systems •2
Tuesday, December 21, 2010
4. dumpFS
Motivation
• Current demand for massive
storage
• Commodity Hardware
• Simple semantics of web context
• Alternative solutions: too
generic, too complex, extra
overhead, too expensive
• Not end user demand
Distributed Systems •4
Tuesday, December 21, 2010
5. dumpFS
Goals
• Availability
• Performance
• Scalability
Distributed Systems •5
Tuesday, December 21, 2010
6. dumpFS
How it works
• Black box Storage
• API/Middleware for developers
• Web, Web & Web...
• Streams, Streams & Streams...
• WORM
Distributed Systems •6
Tuesday, December 21, 2010
7. dumpFS
Architecture
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 7
Tuesday, December 21, 2010
8. dumpFS
Architecture
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 7
Tuesday, December 21, 2010
9. dumpFS
Architecture
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
10. dumpFS
Architecture
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
11. dumpFS
Architecture
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
12. dumpFS
Architecture
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
13. dumpFS
Architecture
dumpFS
Application
Cerebrum
Cerebrum End User
(...)
End User
Monitor
End User
Storage
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
14. dumpFS
Architecture
dumpFS
Application
Cerebrum
Cerebrum End User
(...)
End User
Monitor
End User
Storage
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
15. dumpFS
Architecture
dumpFS
Application
Cerebrum
Cerebrum End User
(...)
End User
Monitor
End User
Storage
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
16. dumpFS
Architecture
dumpFS
Application
Cerebrum
Cerebrum End User
(...)
End User
Monitor
End User
Storage
Storage
(...)
End User
API
End User
Distributed Systems 8
Tuesday, December 21, 2010
17. dumpFS
Architecture - PUT
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 9
Tuesday, December 21, 2010
18. dumpFS
Architecture - PUT
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 9
Tuesday, December 21, 2010
19. dumpFS
Architecture - PUT
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 9
Tuesday, December 21, 2010
20. dumpFS
Architecture - PUT
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 9
Tuesday, December 21, 2010
21. dumpFS
Architecture - PUT
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 9
Tuesday, December 21, 2010
22. dumpFS
Architecture - PUT
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 9
Tuesday, December 21, 2010
23. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
24. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
25. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
26. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
27. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
28. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
29. dumpFS
Architecture - GET
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems 10
Tuesday, December 21, 2010
30. dumpFS
Revisiting the goals
• Availability
• Performance
• Scalability
Distributed Systems • 11
Tuesday, December 21, 2010
31. dumpFS
Revisiting the goals
• Availability
How do we
• Performance provide these
• Scalability properties?
Distributed Systems • 11
Tuesday, December 21, 2010
32. dumpFS
Monitoring
• Heartbeat (between all nodes)
Detection of Failures
• Distributed System State (local
node state sent to cerebrums)
CPU Load
Disk Space
Distributed Systems • 12
Tuesday, December 21, 2010
33. dumpFS
Distributed System State
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
Distributed Systems 13
Tuesday, December 21, 2010
34. dumpFS
Distributed System State
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
5 secs {load; disk}
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
Distributed Systems 13
Tuesday, December 21, 2010
35. dumpFS
Distributed System State
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
5 secs {load; disk}
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
Distributed Systems 13
Tuesday, December 21, 2010
36. dumpFS
Distributed System State
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
5 secs {load; disk}
Storage Cerebrum
HTTP API HTTP API
Server Server
Monitor Monitor
Distributed Systems 13
Tuesday, December 21, 2010
37. dumpFS
Distributed System State
Storage Cerebrum
100
HTTP API HTTP API 75
50
Server Server 25
0
Monitor Monitor
5 secs {load; disk}
Storage Cerebrum
100
HTTP API HTTP API 75
50
Server Server 25
0
Monitor Monitor
Distributed Systems 13
Tuesday, December 21, 2010
38. dumpFS
Availability
• Crash Failures & Broken Links
Heartbeat
- Only online nodes are selected
Replicated Files
Replicated Components
Tolerance to failures
Distributed Systems • 14
Tuesday, December 21, 2010
39. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
40. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
41. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
42. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
43. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
44. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
45. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 15
Tuesday, December 21, 2010
46. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 16
Tuesday, December 21, 2010
47. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 16
Tuesday, December 21, 2010
48. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 16
Tuesday, December 21, 2010
49. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 16
Tuesday, December 21, 2010
50. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor
End User
Storage
(...)
End User
API
End User
Distributed Systems • 16
Tuesday, December 21, 2010
51. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor LB
End User
Storage
(...)
End User
API
End User
Distributed Systems • 17
Tuesday, December 21, 2010
52. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor LB
End User
Storage
(...)
End User
API
End User
Distributed Systems • 17
Tuesday, December 21, 2010
53. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor LB
End User
Storage
(...)
End User
API
End User
Distributed Systems • 17
Tuesday, December 21, 2010
54. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor LB
End User
Storage
(...)
End User
API
End User
Distributed Systems • 17
Tuesday, December 21, 2010
55. dumpFS
Tolerance to failures
dumpFS
Application
Cerebrum End User
(...)
End User
Monitor LB
End User
Storage
(...)
End User
API
End User
Distributed Systems • 17
Tuesday, December 21, 2010
56. dumpFS
Performance
• Cerebrums provide only localization
to the API, not data
• The primary storage node replicates
file in parallel while receiving data (PUT)
• Probabilistic weighted node selection
for PUT and GET operations
Distributed Systems • 18
Tuesday, December 21, 2010
57. dumpFS
Performance
Probabilistic weighted node selection
• PUT uses Available Disk Space
• GET uses CPU Load
Node A Node B
Avl. Disk space: 57% Avl. Disk space: 47%
Should node A always be selected in
PUT operations?
Distributed Systems 19
16
Tuesday, December 21, 2010
58. dumpFS
Performance
Probabilistic weighted node selection
Node A Rand(A) = Rand(1..57)
Node B
Rand(B) = Rand(1..47)
Avl. Disk space: 57% Avl. Disk space: 47%
Rand(B) can be greater than Rand(A)
But the probability that it happens is < 50%
Use Rand(Node) instead of the direct
value!
Distributed Systems 20
17
Tuesday, December 21, 2010
59. dumpFS
Scalability
DumpFS allows:
• Redundant DB
• Partitioning for “infinite” growth
• Straightforward storage addition
• Clusters of Clusters
Distributed Systems 21
18
Tuesday, December 21, 2010
60. dumpFS
Technology
• REST / HTTP
• Erlang !!! - Server
• .Net - Client API
Distributed Systems • 22
Tuesday, December 21, 2010
61. dumpFS
What didn’t work
• Our graphic design skills
• HDD I/O
• Time
Distributed Systems • 23
Tuesday, December 21, 2010
62. dumpFS
Future work
• Delete & Garbage collection
• Read Operations at arbitrary
locations in files
Distributed Systems • 24
Tuesday, December 21, 2010
63. dumpFS
The END!
Questions?
Distributed Systems 25
Tuesday, December 21, 2010