0
Performance comparisson of
Distributed File Systems
GlusterFS
XfreemFS
FhgFS

Marian Marinov
CEO of 1H Ltd.
What have I tested?
➢ GlusterFS

http://glusterfs.org

➢ XtremeFS

http://www.xtreemfs.org/

➢ FhgFS

http://www.fhgfs.com...
What will be compared?
➢ Ease of install and configuration
➢ Sequential write and read (large file)
➢ Sequential write and...
Why only on 1Gbit/s ?
➢ It is considered commodity
➢ 6-7 years ago it was considered high performance
➢ Some projects have...
Lets get the theory first
1Gbit/s has ~950Mbit/s usable Bandwidth

Wikipedia - Ethernet frame

Which is 118.75 MBytes/s us...
Verify what the hardware can deliver locally
# echo 3 > /proc/sys/vm/drop_caches
# time dd if=/dev/zero of=test1 bs=XX cou...
Linux Kernel Tuning
sysctl
net.core.netdev_max_backlog=2000
Default 1000
Congestion control
selective acknowledgments
net....
Linux Kernel Tuning
TCP memory optimizations
min pressure max
net.ipv4.tcp_mem=41460 42484 82920
min default max
net.ipv4....
Linux Kernel Tunning
➢ net.ipv4.tcp_syncookies=0

default 1

➢ net.ipv4.tcp_timestamps=0

default 1

➢ net.ipv4.tcp_app_wi...
More tuning :)
Ethernet Tuning
➢ TSO (TCP segmentation offload)
➢ GSO (generic segmentation offload)
➢ GRO/LRO (Generic/Large receive off...
GlusterFS setup
1. gluster peer probe nodeX
2. gluster volume create NAME replica/stripe 2
node1:/path/to/storage node2:/p...
XtreemeFS setup
1. Configure and start the directory server(s)
2. Configure and start the metadata server(s)
3. Configure ...
FhgFS setup
1. Configure /etc/fhgfs/fhgfs-*
2. /etc/init.d/fhgfs-client rebuild
3. Start daemons fhgfs-mgmtd fhgfs-meta fh...
Tahoe-LAFS setup
➢ Download
➢ python setup.py build
➢ export PATH=”$PATH:$(pwd)/bin”
➢ Install sshfs
➢ Setup ssh rsa key
Tahoe-LAFS setup
➢ mkdir /storage/tahoe
➢ cd /storage/tahoe && tahoe create-introducer .
➢ tahoe start .
➢ cat /storage/ta...
Tahoe-LAFS setup
➢ Configure the shares
➢ shares.needed = 2
➢ shares.happy = 2
➢ shares.total = 2

➢ Add accounts to the a...
Statistics
Sequential write
GlusterFS

dd if=/dev/zero of=test1 bs=1M count=1000
dd if=/dev/zero of=test1 bs=100K count=10000
dd if=/...
Sequential read
GlusterFS
XtreemeFS
FhgFS

dd if=/mnt/test1 of=/dev/zero bs=XX
250
225

214.6

MBytes/s

200

185.3

209
1...
Sequential write (local to cluster)
GlusterFS
XtreemeFS
FhgFS
Tahoe-LAFS

dd if=/tmp/test1 of=/mnt/test1 bs=XX
120

96.33
...
Sequential read (cluster to local)
GlusterFS
XtreemeFS
FhgFS

dd if=/mnt/test1 of=/tmp/test1 bs=XX
90
80

83.76

85.4

82....
Sequential read/write (cluster to cluster)
GlusterFS
XtreemeFS
FhgFS

dd if=/mnt/test1 of=/mnt/test2 bs=XX
120

103.96
100...
Joomla tests (local to cluster)
# for i in {1..100}; do time cp -a /tmp/joomla /mnt/joomla$i; done
70
62.83
60

seconds

5...
Joomla tests (cluster to local)
# for i in {1..100}; do time cp -a /mnt/joomla /tmp/joomla$i; done
250

200.73

seconds

2...
Joomla tests (cluster to cluster)
# for i in {1..100}; do time cp -a joomla joomla$i; done
# for i in {1..100}; do time cp...
Conclusion

➢Distributed FS for large file storage – FhgFS
➢ General purpose distributed FS - GlusterFS

* lower is better
QUESTIONS?
Marian Marinov
<mm@1h.com>
http://www.1h.com
http://hydra.azilian.net
irc.freenode.net hackman
ICQ: 7556201
Jab...
Upcoming SlideShare
Loading in...5
×

Performance comparison of Distributed File Systems on 1Gbit networks

8,583

Published on

Compare the performance of a few Distributed File Systems on 1Gbit networks
- GlusterFS
- XtremeFS
- FhgFS

Published in: Technology
3 Comments
15 Likes
Statistics
Notes
No Downloads
Views
Total Views
8,583
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
193
Comments
3
Likes
15
Embeds 0
No embeds

No notes for slide

Transcript of "Performance comparison of Distributed File Systems on 1Gbit networks"

  1. 1. Performance comparisson of Distributed File Systems GlusterFS XfreemFS FhgFS Marian Marinov CEO of 1H Ltd.
  2. 2. What have I tested? ➢ GlusterFS http://glusterfs.org ➢ XtremeFS http://www.xtreemfs.org/ ➢ FhgFS http://www.fhgfs.com/cms/ Fraunhofer ➢ Tahoe-LAFS http://tahoe-lafs.org/ ➢ PlasmaFS http://blog.camlcity.org/blog/plasma4.html
  3. 3. What will be compared? ➢ Ease of install and configuration ➢ Sequential write and read (large file) ➢ Sequential write and read (many same size, small files) ➢ Copy from local to distributed ➢ Copy from distributed to local ➢ Copy from distributed to distributed ➢ Creating many random file sizes (real cases) ➢ Creating many links (cp -al)
  4. 4. Why only on 1Gbit/s ? ➢ It is considered commodity ➢ 6-7 years ago it was considered high performance ➢ Some projects have started around that time ➢ And last, I only had 1Gbit/s switches available for the tests
  5. 5. Lets get the theory first 1Gbit/s has ~950Mbit/s usable Bandwidth Wikipedia - Ethernet frame Which is 118.75 MBytes/s usable speed iperf tests - 512Mbit/s -> 65MByte/s There are many 1Gbit/s adapters that can not go beyond 70k pps iperf tests - 938Mbit/s -> 117MByte/s hping3 tcp pps tests - 50096 PPS (75MBytes/s) - 62964 PPS (94MBytes/s)
  6. 6. Verify what the hardware can deliver locally # echo 3 > /proc/sys/vm/drop_caches # time dd if=/dev/zero of=test1 bs=XX count=1000 # time dd if=test1 of=/dev/null bs=XX bs=1M Local write 141MB/s bs=1M Local read 228MB/s real 0m4.605s bs=100K Local write 141MB/s real 0m7.493s real 0m7.639s bs=100K Local read 226MB/s real 0m4.596s bs=1K Local write 126MB/s real 0m8.354s bs=1K Local read 220MB/s real 0m4.770s * most distributed filesystems write with the speed of the slowest member node
  7. 7. Linux Kernel Tuning sysctl net.core.netdev_max_backlog=2000 Default 1000 Congestion control selective acknowledgments net.ipv4.tcp_sack=0 net.ipv4.tcp_dsack=0 Default enabled
  8. 8. Linux Kernel Tuning TCP memory optimizations min pressure max net.ipv4.tcp_mem=41460 42484 82920 min default max net.ipv4.tcp_rmem=8192 87380 6291456 net.ipv4.tcp_wmem=8192 87380 6291456 Double the tcp memory
  9. 9. Linux Kernel Tunning ➢ net.ipv4.tcp_syncookies=0 default 1 ➢ net.ipv4.tcp_timestamps=0 default 1 ➢ net.ipv4.tcp_app_win=40 default 31 ➢ net.ipv4.tcp_early_retrans=1 default 2 * For more information - Documentation/networking/ip-sysctl.txt
  10. 10. More tuning :)
  11. 11. Ethernet Tuning ➢ TSO (TCP segmentation offload) ➢ GSO (generic segmentation offload) ➢ GRO/LRO (Generic/Large receive offload) ➢ TX/RX checksumming ➢ ethtool -K ethX tx on rx on tso on gro on lro on
  12. 12. GlusterFS setup 1. gluster peer probe nodeX 2. gluster volume create NAME replica/stripe 2 node1:/path/to/storage node2:/path/to/storage 3. gluster volume start NAME 4. mount -t glusterfs nodeX:/NAME /mnt
  13. 13. XtreemeFS setup 1. Configure and start the directory server(s) 2. Configure and start the metadata server(s) 3. Configure and start the storage server(s) 4. mkfs.xtreemfs localhost/myVolume 5. mount.xtreemfs localhost/myVolume /some/local/path
  14. 14. FhgFS setup 1. Configure /etc/fhgfs/fhgfs-* 2. /etc/init.d/fhgfs-client rebuild 3. Start daemons fhgfs-mgmtd fhgfs-meta fhgfs-storage fhgfs-admon fhgfs-helperd 4. Configure the local client on all machines 5. Start the local client fhgfs-client
  15. 15. Tahoe-LAFS setup ➢ Download ➢ python setup.py build ➢ export PATH=”$PATH:$(pwd)/bin” ➢ Install sshfs ➢ Setup ssh rsa key
  16. 16. Tahoe-LAFS setup ➢ mkdir /storage/tahoe ➢ cd /storage/tahoe && tahoe create-introducer . ➢ tahoe start . ➢ cat /storage/tahoe/private/introducer.furl ➢ mkdir /storage/tahoe-storage ➢ cd /storage/tahoe-storage && tahoe create-node . ➢ Add the introducer.furl to tahoe.cfg ➢ Add [sftpd] section to tahoe.cfg
  17. 17. Tahoe-LAFS setup ➢ Configure the shares ➢ shares.needed = 2 ➢ shares.happy = 2 ➢ shares.total = 2 ➢ Add accounts to the accounts file # This is a password line, (username, password, cap) alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
  18. 18. Statistics
  19. 19. Sequential write GlusterFS dd if=/dev/zero of=test1 bs=1M count=1000 dd if=/dev/zero of=test1 bs=100K count=10000 dd if=/dev/zero of=test1 bs=1K count=1000000 500 XtreemeFS FhgFS 467 450 400 MBytes/s 358 342 350 300 250 200 150 112.6 106.3 100 50 0 43.53 13.7 59.83 1.7 1K * higher is better 100K 1M
  20. 20. Sequential read GlusterFS XtreemeFS FhgFS dd if=/mnt/test1 of=/dev/zero bs=XX 250 225 214.6 MBytes/s 200 185.3 209 181.3 179.6 150 105 105.6 100K 1M 100 74.6 50 0 1K * higher is better
  21. 21. Sequential write (local to cluster) GlusterFS XtreemeFS FhgFS Tahoe-LAFS dd if=/tmp/test1 of=/mnt/test1 bs=XX 120 96.33 100 93.7 87.26 76.7 MBytes/s 80 70.3 57.96 60 43.7 40 20 11.36 5.41 0 1K * higher is better 100K 1M
  22. 22. Sequential read (cluster to local) GlusterFS XtreemeFS FhgFS dd if=/mnt/test1 of=/tmp/test1 bs=XX 90 80 83.76 85.4 82.56 77.5 74.83 72.56 66.1 70 67.13 100K 1M MBytes/s 60 50 40 30 20 10 0 1K * higher is better
  23. 23. Sequential read/write (cluster to cluster) GlusterFS XtreemeFS FhgFS dd if=/mnt/test1 of=/mnt/test2 bs=XX 120 103.96 100 94.4 93.73 MBytes/s 80 62.7 59.6 60 40 20 36 40.7 11.8 0 1K * higher is better 100K 1M
  24. 24. Joomla tests (local to cluster) # for i in {1..100}; do time cp -a /tmp/joomla /mnt/joomla$i; done 70 62.83 60 seconds 50 40 31.42 30 19.26 20 10 0 copy * lower is better 28MB 6384 inodes GlusterFS XtreemeFS FhgFS
  25. 25. Joomla tests (cluster to local) # for i in {1..100}; do time cp -a /mnt/joomla /tmp/joomla$i; done 250 200.73 seconds 200 150 100 50 39.7 19.26 0 copy * lower is better 28MB 6384 inodes GlusterFS XtreemeFS FhgFS
  26. 26. Joomla tests (cluster to cluster) # for i in {1..100}; do time cp -a joomla joomla$i; done # for i in {1..100}; do time cp -al joomla joomla$i; done 28MB 6384 inodes GlusterFS XtreemeFS FhgFS 300 265.02 250 seconds 200 150 113.46 100 50 89.52 76.44 51.31 22.53 0 copy * lower is better link
  27. 27. Conclusion ➢Distributed FS for large file storage – FhgFS ➢ General purpose distributed FS - GlusterFS * lower is better
  28. 28. QUESTIONS? Marian Marinov <mm@1h.com> http://www.1h.com http://hydra.azilian.net irc.freenode.net hackman ICQ: 7556201 Jabber: hackman@jabber.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×