GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR


Published on

GlusterFS Presentation @FOSSCOMM2013

  • Be the first to comment

  • Be the first to like this

GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR

  1. 1. Store your trillions of bytes usingcommodity hardware and open source(GlusterFS)Theophanis K. KontogiannisRHC{SA,E,EV,ESM,I,X}
  2. 2. The problem● Data growth beyond manageable sizes● Data growth beyond cost effective sizesHow much would it cost to store 100PB ofnon structured data in a storage???
  3. 3. The ideaCreate a scalable data storing infrastructureuniformly presented to clients using:● Commodity (even off the Self) Hardware● Open standards
  4. 4. The concept
  5. 5. The visionGlusterFS:Open – Unified - ExtensibleScalable – Manageable - ReliableScale-out Network Attached Storage (NAS) Software SolutionforOn Premise - Virtualized - Cloud Environments
  6. 6. The implementation●Open source, distributed file system capableof scaling to thousands petabytes (actually,72 brontobytes!)and handling thousands of clients.Processing:1024 Terabytes = 1 Petabyte1024 Petabytes = 1 Exabyte1024 Exabytes = 1 Zettabyte1024 Zettabytes = 1 Yottabyte1024 Yottabytes = 1 Brontobyte● Clusters together storage building blocks over InfinibandRDMA or TCP/IP interconnect, aggregating disk and memoryresources and managing data in a single global namespace.● Based on a stackable user space design and can deliverexceptional performance for diverse workloads.● Self Healing● Not tied to I/O profiles or hardware or OS-The question is how much is a BrontoByte?-The question is WHO CARES?
  7. 7. Really it can support that much?Yes it can!2^32 (max subvolumes of distribute translator)X18 exabytes (max xfs volume size)=72 brontobytes(or 89,131,682,828,547,379,792,736,944,128bytes)GlusterFS is supporting 2^128 (uuid) inodes
  8. 8. And this is how it goes
  9. 9. A bit of (business as usual) history● Gluster Inc. was founded in 2005● Focused in Public & Private Cloud Storage● Main product GlusterFS was written byAnand Babu Periasamy, Gluster’s founderand CTO● Received $8.5M in 2010 via VC funding● Acquired for $136M by Red Hat in 2011
  10. 10. GlusterFS <--> Red Hat Storage● redirects to RHS pages● actively supported by RedHatWhat is important is the integration oftechnologies in ways that demonstrablybenefit the customers
  11. 11. Components●brickThe brick is the storage filesystem that has been assigned to a volume.●clientThe machine which mounts the volume (this may also be a server).●serverThe machine (virtual or bare metal) which hosts the actual filesystem in whichdata will be stored.●subvolumeA brick after being processed by at least one translator.●volumeThe final share after it passes through all the translators●TranslatorCode that interprets the actual files geometry/location/distribution on diskscomprising a volume and is responsible for the perceived performance
  12. 12. The Outer Atmosphere View
  13. 13. The 100.000ft viewStorage Node
  14. 14. The 50.000ft View
  15. 15. The 10.000ft View
  16. 16. The ground level view
  17. 17. ...and the programmers viewif (!(xl->fops = dlsym (handle, "fops"))) {gf_log ("xlator", GF_LOG_WARNING, "dlsym(fops) on %s",dlerror ());goto out;}if (!(xl->cbks = dlsym (handle, "cbks"))) {gf_log ("xlator", GF_LOG_WARNING, "dlsym(cbks) on %s",dlerror ());goto out;}if (!(xl->init = dlsym (handle, "init"))) {gf_log ("xlator", GF_LOG_WARNING, "dlsym(init) on %s",dlerror ());goto out;}if (!(xl->fini = dlsym (handle, "fini"))) {gf_log ("xlator", GF_LOG_WARNING, "dlsym(fini) on %s",dlerror ());goto out;}
  18. 18. Course of action● Partition, Format and mount the bricks● Format the partition● Mount the partition as a Gluster "brick"● Add an entry to /etc/fstab● Install Gluster packages on nodes● Run the gluster peer probe command● Configure your Gluster volume (and the translators)● Test using the volume
  19. 19. Translators?Translator Type Functional PurposeStorage Lowest level translator, stores and accesses data from local file system.Debug Provide interface and statistics for errors and debugging.Cluster Handle distribution and replication of data as it relates to writing to andreading from bricks & nodes.Encryption Extension translators for on-the-fly encryption/decryption of stored data.Protocol Interface translators for client / server authentication and communications.Performance Tuning translators to adjust for workload and I/O profiles.Bindings Add extensibility, e.g. The Python interface written by Jeff Darcy to extend APIinteraction with GlusterFS.System System access translators, e.g. Interfacing with file system access control.Scheduler I/O schedulers that determine how to distribute new write operations acrossclustered systems.Features Add additional features such as Quotas, Filters, Locks, etc.
  20. 20. Not flexible with command line?
  21. 21. Benchmarks?Method and platforms pretty much standard:● Multiple dd of varying blocks are read and written frommultiple clients simultaneously.●GlusterFS Brick Configuration (16 bricks)Processor - Dual Intel(R) Xeon(R) CPU 5160 @ 3.00GHzRAM - 8GB FB-DIMMDisk - SATA-II 500GBHCA - Mellanox MHGS18-XT/S InfiniBand HCA● Client Configuration (64 clients)RAM - 4GB DDR2 (533 Mhz)Processor - Single Intel(R) Pentium(R) D CPU 3.40GHzDisk - SATA-II 500GBHCA - Mellanox MHGS18-XT/S InfiniBand HCA●Interconnect Switch: Voltaire port InfiniBand Switch (14U)
  22. 22. Size does not matter....
  23. 23. ...number of participants does
  24. 24. Suck the throughput. You can!
  25. 25. And you can GeoDistribute it :)Multi-site cascading
  26. 26. Enough with food for thoughts...●● www.gluster.orgNow back to your consoles!!!!Thank you...