RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

2,777 views

Published on

Rakuten Technology Conference 2013
"LeoFS - Open the New Door"
Yosuke Hara (Rakuten)

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

  1. 1. The Lion of Storage Systems Open the New Door Yosuke Hara Oct 26, 2013 (rev 2.2) 1
  2. 2. LeoFS is "Unstructured Big Data Storage for the Web" and a highly available, distributed, eventually consistent storage system. Organizations can use LeoFS to store lots of data efficently, safely and inexpensively. Started OSS-project on July 4, 2012 www.leofs.org 2
  3. 3. Motivation 3
  4. 4. Motivation As of 2010 Get Away From Using "Expensive H/W Based Storages" ? Expensive Storage Problems: 1. High Costs (Initial Costs, Running Costs) 2. Possibility of "SPOF" 3. NOT Easily Scale Storage Expansion is difficult during periods of increasing data 4
  5. 5. The Lion of Storage Systems HIGH Availability LeoFS Non Stop 3 Vs in 3 HIGHs Velocity: Low Latency Minimum Resources HIGH Cost Performance Ratio Volume: Petabyte / Exabyte Variety: Photo, Movie, Unstructured-data HIGH Scalability REST-API / AWS S3-API 5
  6. 6. Overview 6
  7. 7. LeoFS Overview Request from Web Applications/ Browsers w/REST-API / S3-API No Master No SPOF Load Balancer LeoFS-Manager REST over HTTP (80/443) RPC (4369) LeoFS-Gateway (4000,4010,4020) Monitor LeoFS-Storage RPC (4369) (10020, 10021) Storage Engine/Router metadata Object Store Storage Engine/Router metadata Object Store Storage Engine/Router metadata GUI Console Object Store 7
  8. 8. Gateway 8
  9. 9. LeoFS Overview - Gateway Handle HTTP Request and Response Built in "Object Cache Mechanism" Clients Choosing Replica Target Node(s) KEY = “bucket/leofs.key” Hash = md5(Filename) REST-API / S3-API Stateless Proxy + Object Cache Gateway(s) [ Memory Cache, Disc Cache ] RING 2 ^ 128 (MD5) # of replicas = 3 Use Consistent Hashing for decision of a primary node Primary Node Secondary-1 Secondary-2 Storage Cluster Storage Cluster 9
  10. 10. Storage 10
  11. 11. LeoFS Overview - Storage Automatically Replicate an Object and a Metadata to Remote Node(s) Gateway Choosing Replica Target Node(s) KEY = “bucket/leofs.key” Hash = md5(Filename) RING 2 ^ 128 (MD5) # of replicas = 3 Primary Node Secondary-1 Storage (Storage Cluster) Secondary-2 Use "Consistent Hashing" for Replication in the Storage Cluster "P2P" 11
  12. 12. LeoFS Overview - Storage Storage Engine consits of "Object Storage" and "Metadata Storage" Built in "Replicator", "Recoverer" w/Queue for the Eventual Consistency Gateway Request From Gateway Storage Engine, Metadata + Object Container LeoFS Storage Storage Engine Workers ... Replicator Repairer w/Queue ... Metadata : Keeps an in-memory index of all data Object Container : Manages "Log Structured File" 12
  13. 13. LeoFS Storage Engine - Retrieve an object from the storage Storage Engine Worker < META DATA < META DATA > > ID Id Filename Filename Offset, Size Offset Checksum (MD5) Size Version# Checksum Header File Footer Object Container Metadata Storage Storage Engine Worker 13
  14. 14. LeoFS Storage Engine - Retrieve an object from the storage Storage Engine Worker Insert a metadata Append an object into the object container < META DATA < META DATA > > ID Id Filename Filename Offset, Size Offset Checksum (MD5) Size Version# Checksum Header File Footer Object Container Metadata Storage Storage Engine Worker 14
  15. 15. LeoFS Storage Engine - Remove unnecessary objects from the storage Storage Engine Worker Compact Old Object Container/Metadata New Object Container/Metadata Storage Engine Worker 15
  16. 16. LeoFS Overview - Storage - Data Structure/Relationship an object for Retrieve an File (Object) for Sync <Metadata> {VNodeId, Key} KeySize Custom Meta Size File Size Offset Version Timestamp Checksum <Needle> Checksum KeySize User-Meta Size DataSize Offset Version Header (Metadata - Fixed length) Timestamp {VNodeId, Key} User-Meta Actual File Body (Variable Length) Footer Footer (8B) Needle-5 Needle-4 Needle-3 Needle-2 Needle-1 Super-block <Object Container> 16
  17. 17. LeoFS Overview - Storage - Large Object Support To Equalize Disk Usage of Every Storage Node To Realize High I/O efficiency and High Availability [ WRITE Operation ] chunk-0 Every chunked object and metadata are replicated in the cluster chunk-1 chunk-2 chunk-3 Chunked Objects Original Object Name Original Object Size # of Chunks An Original Object’s Metadata Client(s) Gateway Storage Cluster 17
  18. 18. Manager 18
  19. 19. LeoFS Overview - Manager Operate LeoFS - Gateway and Storage Cluster "RING Monitor" and "NodeState Monitor" Gateway(s) Gateway(s) Manager(s) Monitor RING, Node State Storage Cluster Operate Storage Cluster status, suspend, resume, detach, whereis, ... 19
  20. 20. New Features 20
  21. 21. "Insight" 21
  22. 22. New Features - LeoInsight (v1.0) Give Insight into the State of LeoFS 1. To control requests from Clients to LeoFS 2. To check and see "Traffic info" and "State of Every Node" for Keeping Availability 22
  23. 23. New Features - LeoInsight (v1.0) The Lion of Storage Systems Storage Cluster Operate LeoFS Manager Gateway Pr Tr a ffi c -In fo fro ov e s No tif of a y No de fro m G at ew Notifier ay /S to r ag m G at ew ay e/ M an Re ag er tri e ve A TES R I( P J N) SO Consume MSG Persistent calculated statistics-data Distributed Queue (ElkDB) TimeSeriesDB (Savannah) 23
  24. 24. More Scalability & More Availability 24
  25. 25. New Features - Multi Data Center Data Replication (v1.0) HIGH-Scalability HIGH-Availability + Easy Operation for Admins Europe Tokyo US Singapore NO SPOF NO Performance Degration 25
  26. 26. v1.0 - Multi Data Center Data Replication [ 3 Regions & 5 Replicas ] DC-1 Configuration: - Method of Replication: - Consistency Level: Client Application(s) Request to the Target Region - local-quorum:[N=3, W=2, R=1, D=2] - # of target DC(s):2 Method of MDC-Replication: Async: Bulked Transfer Sync+Tran: Consensus Algorithm - # of replicas a DC:1 >> Total of Replicas: 5 Storage cluster [replicas:3] [replicas:1] [replicas:1] DC-1 DC-2 DC-3 Manager cluster Monitor and Replicate each “RING” and “System Configuration” "Leo Storage Platform" 26
  27. 27. v1.0 - Multi Data Center Data Replication [ 3 Regions & 5 Replicas ] DC-1 Configuration: - Method of Replication: - Consistency Level: Client Application(s) Request to the Target Region - local-quorum:[N=3, W=2, R=1, D=2] - # of target DC(s):2 Method of MDC-Replication: Async: Bulked Transfer Sync+Tran: Consensus Algorithm - # of replicas a DC:1 >> Total of Replicas: 5 1) 3 replicas are written in "Local Region" Storage cluster [replicas:3] [replicas:1] [replicas:1] DC-1 DC-2 DC-3 Manager cluster Monitor and Replicate each “RING” and “System Configuration” "Leo Storage Platform" 27
  28. 28. v1.0 - Multi Data Center Data Replication [ 3 Regions & 5 Replicas ] DC1.node_0 DC1.node_1 DC1.node_2 DC2.node_3 DC3.node_4 Client Application(s) Request to the Target Region Leader Primary Local-follower Remote-follower Follower 2) Sync (or Async) Rplicaion to Other Region(s) Storage cluster [replicas:3] [replicas:1] [replicas:1] DC-1 DC-2 DC-3 Manager cluster Monitor and Replicate each “RING” and “System Configuration” "Leo Storage Platform" 28
  29. 29. v1.0 - Multi Data Center Data Replication [ 3 Regions & 5 Replicas ] Remote-1 Remote-2 Tokyo Singapore US Singapore Client Local Region Tokyo Europe Europe US Singapore US Europe Tokyo Application(s) Request to the Target Region 3) Replication for Geographical Optimization Storage cluster [replicas:3] [replicas:1] [replicas:1] DC-1 DC-2 DC-3 DC-4 Tokyo Singapore US Europe Manager cluster Monitor and Replicate each “RING” and “System Configuration” "Leo Storage Platform" 29
  30. 30. "Center" 30
  31. 31. New Features - LeoCenter Web-based administrative console for inspecting and manipulating LeoFS Storage Clusters and LeoFS Gateway Admin Tools Access Log Analysis Operate LeoFS 31
  32. 32. Access Log Analysis (β) 32
  33. 33. The Lion of Storage Systems HIGH Availability LeoFS Non Stop 3 Vs in 3 HIGHs Velocity: Low Latency Minimum Resources HIGH Cost Performance Ratio Volume: Petabyte / Exabyte Variety: Photo, Movie, Unstructured-data HIGH Scalability REST-API / AWS S3-API 33
  34. 34. Set Sail for “Cloud Storage” Website: www.leofs.org Twitter: @LeoFastStorage Facebook: www.facebook.com/org.leofs 34

×