Your SlideShare is downloading. ×
Scaling and High Performance Storage System: LeoFS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scaling and High Performance Storage System: LeoFS

14,070
views

Published on

LeoFS is an Unstructured Object Storage for the Web and a highly available, distributed, eventually consistent storage system.

LeoFS is an Unstructured Object Storage for the Web and a highly available, distributed, eventually consistent storage system.

Published in: Technology, Business

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,070
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
58
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Scaling and High Performance Storage System Yosuke Hara - @yosukehara A Researcher of R.I.T. and Tech Lead LeoFS with Hiroki Matsue, LeoFS Support and Rakuten Software Engineer
  • 2. LeoFS is "Unstructured Big Data Storage for the Web" and a highly available, distributed, eventually consistent storage system. ! Organizations can use LeoFS to store lots of data efficently, safely and inexpensively. ! LeoFS was published as OSS on July of 2012 leo-project.net/leofs
  • 3. Overview Brief Benchmark Report Multi Data Center Replication LeoFS Administration at Rakuten Future Plans “NFS” Support and more
  • 4. Overview Tokyo, Japan
  • 5. The Lion of Storage Systems HIGH Availability HIGH Cost Performance Ratio HIGH Scalability LeoFS Non Stop Velocity: Low Latency Minimum Resources Volume: Petabyte / Exabyte Variety: Photo, Movie, Unstructured-data 3 Vs in 3 HIGHs
  • 6. Metadata Object Storage Storage Engine/Router Monitor GUI Console ( Erlang RPC) LeoFS Overview Storage Manager ( Erlang RPC) Gateway ( TCP/IP,SNMP ) Request from Web Applications / Browsers w/HTTP over REST-API / S3-API Load Balancer Keeping High Availability Keeping High Performance Easy Administration Metadata Object Storage Storage Engine/Router Metadata Object Storage Storage Engine/Router
  • 7. Gateway
  • 8. LeoFS Overview - Gateway Stateless Proxy + Object Cache REST-API / S3-API Use Consistent Hashing for decision of a primary node [ Memory Cache, Disc Cache ] StorageClusterGateway(s)Clients HTTP Request and Response Built in Object Cache Mechanism Storage Cluster Fast HTTP Server - Cowboy API Handler Object Cache Mechanism
  • 9. Storage
  • 10. Storage(StorageCluster)GatewayLeoFS Overview - Storage Use "Consistent Hashing" for Data Operation in the Storage Cluster Choosing Replica Target Node(s) RING 2 ^ 128 (MD5) # of replicas = 3 KEY = “bucket/leofs.key” Hash = md5(Filename) Secondary-1 Secondary-2 Primary Node "P2P" WRITE: Auto Replication READ : Auto Repair of an Inconsistent Object with Async
  • 11. Request From Gateway LeoFS Overview - Storage ... LeoFS Storage Replicator Recoverer ... Storage Engine StorageEngine,Metadata+ObjectStorageGateway Storage consists of Object Storage and Metadata Storage Includes Replicator and Recoverer for the eventual consistency Metadata Storage Object Storage
  • 12. LeoFS Overview - Storage - Data StructureMetadata Storage ObjectStorage Robust and High Performance Necessary for GC Offset Version Time- stampKey <Metadata> Checksum for Sync KeySize Custom Meta Size File Size for retrieving an object Footer (8B) Checksum KeySize DataSize Offset Version Time- stamp Key User-Meta Footer Header (Metadata - Fixed length) Body (Variable Length) User-Meta Size Actual File <Needle> Super-block Needle-1 Needle-2 Needle-3 <Object Container> Needle-4 Needle-5
  • 13. To Equalize Disk Usage in Every Storage Node To Realize High I/O efficiency and High Availability LeoFS Overview - Storage - Large Object Support chunk-0 chunk-1 chunk-2 chunk-3 An Original Object’s Metadata Original Object Name Original Object Size # of Chunks Storage ClusterGatewayClient(s) [ WRITE Operation ] Chunked Objects Every chunked object and metadata are replicated in the cluster
  • 14. Manager
  • 15. Storage Cluster LeoFS Overview - Manager Monitor Operate RING, Node State status, suspend, resume, detach, whereis, ... Gateway(s) StorageClusterGateway(s) Manager(s) Operate LeoFS - Gateway and Storage Cluster "RING Monitor" and "NodeState Monitor"
  • 16. Brief Benchmark Report Hokkaido, Japan
  • 17. LeoFS kept in a stable performance through the benchmark Brief Benchmark Report Bottleneck is Disk I/O The cache mechanism contributed to reduce network traffic between Gateway and Storage Summary of the benchmark results
  • 18. Brief Benchmark Report 1st Case: Group of Value Ranges (HDD) Storage:5, Gateway:1, Manager:2 R:W = 9:1 2nd Case: Group of Value Ranges (HDD) Storage:5, Gateway:1, Manager:2 R:W = 8:2 source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140605/tests/1m_r9w1_240min source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140605/tests/1m_r8w2_120min
  • 19. Brief Benchmark Report CPU Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 (12 cores / 24 threads) Memory 96GB Disk HDD - 240GB RAID0 Network 10G-Ether Server Spec - Gateway: CPU Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 (12 cores / 24 threads) Memory 96GB Disk HDD - 240GB RAID0 (System) HDD - 2TB RAID0 (Data) Network 10G-Ether Server Spec - Storage x5:
  • 20. Network 10Gbps OS CentOS release 6.5 (Final) Erlang OTP R16B03-1 LeoFS v1.0.2 Environment: System Consistency Level: [ N:3, W:2, R:1, D:2 ] Duration 4.0h R:W 9:1 # of Concurrent Processes 64 # of Keys 100,000 Value Size ! ! ! ! ! Benchmark Configuration: Range (byte) Percentage 1024 10240 24% 10241 102400 30% 10241 819200 30% 819201 1572864 16% Brief Benchmark Report - 1st Case (HDD / R:W=9:1)
  • 21. source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140601/tests/1m_r9w1_240min 50ms Brief Benchmark Report - 1st Case (HDD / R:W=9:1) 50ms 1,500ops No Errors
  • 22. 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 450,000 500,000 550,000 600,000 650,000 700,000 750,000 800,000 850,000 900,000 950,000 1,000,000 1,050,000 1,100,000 1,150,000 1,200,000 1,250,000 1,300,000 1,350,000 1,400,000 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s 7500s 8000s 8500s 9000s 9500s 10000s 10500s 11000s 11500s 12000s 12500s 13000s 13500s 14000s gateway rxbyt/s gateway txbyt/s storage-1 rxbyt/s storage-1 txbyt/s storage-2 rxbyt/s storage-2 txbyt/s storage-3 rxbyt/s storage-3 txbyt/s storage-4 rxbyt/s storage-4 txbyt/s storage-5 rxbyt/s storage-5 txbyt/s Brief Benchmark Report - 1st Case / Network Traffic 10.0Gbps 7.0Gbps 5.0Gbps 6.0Gbps StorageGateway 60%
  • 23. 0.0 0.1 0.3 0.4 0.6 0.7 0.9 1.0 1.1 1.3 1.4 1.6 1.7 1.9 2.0 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s 7500s 8000s 8500s 9000s 9500s 10000s 10500s 11000s 11500s 12000s 12500s 13000s 13500s 14000s Memory Usage CPU Load 5min Brief Benchmark Report - 1st Case / Memory and CPU 1.0 0 10 20 30 40 50 60 70 80 90 100 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s 7500s 8000s 8500s 9000s 9500s 10000s 10500s 11000s 11500s 12000s 12500s 13000s 13500s 14000s gateway storage-1 storage-2 storage-3 storage-4 storage-5
  • 24. Network 10Gbps OS CentOS release 6.5 (Final) Erlang OTP R16B03-1 LeoFS v1.0.2 Environment: System Consistency Level: [ N:3, W:2, R:1, D:2 ] Duration 2.0h R:W 8:2 # of Concurrent Processes 64 # of Keys 100,000 Value Size ! ! ! ! ! Benchmark Configuration: Brief Benchmark Report - 2nd Case (HDD / R:W=8:2) Range (byte) Percentage 1024 10240 24% 10241 102400 30% 10241 819200 30% 819201 1572864 16%
  • 25. Brief Benchmark Report - 2nd Case (HDD / R:W=8:2) 60-70ms 80-90ms 1,000ops No Errors
  • 26. Compare 1st case with 2nd case
  • 27. 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 1,100,000 1,200,000 1,300,000 1,400,000 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s gateway rxbyt/s gateway txbyt/s storage-1 rxbyt/s storage-1 txbyt/s storage-2 rxbyt/s storage-2 txbyt/s storage-3 rxbyt/s storage-3 txbyt/s storage-4 rxbyt/s storage-4 txbyt/s storage-5 rxbyt/s storage-5 txbyt/s 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 1,100,000 1,200,000 1,300,000 1,400,000 1,500,000 1,600,000 1,700,000 1,800,000 1,900,000 2,000,000 2,100,000 2,200,000 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s 6.0Gbps Brief Benchmark Report 7.0Gbps 6.0Gbps 7.0Gbps minus 0.7Gbps 1st Case - Network Traffic 2nd Case - Network Traffic
  • 28. 0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 500.0 550.0 600.0 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s 0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 500.0 550.0 600.0 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s storage-1 storage-2 storage-3 storage-4 storage-5 100 100 Brief Benchmark Report 2nd Case - Disk util% 200 200 1st Case - Disk util% 1.8x high
  • 29. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s Brief Benchmark Report 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 0s 500s 1000s 1500s 2000s 2500s 3000s 3500s 4000s 4500s 5000s 5500s 6000s 6500s 7000s 1.00 1.00 1.6x high 2nd Case - CPU Load 5min 1st Case - CPU Load 5min
  • 30. LeoFS kept in a stable performance through the benchmark Brief Benchmark Report Bottleneck is Disk I/O The cache mechanism contributed to reduce network traffic between Gateway and Storage Conclusion:
  • 31. Multi Data Center Replication Hokkaido, Japan
  • 32. Tokyo Europe US Multi Data Center Replication HIGH-Scalability HIGH-Availability Easy Operation for Admins + NO SPOF NO Performance Degradation Singapore
  • 33. 1. Easy Operation to build multi clusters. 2. Asynchronous data replication between clusters Stacked data is transferred to remote cluster(s) 3. Eventual consistency Multi Data Center Replication Designed it as simple as possible
  • 34. DC-3DC-2 StorageclusterManagerclusterClient DC-1 Monitors and Replicates each “RING” and “System Configuration” "Leo Storage Platform" [# of replicas:1] [# of replicas:1][# of replicas:3] "join cluster DC-2 and DC-3" leo_rpcleo_rpc Multi Data Center Replication Executing “Join Cluster” on Manager Console Preparing MDC Replication
  • 35. DC-3DC-2 StorageclusterManagerclusterClient Monitors and Replicates each “RING” and “System Configuration” "Leo Storage Platform" [# of replicas:1] [# of replicas:1] Request to the Target Region Application(s) DC-1 [# of replicas:3] Temporally Stacking objects - One container's capacity is *32MB - When capacity is full, send it to remote cluster(s) * 32MB: default capacity - able to set optional value leo_rpcleo_rpc Multi Data Center Replication Stacking objects
  • 36. DC-3DC-2 StorageclusterManagerclusterClient Monitors and Replicates each “RING” and “System Configuration” "Leo Storage Platform" DC-1 Stacked an object with a metadata Compress it with LZ4 Replicated an object Request to the Target Region Application(s) leo_rpc leo_rpc leo_rpc Multi Data Center Replication Transferring stacked objects Stacked objects
  • 37. DC-3DC-2 StorageclusterManagerclusterClient Monitor and Replicate each “RING” and “System Configuration” "Leo Storage Platform" Request to the Target Region Application(s) DC-1 1) Receive metadata of stored objects 2) Compare them at the local cluster 3) Fix inconsistent objects leo_rpcleo_rpc leo_rpc leo_rpc Multi Data Center Replication Investigating stored objects
  • 38. LeoFS
 Administration at Rakuten Presented by Hiroki Matsue Rakuten Software Engineer Tokyo, JapanKyoto, Japan
  • 39. Storage Platform File Sharing Service Others Portal Site Photo Storage Background Storage of OpenStack LeoFS Administration at Rakuten
  • 40. Storage Platform
  • 41. Storage Platform - Scaling the Storage Platform (Movie) Reduce Costs High Reliability Easy to Scale S3-API
  • 42. Using Various Services Total Usage: 450TB # of Files: 600Million Daily Growth: 100GB Daily Reqs: 13Million Storage Platform - Scaling the Storage Platform E-Commerce Blog Insurance Calendar Recruiting Review Photo share Portal & Contents Bookmark B Storage Platform (Movie)
  • 43. Monitor GUI Console ( Erlang RPC) ( Erlang RPC) ( TCP/IP,SNMP ) Gatewayx4Storagex14 Manager x 2 Requests from Web Applications / Browsers w/HTTP over S3-API Load Balancer / Cache Servers Storage Platform - System Layout Total disk space: 600TB Number of Files: 600Million Access Stats: 800Mbps (MAX) 400Mbps (AVG)
  • 44. Monitor GUI Console ( Erlang RPC) ( Erlang RPC) ( TCP/IP,SNMP ) Gatewayx4Storagex14 Manager x 2 Storage Platform - Monitor Send Mail Alert Ganglia and Nagios Agent Status Collection (Ganglia) Status Check (Nagios) Port + Threshold Check
  • 45. Storage Platform - Spreading Globally Covering All Services with Multi DC Replication
  • 46. File Sharing Service + https://owncloud.com/
  • 47. + File Sharing Service - Required Targets Reduce Costs Handle Confidential Files Store Large Files Scale Easily
  • 48. + Share Docs and Movies with Group Companies Over 20 Companies, Over 10 Countries Over 4,000 Users, Over 10,000 Teams File Sharing Service - Usage
  • 49. LDAP Monitor GUI Console ( Erlang RPC) ( Erlang RPC) ( TCP/IP,SNMP ) Manager x 2 Authenticate Users Manage Configurations Manage Login Session (KVS) File Sharing Service - System Layout Web GUI File Browser
  • 50. Cover 25 Countries/Regions Over 20,000 Users + File Sharing Service - Future Plans
  • 51. Empowering the Services and the Users Through the Cloud Storage
  • 52. Future Plans Tokyo, JapanHokkaido, Japan
  • 53. NFS Support Future Plans Data-HUB: Centralize unstructured data in LeoFS Search / Analysis PaaS / IaaS Photo-Storage Many Kind of Data PhotoLog / Event Data Loading Data Analysis Data Stream Processing
  • 54. SavannaDB for Statistics Data Retrieve m etrics and stats from SavannaDB's Agents Storage Cluster ManagerGateway The Lion of Storage Systems REST-API (JSON) Operate LeoFS Notify a m essage of over # of req threshold SavannaDB's Agent Insight LeoFS LeoInsight Future Plans +
  • 55. Set Sail for “Cloud Storage” Website: leo-project.net Twitter: @LeoFastStorage Facebook: www.facebook.com/org.leofs