Colony for-openstack-summit

582 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
582
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Colony for-openstack-summit

  1. 1. Inter-cloud object storage: Colony 15/Oct/2012 NTT DATA INTELLILINK Motonobu Ichimura @famaoCopyright © 2012 NTT DATA Corporation
  2. 2. EtherPad http://etherpad.openstack.org/grizzly-colonyCopyright © 2012 NTT DATA INTELLILINK Corporation 2
  3. 3. Agenda• What is Colony ? – Our goal – Usecase• How to make swift network(or region) aware – Problems with original swift code – Our modification – Investigation – Conclusion• Future Plan – Problems to tackle (and being tackled) – Collaboration Copyright © 2012 NTT DATA INTELLILINK Corporation 3
  4. 4. What is Colony?Copyright © 2012 NTT DATA INTELLILINK Corporation 4
  5. 5. Goal: academic community cloudAcademicCommunity Cloud Education Cloud Education Cloud Univ.-X Cloud     Univ. -A Cloud Univ.-B Cloud Research Cloud Research Cloud ・・・ Intercloud Intercloud services services Science Information Network Copyright © 2012 NTT DATA INTELLILINK Corporation 5
  6. 6. Intercloud object storage service Colony federates cloud Nova object storage services, like swift, to archive Glance Swift for intercloud Swif t intercloud object storage service. use Swift for intercloud use Swift forNova Swift for Swif local use intercloud t Glance use Glance Swif t Nova Swift for intercloud use Copyright © 2012 NTT DATA INTELLILINK Corporation 6
  7. 7. Users’ points of view Cloud Services Cloud-B Cloud- Container B1 Object B1-1 Object B1-2 A Container B2 Object B1-3 Container B3 Object A1-1 Swift-B Object I1-1 Container A1 Object A1-2 Inter-cloud Container Object I1-2 Container A2 Object A1-3 I1 Inter-cloud Container I8 Object I1-3 Swift-A Container A3 Inter-cloud Container I10 Inter-cloud Container Object I4-1 I1 Inter-cloud Container I4 Object I4-2 Inter-cloud Container I13 Object I4-3 Geographicall Geographicall Object I1-1 Inter-cloud yy Container I1 Inter-cloud Container Object I1-2 Object I1-3 Distributed Distributed Swift-I I2 Inter-cloud Container I3 Object I4-1 Inter-cloud Container I4 Object I4-2 Object I4-3 Inter-cloud object storage service : colony Copyright © 2012 NTT DATA INTELLILINK Corporation 7
  8. 8. Colony archives the federation Shibboleth IdP Authenticate with Shibboleth IdP Cloud-A User Colony Apache mod_wsgi mod_shib Colony-horizon Colony- keystone Colony- dispatcher Squid Slapd Provide seamless access to Ubuntu multiple swifts Swift-I Swift Colony- Swift Colony- Swift- Keystone Slapd Keystone Slapd ACopyright © 2012 NTT DATA INTELLILINK Corporation 8
  9. 9. UseCase We plan to use Colony as Object Storage for Clouds to Clouds migration Object Storage to delevery VM Images around Japan Object Storage to store big data.Copyright © 2012 NTT DATA INTELLILINK Corporation 9
  10. 10. Developed software components in colony• Colony-Horizon – based on diablo/stable Horizon with some enhancements • Multi-region support – Users can choose which swift is used to store/retrieve objects • Swift Container’s ACL ,metadata support • Swift Object’s metadata support • >5G segment upload support …• Colony-Keystone – based on diablo/stable Keystone with some enhancements • Authenticate with Shibboleth • %{tanant_name} can be used for endpointTemplates in addition to % {tenant_id} to federate cloud services• Colony-Dispatcher - new • Relay requests to multiple object services (and merge response for clients) • Relay requests to a specific object service indicated by URI • Choose the “nearest” swift-proxy server to relay requests • Copy objects among different swifts• Utilities - new • Tools to simplfy admin tasks to federate object storage services Copyright © 2012 NTT DATA INTELLILINK Corporation 10
  11. 11. Colony-horizon Users can choose Swift -Iswift Swift -A Copyright © 2012 NTT DATA INTELLILINK Corporation 11
  12. 12. Colony – keystone Shibboleth IdP Modifications to keystone •• Add ePPN field to keystone schema Add ePPN field to keystone schema •• ADD rest api services to create token by ePPN ADD rest api services to create token by ePPN (/token_by/eppn) and email (/token_by/eppn) and email address(/token_by/email) address(/token_by/email) •• Add a rest api service to register/update ePPN Add a rest api service to register/update ePPN (/users/{user_id}/eppn) (/users/{user_id}/eppn)1. ID/passwd 2. Attribute: ePPN, mail_addr 0-1. User registration by mail_addr 0-2 . Associate ePPN to mail_addr by initial access Shibboleth SP Colony-Horizon Colony-Horizon 3. Attribute: ePPN Colony- Colony- 4. auth_token Keystone Keystone Copyright © 2012 NTT DATA INTELLILINK Corporation 12
  13. 13. Colony-dispatcher1. Swift client can send requests to Swift-A and Swift-I through SwiftDispatcher2. Swift Dispatcher merges and sends the response from each Swift to SwiftClient Swift Client Requests modified for merging responses. A:container1 •Account Info A:container2 •Container List •X-Copy-from/to I:container1 I:container2 Colony DispatcherResponse merged by Response merged by Swift Proxy Swift Proxy Swift ProxyColony Dispatcher has Colony Dispatcher hasa prefix to indicate a prefix to indicatewhich Swift is used which Swift is usedto store. to store. Swift-A (local) Swift-I (intercloud ) Copyright © 2012 NTT DATA INTELLILINK Corporation 13
  14. 14. CachingSwift Dispatcher can use cache proxy (like squid)per swift proxy to retrieve objects from remoteswifts. A:container1 A:container2 I:container1 Colony Dispatcher I:container2 Cache(Proxy) Swift Client Swift Proxy Swift Proxy Swift Proxy Swift-A (local) Swift-I (intercloud )Copyright © 2012 NTT DATA INTELLILINK Corporation 14
  15. 15. How to swift make network awareCopyright © 2012 NTT DATA INTELLILINK Corporation 15
  16. 16. Current implementationCopyright © 2012 NTT DATA INTELLILINK Corporation 16
  17. 17. Problems which original swift code has• PUT/GET performance – Swift proxy waits all objects are put to storage servers. – Swift proxy chooses randomly the node to retrieve object. Copyright © 2012 NTT DATA INTELLILINK Corporation 17
  18. 18. Test Environments CPU: Intel(R) Xeon(R) CPU E7- 8870 (40core) Mem: 126GB NIC: 1000baseT/Full x2 900MBps(0.1msec) Sapporo 200-850Mbps(18msec) Tokyo 9900MBps CPU: AMD Opetron 6128 2000Mhz (16core) Mem: 32GB NIC: 10000baseT/Full x2Copyright © 2012 NTT DATA INTELLILINK Corporation 18
  19. 19. PUT operation Sapporo Storag Storag e e Storag Object PUT operation e is always affected by the worst case. Tokyo Storag Storag e e Storag e Proxy ClientCopyright © 2012 NTT DATA INTELLILINK Corporation 19
  20. 20. Objects location name @Tokyo @Sapporo 1K 1 2 1M 2 1 10M 1 2 100M 2 1 1G 2 1Copyright © 2012 NTT DATA INTELLILINK Corporation 20
  21. 21. PUT objects throughput @Tokyo (Bytes/sec) 1 2 3 4 5 1K 4,857 5,596 2,384 405 7,844 1M 1,109,196 1,161,519 1,157,529 1,092,685 1,162,359 10M 2,052,541 1,935,695 2,066,010 2,065,412 2,068,340 100M 9,425,346 9,411,894 9,441,722 9,427,770 9,432,213 1G 47,020,441 47,032,115 47,667,067 47,083,438 47,852,594Copyright © 2012 NTT DATA INTELLILINK Corporation 21
  22. 22. GET operation High-bandwidth, low-latency Sapporo Storag Storag e e Storag 1/replications e Tokyo Storag Storag e e Storag e Proxy High-bandwidth, low-latency ClientCopyright © 2012 NTT DATA INTELLILINK Corporation 22
  23. 23. Objects location name @Tokyo @Sapporo 1K 1 2 1M 2 1 10M 1 2 100M 2 1 1G 2 1 1.txt (1G) 3 0 5.txt (1G) 0 3Copyright © 2012 NTT DATA INTELLILINK Corporation 23
  24. 24. GET objects throughput @Tokyo (Bytes/sec) 1 2 3 4 51K 8,859 8,165 8,225 11,455 11,5041M 1,222,259 1,172,193 1,149,629 1,148,493 49,542,92410M 96,848,249 97,777,529 2,098,071 100,899,319 99,814,948100M 104,857,600 9,670,414 9,672,893 9,658,095 9,657,3131G 117,490,592 115,273,333 51,117,116 51,109,464 51,099,6161.txt( 51,085,780 44,245,222 50,812,419 50,923,435 51,066,880Worstcase)5.txt 117,473,740 115,216,645 115,340,248 115,288,545 114,347,285(Bestcase) Performance degradation by network between Sapporo and Tokyo Copyright © 2012 NTT DATA INTELLILINK Corporation 24
  25. 25. Our modificationCopyright © 2012 NTT DATA INTELLILINK Corporation 25
  26. 26. How to solve - Basic Idea• Limitation – Don’t modify data structure (including ring) – Minimize customization• Adding some rules to the ring’s data strcuture – Zone information is treated as decimal number, so consider difference between zoneA and ZoneB represents a distance of zoneA and ZoneB• Adding some zone hints to Swift proxy servers• Changes the order of nodes for Proxy server. Copyright © 2012 NTT DATA INTELLILINK Corporation 26
  27. 27. How to solve Proxy Zone 200[app:proxy-server] Distance Sapporo 10nearby_mode = false zone 200- Proxy , which has zone info(200) andown_zone = 100 zone distance(10), considers 202 storage servers between zone 200-near_distance = 10 210 to be located near the proxy. Tokyo zone 100- Proxy 102 Zone 100 Proxy ,which has zone info(100) and Distance zone distance(10), considers storage servers between zone 100- 10 110 to be located near the proxy.Copyright © 2012 NTT DATA INTELLILINK Corporation 27
  28. 28. PUT operation Proxy initially puts objects to the nearest storage servers using zone information and zone distance. Then object replicator replicates it the proper position asyncronasly. Sapporo Storag Storag eD eF Storag eG Tokyo Storag Storag eA eB zone_info: 100 Storag zone_distance: eC Proxy10 ClientCopyright © 2012 NTT DATA INTELLILINK Corporation 28
  29. 29. PUT operation This is the same situation that all storage servers located in Supporo are broken. Sapporo Storag Storag ×eD eE × Storag eF × Tokyo Storag Storag e eB A Storag eC Proxy Hinted hand Client offCopyright © 2012 NTT DATA INTELLILINK Corporation 29
  30. 30. GET operation 1. First, try to retrieve object from storage server near the Sapporo proxy. 2. After that, try to Storag Storag retrieve object from e e Storag storage server e indicated as a primary zone Tokyo Storag Storag e e Storag e Proxy ClientCopyright © 2012 NTT DATA INTELLILINK Corporation 30
  31. 31. DELETE operation 1. First, try to delete object from storage server near the Sapporo proxy 2. After that, try to Storag Storag delete object from e e Storag storage server e indicated as a primary zone Tokyo Storag Storag e e Storag e Proxy ClientCopyright © 2012 NTT DATA INTELLILINK Corporation 31
  32. 32. Code ring.py proxy/server.p   def get_near_nodes(self, account, container, obj, own_zone, near_distance): y @@ -1044,6 +1056,14 @@ def POST(self, req): 1056              container_partition, containers, _junk, req.acl, _junk =          """ 1057 Get the partition and nodes same as get_nodes,                  self.container_info(self.account_name, self.container_name, 1058                      account_autocreate=self.app.account_autocreate)   :param account: account name 1059 +            if self.app.nearby_mode: :param container: container name 1060 :param obj: object name +                partition, near_nodes = self.app.object_ring.get_near_nodes(   :param own_zone: top number of zone name 1061 :param near_distance: recognize matched zone name +                    self.account_name, self.container_name, self.object_name,   which start from own_zone to a number add own_zone and this number. 1062 +                    self.app.own_zone, self.app.near_distance)   1063 :returns: a tuple of (partition, list of node dicts) +                print before nodes: %s % containers   1064 """ +                containers = near_nodes +    1065         part, nodes = self.get_nodes(account, container, obj) +                    [cont for cont in containers if cont[zone] not in [c[zone] for  c in near_nodes]]   1066 +                print after nodes: %s % containers 1047 1067              if swift.authorize in req.environ: 1048 1068         def isnearby(one, other, distance):                  aresp = req.environ[swift.authorize](req) 1049 1069             if one <= other and one + distance > other:                  if aresp:                 return True             return False         near_nodes = [] and then modify proxy/server.py to use get_near_nodes() for         for node in nodes:             if isnearby(own_zone, node[zone], near_distance):                 near_nodes.append(node)         if len(near_nodes) <= self.replica_count:             for node in self.get_more_nodes(part): each method.                 if isnearby(own_zone, node[zone], near_distance):                     near_nodes.append(node)                 if len(near_nodes) >= self.replica_count:                     break         return part, near_nodes adding get_near_nodes() to ring.pyCopyright © 2012 NTT DATA INTELLILINK Corporation 32
  33. 33. Investigation PUT Average (bytes/sec) @Sapporo40,000,00035,000,00030,000,00025,000,000 Original20,000,000 Patched15,000,00010,000,000 5,000,000 0 1K 1M 10M 100M 1G PUT Average (bytes/sec) @Tokyo160,000,000140,000,000120,000,000100,000,000 Original 80,000,000 Patched 60,000,000 40,000,000 20,000,000 0 1K 1M 10M 100M 1GCopyright © 2012 NTT DATA INTELLILINK Corporation 33
  34. 34. Using Cache How about the case of all objects are located to remote areas ? Sapporo Storag Storag e e Storag e TokyoKyusyu Storag Storag e e Proxy Storag e ProxyClient Copyright © 2012 NTT DATA INTELLILINK Corporation 34
  35. 35. Colony-Dispatcher as a cacheColony-Dispatcher can be a swift-proxy-proxy withcache mechanism Copyright © 2012 NTT DATA INTELLILINK Corporation 35
  36. 36. Investigation – Cache effectiveness Using Colony-Dispatcher as a cache, the performance to retrieve objects from remote area could be nice. GET average (bytes/sec) @Sapporo 350,000,000 300,000,000 250,000,000 Column K Column K 200,000,000 Column K 150,000,000 Column K 100,000,000 50,000,000 0 1K 1M 10M 100M 1G GET average (bytes/sec) @Tokyo250,000,000200,000,000 Column K150,000,000 Column K Column K100,000,000 Column K 50,000,000 0 1K 1M 10M 100M 1G Copyright © 2012 NTT DATA INTELLILINK Corporation 36
  37. 37. Conclusion• Re-ordering the nodes by regions for Proxy resolves GET/PUT performance issues – And this feature can be implemented with minimum(<50 lines of code) customization.• Using cache is a good idea for inter-cloud use Copyright © 2012 NTT DATA INTELLILINK Corporation 37
  38. 38. Our future planCopyright © 2012 NTT DATA INTELLILINK Corporation 38
  39. 39. Problems to tackle• Object’s location • Adding Region concepts to the ring structure might help. – Primary nodes isolated by region• Replication’s performance – Key factor • We aggressivelly used hinted-hand-off mechanism to – Using UDT instead of TCP for replication – Using pyinotify to I/O event driven replication – Separation of Network for replication – Hop by Hop replication Copyright © 2012 NTT DATA INTELLILINK Corporation 39
  40. 40. Are you interested in Colony ?• Please contact with me if you are interested in Colony project. – We want to collaborate with people who want to use/develop swift as a inter-cloud object store. Copyright © 2012 NTT DATA INTELLILINK Corporation 40
  41. 41. Are you interested in academic clouds?• If you are interested in the way how to integrate clouds using dodai and clony – My colleague (guan-san) will make a presentation about dodai (Cluster as a service) at 17:20 @Manchester A – Yokoyama-san (a member of NII) might talk about how to integrate both Colony and Dodai on LT Copyright © 2012 NTT DATA INTELLILINK Corporation 41
  42. 42. Thank you.Copyright © 2012 NTT DATA INTELLILINK Corporation 42
  43. 43. Q&A• Please phase your question using simple grammar if possible. Copyright © 2012 NTT DATA INTELLILINK Corporation 43
  44. 44. Copyright © 2011 NTT DATA CorporationCopyright © 2012 NTT DATA INTELLILINK Corporation

×