Is It Time to Go Global with Cloud Performance Management?

373 views
331 views

Published on

The topic of Federated Clouds has been in discussion for several years. However, practice today sees very little federation across large infrastructure providers. One of the biggest causes of this loitering is insufficient understanding of how to share responsibility across data centers, providers, and so on. This study shows that understanding cloud performance at such a large scale is a crucial part of information support in federated clouds. Topics like cloud performance measurement and modeling, as well as several practical ongoing projects and works in progress are also discussed.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
373
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Is It Time to Go Global with Cloud Performance Management?

  1. 1. . Mission Statement 1. federated clouds = diversification 2. many DCs and/or cloud providers 3. we care mostly about performance 4. practical solutions are needed Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 2/30 ... 2/30
  2. 2. . Example: BizStore Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 3/30 ... 3/30
  3. 3. . BizStore: One DC is Not Enough • remember June 2013? • most services today use vertical intergration -- no diversity • Hitachi does not share DCs with NEC • regional diversity of one provider is bad ◦ how many Amazon DCs in Japan? . (the only possible) Solution .. . ... is to sign contracts with multiple DCs and manage on client side ◦ to be officially presented/released in April 01 01 myself+0 "High Availability Cloud Storage ... Social Graph ... Smart Distribution" NS研 (April 2014) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 4/30 ... 4/30
  4. 4. . BizStore: One DC is Not Enough Kansai DC1 OkinawaLocations Data Centers DC2 Kyushu Osaka Office DC1 DC1 DC2 Naha Office Network distance Network distance storage network Employee A …. Content / Social Metadata High Availability Data Store DC1 DC2 …. DC1 DC2 Business trip Store APIs Proposed Software Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 5/30 ... 5/30
  5. 5. . BizStore: Store Diversification • in software: not a priority list -- optimization engine! • realtime performance monitoring, read/write optimization, etc. • sub-file data unit -- chunks SSD Growing network distance User HDD DC1 DC2 … Network Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 6/30 ... 6/30
  6. 6. . BizStore: Socially Aware Store • content relevance based on social graph • relevance is a distribution • individual redundancy based on distribution • other link types: same time, location, filetype, ... • link strengh != 1 Descending order Relevance Distribution Redundancy (user setting) Physical limit of redundancy End of content There is a link When a file is … Between Created Viewed Edited Deleted Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 7/30 ... 7/30
  7. 7. . Example: Cloud Streaming Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 8/30 ... 8/30
  8. 8. . Cloud Streaming: Fixing Problems Traditional streaming P2P streaming Cloud streaming Adaptive streaming • Congestion (Flash Crowds) • Unreliable throughput • Unreliable sources • Unreliable throughput • Congestion Fixed Fixed Fixed Fixed Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 9/30 ... 9/30
  9. 9. . Cloud Streaming: Design VM population Current Sources Service Provider (SP) Tracker Service Provider (SP) Parent peers P2P streaming Cloud streaming ClientClient 02 myself+0 "Multi-Source Stream Aggregation in the Cloud" Wiley Book on ACDN, Chapter 10 (2014) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 10/30 ... 10/30
  10. 10. . Practical Solutions for Federated Clouds Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 11/30 ... 11/30
  11. 11. . A Shortlist of (S)olutions 1. S1: Nextgen traffic processors at DCs 2. S2: QoS Context and Performance Visualization at DCs 3. S3: Performance Modeling for Federated Clouds 4. S4: Client Side Traffic Boostings 5. .... definitely not a complete list Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 12/30 ... 12/30
  12. 12. . Solution (S) 1: Nextgen Traffic Processors at DCs (work in progress) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 13/30 ... 13/30
  13. 13. . S1: Multicore Packet Capture Global Networks Data Center Internals Gateway Switch Capture Manager CPU CPU CPU CPU CPU CPU … Storage Mirror • multicore is the key • multicore != traditional parallel processing 03 • on-demand capture, DPI, heterogeneous tasks 04 03 myself+0 "...Multicore Capture in Data Center Forensics" ACM AISACCS-SFCS (June 2014) 04 myself+0 "A Lock-Free Shared Memory Design for ... Multicore Packet Traffic Capture" IJNM (in print) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 14/30 ... 14/30
  14. 14. . S1: Multicore Hates Memory Locks • lockfree design 04 : no messages, no memory locks PF_ RING PF_ RING Time Manager Shared Memory Capture Capture … Core 1 Core 2 Core 3 …. Core X Manager PF_ RING Shared memory Onethread Create Fork Lifespan Stale check Process/wrap Wrap wait Double-LinkedList(DLL) Assign 04 myself+0 "A Lock-Free Shared Memory Design for ... Multicore Packet Traffic Capture" IJNM (in print) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 15/30 ... 15/30
  15. 15. . Solution (S) 2: DC Performance APIs Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 16/30 ... 16/30
  16. 16. . S2: E2E QoS, M2M Patterns Meter Merger Per flow statistics Analyzer History, state Profiler UDP Users Clients Probe Analysis machine Web application • clean slate: capture QoS context 05 • visualize user communities • export via APIs to users and/or service providers 05 myself+0 "A holistic community-based architecture for measuring E2E QoS at data centres" IJCSE (in print) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 17/30 ... 17/30
  17. 17. . S2: Has to be a Clean Slate Probe Router Data center infrastructure source IP timestamp Key Key DLL 0 #01 #02 #03 1 #04 2 #05 #06 …. …. 2^24 #07 source port dest IP dest port protocol packet size CRC24 Packet Hash table #01 DLL #05 #04 #02 #03 #07 Export over UDP Byte 0 4 8 12 16 20 24 … 0 (bits) 32 Source port Dest port Source IP Destination IP * psize pspace Start time (s) Start time (us) * psize pspace 1 11 Data unit psize: Packet size pspace: Packet space (us) #06 Export via a file UDP RX Buffer (5s) Byte 0 4 8 12 16 20 24 … 0 (bits) 32 Source port Dest port Source IP Destination IP D psize pspace Start time (s) Start time (us) D psize pspace 1 11 Data unit D: Direction (0 or 1) Merger Find flow from opposite direction Analyzer History State Read and update Ring buffer of data units per IP on internal networks Statistic Meaning MinOWD Global minimum OWD MaxBatch Max byte count of a packet burst Bulks Throughputs in flows Per source- dest pair • has to be a clean slate! • cisco, ntop, sflow are not feasible • QoS context is something new • (figure is vector, so, zoom in!) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 18/30 ... 18/30
  18. 18. . S2: But Payoff is Great! 0 6400 12800 19200 Batch size (bytes) 0 800 1600 2400 3200 4000 OWD(ms)+TXtime(x0.1ms) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 19/30 ... 19/30
  19. 19. . Solution (S) 3: Cloud Weather System (work in progress) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 20/30 ... 20/30
  20. 20. . S3: Cloud Weather System (high/low) Pressure front Typhoon Drought Good weather Bad weather • continents: user, services 07 • water: network • weather, clouds, etc.: changes in performance • droughts: insufficiency of infrastructure, users do not get enough capacity • typhoons: basically, Flash Crowds in services, going viral, ... • forecasting: possible with enough performance monitoring, similar to stock market 07 myself+0 "Cloud Weather System as a Futuristic Performance Model" IEICE総合大会 (March 2013) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 21/30 ... 21/30
  21. 21. . Solution (S) 4: Mobile Throughput Boosters Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 22/30 ... 22/30
  22. 22. . S4: Mobile Throughput Booster • so far, only possible in wireless -- WiFi Direct Single Connection Multipath Singular Connectivity Traditional Applications Traditional Multipath Multiple Connectivity No known cases (wasted potential) Group Communication 3G/LTE/* + WiFi Direct THIS PROPOSAL Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 23/30 ... 23/30
  23. 23. . S4: Group Resource Pooling Remote connectivity Local Connectivity Content Provider Main Client Delegated Client Delegated Client 3G/LTE/* Access 3G/LTE/* Access Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 24/30 ... 24/30
  24. 24. . S4: Converged Wireless Campus Student Develop, make secure APP + CODE Campus Another Student APP + CODE APP + CODE 1 2 2 Distribute 3 Meet and delegate API Tokens API Tokens Distribute Pass at delegation University 4 Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 25/30 ... 25/30
  25. 25. . Solution (S) 5: Over-the-Network Indexing Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 26/30 ... 26/30
  26. 26. . S5: Indexing in Clouds Data Indexer Index Network Traditional Client Data Indexer IndexRead, Write Stringex Client The Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 27/30 ... 27/30
  27. 27. . S5: Over-the-Network Optimization • in short: throughput-centric network storage optimization 08 Stringex Index Stringex Client The Sync Engine Optimization Local Cache Check 1 2 Use 08 myself+0 "A New Practical Design for Browsable Over-the-Network Indexing" ISEEE (April 2014) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 28/30 ... 28/30
  28. 28. . S5: Performance 3.15 3.85 4.55 5.25 5.95 6.65 Index Size (log) 2.55 2.65 2.75 2.85 2.95 3.05 3.15 3.25 Throughput(logofbytes/doc) Lucene Stringex Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 29/30 ... 29/30
  29. 29. . That’s all, thank you ... Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 30/30 ... 30/30
  30. 30. . [01] myself+0 (April 2014) High Availability Cloud Storage ... Social Graph ... Smart Distribution NS研 [02] myself+0 (2014) Multi-Source Stream Aggregation in the Cloud Wiley Book on ACDN, Chapter 10 [03] myself+0 (June 2014) ...Multicore Capture in Data Center Forensics ACM AISACCS-SFCS [04] myself+0 (in print) A Lock-Free Shared Memory Design for ... Multicore Packet Traffic Capture IJNM [05] myself+0 (in print) A holistic community-based architecture for measuring E2E QoS at data centres IJCSE [06] myself+0 (May 2014) Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 30/30 ... 30/30
  31. 31. . Towards a Practical Method for Interactive Traffic Visualizations in Data Centers SC研 [07] myself+0 (March 2013) Cloud Weather System as a Futuristic Performance Model IEICE総合大会 [08] myself+0 (April 2014) A New Practical Design for Browsable Over-the-Network Indexing ISEEE Marat Zhanikeev -- maratishe@gmail.com Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 30/30 ... 30/30

×