Your SlideShare is downloading. ×
0
Towards Massive Server 
Consolidation 
Filipe Manco, João Martins, Felipe Huici 
{filipe.manco,joao.martins,felipe.huici}@...
Agenda 
1. Use Cases and Goals 
2. Baseline Measurements 
3. Hard Limitations 
4. Performance Limitations 
5. Conclusions ...
Use Cases and Goals 
3 Towards Massive Server Consolidation 18 August 2014
The Super Fluid Cloud 
● Target: remove barriers in current cloud 
deployments 
– Extremely flexible infrastructure 
– Mil...
Recent trend: specialized guests 
● ClickOS, OSv, Mirage, Erlang on Xen, etc 
– Small memory footprints 
– Relatively fast...
Wouldn't it be Nice if... 
● Thousands of guests on a single server 
– Short-term target: 10K 
– Medium-term target: 100K ...
Baseline Measurements 
7 Towards Massive Server Consolidation 18 August 2014
Experiment Setup 
● Freshly installed Xen/Debian system 
– Xen 4.2 
– Linux 3.6.10 
– Debian squeeze 
● Commodity server 
...
Baseline Test 
Boot as many guests as possible before system breaks 
● Using ClickOS guests 
– 8 MB of RAM 
– 1 VIF 
● Gue...
Didn't Work Quite Well... 
● Stopped test after 4K guests 
– Took ~ 5 days 
– Up to ~ 100 seconds for creation of last gue...
Domain Creation Time 
11 Towards Massive Server Consolidation 18 August 2014
Domain Creation Time 
12 Towards Massive Server Consolidation 18 August 2014 
92 s
Two Types of Problems 
● Hard limitations 
– Prevent guests from booting correctly 
– Only ~300 guests fully usable 
● Per...
Hard Limitations 
14 Towards Massive Server Consolidation 18 August 2014
Issues 
● Cannot access guests' console 
– Only first ~300 guests have accessible console 
● Guests' VIF is not created 
–...
Number of File Descriptors 
● xenconsoled opens 3 FD per guest 
– /dev/xenbus; /dev/ptmx; /dev/pts/<id>; 
● Fix 
– Linux c...
Number of PTYs 
● xenconsoled opens 1 PTY per guest 
● Fix 
– Linux can easily handle > 100K PTY 
– Tune kernel.tty.max 
●...
Number of Event Channels 
● 3 Interdomain evtchn per guest 
– xenstore; console; VIF 
– 64bit Dom0: NR_EVTCHNS == 4096 
– ...
Number of IRQs 
● Linux runs out of IRQs to map evtchn 
– Limited by NR_CPUS 
● Fix 
– Build with: MAXSMP=y; NR_CPUS=4096 ...
vSwitch Ports 
● Currently back-end switch supports up to few 
thousand ports 
– Linux bridge: 1K 
– Open vSwitch: 64K 
● ...
Summarizing 
● Xen 4.4; Linux 3.14 
● fs.file-max; nofile ulimit 
● kernel.tty.max 
● MAXSMP=y; NR_CPUS=4096 
● Not yet fi...
Performance Limitations 
22 Towards Massive Server Consolidation 18 August 2014
Issues 
● Overall system becomes too slow 
– oxenstored 
● CPU fully utilized after a few dozen guests 
– xenconsoled 
● C...
“Blind” optimizations 
● 4 Core Dom0 
– 1 core for oxenstored 
– 1 core for xenconsoled 
– 2 cores for remaining processes...
Tools' Optimizations 
● xl toolstack 
– Disable xl background process (xl create -e) 
– Disable memory ballooning on Dom0 ...
Creation Times with Optimizations 
26 Towards Massive Server Consolidation 18 August 2014
Creation Times with Optimizations 
27 Towards Massive Server Consolidation 18 August 2014 
2.3 s
How better is it? 
28 Towards Massive Server Consolidation 18 August 2014
With Optimizations 
● Improvement: system is still usable after 10K 
guests 
– Although domain creation time is far from i...
xenconsoled 
● Two major optimizations 
– Move from poll to epoll 
– On INTRODUCE_DOMAIN, search from last domid 
● Avoid ...
What Bottlenecks Remain? 
31 Towards Massive Server Consolidation 18 August 2014
Domain Creation Breakdown 
32 Towards Massive Server Consolidation 18 August 2014
Let's Look at the Toolstack Again 
● The domain creation process is too complex for 
our specialized VMs 
– Also makes the...
xcl: XenCtrl Light 
● A very simplified toolstack 
● Small abstraction on top of libxc (~600 LOC) 
– Optimized for our use...
xl vs xcl 
35 Towards Massive Server Consolidation 18 August 2014
xl vs xcl 
36 Towards Massive Server Consolidation 18 August 2014 
2.3 s 
0.1 s
With xcl 
● Much better 
● But reducing the number of Xenstore entries is only a palliative 
– Eventually the issue will c...
lixs: LIghtweight XenStore 
● Work in progress (< 2 weeks) 
● Written from scratch but compatible with the 
Xenstore proto...
lixs vs oxenstored 
Cumulative time: ~ 11 min 
Cumulative time: ~ 8 min 
39 Towards Massive Server Consolidation 18 August...
Breakdown with lixs 
40 Towards Massive Server Consolidation 18 August 2014
lixs: Future Work 
● Optimize protocol 
– Make Xenstore more specialized 
– Avoid all possible listing operations 
● Optim...
Conclusions 
42 Towards Massive Server Consolidation 18 August 2014
Where are we? 
● Usable system running 10K guests 
● 10K guests actually working 
– Although idle most of the time 
● Lowe...
Will it work? Can we reach 100K? 
● There are no fundamental issues with Xen 
– But we only tested it up to 10K guests 
● ...
Future work 
● Improve lixs and Xenstore protocol 
● Multi thousand-port vSwitch 
● Have guests doing useful work 
● Sched...
Our Open Source Corner 
https://cnp.neclab.eu 
46 Towards Massive Server Consolidation 18 August 2014
Questions? 
47 Towards Massive Server Consolidation 18 August 2014
Upcoming SlideShare
Loading in...5
×

XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC

1,329

Published on

In recent years Xen has seen the development of many minimalistic or specialized virtual machines (e.g., OSv, Mirage, ClickOS, Erlang on Xen, etc.). Thanks in part to a small CPU and memory footprints, these VMs allow for running thousands or more on a single, inexpensive commodity server. Doing so could save cloud and network operators vast amounts of money.
Attempts to do so are already underway and have discovered important bottlenecks in Xen. While some of these have already been addressed by the community (e.g., limited number of event channels or memory grants) others still remain. In this talk we describe our experience when trying to run up to 10,000 MiniOS-based VMs, including bottlenecks in the XenStore, toolchain and network pipe. We further report on prototypical solutions, and on our implementation of suspend/resume for MiniOS that allows us tens of milliseconds migrations.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,329
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
22
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC"

  1. 1. Towards Massive Server Consolidation Filipe Manco, João Martins, Felipe Huici {filipe.manco,joao.martins,felipe.huici}@neclab.eu NEC Europe Ltd. Xen Developer Summit 2014
  2. 2. Agenda 1. Use Cases and Goals 2. Baseline Measurements 3. Hard Limitations 4. Performance Limitations 5. Conclusions 6. Q&A 2 Towards Massive Server Consolidation 18 August 2014
  3. 3. Use Cases and Goals 3 Towards Massive Server Consolidation 18 August 2014
  4. 4. The Super Fluid Cloud ● Target: remove barriers in current cloud deployments – Extremely flexible infrastructure – Milliseconds instantiation and migration of resources – Thousands of concurrent units running ● This would allow new use cases – On the fly deployment of middleboxes – Flash crowds – Energy consumption reduction – Your use case here... 4 Towards Massive Server Consolidation 18 August 2014
  5. 5. Recent trend: specialized guests ● ClickOS, OSv, Mirage, Erlang on Xen, etc – Small memory footprints – Relatively fast boot times – Provide the basic functionality to make use cases a reality ● Our work focuses on ClickOS – Targets network processing using the Click modular router software 5 Towards Massive Server Consolidation 18 August 2014
  6. 6. Wouldn't it be Nice if... ● Thousands of guests on a single server – Short-term target: 10K – Medium-term target: 100K ● Extremely fast domain creation, destruction and migration – Tens of milliseconds – Constant as number of guests increases 6 Towards Massive Server Consolidation 18 August 2014
  7. 7. Baseline Measurements 7 Towards Massive Server Consolidation 18 August 2014
  8. 8. Experiment Setup ● Freshly installed Xen/Debian system – Xen 4.2 – Linux 3.6.10 – Debian squeeze ● Commodity server – 64 Cores @ 2.1GHz [4 x AMD Opteron 6376] – 128GB RAM DDR3 @ 1333MHz 8 Towards Massive Server Consolidation 18 August 2014
  9. 9. Baseline Test Boot as many guests as possible before system breaks ● Using ClickOS guests – 8 MB of RAM – 1 VIF ● Guests are mostly idle – Running arp responder configuration – Only arping guests to check they're working 9 Towards Massive Server Consolidation 18 August 2014
  10. 10. Didn't Work Quite Well... ● Stopped test after 4K guests – Took ~ 5 days – Up to ~ 100 seconds for creation of last guest (normally ClickOS boots in ~30 milliseconds) ● All the domains were running, but: – Only first ~300 guests fully functional ● System got extremely slow – Dom0 unusable 10 Towards Massive Server Consolidation 18 August 2014
  11. 11. Domain Creation Time 11 Towards Massive Server Consolidation 18 August 2014
  12. 12. Domain Creation Time 12 Towards Massive Server Consolidation 18 August 2014 92 s
  13. 13. Two Types of Problems ● Hard limitations – Prevent guests from booting correctly – Only ~300 guests fully usable ● Performance limitations – Decreasing system performance – System unusable after just a few hundred guests 13 Towards Massive Server Consolidation 18 August 2014
  14. 14. Hard Limitations 14 Towards Massive Server Consolidation 18 August 2014
  15. 15. Issues ● Cannot access guests' console – Only first ~300 guests have accessible console ● Guests' VIF is not created – Only first ~1300 guests have usable VIF ● Guests cannot access the Xenstore – Only first ~1300 guests have access to it ● The back-end switch doesn't provide enough ports – Only 1024 available 15 Towards Massive Server Consolidation 18 August 2014
  16. 16. Number of File Descriptors ● xenconsoled opens 3 FD per guest – /dev/xenbus; /dev/ptmx; /dev/pts/<id>; ● Fix – Linux can easily handle > 300K FD – Tune fs.file-max; nofile ulimit; 16 Towards Massive Server Consolidation 18 August 2014
  17. 17. Number of PTYs ● xenconsoled opens 1 PTY per guest ● Fix – Linux can easily handle > 100K PTY – Tune kernel.tty.max ● Future – Only create PTY when user connects to console – This also reduces number of FD to 1 per guest 17 Towards Massive Server Consolidation 18 August 2014
  18. 18. Number of Event Channels ● 3 Interdomain evtchn per guest – xenstore; console; VIF – 64bit Dom0: NR_EVTCHNS == 4096 – Dom0 runs out after ~1300 guests ● Fix – Upgrade to Xen 4.4 + Linux 3.14: ● NR_EVTCHNS == 128K – Split services into stub domains 18 Towards Massive Server Consolidation 18 August 2014
  19. 19. Number of IRQs ● Linux runs out of IRQs to map evtchn – Limited by NR_CPUS ● Fix – Build with: MAXSMP=y; NR_CPUS=4096 – NR_IRQS == 256K 19 Towards Massive Server Consolidation 18 August 2014
  20. 20. vSwitch Ports ● Currently back-end switch supports up to few thousand ports – Linux bridge: 1K – Open vSwitch: 64K ● Workaround – Create multiple bridges ● Longer-term fix – Develop a purpose-built back-end switch 20 Towards Massive Server Consolidation 18 August 2014
  21. 21. Summarizing ● Xen 4.4; Linux 3.14 ● fs.file-max; nofile ulimit ● kernel.tty.max ● MAXSMP=y; NR_CPUS=4096 ● Not yet fixed: – Back-end switch ports 21 Towards Massive Server Consolidation 18 August 2014
  22. 22. Performance Limitations 22 Towards Massive Server Consolidation 18 August 2014
  23. 23. Issues ● Overall system becomes too slow – oxenstored ● CPU fully utilized after a few dozen guests – xenconsoled ● CPU bound after ~ 2K guests ● Domain creation takes too long – Affects migration too 23 Towards Massive Server Consolidation 18 August 2014
  24. 24. “Blind” optimizations ● 4 Core Dom0 – 1 core for oxenstored – 1 core for xenconsoled – 2 cores for remaining processes ● Pin all vCPUs to pCPUs ● Round robin remaining 60 cores for guests ● Put everything in a ramfs 24 Towards Massive Server Consolidation 18 August 2014
  25. 25. Tools' Optimizations ● xl toolstack – Disable xl background process (xl create -e) – Disable memory ballooning on Dom0 – Never use domain name ● This causes xl to retrieve all guest names from the Xenstore – Use specialized VIF hotplug script – Don't retrieve domain list on creation [PATCH] ● oxenstored – Use more recent version of Xenstore from: ● https://github.com/mirage/ocaml-xenstore 25 Towards Massive Server Consolidation 18 August 2014
  26. 26. Creation Times with Optimizations 26 Towards Massive Server Consolidation 18 August 2014
  27. 27. Creation Times with Optimizations 27 Towards Massive Server Consolidation 18 August 2014 2.3 s
  28. 28. How better is it? 28 Towards Massive Server Consolidation 18 August 2014
  29. 29. With Optimizations ● Improvement: system is still usable after 10K guests – Although domain creation time is far from ideal ● However... – xenstored still CPU heavy – xenconsoled still CPU heavy 29 Towards Massive Server Consolidation 18 August 2014
  30. 30. xenconsoled ● Two major optimizations – Move from poll to epoll – On INTRODUCE_DOMAIN, search from last domid ● Avoid listing all existing domains ● CPU usage down to ~ 10% max. 30 Towards Massive Server Consolidation 18 August 2014
  31. 31. What Bottlenecks Remain? 31 Towards Massive Server Consolidation 18 August 2014
  32. 32. Domain Creation Breakdown 32 Towards Massive Server Consolidation 18 August 2014
  33. 33. Let's Look at the Toolstack Again ● The domain creation process is too complex for our specialized VMs – Also makes the profiling really difficult and inaccurate – A lot of unnecessary Xenstore entries ● Some checks take a lot of time – Mainly checking for duplicate names 33 Towards Massive Server Consolidation 18 August 2014
  34. 34. xcl: XenCtrl Light ● A very simplified toolstack ● Small abstraction on top of libxc (~600 LOC) – Optimized for our use case ● Only boots PV and PVH domains ● Only supports VIFs – Reduced Xenstore usage ● From 37 to 17 entries per guest ● Less Xenstore operations – Doesn't check domain name 34 Towards Massive Server Consolidation 18 August 2014
  35. 35. xl vs xcl 35 Towards Massive Server Consolidation 18 August 2014
  36. 36. xl vs xcl 36 Towards Massive Server Consolidation 18 August 2014 2.3 s 0.1 s
  37. 37. With xcl ● Much better ● But reducing the number of Xenstore entries is only a palliative – Eventually the issue will come back as we increase the number of guests ● Xenstore remains a major bottleneck 37 Towards Massive Server Consolidation 18 August 2014
  38. 38. lixs: LIghtweight XenStore ● Work in progress (< 2 weeks) ● Written from scratch but compatible with the Xenstore protocol ● Currently ~1800 LOC ● C++ 38 Towards Massive Server Consolidation 18 August 2014
  39. 39. lixs vs oxenstored Cumulative time: ~ 11 min Cumulative time: ~ 8 min 39 Towards Massive Server Consolidation 18 August 2014
  40. 40. Breakdown with lixs 40 Towards Massive Server Consolidation 18 August 2014
  41. 41. lixs: Future Work ● Optimize protocol – Make Xenstore more specialized – Avoid all possible listing operations ● Optimize implementation – Remove unix sockets – Generic storage backend ● std::map; noSQL DB; <your backend here>; ● 10K guests with std::map took 10m 3s ● 10K guests with boost::unordered_map took 7m 54s 41 Towards Massive Server Consolidation 18 August 2014
  42. 42. Conclusions 42 Towards Massive Server Consolidation 18 August 2014
  43. 43. Where are we? ● Usable system running 10K guests ● 10K guests actually working – Although idle most of the time ● Lower domain creation times – First domain: < 10ms – With 10K domains: < 100ms 43 Towards Massive Server Consolidation 18 August 2014
  44. 44. Will it work? Can we reach 100K? ● There are no fundamental issues with Xen – But we only tested it up to 10K guests ● Xenstore protocol needs work – Make Xenstore more specialized – With 10K+ guests we need to avoid listings 44 Towards Massive Server Consolidation 18 August 2014
  45. 45. Future work ● Improve lixs and Xenstore protocol ● Multi thousand-port vSwitch ● Have guests doing useful work ● Scheduling – Number of guests much bigger than number of cores – With that many guests we'll have scheduling issues ● Reducing Memory Usage – Smaller image sizes – Share memory between guests booting same image 45 Towards Massive Server Consolidation 18 August 2014
  46. 46. Our Open Source Corner https://cnp.neclab.eu 46 Towards Massive Server Consolidation 18 August 2014
  47. 47. Questions? 47 Towards Massive Server Consolidation 18 August 2014
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×