Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Puma: Pooling Unused Memory in Virtual Machines
for I/O intensive applications
Maxime Lorrillere, Julien Sopena, Sébastien...
Introduction Context
Problem: memory fragmentation
Host 1
PFRA
cache
Memory
Disk
Applications
Host 2
PFRA
cache
Anonymous
...
Introduction Context
Problem: memory fragmentation
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
virtio virtio
Appli...
Introduction Context
Problem: memory fragmentation
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
virtio virtio
Swap
...
Introduction Related work
Solution: Memory Ballooning [OSDI’02]
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
virtio...
Introduction Related work
Solution: Memory Ballooning [OSDI’02]
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
virtio...
Introduction Related work
Solution: Memory Ballooning [OSDI’02]
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
virtio...
Introduction Related work
Solution: Memory Ballooning [OSDI’02]
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
virtio...
Introduction Related work
Memory Ballooning – Time to recover memory
1 Make a lot of I/O on the first VM
2 Try to allocate ...
Introduction Related work
Memory Ballooning – Time to recover memory
1 Make a lot of I/O on the first VM
2 Try to allocate ...
Introduction Related work
Our contribution: a cooperative page cache
PFRA
cache
VM1 VM2
Host 1
PFRA
cache
VM3 VM4
Host 2
v...
Puma design Basics
Puma design
Local page cache eviction – put operation
PFRAPFRA
alloc()
VM1 VM2
P31
1
Metadata
31
P31
Ty...
Puma design Basics
Puma design
Local page cache eviction – put operation
PFRAPFRA
alloc()
VM1 VM2
P31
1
Metadata
31
P31
Re...
Puma design Basics
Puma design
Local page cache eviction – put operation
PFRAPFRA
alloc()
VM1 VM2
P31
1
Metadata
31
P31
Re...
Puma design Basics
Puma design
Local page cache eviction – put operation
PFRAPFRA
alloc()
VM1 VM2
P31
1
Metadata
31
Reclai...
Puma design Basics
Puma design
Local page cache eviction – put operation
PFRAPFRA
alloc()
VM1 VM2
P31
1
Metadata
31
Reclai...
Puma design Basics
Puma design
Local page cache miss – get operation
PFRAPFRA
P24
Miss
get(P24)
VM1 VM2
P24
1
Metadata
24
...
Puma design Basics
Puma design
Local page cache miss – get operation
PFRAPFRA
P24
Miss
get(P24)
VM1 VM2
P24
1
Metadata
24
...
Puma design Basics
Puma design
Local page cache miss – get operation
PFRAPFRA
P24
Miss
get(P24)
VM1 VM2
P24
1
Metadata
24
...
Puma design Basics
Puma design
Local page cache miss – get operation
PFRAPFRA
P24
Miss
get(P24)
VM1 VM2
P24
1
Metadata
24
...
Puma design Basics
Puma design
Local page cache miss – get operation
PFRAPFRA
P24
Miss
get(P24)
VM1 VM2
P24
1
Metadata
24
...
Puma design Basics
Puma design
Local page cache miss – get operation
PFRAPFRA
P24
Miss
get(P24)
VM1 VM2
P24
1
Metadata
24
...
Puma design Sequential I/O
Puma design
Filtering sequential I/O
P24
get(P24,32)
VM1 - get
1
!Hit
Metadata
PFRA
Miss
Sequen...
Puma design Sequential I/O
Puma design
Filtering sequential I/O
P24
get(P24,32)
VM1 - get
1
!Hit
Metadata
PFRA
Miss 2
Sequ...
Puma design Sequential I/O
Puma design
Filtering sequential I/O
P24
get(P24,32)
VM1 - get
1
!Hit
Metadata
PFRA
2
3
S
P24
S...
Puma design Sequential I/O
Puma design
Filtering sequential I/O
P24
get(P24,32)
VM1 - get
1
!Hit
Metadata
PFRA
2
3
S
P24
P...
Puma design Sequential I/O
Puma design
Filtering sequential I/O
P24
get(P24,32)
VM1 - get
1
!Hit
Metadata
PFRA
2
3
S
P24
P...
Puma design Details and optimisations
Implementation details and optimisations
Response time
⇒ Puma is temporarily disable...
Evaluation Evaluation Overview
Evaluation Overview
Experiment setup on KVM
Puma server: provides from 512 MB to 12 GB of c...
Evaluation Varying workload
Dynamic memory balancing
Comparison with memory ballooning
Baseline Auto-ballooning Puma
High ...
Evaluation Performance evaluation
Sequential I/O filtering
Unfiltered large sequences may severely drop the performance
Filt...
Evaluation Performance evaluation
Performance improvement on database benchmarks
I/Os are a mix of random accesses and med...
Evaluation Latency injection
Network latency management
Latency injection with Netem [LCA’05]
Speedup decreases as we inje...
Conclusion
Conclusion
Summary
⇒ Virtualization leads to a fragmentation of the available cache
⇒ Memory ballooning techniq...
Upcoming SlideShare
Loading in …5
×

Kernel Recipes 2015: Puma: Pooling Unused Memory in Virtual Machines for I/O intensive applications

1,363 views

Published on

With the advent of cloud architectures, virtualization has become a key mechanism. In clouds, virtual machines (VMs) offer both isolation and flexibility. This is the foundation of cloud elasticity, but it induces fragmentation of the physical resources, including memory. While each VM memory needs evolve during time, existing mechanisms used to dynamically adjust VMs memory are inefficient, and it is currently impossible to take benefit of the unused memory of VMs hosted by another host.

This presentation is about Puma, a remote cache mechanism that improves I/O intensive applications performance by providing the ability for a VM to entrust clean page-cache pages to other VMs having unsused memory. By reusing the existing page-cache data structures, Puma is very efficient to reclaim the memory lent to another VM. By being distributed, Puma increases the memory consolidation at the scale of a data center. In our evaluations we show that Puma can significantly boost the performance without impacting potential activity peaks on the lender.

Maxime Lorrillere is a PhD student at Pierre and Marie Curie University (UPMC) since September 2012. He works within the REGAL Inria research team at Laboratoire d’Informatique de Paris 6 (LIP6). His Ph.D. is about how to adapting cooperative caches to cloud computing and virtualized environments. He does some teaching at UPMC in operating systems and Linux kernel programming (master level).

Published in: Software
  • Login to see the comments

Kernel Recipes 2015: Puma: Pooling Unused Memory in Virtual Machines for I/O intensive applications

  1. 1. Puma: Pooling Unused Memory in Virtual Machines for I/O intensive applications Maxime Lorrillere, Julien Sopena, Sébastien Monnet and Pierre Sens contact: maxime.lorrillere@lip6.fr Kernel Recipes 2015 Maxime Lorrillere Puma Kernel Recipes 2015 1 / 15
  2. 2. Introduction Context Problem: memory fragmentation Host 1 PFRA cache Memory Disk Applications Host 2 PFRA cache Anonymous pages Page cache 10GB Ethernet Maxime Lorrillere Puma Kernel Recipes 2015 2 / 15
  3. 3. Introduction Context Problem: memory fragmentation PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtio virtio Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet Virtualization allows more flexibility and isolation Maxime Lorrillere Puma Kernel Recipes 2015 2 / 15
  4. 4. Introduction Context Problem: memory fragmentation PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtio virtio Swap Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet Virtualization allows more flexibility and isolation Problem: it fragments available memory ⇒ Sharing resources like CPU time is straightforward ⇒ Memory cannot be reassigned as efficiently as CPU time Maxime Lorrillere Puma Kernel Recipes 2015 2 / 15
  5. 5. Introduction Related work Solution: Memory Ballooning [OSDI’02] PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtio virtio Swap Balloon Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet The host asks a VM to inflate its balloon to return free memory The host asks a VM to deflate its balloon to get more memory Maxime Lorrillere Puma Kernel Recipes 2015 3 / 15
  6. 6. Introduction Related work Solution: Memory Ballooning [OSDI’02] PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtio virtio BalloonBalloon Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet The host asks a VM to inflate its balloon to return free memory The host asks a VM to deflate its balloon to get more memory Maxime Lorrillere Puma Kernel Recipes 2015 3 / 15
  7. 7. Introduction Related work Solution: Memory Ballooning [OSDI’02] PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtio virtio BalloonBalloon I/O Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet The host asks a VM to inflate its balloon to return free memory The host asks a VM to deflate its balloon to get more memory Limitations ⇒ page cache is still fragmented Maxime Lorrillere Puma Kernel Recipes 2015 3 / 15
  8. 8. Introduction Related work Solution: Memory Ballooning [OSDI’02] PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtio virtio I/O Balloon Swap Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet The host asks a VM to inflate its balloon to return free memory The host asks a VM to deflate its balloon to get more memory Limitations ⇒ page cache is still fragmented ⇒ slow to recover Maxime Lorrillere Puma Kernel Recipes 2015 3 / 15
  9. 9. Introduction Related work Memory Ballooning – Time to recover memory 1 Make a lot of I/O on the first VM 2 Try to allocate the memory (malloc) on the second VM Baseline Auto-ballooning ⇒ Memory allocations are 20× slower than the baseline Maxime Lorrillere Puma Kernel Recipes 2015 4 / 15
  10. 10. Introduction Related work Memory Ballooning – Time to recover memory 1 Make a lot of I/O on the first VM 2 Try to allocate the memory (malloc) on the second VM Baseline Auto-ballooning ⇒ Memory allocations are 20× slower than the baseline ⇒ When it does not crash! (OOM-kill) Maxime Lorrillere Puma Kernel Recipes 2015 4 / 15
  11. 11. Introduction Related work Our contribution: a cooperative page cache PFRA cache VM1 VM2 Host 1 PFRA cache VM3 VM4 Host 2 virtiovirtio virtio TCP (~100µs) Remote page cache ~10ms Puma Puma Puma TCP (~100µs) Applications Hypervisor (KVM) Hypervisor (KVM) 10GB Ethernet Puma’s approach: Relies on a fast network between VMs and physical machines Hypervisor, filesystem and block device agnostic Handles only clean cache pages ⇒ Writes a generally non-blocking ⇒ Simple consistency scheme ⇒ fast to recover memory! Maxime Lorrillere Puma Kernel Recipes 2015 5 / 15
  12. 12. Puma design Basics Puma design Local page cache eviction – put operation PFRAPFRA alloc() VM1 VM2 P31 1 Metadata 31 P31 Typically triggered by a memory allocation Puma is integrated into the PFRA to detect page cache eviction Pages are sent asynchronously to avoid slowdowns Remote pages are stored into the system page cache Maxime Lorrillere Puma Kernel Recipes 2015 6 / 15
  13. 13. Puma design Basics Puma design Local page cache eviction – put operation PFRAPFRA alloc() VM1 VM2 P31 1 Metadata 31 P31 Reclaim2 Typically triggered by a memory allocation Puma is integrated into the PFRA to detect page cache eviction Pages are sent asynchronously to avoid slowdowns Remote pages are stored into the system page cache Maxime Lorrillere Puma Kernel Recipes 2015 6 / 15
  14. 14. Puma design Basics Puma design Local page cache eviction – put operation PFRAPFRA alloc() VM1 VM2 P31 1 Metadata 31 P31 Reclaim2 put(P31) 3 Typically triggered by a memory allocation Puma is integrated into the PFRA to detect page cache eviction Pages are sent asynchronously to avoid slowdowns Remote pages are stored into the system page cache Maxime Lorrillere Puma Kernel Recipes 2015 6 / 15
  15. 15. Puma design Basics Puma design Local page cache eviction – put operation PFRAPFRA alloc() VM1 VM2 P31 1 Metadata 31 Reclaim2 put(P31) 3 P31 4 4 Typically triggered by a memory allocation Puma is integrated into the PFRA to detect page cache eviction Pages are sent asynchronously to avoid slowdowns Remote pages are stored into the system page cache Maxime Lorrillere Puma Kernel Recipes 2015 6 / 15
  16. 16. Puma design Basics Puma design Local page cache eviction – put operation PFRAPFRA alloc() VM1 VM2 P31 1 Metadata 31 Reclaim2 put(P31) 3 P31 4 4 Store page P31 5 Typically triggered by a memory allocation Puma is integrated into the PFRA to detect page cache eviction Pages are sent asynchronously to avoid slowdowns Remote pages are stored into the system page cache Maxime Lorrillere Puma Kernel Recipes 2015 6 / 15
  17. 17. Puma design Basics Puma design Local page cache miss – get operation PFRAPFRA P24 Miss get(P24) VM1 VM2 P24 1 Metadata 24 Integrated into the page cache to detect local cache misses A local cache miss leads to a (synchronous) get operation Local metadata are used to know if and where a page is in the cache Exclusive and non-inclusive caching strategies Maxime Lorrillere Puma Kernel Recipes 2015 7 / 15
  18. 18. Puma design Basics Puma design Local page cache miss – get operation PFRAPFRA P24 Miss get(P24) VM1 VM2 P24 1 Metadata 24 2 Hit? Integrated into the page cache to detect local cache misses A local cache miss leads to a (synchronous) get operation Local metadata are used to know if and where a page is in the cache Exclusive and non-inclusive caching strategies Maxime Lorrillere Puma Kernel Recipes 2015 7 / 15
  19. 19. Puma design Basics Puma design Local page cache miss – get operation PFRAPFRA P24 Miss get(P24) VM1 VM2 P24 1 Metadata 24 2 Hit? req(P24) 3 Integrated into the page cache to detect local cache misses A local cache miss leads to a (synchronous) get operation Local metadata are used to know if and where a page is in the cache Exclusive and non-inclusive caching strategies Maxime Lorrillere Puma Kernel Recipes 2015 7 / 15
  20. 20. Puma design Basics Puma design Local page cache miss – get operation PFRAPFRA P24 Miss get(P24) VM1 VM2 P24 1 Metadata 24 2 Hit? req(P24) 3 Lookup 4 Integrated into the page cache to detect local cache misses A local cache miss leads to a (synchronous) get operation Local metadata are used to know if and where a page is in the cache Exclusive and non-inclusive caching strategies Maxime Lorrillere Puma Kernel Recipes 2015 7 / 15
  21. 21. Puma design Basics Puma design Local page cache miss – get operation PFRAPFRA P24 Miss get(P24) VM1 VM2 P24 1 Metadata 24 2 Hit? req(P24) 3 Lookup 4 P24 P24 5 Integrated into the page cache to detect local cache misses A local cache miss leads to a (synchronous) get operation Local metadata are used to know if and where a page is in the cache Exclusive and non-inclusive caching strategies Maxime Lorrillere Puma Kernel Recipes 2015 7 / 15
  22. 22. Puma design Basics Puma design Local page cache miss – get operation PFRAPFRA P24 Miss get(P24) VM1 VM2 P24 1 Metadata 24 2 Hit? req(P24) 3 Lookup 4 P24 P24 5 Integrated into the page cache to detect local cache misses A local cache miss leads to a (synchronous) get operation Local metadata are used to know if and where a page is in the cache Exclusive and non-inclusive caching strategies Maxime Lorrillere Puma Kernel Recipes 2015 7 / 15
  23. 23. Puma design Sequential I/O Puma design Filtering sequential I/O P24 get(P24,32) VM1 - get 1 !Hit Metadata PFRA Miss Sequential reads are detected through the read-ahead algorithm Maxime Lorrillere Puma Kernel Recipes 2015 8 / 15
  24. 24. Puma design Sequential I/O Puma design Filtering sequential I/O P24 get(P24,32) VM1 - get 1 !Hit Metadata PFRA Miss 2 Sequential reads are detected through the read-ahead algorithm Maxime Lorrillere Puma Kernel Recipes 2015 8 / 15
  25. 25. Puma design Sequential I/O Puma design Filtering sequential I/O P24 get(P24,32) VM1 - get 1 !Hit Metadata PFRA 2 3 S P24 Sequential reads are detected through the read-ahead algorithm “Sequential pages” are tagged into the metadata Maxime Lorrillere Puma Kernel Recipes 2015 8 / 15
  26. 26. Puma design Sequential I/O Puma design Filtering sequential I/O P24 get(P24,32) VM1 - get 1 !Hit Metadata PFRA 2 3 S P24 PFRA alloc() VM1 - put 1 Metadata S P24 P24 Reclaim2 put(P24) 3 Sequential reads are detected through the read-ahead algorithm “Sequential pages” are tagged into the metadata When evicted, sequential pages are simply discarded Maxime Lorrillere Puma Kernel Recipes 2015 8 / 15
  27. 27. Puma design Sequential I/O Puma design Filtering sequential I/O P24 get(P24,32) VM1 - get 1 !Hit Metadata PFRA 2 3 S P24 PFRA alloc() VM1 - put 1 Metadata S P24 Reclaim2 put(P24) 3 4 Sequential reads are detected through the read-ahead algorithm “Sequential pages” are tagged into the metadata When evicted, sequential pages are simply discarded Maxime Lorrillere Puma Kernel Recipes 2015 8 / 15
  28. 28. Puma design Details and optimisations Implementation details and optimisations Response time ⇒ Puma is temporarily disabled if the response time becomes too high Memory footprint ⇒ Metadata: amortized 64 bits/page, 2 MB of metadata per GB of cache Memory recovery ⇒ Remote cache pages are discarded when reclaimed Memory management: avoiding deadlocks Atomic memory allocations Use of pre-allocated memory pools PFRA alloc() P31 1 Metadata 31 alloc() P31 Reclaim2 put(P31) 3 P31 4 4 Consistency Dirty pages are written to disk before being sent to the cache Maxime Lorrillere Puma Kernel Recipes 2015 9 / 15
  29. 29. Evaluation Evaluation Overview Evaluation Overview Experiment setup on KVM Puma server: provides from 512 MB to 12 GB of cache Puma client: 1 GB Baseline: a single VM without additional cache Hosts: Intel Xeon E5-2660v2, 5 × 600GB SAS in RAID-0 Benchmarks: Filebench, BLAST, TPC-C, TPC-H, Postmark Experiments 1 Varying workload on server side 2 Co-localised VMs with a paravirtualised network (virtio) 3 Latency injection Maxime Lorrillere Puma Kernel Recipes 2015 10 / 15
  30. 30. Evaluation Varying workload Dynamic memory balancing Comparison with memory ballooning Baseline Auto-ballooning Puma High latencies to reclaim memory with memory ballooning (avg: 20ms) Puma allows to reclaim memory at a small cost (avg: 1.8ms) Maxime Lorrillere Puma Kernel Recipes 2015 11 / 15
  31. 31. Evaluation Performance evaluation Sequential I/O filtering Unfiltered large sequences may severely drop the performance Filtering sequential I/O allows us to focus on random accesses Maxime Lorrillere Puma Kernel Recipes 2015 12 / 15
  32. 32. Evaluation Performance evaluation Performance improvement on database benchmarks I/Os are a mix of random accesses and medium sized sequences ⇒ Concurrent accesses: sequential accesses are interleaved → slow ⇒ Non-inclusive strategy: pages are kept in cache even if accessed sequentially Maxime Lorrillere Puma Kernel Recipes 2015 13 / 15
  33. 33. Evaluation Latency injection Network latency management Latency injection with Netem [LCA’05] Speedup decreases as we inject network latency between nodes When the response time is too high, Puma disables itself to avoid a performance drop Maxime Lorrillere Puma Kernel Recipes 2015 14 / 15
  34. 34. Conclusion Conclusion Summary ⇒ Virtualization leads to a fragmentation of the available cache ⇒ Memory ballooning techniques are not able to manage VM’s page cache distribution Puma: Pooling Unused memory in virtual MAchines ⇒ It is based on an efficient kernel-level remote caching mechanism ⇒ It handles clean cache pages to quickly recover the memory ⇒ It works with co-localised VMs and remote VMs Maxime Lorrillere Puma Kernel Recipes 2015 15 / 15

×