• Like
Reconsidering Custom Memory Allocation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Reconsidering Custom Memory Allocation

  • 2,973 views
Published

This talk presents an extensive experimental study that shows that a good general-purpose allocator is better than almost all commonly-used custom allocators, with one exception: regions (a.k.a., …

This talk presents an extensive experimental study that shows that a good general-purpose allocator is better than almost all commonly-used custom allocators, with one exception: regions (a.k.a., pools, arenas, zones). However, it shows that regions consume much more memory than necessary. The talk then introduces reaps (regions + heaps), which combine the flexibility and space efficiency of heaps with the performance of regions.

Published in Business , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,973
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
64
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
  • 2. Custom Memory Allocation Programmers replace Very common practice   new/delete, bypassing Apache, gcc, lcc, STL,  system allocator database servers… Language-level Reduce runtime – often   support in C++ Expand functionality – sometimes  Widely recommended Reduce space – rarely   “Use custom allocators” UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 2
  • 3. Drawbacks of Custom Allocators Avoiding system allocator:  More code to maintain & debug  Can’t use memory debuggers  Not modular or robust:  Mix memory from custom  and general-purpose allocators → crash! Increased burden on programmers  Are custom allocators really a win? UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 3
  • 4. Overview Introduction  Perceived benefits and drawbacks  Three main kinds of custom allocators  Comparison with general-purpose allocators  Advantages and drawbacks of regions  Reaps – generalization of regions & heaps  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 4
  • 5. (1) Per-Class Allocators Recycle freed objects from a free list  a = new Class1; Class1 Fast free list b = new Class1; + c = new Class1; Linked list operations + a delete a; Simple + delete b; Identical semantics b + delete c; C++ language support + a = new Class1; c Possibly space-inefficient b = new Class1; - c = new Class1; UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 5
  • 6. (II) Custom Patterns Tailor-made to fit allocation patterns  Example: 197.parser (natural language parser)  db a c char[MEMORY_LIMIT] end_of_array end_of_array end_of_array end_of_array end_of_array a = xalloc(8); Fast + b = xalloc(16); Pointer-bumping allocation + c = xalloc(8); - Brittle xfree(b); - Fixed memory size xfree(c); - Requires stack-like lifetimes d = xalloc(8); UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 6
  • 7. (III) Regions Separate areas, deletion only en masse  regioncreate(r) r regionmalloc(r, sz) regiondelete(r) - Risky Fast + - Dangling Pointer-bumping allocation + references Deletion of chunks + - Too much space Convenient + One call frees all memory + Increasingly popular custom allocator  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 7
  • 8. Overview Introduction  Perceived benefits and drawbacks  Three main kinds of custom allocators  Comparison with general-purpose allocators  Advantages and drawbacks of regions  Reaps – generalization of regions & heaps  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 8
  • 9. Custom Allocators Are Faster… Runtime - Custom Allocator Benchmarks Custom Win32 1.75 Normalized Runtime non-regions regions 1.5 1.25 1 0.75 0.5 0.25 0 r he er lle ze m c c vp gc lc rs si ud ac ee 5. 6. d- pa m 17 ap br 17 xe 7. c- bo 19 As good as and sometimes much faster than Win32  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 9
  • 10. Not So Fast… Runtime - Custom Allocator Benchmarks Custom Win32 DLmalloc 1.75 non-regions regions Normalized Runtime 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r m c vp e lc z gc si ud rs ee ac 5. d- 6. pa m br 17 ap 17 xe 7. c- bo 19 DLmalloc: as fast or faster for most benchmarks  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 10
  • 11. The Lea Allocator (DLmalloc 2.7.0) Mature public-domain general-purpose  allocator Optimized for common allocation patterns  Per-size quicklists ≈ per-class allocation  Deferred coalescing  (combining adjacent free objects) Highly-optimized fastpath  Space-efficient  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 11
  • 12. Space Consumption: Mixed Results Space - Custom Allocator Benchmarks Custom DLmalloc 1.75 non-regions regions Normalized Space 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r sim c vp e lc z gc ud rs ee ac 5. d- 6. pa m br 17 ap 17 xe 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 12
  • 13. Overview Introduction  Perceived benefits and drawbacks  Three main kinds of custom allocators  Comparison with general-purpose allocators  Advantages and drawbacks of regions  Reaps – generalization of regions & heaps  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 13
  • 14. Regions – Pros and Cons Fast, convenient, etc. + Avoid resource leaks (e.g., Apache) + Tear down memory for terminated connections  No individual object deletion - Unbounded memory consumption  (producer-consumer, long-running computations, off-the-shelf programs)  Apache: vulnerable to DoS, memory leaks UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 14
  • 15. Reap Hybrid Allocator Reap = region + heap  Adds individual object deletion & heap  reapcreate(r) r reapmalloc(r, sz) reapfree(r,p) reapdelete(r) Can reduce memory consumption + Fast + Adapts to use (region or heap style) + Cheap deletion + UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 15
  • 16. Reap Runtime Runtime - Custom Allocation Benchmarks Custom Win32 DLmalloc Reap 1.75 Normalized runtime non-regions regions 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r im c vp e lc z gc ud rs ee -s ac 5. 6. pa d m br 17 ap 17 xe 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 16
  • 17. Reap Space Space - Custom Allocator Benchmarks Custom DLmalloc Reap 1.75 non-regions regions Normalized Space 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r sim c vp e lc z gc ud rs ee ac 5. d- 6. pa m br 17 ap 17 xe 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 17
  • 18. Reap: Best of Both Worlds Allows mixing of regions and new/delete  Case study:   New Apache module “mod_bc” bc: C-based arbitrary-precision calculator  Changed 20 lines out of 8000  Benchmark: compute 1000th prime  With Reap: 240K  Without Reap: 7.4MB  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 18
  • 19. Conclusions and Future Work Empirical study of custom allocators  Lea allocator often as fast or faster  Non-region custom allocation ineffective  Reap: region performance without drawbacks  Future work:  Reduce space with per-page bitmaps  Combine with scalable general-purpose  allocator (e.g., Hoard) UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 19
  • 20. Software http://www.cs.umass.edu/~emery (Reap: part of Heap Layers distribution) http://g.oswego.edu (DLmalloc 2.7.0) UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 20
  • 21. If You Can Read This, I Went Too Far UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 21
  • 22. Backup Slides UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 22
  • 23. Experimental Methodology Comparing to general-purpose  allocators Same semantics: no problem  E.g., disable per-class allocators  Different semantics: use emulator  Uses general-purpose allocator   Adds bookkeeping to support region semantics UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 23
  • 24. Why Did They Do That? Recommended practice  Premature optimization  Microbenchmarks vs. actual performance  Drift  Not bottleneck anymore  Improved competition  Modern allocators are better  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 24
  • 25. Reaps as Regions: Runtime Runtime - Region-Based Benchmarks Custom Win32 DLmalloc Reap 1.75 1.5 Normalized Runtime 1.25 1 0.75 0.5 0.25 0 lcc mudlle Reap performance nearly matches regions  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 25
  • 26. Using Reap as Regions Runtime - Region-Based Benchmarks Original Win32 DLmalloc WinHeap Vmalloc Reap 4.08 2.5 2 Normalized Runtime 1.5 1 0.5 0 lcc mudlle Reap performance nearly matches regions UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 26
  • 27. Drawbacks of Regions Can’t reclaim memory within regions  Bad for long-running computations,  producer-consumer patterns, “malloc/free” programs unbounded memory consumption  Current situation for Apache:  vulnerable to denial-of-service  limits runtime of connections  limits module programming  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 27
  • 28. Use Custom Allocators? Strongly recommended by practitioners  Little hard data on performance/space  improvements Only one previous study [Zorn 1992]  Focused on just one type of allocator  Custom allocators: waste of time  Small gains, bad allocators  Different allocators better? Trade-offs?  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 28
  • 29. Kinds of Custom Allocators Three basic types of custom allocators  Per-class  Fast  Custom patterns  Fast, but very special-purpose  Regions  Fast, possibly more space-efficient  Convenient  Variants: nested, obstacks  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 29
  • 30. Optimization Opportunity Time Spent in Memory Operations Memory Operations Other 100 % of runtime 80 60 40 20 0 sim ll e lcc ze cc e e pr r se ag h ud v g e ac d- 5. ar 6. re er m ap 17 xe 17 p b Av 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 30