Reconsidering
Custom Memory Allocation

Emery Berger, Ben Zorn, Kathryn McKinley




     UNIVERSITY OF MASSACHUSETTS • DE...
Custom Memory Allocation
    Programmers replace                              Very common practice
                      ...
Drawbacks of Custom Allocators
    Avoiding system allocator:


        More code to maintain & debug
    

        Can’...
Overview
    Introduction


    Perceived benefits and drawbacks


    Three main kinds of custom allocators


    Comp...
(1) Per-Class Allocators
    Recycle freed objects from a free list


    a = new Class1;               Class1
          ...
(II) Custom Patterns
             Tailor-made to fit allocation patterns
         

                 Example: 197.parser ...
(III) Regions
    Separate areas, deletion only en masse


    regioncreate(r)               r
    regionmalloc(r, sz)
  ...
Overview
    Introduction


    Perceived benefits and drawbacks


    Three main kinds of custom allocators


    Comp...
Custom Allocators Are Faster…
                                Runtime - Custom Allocator Benchmarks

                     ...
Not So Fast…
                                       Runtime - Custom Allocator Benchmarks
                                ...
The Lea Allocator (DLmalloc 2.7.0)
    Mature public-domain general-purpose

    allocator
    Optimized for common alloc...
Space Consumption: Mixed Results
                                      Space - Custom Allocator Benchmarks

              ...
Overview
    Introduction


    Perceived benefits and drawbacks


    Three main kinds of custom allocators


    Comp...
Regions – Pros and Cons
    Fast, convenient, etc.
+

    Avoid resource leaks (e.g., Apache)
+

         Tear down memory...
Reap Hybrid Allocator
          Reap = region + heap
      

               Adds individual object deletion & heap
      ...
Reap Runtime
                             Runtime - Custom Allocation Benchmarks

                                        ...
Reap Space
                             Space - Custom Allocator Benchmarks
                                              ...
Reap: Best of Both Worlds
    Allows mixing of regions and new/delete


    Case study:


     New Apache module “mod_b...
Conclusions and Future Work
    Empirical study of custom allocators


        Lea allocator often as fast or faster
    ...
Software


http://www.cs.umass.edu/~emery
(Reap: part of Heap Layers distribution)

http://g.oswego.edu
(DLmalloc 2.7.0)

...
If You Can Read This,
I Went Too Far




   UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE   21
Backup Slides




   UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE   22
Experimental Methodology

    Comparing to general-purpose


    allocators
        Same semantics: no problem
    

   ...
Why Did They Do That?
    Recommended practice


    Premature optimization


        Microbenchmarks vs. actual perform...
Reaps as Regions: Runtime
                                  Runtime - Region-Based Benchmarks

                           ...
Using Reap as Regions
                                       Runtime - Region-Based Benchmarks

                          ...
Drawbacks of Regions
    Can’t reclaim memory within regions


        Bad for long-running computations,
    

        ...
Use Custom Allocators?
    Strongly recommended by practitioners


    Little hard data on performance/space

    improv...
Kinds of Custom Allocators
    Three basic types of custom allocators


        Per-class
    

              Fast
     ...
Optimization Opportunity
                                  Time Spent in Memory Operations

                              ...
Upcoming SlideShare
Loading in …5
×

Reconsidering Custom Memory Allocation

3,477 views

Published on

This talk presents an extensive experimental study that shows that a good general-purpose allocator is better than almost all commonly-used custom allocators, with one exception: regions (a.k.a., pools, arenas, zones). However, it shows that regions consume much more memory than necessary. The talk then introduces reaps (regions + heaps), which combine the flexibility and space efficiency of heaps with the performance of regions.

Published in: Business, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,477
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
67
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Reconsidering Custom Memory Allocation

  1. 1. Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
  2. 2. Custom Memory Allocation Programmers replace Very common practice   new/delete, bypassing Apache, gcc, lcc, STL,  system allocator database servers… Language-level Reduce runtime – often   support in C++ Expand functionality – sometimes  Widely recommended Reduce space – rarely   “Use custom allocators” UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 2
  3. 3. Drawbacks of Custom Allocators Avoiding system allocator:  More code to maintain & debug  Can’t use memory debuggers  Not modular or robust:  Mix memory from custom  and general-purpose allocators → crash! Increased burden on programmers  Are custom allocators really a win? UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 3
  4. 4. Overview Introduction  Perceived benefits and drawbacks  Three main kinds of custom allocators  Comparison with general-purpose allocators  Advantages and drawbacks of regions  Reaps – generalization of regions & heaps  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 4
  5. 5. (1) Per-Class Allocators Recycle freed objects from a free list  a = new Class1; Class1 Fast free list b = new Class1; + c = new Class1; Linked list operations + a delete a; Simple + delete b; Identical semantics b + delete c; C++ language support + a = new Class1; c Possibly space-inefficient b = new Class1; - c = new Class1; UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 5
  6. 6. (II) Custom Patterns Tailor-made to fit allocation patterns  Example: 197.parser (natural language parser)  db a c char[MEMORY_LIMIT] end_of_array end_of_array end_of_array end_of_array end_of_array a = xalloc(8); Fast + b = xalloc(16); Pointer-bumping allocation + c = xalloc(8); - Brittle xfree(b); - Fixed memory size xfree(c); - Requires stack-like lifetimes d = xalloc(8); UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 6
  7. 7. (III) Regions Separate areas, deletion only en masse  regioncreate(r) r regionmalloc(r, sz) regiondelete(r) - Risky Fast + - Dangling Pointer-bumping allocation + references Deletion of chunks + - Too much space Convenient + One call frees all memory + Increasingly popular custom allocator  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 7
  8. 8. Overview Introduction  Perceived benefits and drawbacks  Three main kinds of custom allocators  Comparison with general-purpose allocators  Advantages and drawbacks of regions  Reaps – generalization of regions & heaps  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 8
  9. 9. Custom Allocators Are Faster… Runtime - Custom Allocator Benchmarks Custom Win32 1.75 Normalized Runtime non-regions regions 1.5 1.25 1 0.75 0.5 0.25 0 r he er lle ze m c c vp gc lc rs si ud ac ee 5. 6. d- pa m 17 ap br 17 xe 7. c- bo 19 As good as and sometimes much faster than Win32  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 9
  10. 10. Not So Fast… Runtime - Custom Allocator Benchmarks Custom Win32 DLmalloc 1.75 non-regions regions Normalized Runtime 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r m c vp e lc z gc si ud rs ee ac 5. d- 6. pa m br 17 ap 17 xe 7. c- bo 19 DLmalloc: as fast or faster for most benchmarks  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 10
  11. 11. The Lea Allocator (DLmalloc 2.7.0) Mature public-domain general-purpose  allocator Optimized for common allocation patterns  Per-size quicklists ≈ per-class allocation  Deferred coalescing  (combining adjacent free objects) Highly-optimized fastpath  Space-efficient  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 11
  12. 12. Space Consumption: Mixed Results Space - Custom Allocator Benchmarks Custom DLmalloc 1.75 non-regions regions Normalized Space 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r sim c vp e lc z gc ud rs ee ac 5. d- 6. pa m br 17 ap 17 xe 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 12
  13. 13. Overview Introduction  Perceived benefits and drawbacks  Three main kinds of custom allocators  Comparison with general-purpose allocators  Advantages and drawbacks of regions  Reaps – generalization of regions & heaps  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 13
  14. 14. Regions – Pros and Cons Fast, convenient, etc. + Avoid resource leaks (e.g., Apache) + Tear down memory for terminated connections  No individual object deletion - Unbounded memory consumption  (producer-consumer, long-running computations, off-the-shelf programs)  Apache: vulnerable to DoS, memory leaks UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 14
  15. 15. Reap Hybrid Allocator Reap = region + heap  Adds individual object deletion & heap  reapcreate(r) r reapmalloc(r, sz) reapfree(r,p) reapdelete(r) Can reduce memory consumption + Fast + Adapts to use (region or heap style) + Cheap deletion + UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 15
  16. 16. Reap Runtime Runtime - Custom Allocation Benchmarks Custom Win32 DLmalloc Reap 1.75 Normalized runtime non-regions regions 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r im c vp e lc z gc ud rs ee -s ac 5. 6. pa d m br 17 ap 17 xe 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 16
  17. 17. Reap Space Space - Custom Allocator Benchmarks Custom DLmalloc Reap 1.75 non-regions regions Normalized Space 1.5 1.25 1 0.75 0.5 0.25 0 lle e r he c r sim c vp e lc z gc ud rs ee ac 5. d- 6. pa m br 17 ap 17 xe 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 17
  18. 18. Reap: Best of Both Worlds Allows mixing of regions and new/delete  Case study:   New Apache module “mod_bc” bc: C-based arbitrary-precision calculator  Changed 20 lines out of 8000  Benchmark: compute 1000th prime  With Reap: 240K  Without Reap: 7.4MB  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 18
  19. 19. Conclusions and Future Work Empirical study of custom allocators  Lea allocator often as fast or faster  Non-region custom allocation ineffective  Reap: region performance without drawbacks  Future work:  Reduce space with per-page bitmaps  Combine with scalable general-purpose  allocator (e.g., Hoard) UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 19
  20. 20. Software http://www.cs.umass.edu/~emery (Reap: part of Heap Layers distribution) http://g.oswego.edu (DLmalloc 2.7.0) UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 20
  21. 21. If You Can Read This, I Went Too Far UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 21
  22. 22. Backup Slides UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 22
  23. 23. Experimental Methodology Comparing to general-purpose  allocators Same semantics: no problem  E.g., disable per-class allocators  Different semantics: use emulator  Uses general-purpose allocator   Adds bookkeeping to support region semantics UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 23
  24. 24. Why Did They Do That? Recommended practice  Premature optimization  Microbenchmarks vs. actual performance  Drift  Not bottleneck anymore  Improved competition  Modern allocators are better  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 24
  25. 25. Reaps as Regions: Runtime Runtime - Region-Based Benchmarks Custom Win32 DLmalloc Reap 1.75 1.5 Normalized Runtime 1.25 1 0.75 0.5 0.25 0 lcc mudlle Reap performance nearly matches regions  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 25
  26. 26. Using Reap as Regions Runtime - Region-Based Benchmarks Original Win32 DLmalloc WinHeap Vmalloc Reap 4.08 2.5 2 Normalized Runtime 1.5 1 0.5 0 lcc mudlle Reap performance nearly matches regions UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 26
  27. 27. Drawbacks of Regions Can’t reclaim memory within regions  Bad for long-running computations,  producer-consumer patterns, “malloc/free” programs unbounded memory consumption  Current situation for Apache:  vulnerable to denial-of-service  limits runtime of connections  limits module programming  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 27
  28. 28. Use Custom Allocators? Strongly recommended by practitioners  Little hard data on performance/space  improvements Only one previous study [Zorn 1992]  Focused on just one type of allocator  Custom allocators: waste of time  Small gains, bad allocators  Different allocators better? Trade-offs?  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 28
  29. 29. Kinds of Custom Allocators Three basic types of custom allocators  Per-class  Fast  Custom patterns  Fast, but very special-purpose  Regions  Fast, possibly more space-efficient  Convenient  Variants: nested, obstacks  UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 29
  30. 30. Optimization Opportunity Time Spent in Memory Operations Memory Operations Other 100 % of runtime 80 60 40 20 0 sim ll e lcc ze cc e e pr r se ag h ud v g e ac d- 5. ar 6. re er m ap 17 xe 17 p b Av 7. c- bo 19 UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 30

×