SlideShare a Scribd company logo
1 of 29
Parallel Memory AllocationParallel Memory Allocation
Steven Saunders
Steven Saunders 2 Parallel Memory Allocation
Introduction
 Fallacy: All dynamic memory allocators are
either scalable, effective or efficient...
 Truth: no allocator exists that handles all
situations, allocation patterns and memory
hierarchies best (serial or parallel)
 Research results are difficult to compare
– no standard benchmarks, simulators vs. real tests
 Distributed memory, garbage collection, locality
not considered in this talk
Steven Saunders 3 Parallel Memory Allocation
Definitions
 Heap
– pool of memory available for allocation or
deallocation of arbitrarily-sized blocks in arbitrary
order that will live an arbitrary amount of time
 Dynamic Memory Allocator
– used to request or return memory blocks to the heap
– aware only of the size of a memory block, not its
type and value
– tracks which parts of the heap are in use and which
parts are available for allocation
Steven Saunders 4 Parallel Memory Allocation
Design of an Allocator
 Strategy - consider regularities in program
behavior and memory requests to determine a
set of acceptable policies
 Policy - decide where to allocate a memory
block within the heap
 Mechanism - implement the policy using a set
of data structures and algorithms
Emphasis has been on policies and mechanisms!
Steven Saunders 5 Parallel Memory Allocation
Strategy
 Ideal Serial Strategy
– “put memory blocks where they won’t cause
fragmentation later”
– Serial program behavior:
 ramps, peaks, plateaus
 Parallel Strategies
– “minimize unrelated objects on the same page”
– “bound memory blowup and minimize false sharing”
– Parallel program behavior:
 SPMD, producer-consumer
Steven Saunders 6 Parallel Memory Allocation
Policy
 Common Serial Policies:
– best fit, first fit, worst fit, etc.
 Common Techniques:
– splitting - break large blocks into smaller pieces to
satisfy smaller requests
– coalescing - free blocks to satisfy bigger requests
 immediate - upon deallocation
 deferred - wait until requests cannot be satisfied
Steven Saunders 7 Parallel Memory Allocation
Mechanism
 Each block contains header information
– size, in use flag, pointer to next free block, etc.
 Free List - list of memory blocks available for
allocation
– Singly/Doubly Linked List - each free block points to
the next free block
– Boundary Tag - size info at both ends of block
 positive indicates free, negative indicates in use
– Quick Lists - multiple linked lists, where each list
contains blocks of equal size
Steven Saunders 8 Parallel Memory Allocation
Performance
 Speed - comparable to serial allocators
 Scalability - scale linearly with processors
 False Sharing - avoid actively causing it
 Fragmentation - keep it to a minimum
Steven Saunders 9 Parallel Memory Allocation
0 1
Memory
Cache
Performance (False Sharing)
 Multiple processors inadvertently share data
that happens to reside in the same cache line
– padding solves problem, but greatly increases
fragmentation
0 1
0 1 0 1
X X X X
Steven Saunders 10 Parallel Memory Allocation
Performance (Fragmentation)
 Inability to use available memory
 External - available blocks cannot satisfy
requests (e.g., too small)
 Internal - block used to satisfy a request is
larger than the requested size
Steven Saunders 11 Parallel Memory Allocation
Performance (Blowup)
 Out of control fragmentation
– unique to parallel allocators
– allocator correctly reclaims freed memory but fails to
reuse it for future requests
 available memory not seen by allocating processor
– p threads that serialize one after
another require O(s) memory
– p threads that execute in parallel
require O(ps) memory
– p threads that execute interleaved on
1 processor require O(ps) memory...
...
x = malloc(s);
free(x);
...
Example:
blowup from exchanging memory (Producer-Consumer)
Steven Saunders 12 Parallel Memory Allocation
Parallel Allocator Taxonomy
 Serial Single Heap
– global lock protects the heap
 Concurrent Single Heap
– multiple free lists or a concurrent free list
 Multiple Heaps
– processors allocate from any of several heaps
 Private Heaps
– processors allocate exclusively from a local heap
 Global & Local Heaps
– processors allocate from a local and a global heap
Steven Saunders 13 Parallel Memory Allocation
Serial Single Heap
 Make an existing serial allocator thread-safe
– utilize a single global lock for every request
 Performance
– high speed, assuming a fast lock
– scalability limited
– false sharing not considered
– fragmentation bounded by serial policy
 Typically production allocators
– IRIX, Solaris, Windows
Steven Saunders 14 Parallel Memory Allocation
Concurrent Single Heap
 Apply concurrency to existing serial allocators
– use quick list mechanism, with a lock per list
 Performance
– moderate speed, could require many locks
– scalability limited by number of requested sizes
– false sharing not considered
– fragmentation bounded by serial policy
 Typically research allocators
– Buddy System, MFLF (multiple free list first),
NUMAmalloc
Steven Saunders 15 Parallel Memory Allocation
Concurrent Single (Buddy System)
 Policy/Mechanism
– one free list per memory block size: 1,2,4,…,2i
– blocks recursively split on list i into 2 buddies for list
i-1 in order to satisfy smaller requests
– only buddies are coalesced to satisfy larger requests
– each free list can be individually locked
– trade speed for reduced fragmentation
 if free list empty, a thread’s malloc enters a wait queue
 malloc could be satisfied by another thread freeing memory
or by breaking a higher list’s block into buddies (whichever
finishes first)
 reducing the number of splits reduces fragmentation by
leaving more large blocks for future requests
Steven Saunders 16 Parallel Memory Allocation
Concurrent Single (Buddy System)
 Performance
– moderate speed, complicated locking/queueing,
although buddy split/coalesce code is fast
– scalability limited by number of requested sizes
– false sharing very likely
– high internal fragmentation
1
2
4
8
16
Thread 1 Thread 2
x = malloc(8);
1
x = malloc(5);
2
y = malloc(8); free(x);
1
2
4
8
16
1
Steven Saunders 17 Parallel Memory Allocation
Concurrent Single (MFLF)
 Policy/Mechanism
– set of quick lists to satisfy small requests exactly
 malloc takes first block in appropriate list
 free returns block to head of appropriate list
– set of misc lists to satisfy large requests quickly
 each list labeled with range of block sizes, low…high
 malloc takes first block in list where request < low
– trades linear search for internal fragmentation
 free returns blocks to list where low < request < high
– each list can be individually locked and searched
Steven Saunders 18 Parallel Memory Allocation
Concurrent Single (MFLF)
 Performance
– high speed
– scalability limited by number of requested sizes
– false sharing very likely
– high internal fragmentation
 Approach similar to current state-of-the-art
serial allocators
Steven Saunders 19 Parallel Memory Allocation
Concurrent Single (NUMAmalloc)
 Strategy - minimize co-location of unrelated
objects on the same page
– avoid page level false sharing (DSM/software DSM)
 Policy - place same-sized requests in the same
page (heuristic hypothesis)
– basically MFLF on the page level
 Performance
– high speed
– scalability limited by number of requested sizes
– false sharing: helps page level but not cache level
– high internal fragmentation
Steven Saunders 20 Parallel Memory Allocation
Multiple Heaps
 List of multiple heaps
– individually growable, shrinkable and lockable
– threads scan list looking for first available (trylock)
– threads may cache result to reduce next lookup
 Performance
– moderate speed, limited by # list scans and lock
– scalability limited by number of heaps and traffic
– false sharing unintentionally reduced
– blowup increased (up to O(p))
 Typically production allocators
– ptmalloc (Linux), HP-UX
Steven Saunders 21 Parallel Memory Allocation
Private Heaps
 Processors exclusively utilize a local private
heap for all allocation and deallocation
– eliminates need for locks
 Performance
– extremely high speed
– scalability unbounded
– reduced false sharing
 pass memory to another thread
– blowup unbounded
 Both research and production allocators
– CILK, STL
Steven Saunders 22 Parallel Memory Allocation
Global & Local Heaps
 Processors generally utilize a local heap
– reduces most lock contention
– private memory is acquired/returned to global heap
(which is always locked) as necessary
 Performance
– high speed, less lock contention
– scalability limited by number of locks
– low false sharing
– blowup bounded (O(1))
 Typically research allocators
– VH, Hoard
Steven Saunders 23 Parallel Memory Allocation
Global & Local Heaps (VH)
 Strategy - exchange memory overhead for
improved scalability
 Policy
– memory broken into stacks of size m/2
– global heap maintains a LIFO of stack pointers that
local heaps can use
 global push releases a local heap’s stack
 global pop acquires a local heap’s stack
– local heaps maintain an Active and Backup stack
 local operations private, i.e. don’t require locks
 Mechanism - serial free list within stacks
Steven Saunders 24 Parallel Memory Allocation
Global & Local Heaps (VH)
 Memory Usage = M + m*p
– M = amount of memory in use by program
– p = number of processors
– m = private memory (2 size m/2 stacks)
 higher m reduces number of global heap lock operations
 lower m reduces memory usage overhead
B
0
A B
1
A B
p
A
global heap
private heap
Steven Saunders 25 Parallel Memory Allocation
Global & Local Heaps (Hoard)
 Strategy - bound blowup and min. false sharing
 Policy
– memory broken into superblocks of size S
 all blocks within a superblock are of equal size
– global heap maintains a set of available superblocks
– local heaps maintain local superblocks
 malloc satisfied by local superblock
 free returns memory to original allocating superblock (lock!)
 superblocks acquired as necessary from global heap
 if local usage drops below a threshold, superblocks are
returned to the global heap
 Mechanism – private superblock quick lists
Steven Saunders 26 Parallel Memory Allocation
Global & Local Heaps (Hoard)
 Memory Usage = O(M + p)
– M = amount of memory in use by program
– p = number of processors
 False Sharing
– since malloc is satisfied by a local superblock, and
free returns memory to the original superblock,
false sharing is greatly reduced
– worst case: a non-empty superblock is released to
global heap and another thread acquires it
 set superblock size and emptiness threshold to minimize
Steven Saunders 27 Parallel Memory Allocation
Global & Local Heaps (Compare)
 VH proves a tighter memory bound than Hoard
and only requires global locks
 Hoard has a more flexible local mechanism and
considers false sharing
 They’ve never been compared!
– Hoard is production quality and would likely win
Steven Saunders 28 Parallel Memory Allocation
Summary
 Memory allocation is still an open problem
– strategies addressing program behavior still
uncommon
 Performance Tradeoffs
– speed, scalability, false sharing, fragmentation
 Current Taxonomy
– serial single heap, concurrent single heap, multiple
heaps, private heaps, global & local heaps
Steven Saunders 29 Parallel Memory Allocation
References
Serial Allocation
1) Paul Wilson, Mark Johnstone, Michael Neely, David Boles.
Dynamic Storage Allocation: A Survey and Critical Review. 1995 International Workshop on
Memory Management.
Shared Memory Multiprocessor Allocation
Concurrent Single Heap
2) Arun Iyengar. Scalability of Dynamic Storage Allocation Algorithms. Sixth Symposium on the
Frontiers of Massively Parallel Computing. October 1996.
3) Theodore Johnson, Tim Davis. Space Efficient Parallel Buddy Memory Management. The Fourth
International Conference on Computing and Information (ICCI'92). May 1992.
4) Jong Woo Lee, Yookun Cho. An Effective Shared Memory Allocator
for Reducing False Sharing in NUMA Multiprocessors. IEEE Second International Conference
on Algorithms & Architectures for Parallel Processing (ICAPP'96).
Multiple Heaps
5) Wolfram Gloger. Dynamic Memory Allocator Implementations in Linux System Libraries.
Global & Local Heaps
6) Emery Berger, Kathryn McKinley, Robert Blumofe, Paul Wilson. Hoard: A Scalable Memory
Allocator for Multithreaded Applications. The Ninth International Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS-IX). November 2000.
7) Voon-Yee Vee, Wen-Jing Hsu. A Scalable and Efficient Storage Allocator
on Shared-Memory Multiprocessors. The International Symposium on Parallel Architectures,
Algorithms, and Networks (I-SPAN'99). June 1999.

More Related Content

What's hot

Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation FinalDhritiman Halder
 
Cache Memory Computer Architecture and organization
Cache Memory Computer Architecture and organizationCache Memory Computer Architecture and organization
Cache Memory Computer Architecture and organizationHumayra Khanum
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache designRohail Butt
 
Cache memory and virtual memory
Cache memory and virtual memoryCache memory and virtual memory
Cache memory and virtual memoryPrakharBansal29
 
Cache performance considerations
Cache performance considerationsCache performance considerations
Cache performance considerationsSlideshare
 
Cache memory by Foysal
Cache memory by FoysalCache memory by Foysal
Cache memory by FoysalFoysal Mahmud
 
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address TranslationMemory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address TranslationFarwa Ansari
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cacheVISHAL DONGA
 

What's hot (20)

Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
 
Cache Memory Computer Architecture and organization
Cache Memory Computer Architecture and organizationCache Memory Computer Architecture and organization
Cache Memory Computer Architecture and organization
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache design
 
Cachememory
CachememoryCachememory
Cachememory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Cache memory and virtual memory
Cache memory and virtual memoryCache memory and virtual memory
Cache memory and virtual memory
 
computer-memory
computer-memorycomputer-memory
computer-memory
 
cache memory
 cache memory cache memory
cache memory
 
Cache performance considerations
Cache performance considerationsCache performance considerations
Cache performance considerations
 
Lecture2
Lecture2Lecture2
Lecture2
 
Memory (Computer Organization)
Memory (Computer Organization)Memory (Computer Organization)
Memory (Computer Organization)
 
Cache memory by Foysal
Cache memory by FoysalCache memory by Foysal
Cache memory by Foysal
 
Cache memory
Cache memoryCache memory
Cache memory
 
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address TranslationMemory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
 
cache memory management
cache memory managementcache memory management
cache memory management
 
Cache memory ...
Cache memory ...Cache memory ...
Cache memory ...
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
 
Cache memory
Cache memoryCache memory
Cache memory
 

Similar to 3parallel memoryallocation

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...sumithragunasekaran
 
Memory allocation for real time operating system
Memory allocation for real time operating systemMemory allocation for real time operating system
Memory allocation for real time operating systemAsma'a Lafi
 
storage & file strucure in dbms
storage & file strucure in dbmsstorage & file strucure in dbms
storage & file strucure in dbmssachin2690
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureHaris456
 
Operating system 34 contiguous allocation
Operating system 34 contiguous allocationOperating system 34 contiguous allocation
Operating system 34 contiguous allocationVaibhav Khanna
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Lakshmi Yasaswi Kamireddy
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability ConsiderationsNavid Malek
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory SystemsArush Nagpal
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystemSandeep Kamath
 

Similar to 3parallel memoryallocation (20)

12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Class notesfeb27
Class notesfeb27Class notesfeb27
Class notesfeb27
 
Chapter 5 b
Chapter 5  bChapter 5  b
Chapter 5 b
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
Dos unit3
Dos unit3Dos unit3
Dos unit3
 
Memory allocation for real time operating system
Memory allocation for real time operating systemMemory allocation for real time operating system
Memory allocation for real time operating system
 
Chap 4
Chap 4Chap 4
Chap 4
 
storage & file strucure in dbms
storage & file strucure in dbmsstorage & file strucure in dbms
storage & file strucure in dbms
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
Memory management
Memory managementMemory management
Memory management
 
Operating system 34 contiguous allocation
Operating system 34 contiguous allocationOperating system 34 contiguous allocation
Operating system 34 contiguous allocation
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)
 
Week5
Week5Week5
Week5
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability Considerations
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystem
 
Parallel databases
Parallel databasesParallel databases
Parallel databases
 
Distributed shared memory ch 5
Distributed shared memory ch 5Distributed shared memory ch 5
Distributed shared memory ch 5
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

3parallel memoryallocation

  • 1. Parallel Memory AllocationParallel Memory Allocation Steven Saunders
  • 2. Steven Saunders 2 Parallel Memory Allocation Introduction  Fallacy: All dynamic memory allocators are either scalable, effective or efficient...  Truth: no allocator exists that handles all situations, allocation patterns and memory hierarchies best (serial or parallel)  Research results are difficult to compare – no standard benchmarks, simulators vs. real tests  Distributed memory, garbage collection, locality not considered in this talk
  • 3. Steven Saunders 3 Parallel Memory Allocation Definitions  Heap – pool of memory available for allocation or deallocation of arbitrarily-sized blocks in arbitrary order that will live an arbitrary amount of time  Dynamic Memory Allocator – used to request or return memory blocks to the heap – aware only of the size of a memory block, not its type and value – tracks which parts of the heap are in use and which parts are available for allocation
  • 4. Steven Saunders 4 Parallel Memory Allocation Design of an Allocator  Strategy - consider regularities in program behavior and memory requests to determine a set of acceptable policies  Policy - decide where to allocate a memory block within the heap  Mechanism - implement the policy using a set of data structures and algorithms Emphasis has been on policies and mechanisms!
  • 5. Steven Saunders 5 Parallel Memory Allocation Strategy  Ideal Serial Strategy – “put memory blocks where they won’t cause fragmentation later” – Serial program behavior:  ramps, peaks, plateaus  Parallel Strategies – “minimize unrelated objects on the same page” – “bound memory blowup and minimize false sharing” – Parallel program behavior:  SPMD, producer-consumer
  • 6. Steven Saunders 6 Parallel Memory Allocation Policy  Common Serial Policies: – best fit, first fit, worst fit, etc.  Common Techniques: – splitting - break large blocks into smaller pieces to satisfy smaller requests – coalescing - free blocks to satisfy bigger requests  immediate - upon deallocation  deferred - wait until requests cannot be satisfied
  • 7. Steven Saunders 7 Parallel Memory Allocation Mechanism  Each block contains header information – size, in use flag, pointer to next free block, etc.  Free List - list of memory blocks available for allocation – Singly/Doubly Linked List - each free block points to the next free block – Boundary Tag - size info at both ends of block  positive indicates free, negative indicates in use – Quick Lists - multiple linked lists, where each list contains blocks of equal size
  • 8. Steven Saunders 8 Parallel Memory Allocation Performance  Speed - comparable to serial allocators  Scalability - scale linearly with processors  False Sharing - avoid actively causing it  Fragmentation - keep it to a minimum
  • 9. Steven Saunders 9 Parallel Memory Allocation 0 1 Memory Cache Performance (False Sharing)  Multiple processors inadvertently share data that happens to reside in the same cache line – padding solves problem, but greatly increases fragmentation 0 1 0 1 0 1 X X X X
  • 10. Steven Saunders 10 Parallel Memory Allocation Performance (Fragmentation)  Inability to use available memory  External - available blocks cannot satisfy requests (e.g., too small)  Internal - block used to satisfy a request is larger than the requested size
  • 11. Steven Saunders 11 Parallel Memory Allocation Performance (Blowup)  Out of control fragmentation – unique to parallel allocators – allocator correctly reclaims freed memory but fails to reuse it for future requests  available memory not seen by allocating processor – p threads that serialize one after another require O(s) memory – p threads that execute in parallel require O(ps) memory – p threads that execute interleaved on 1 processor require O(ps) memory... ... x = malloc(s); free(x); ... Example: blowup from exchanging memory (Producer-Consumer)
  • 12. Steven Saunders 12 Parallel Memory Allocation Parallel Allocator Taxonomy  Serial Single Heap – global lock protects the heap  Concurrent Single Heap – multiple free lists or a concurrent free list  Multiple Heaps – processors allocate from any of several heaps  Private Heaps – processors allocate exclusively from a local heap  Global & Local Heaps – processors allocate from a local and a global heap
  • 13. Steven Saunders 13 Parallel Memory Allocation Serial Single Heap  Make an existing serial allocator thread-safe – utilize a single global lock for every request  Performance – high speed, assuming a fast lock – scalability limited – false sharing not considered – fragmentation bounded by serial policy  Typically production allocators – IRIX, Solaris, Windows
  • 14. Steven Saunders 14 Parallel Memory Allocation Concurrent Single Heap  Apply concurrency to existing serial allocators – use quick list mechanism, with a lock per list  Performance – moderate speed, could require many locks – scalability limited by number of requested sizes – false sharing not considered – fragmentation bounded by serial policy  Typically research allocators – Buddy System, MFLF (multiple free list first), NUMAmalloc
  • 15. Steven Saunders 15 Parallel Memory Allocation Concurrent Single (Buddy System)  Policy/Mechanism – one free list per memory block size: 1,2,4,…,2i – blocks recursively split on list i into 2 buddies for list i-1 in order to satisfy smaller requests – only buddies are coalesced to satisfy larger requests – each free list can be individually locked – trade speed for reduced fragmentation  if free list empty, a thread’s malloc enters a wait queue  malloc could be satisfied by another thread freeing memory or by breaking a higher list’s block into buddies (whichever finishes first)  reducing the number of splits reduces fragmentation by leaving more large blocks for future requests
  • 16. Steven Saunders 16 Parallel Memory Allocation Concurrent Single (Buddy System)  Performance – moderate speed, complicated locking/queueing, although buddy split/coalesce code is fast – scalability limited by number of requested sizes – false sharing very likely – high internal fragmentation 1 2 4 8 16 Thread 1 Thread 2 x = malloc(8); 1 x = malloc(5); 2 y = malloc(8); free(x); 1 2 4 8 16 1
  • 17. Steven Saunders 17 Parallel Memory Allocation Concurrent Single (MFLF)  Policy/Mechanism – set of quick lists to satisfy small requests exactly  malloc takes first block in appropriate list  free returns block to head of appropriate list – set of misc lists to satisfy large requests quickly  each list labeled with range of block sizes, low…high  malloc takes first block in list where request < low – trades linear search for internal fragmentation  free returns blocks to list where low < request < high – each list can be individually locked and searched
  • 18. Steven Saunders 18 Parallel Memory Allocation Concurrent Single (MFLF)  Performance – high speed – scalability limited by number of requested sizes – false sharing very likely – high internal fragmentation  Approach similar to current state-of-the-art serial allocators
  • 19. Steven Saunders 19 Parallel Memory Allocation Concurrent Single (NUMAmalloc)  Strategy - minimize co-location of unrelated objects on the same page – avoid page level false sharing (DSM/software DSM)  Policy - place same-sized requests in the same page (heuristic hypothesis) – basically MFLF on the page level  Performance – high speed – scalability limited by number of requested sizes – false sharing: helps page level but not cache level – high internal fragmentation
  • 20. Steven Saunders 20 Parallel Memory Allocation Multiple Heaps  List of multiple heaps – individually growable, shrinkable and lockable – threads scan list looking for first available (trylock) – threads may cache result to reduce next lookup  Performance – moderate speed, limited by # list scans and lock – scalability limited by number of heaps and traffic – false sharing unintentionally reduced – blowup increased (up to O(p))  Typically production allocators – ptmalloc (Linux), HP-UX
  • 21. Steven Saunders 21 Parallel Memory Allocation Private Heaps  Processors exclusively utilize a local private heap for all allocation and deallocation – eliminates need for locks  Performance – extremely high speed – scalability unbounded – reduced false sharing  pass memory to another thread – blowup unbounded  Both research and production allocators – CILK, STL
  • 22. Steven Saunders 22 Parallel Memory Allocation Global & Local Heaps  Processors generally utilize a local heap – reduces most lock contention – private memory is acquired/returned to global heap (which is always locked) as necessary  Performance – high speed, less lock contention – scalability limited by number of locks – low false sharing – blowup bounded (O(1))  Typically research allocators – VH, Hoard
  • 23. Steven Saunders 23 Parallel Memory Allocation Global & Local Heaps (VH)  Strategy - exchange memory overhead for improved scalability  Policy – memory broken into stacks of size m/2 – global heap maintains a LIFO of stack pointers that local heaps can use  global push releases a local heap’s stack  global pop acquires a local heap’s stack – local heaps maintain an Active and Backup stack  local operations private, i.e. don’t require locks  Mechanism - serial free list within stacks
  • 24. Steven Saunders 24 Parallel Memory Allocation Global & Local Heaps (VH)  Memory Usage = M + m*p – M = amount of memory in use by program – p = number of processors – m = private memory (2 size m/2 stacks)  higher m reduces number of global heap lock operations  lower m reduces memory usage overhead B 0 A B 1 A B p A global heap private heap
  • 25. Steven Saunders 25 Parallel Memory Allocation Global & Local Heaps (Hoard)  Strategy - bound blowup and min. false sharing  Policy – memory broken into superblocks of size S  all blocks within a superblock are of equal size – global heap maintains a set of available superblocks – local heaps maintain local superblocks  malloc satisfied by local superblock  free returns memory to original allocating superblock (lock!)  superblocks acquired as necessary from global heap  if local usage drops below a threshold, superblocks are returned to the global heap  Mechanism – private superblock quick lists
  • 26. Steven Saunders 26 Parallel Memory Allocation Global & Local Heaps (Hoard)  Memory Usage = O(M + p) – M = amount of memory in use by program – p = number of processors  False Sharing – since malloc is satisfied by a local superblock, and free returns memory to the original superblock, false sharing is greatly reduced – worst case: a non-empty superblock is released to global heap and another thread acquires it  set superblock size and emptiness threshold to minimize
  • 27. Steven Saunders 27 Parallel Memory Allocation Global & Local Heaps (Compare)  VH proves a tighter memory bound than Hoard and only requires global locks  Hoard has a more flexible local mechanism and considers false sharing  They’ve never been compared! – Hoard is production quality and would likely win
  • 28. Steven Saunders 28 Parallel Memory Allocation Summary  Memory allocation is still an open problem – strategies addressing program behavior still uncommon  Performance Tradeoffs – speed, scalability, false sharing, fragmentation  Current Taxonomy – serial single heap, concurrent single heap, multiple heaps, private heaps, global & local heaps
  • 29. Steven Saunders 29 Parallel Memory Allocation References Serial Allocation 1) Paul Wilson, Mark Johnstone, Michael Neely, David Boles. Dynamic Storage Allocation: A Survey and Critical Review. 1995 International Workshop on Memory Management. Shared Memory Multiprocessor Allocation Concurrent Single Heap 2) Arun Iyengar. Scalability of Dynamic Storage Allocation Algorithms. Sixth Symposium on the Frontiers of Massively Parallel Computing. October 1996. 3) Theodore Johnson, Tim Davis. Space Efficient Parallel Buddy Memory Management. The Fourth International Conference on Computing and Information (ICCI'92). May 1992. 4) Jong Woo Lee, Yookun Cho. An Effective Shared Memory Allocator for Reducing False Sharing in NUMA Multiprocessors. IEEE Second International Conference on Algorithms & Architectures for Parallel Processing (ICAPP'96). Multiple Heaps 5) Wolfram Gloger. Dynamic Memory Allocator Implementations in Linux System Libraries. Global & Local Heaps 6) Emery Berger, Kathryn McKinley, Robert Blumofe, Paul Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. The Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). November 2000. 7) Voon-Yee Vee, Wen-Jing Hsu. A Scalable and Efficient Storage Allocator on Shared-Memory Multiprocessors. The International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99). June 1999.