Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software Emery Berger University of Massachus...
Research Overview <ul><li>High-performance  memory managers </li></ul><ul><ul><li>Hoard  allocator for concurrent apps  [A...
Concurrent Memory Allocators <ul><li>Previous allocators  unsuitable  for multithreaded apps </li></ul><ul><ul><li>Seriali...
Hoard Memory Allocator <ul><li>Hoard </li></ul><ul><ul><li>Scalable heap </li></ul></ul><ul><ul><ul><li>Provably low  sync...
The Cores Have Arrived <ul><li>Hurray! Now what? </li></ul><ul><li>Multithreading problems: </li></ul><ul><ul><li>Data rac...
Exploit Multicores Now! <ul><li>Taking advantage of multicores  without rewriting a line of code : </li></ul><ul><ul><li>B...
Flux A Language for Programming  High-Performance Servers joint work with Brendan Burns, Kevin Grimaldi, Alex Kostadinov, ...
Motivating Example: Image Server <ul><li>Client </li></ul><ul><ul><li>Requests image @ desired quality, size </li></ul></u...
Problem: Concurrency <ul><li>Could write sequential code but… </li></ul><ul><ul><li>More  clients  (latency) </li></ul></u...
The Flux Programming Language <ul><li>Unmodified  C, C++  (or Java)  –  black boxes </li></ul><ul><li>Compose with Flux pr...
Flux Server “Main” <ul><li>Source  nodes originate flows </li></ul><ul><ul><li>Conceptually in separate thread </li></ul><...
Flux Image Server <ul><li>Basic image server requires: </li></ul><ul><ul><li>HTTP parsing ( http ) </li></ul></ul><ul><ul>...
Control Flow <ul><li>Direct   flow via user-supplied  predicate types </li></ul><ul><ul><li>Type test applied to output </...
Supporting Concurrency <ul><li>Many clients = concurrent flows </li></ul><ul><ul><li>Must keep cache consistent </li></ul>...
More Atomicity <ul><li>Reader / writer constraints </li></ul><ul><ul><li>Multiple readers  or  single writer (default) </l...
Preventing Deadlock <ul><li>Naïve execution can deadlock </li></ul><ul><li>Establish  canonical lock order </li></ul><ul><...
Preventing Deadlock, II <ul><li>Harder with abstract nodes </li></ul>A = B; C = D; atomic  A:{z}; atomic  B:{y}; atomic  C...
Almost Complete Flux Image Server <ul><li>Concise, readable expression of  server  logic </li></ul><ul><ul><li>No threads,...
Flux Outline <ul><li>Intro to Flux: building a server </li></ul><ul><ul><li>Components, flow </li></ul></ul><ul><ul><li>At...
Flux Results <ul><li>Four servers: </li></ul><ul><ul><li>Image server [23] </li></ul></ul><ul><ul><ul><li>+ libjpeg </li><...
Web Server
Performance Prediction observed parameters
Performance Prediction observed parameters
Flux Conclusion <ul><li>Flux  language & system </li></ul><ul><ul><li>Concurrency made easier </li></ul></ul><ul><ul><li>B...
Future Work: eFlux <ul><li>Wood turtle ( Clemmys insculpta ) </li></ul><ul><li>eFlux : language for  perpetual computing <...
DieHard: Probabilistic Memory Safety for Unsafe Programming Languages Joint work with Ben Zorn (Microsoft Research)
Problems with Unsafe Languages <ul><li>C, C++: pervasive apps, but langs. memory unsafe </li></ul><ul><li>Numerous opportu...
Current Approaches <ul><li>Unsound,  may   work or abort </li></ul><ul><ul><li>Windows, GNU libc, etc.,  Rx   [Zhou] </li>...
Soundness for “Erroneous” Programs <ul><li>Normally: memory errors  )   ?  … </li></ul><ul><li>Consider  infinite-heap  al...
Probabilistic Memory Safety <ul><li>Approximate    with  M -heaps (e.g., M=2) </li></ul><ul><li>Naïve:  pad  allocations,...
Implementation Choices <ul><li>Conventional, freelist-based heaps </li></ul><ul><ul><li>Hard to randomize, protect from er...
Randomized Heap Layout <ul><li>Bitmap-based,  segregated  size classes </li></ul><ul><ul><li>Bit represents one  object  o...
Randomized Allocation <ul><li>malloc(8) : </li></ul><ul><ul><li>compute size class = ceil(log 2  sz) – 3 </li></ul></ul><u...
<ul><li>malloc(8) : </li></ul><ul><ul><li>compute size class = ceil(log 2  sz) – 3 </li></ul></ul><ul><ul><li>randomly  pr...
<ul><li>free(ptr) : </li></ul><ul><ul><li>Ensure object  valid  – aligned to right address </li></ul></ul><ul><ul><li>Ensu...
Randomized Deallocation <ul><li>free(ptr) : </li></ul><ul><ul><li>Ensure object  valid  – aligned to right address </li></...
<ul><li>free(ptr) : </li></ul><ul><ul><li>Ensure object  valid  – aligned to right address </li></ul></ul><ul><ul><li>Ensu...
Randomized Heaps & Reliability <ul><li>Objects randomly spread across heap </li></ul><ul><li>Different run = different hea...
DieHard software architecture <ul><li>“ Output equivalent” – kill failed replicas </li></ul>broadcast vote input output ex...
DieHard Results <ul><li>Analytical results (pictures!) </li></ul><ul><ul><li>Buffer overflows </li></ul></ul><ul><ul><li>U...
Analytical Results: Buffer Overflows <ul><li>Model overflow as write of live data </li></ul><ul><ul><li>Heap half full (ma...
Analytical Results: Buffer Overflows <ul><li>Model overflow as write of live data </li></ul><ul><ul><li>Heap half full (ma...
Analytical Results: Buffer Overflows <ul><li>Model overflow:  random  write of live data </li></ul><ul><ul><li>Heap half f...
Analytical Results: Buffer Overflows <ul><li>Replicas:  Increase odds of avoiding overflow in  at least one  replica </li>...
Analytical Results: Buffer Overflows <ul><li>Replicas:  Increase odds of avoiding overflow in  at least one  replica </li>...
Analytical Results: Buffer Overflows <ul><li>Replicas:  Increase odds of avoiding overflow in  at least one  replica </li>...
Analytical Results: Buffer Overflows <ul><li>F =  free space </li></ul><ul><li>H =  heap size </li></ul><ul><li>N  = # obj...
Empirical Results: Runtime
Empirical Results: Runtime
Empirical Results: Error Avoidance <ul><li>Injected faults: </li></ul><ul><ul><li>Dangling pointers  ( @ 50%, 10 allocatio...
DieHard Conclusion <ul><li>Randomization + replicas = probabilistic memory safety </li></ul><ul><ul><li>Improves over toda...
The End <ul><li>http://www.cs.umass.edu/~emery/diehard </li></ul><ul><ul><li>Linux, Solaris (stand-alone & replicated) </l...
Backup
Handling Errors <ul><li>What if image requested doesn’t exist? </li></ul><ul><ul><li>Error = negative return value from co...
Flux Outline <ul><li>Intro to Flux: building a server </li></ul><ul><ul><li>Components </li></ul></ul><ul><ul><li>Flows </...
Probabilistic Memory Safety <ul><li>Fully-randomized  memory manager </li></ul><ul><ul><li>Increases odds of  benign  memo...
Upcoming SlideShare
Loading in …5
×

Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

4,474 views
4,414 views

Published on

Multiple core CPUs are here. Conventional wisdom holds that, to take best advantage of these processors, we now need to rewrite sequential applications to make them multithreaded. Because of the difficulty of programming correct and efficient multithreaded applications (e.g., race conditions, deadlocks, and scalability bottlenecks), this is a major challenge.

This talk presents two alternative approaches that bring the power of multiple cores to today's software. The first approach focuses on building highly-concurrent client-server applications from legacy code. I present a system called Flux that allows users to take unmodified off-the-shelf *sequential* C and C++ code and build concurrent applications. The Flux compiler combines the Flux program and the sequential code to generate a deadlock-free, high-concurrency server. Flux also generates discrete event simulators that accurately predict actual server performance under load. While the Flux language was initially targeted at servers, we have found it to be a useful abstraction for sensor networks, and I will briefly talk about our use of an energy-aware variant of Flux in a deployment on the backs of endangered turtles. The second approach uses the extra processing power of multicore CPUs to make legacy C/C++ applications more reliable. I present a system called DieHard that uses randomization and replication to transparently harden programs against a wide range of errors, including buffer overflows and dangling pointers. Instead of crashing or running amok, DieHard lets programs continue to run correctly in the face of memory errors with high probability. This is joint work with Brendan Burns, Kevin Grimaldi, Alex Kostadinov, Jacob Sorber, and Mark Corner (University of Massachusetts Amherst), and Ben Zorn (Microsoft Research).

Published in: Technology, Education
1 Comment
3 Likes
Statistics
Notes
  • Another excellent slide show to understand multicore CPUs.

    It is easy to create C# applications that take advantage of multicore CPUs. You need to read a good book. That's all!!

    Read this article and the book 'C# 2008 and 2005 threaded programming'. http://www.packtpub.com/article/simplifying-parallelism-complexity-c-sharp

    I had never used threads and after reading the first 7 chapters, I am exploiting my Core 2 Quad. Highly recommended to C# developers.

    http://www.packtpub.com/beginners-guide-for-C-sharp-2008-and-2005-threaded-programming/book

    Cheers!

    Henry Simlan
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
4,474
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
169
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

  1. 1. Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software Emery Berger University of Massachusetts Amherst
  2. 2. Research Overview <ul><li>High-performance memory managers </li></ul><ul><ul><li>Hoard allocator for concurrent apps [ASPLOS-IX] </li></ul></ul><ul><ul><li>Heap Layers infrastructure [PLDI 01] </li></ul></ul><ul><ul><li>Reaps ( regions + heaps ) [OOPSLA 02] </li></ul></ul><ul><li>Cooperative memory management (OS + GC) </li></ul><ul><ul><li>Bookmarking: GC without paging [PLDI 04] </li></ul></ul><ul><ul><li>CRAMM VM + any GC, max thruput [ISMM 04, OSDI 06] </li></ul></ul><ul><li>And: </li></ul><ul><ul><li>Memory management studies </li></ul></ul><ul><ul><ul><li>Custom allocation [OOPSLA 02] , GC vs. malloc [OOPSLA 05] </li></ul></ul></ul><ul><ul><li>Support for contributory applications </li></ul></ul><ul><ul><ul><li>Transparent contribution: memory, disk [USENIX 06, FAST 07] </li></ul></ul></ul><ul><ul><li>Plus other compiler & runtime stuff </li></ul></ul>Transparently improving performance , robustness & reliability (PL + OS)
  3. 3. Concurrent Memory Allocators <ul><li>Previous allocators unsuitable for multithreaded apps </li></ul><ul><ul><li>Serialized heap </li></ul></ul><ul><ul><ul><li>Protected by lock </li></ul></ul></ul><ul><ul><li>Allocator-induced false sharing </li></ul></ul><ul><ul><li>Poor space bounds: blowup </li></ul></ul><ul><ul><ul><li>O(P), O(T), or unbounded increase in memory </li></ul></ul></ul>processor 0 processor 1 = in use , processor 0 = free , on heap 1 Key: free(x1) x2= malloc(1) free(x2) x1= malloc(1) x3= malloc(1) free(x3) “ pure private heaps” (STL, Cilk, others)
  4. 4. Hoard Memory Allocator <ul><li>Hoard </li></ul><ul><ul><li>Scalable heap </li></ul></ul><ul><ul><ul><li>Provably low synch overhead </li></ul></ul></ul><ul><ul><li>Optimal space consumption: blowup = O(1) </li></ul></ul><ul><ul><li>Avoids false sharing </li></ul></ul><ul><li>www.hoard.org </li></ul><ul><ul><li>40,000+ downloads </li></ul></ul><ul><ul><li>AOL, BT, Philips, Credit Suisse, Novell, etc. </li></ul></ul>
  5. 5. The Cores Have Arrived <ul><li>Hurray! Now what? </li></ul><ul><li>Multithreading problems: </li></ul><ul><ul><li>Data races </li></ul></ul><ul><ul><li>Deadlock & livelock </li></ul></ul><ul><ul><li>Scalability bottlenecks </li></ul></ul><ul><li>Automatic Parallelization? </li></ul>
  6. 6. Exploit Multicores Now! <ul><li>Taking advantage of multicores without rewriting a line of code : </li></ul><ul><ul><li>Build scalable applications from parts </li></ul></ul><ul><ul><ul><li>Flux: “glue” language for easily building highly-concurrent servers [USENIX 06] </li></ul></ul></ul><ul><ul><li>Increase reliability </li></ul></ul><ul><ul><ul><li>DieHard: lets C/C++ programs run correctly in face of memory errors with high probability [PLDI 06] </li></ul></ul></ul>
  7. 7. Flux A Language for Programming High-Performance Servers joint work with Brendan Burns, Kevin Grimaldi, Alex Kostadinov, Mark Corner University of Massachusetts Amherst
  8. 8. Motivating Example: Image Server <ul><li>Client </li></ul><ul><ul><li>Requests image @ desired quality, size </li></ul></ul><ul><li>Server </li></ul><ul><ul><li>Images: RAW </li></ul></ul><ul><ul><li>Compresses to JPG </li></ul></ul><ul><ul><li>Caches requests </li></ul></ul><ul><ul><li>Sends to client </li></ul></ul>http://server/Easter-bunny/ 200x100/75 not found client image server
  9. 9. Problem: Concurrency <ul><li>Could write sequential code but… </li></ul><ul><ul><li>More clients (latency) </li></ul></ul><ul><ul><li>Bigger server </li></ul></ul><ul><ul><ul><li>Multicores , multiprocessors </li></ul></ul></ul><ul><li>One approach: threads </li></ul><ul><ul><li>Risk deadlock, etc. </li></ul></ul><ul><ul><li>Mixes program logic & concurrency control – ties to runtime (threads?!) </li></ul></ul>clients image server
  10. 10. The Flux Programming Language <ul><li>Unmodified C, C++ (or Java) – black boxes </li></ul><ul><li>Compose with Flux program </li></ul><ul><ul><li>Assume #clients » #cores </li></ul></ul><ul><li>High-quality server + performance tools: </li></ul><ul><ul><li>Statically enforces atomicity w/o deadlock </li></ul></ul><ul><ul><li>Path profiling </li></ul></ul><ul><ul><li>Discrete event simulator </li></ul></ul>High-performance & deadlock-free concurrent programming w/ sequential components
  11. 11. Flux Server “Main” <ul><li>Source nodes originate flows </li></ul><ul><ul><li>Conceptually in separate thread </li></ul></ul><ul><ul><li>Executes inside implicit infinite loop </li></ul></ul><ul><ul><ul><li>Initiates flow (“thread”) for each image request </li></ul></ul></ul>Listen source Listen  Image; image server ReadRequest Write Compress Complete ReadRequest Write Compress Complete ReadRequest Write Compress Complete
  12. 12. Flux Image Server <ul><li>Basic image server requires: </li></ul><ul><ul><li>HTTP parsing ( http ) </li></ul></ul><ul><ul><li>Socket handling ( socket ) </li></ul></ul><ul><ul><li>JPEG compression ( libjpeg ) </li></ul></ul><ul><ul><li>All UNIX-style C libraries </li></ul></ul><ul><li>Abstract node = flow across nodes </li></ul><ul><ul><li>Concrete or abstract </li></ul></ul>ReadRequest Write Compress Complete libjpeg socket http http Image = ReadRequest  Compress  Write  Complete; image server
  13. 13. Control Flow <ul><li>Direct flow via user-supplied predicate types </li></ul><ul><ul><li>Type test applied to output </li></ul></ul><ul><ul><ul><li>Note: no variables – dispatch on output “type” </li></ul></ul></ul><ul><ul><li>Here: cache frequently requested images </li></ul></ul>Listen ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit handler handler Image = ReadRequest  Handler  Write  Complete; typedef hit TestInCache; Handler:[_,_,hit] = ; Handler:[_,_,_] = ReadFromDisk  Compress  StoreInCache;
  14. 14. Supporting Concurrency <ul><li>Many clients = concurrent flows </li></ul><ul><ul><li>Must keep cache consistent </li></ul></ul><ul><li>Atomicity constraints </li></ul><ul><ul><li>Same name = mutual exclusion (2PL) </li></ul></ul><ul><ul><li>Apply to nodes or whole flow (abstract node) </li></ul></ul>atomic CheckCache {  }; atomic Complete {  ,  }; atomic StoreInCache {  }; Listen ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit handler handler
  15. 15. More Atomicity <ul><li>Reader / writer constraints </li></ul><ul><ul><li>Multiple readers or single writer (default) </li></ul></ul><ul><ul><ul><li>atomic ReadList: {listAccess ? }; </li></ul></ul></ul><ul><ul><ul><li>atomic AddToList: {listAccess ! }; </li></ul></ul></ul><ul><li>Per-session constraints </li></ul><ul><ul><li>User-supplied function ≈ hash on source </li></ul></ul><ul><ul><ul><li>Added to flow ≈ chooses from array of locks </li></ul></ul></ul>atomic AddHasChunk: {chunks (session) };
  16. 16. Preventing Deadlock <ul><li>Naïve execution can deadlock </li></ul><ul><li>Establish canonical lock order </li></ul><ul><ul><li>Partial order </li></ul></ul><ul><ul><li>Alphabetic by name </li></ul></ul>atomic A: {z,y}; atomic B: {y,z}; atomic A: {y,z}; atomic B: {y,z};
  17. 17. Preventing Deadlock, II <ul><li>Harder with abstract nodes </li></ul>A = B; C = D; atomic A:{z}; atomic B:{y}; atomic C:{y,z}; A = B; C = D; atomic A:{ y ,z}; atomic B:{y}; atomic C:{y,z}; <ul><li>Solution: Elevate constraints; fixed point </li></ul>B A C B A:{z} C B A:{z} C:{y} B A:{y,z} C
  18. 18. Almost Complete Flux Image Server <ul><li>Concise, readable expression of server logic </li></ul><ul><ul><li>No threads, etc.: simplifies programming, debugging </li></ul></ul>Listen image server source Listen  Image; Image = ReadRequest  CheckCache  Handler  Write  Complete; Handler[_,_,hit] = ; Handler[_,_,_] = ReadFromDisk  Compress  StoreInCache; atomic CheckCache: {cacheLock}; atomic StoreInCache: {cacheLock}; atomic Complete: {cacheLock}; handle error ReadInFromDisk  FourOhFour; ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit handler handler
  19. 19. Flux Outline <ul><li>Intro to Flux: building a server </li></ul><ul><ul><li>Components, flow </li></ul></ul><ul><ul><li>Atomicity, deadlock avoidance </li></ul></ul><ul><li>Performance results </li></ul><ul><ul><li>Server performance </li></ul></ul><ul><ul><li>Performance prediction </li></ul></ul><ul><li>Future work </li></ul>
  20. 20. Flux Results <ul><li>Four servers: </li></ul><ul><ul><li>Image server [23] </li></ul></ul><ul><ul><ul><li>+ libjpeg </li></ul></ul></ul><ul><ul><li>Multi-player game [54] </li></ul></ul><ul><ul><li>BitTorrent [84] </li></ul></ul><ul><ul><ul><li>2 undergrads: 1 week! </li></ul></ul></ul><ul><ul><li>Web server [36] </li></ul></ul><ul><ul><ul><li>+ PHP </li></ul></ul></ul><ul><li>Evaluation </li></ul><ul><ul><li>Benchmark: variant of SPECweb99 </li></ul></ul><ul><ul><li>Compared to Capriccio [SOSP03] , SEDA [SOSP01] </li></ul></ul>ReadRequest ReadRequest ReadRequest CheckCache Compress CheckCache CheckCache CheckCache Write StoreInCache thread-per-connection event-driven thread pool ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit handler handler ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit handler handler
  21. 21. Web Server
  22. 22. Performance Prediction observed parameters
  23. 23. Performance Prediction observed parameters
  24. 24. Flux Conclusion <ul><li>Flux language & system </li></ul><ul><ul><li>Concurrency made easier </li></ul></ul><ul><ul><li>Build high-performance servers from sequential parts </li></ul></ul><ul><ul><ul><li>Deadlock-free </li></ul></ul></ul><ul><ul><li>Predict & debug performance before deployment </li></ul></ul>
  25. 25. Future Work: eFlux <ul><li>Wood turtle ( Clemmys insculpta ) </li></ul><ul><li>eFlux : language for perpetual computing </li></ul><ul><ul><li>Sensors ≈ client-server! </li></ul></ul><ul><ul><li>Energy-aware language </li></ul></ul><ul><ul><ul><li>Flows decorated with power states (e.g., “high”, “low”) </li></ul></ul></ul><ul><ul><ul><li>Provide different levels of service depending on available & predicted energy </li></ul></ul></ul>
  26. 26. DieHard: Probabilistic Memory Safety for Unsafe Programming Languages Joint work with Ben Zorn (Microsoft Research)
  27. 27. Problems with Unsafe Languages <ul><li>C, C++: pervasive apps, but langs. memory unsafe </li></ul><ul><li>Numerous opportunities for security vulnerabilities, errors </li></ul><ul><ul><li>Double free </li></ul></ul><ul><ul><li>Invalid free </li></ul></ul><ul><ul><li>Uninitialized reads </li></ul></ul><ul><ul><li>Dangling pointers </li></ul></ul><ul><ul><li>Buffer overflows (stack & heap ) </li></ul></ul>
  28. 28. Current Approaches <ul><li>Unsound, may work or abort </li></ul><ul><ul><li>Windows, GNU libc, etc., Rx [Zhou] </li></ul></ul><ul><li>Unsound, will definitely continue </li></ul><ul><ul><li>Failure oblivious [Rinard] </li></ul></ul><ul><li>Sound, definitely aborts (fail-safe) </li></ul><ul><ul><li>CCured [Necula] , CRED [Ruwase & Lam], SAFECode [Dhurjati, Kowshik & Adve], &c. </li></ul></ul><ul><ul><ul><li>Slowdowns: 30% - 20X </li></ul></ul></ul><ul><ul><ul><li>Requires C source, programmer intervention </li></ul></ul></ul><ul><ul><ul><li>Garbage collection or partially sound (pools) </li></ul></ul></ul><ul><ul><li>Good for debugging , less for deployment </li></ul></ul>
  29. 29. Soundness for “Erroneous” Programs <ul><li>Normally: memory errors ) ? … </li></ul><ul><li>Consider infinite-heap allocator: </li></ul><ul><ul><li>All new s fresh ; ignore delete </li></ul></ul><ul><ul><ul><li>No dangling pointers, invalid frees, double frees </li></ul></ul></ul><ul><ul><li>Every object infinitely large </li></ul></ul><ul><ul><ul><li>No buffer overflows, data overwrites </li></ul></ul></ul><ul><li>Transparent to correct program </li></ul><ul><li>“ Erroneous” programs sound </li></ul>
  30. 30. Probabilistic Memory Safety <ul><li>Approximate  with M -heaps (e.g., M=2) </li></ul><ul><li>Naïve: pad allocations, defer deallocations </li></ul><ul><ul><li>No protection from larger overflows </li></ul></ul><ul><ul><ul><li>pad = 8 bytes, overflow = 9 bytes… </li></ul></ul></ul><ul><ul><li>Deterministic : overflow crashes everyone </li></ul></ul><ul><li>DieHard : randomize M-heap </li></ul><ul><ul><li>Probabilistic memory safety </li></ul></ul><ul><ul><ul><li>Independent across heaps </li></ul></ul></ul><ul><ul><li>Efficient implementation… </li></ul></ul>
  31. 31. Implementation Choices <ul><li>Conventional, freelist-based heaps </li></ul><ul><ul><li>Hard to randomize, protect from errors </li></ul></ul><ul><ul><ul><li>Double frees, heap corruption </li></ul></ul></ul><ul><li>What about bitmaps? [Wilson90] </li></ul><ul><ul><li>Catastrophic fragmentation </li></ul></ul><ul><ul><ul><li>Each small object likely to occupy one page </li></ul></ul></ul>obj obj obj obj pages
  32. 32. Randomized Heap Layout <ul><li>Bitmap-based, segregated size classes </li></ul><ul><ul><li>Bit represents one object of given size </li></ul></ul><ul><ul><ul><li>i.e., one bit = 2 i+3 bytes, etc. </li></ul></ul></ul><ul><ul><li>Prevents fragmentation </li></ul></ul>00000001 1010 10 size = 2 i+3 2 i+4 2 i+5 metadata heap
  33. 33. Randomized Allocation <ul><li>malloc(8) : </li></ul><ul><ul><li>compute size class = ceil(log 2 sz) – 3 </li></ul></ul><ul><ul><li>randomly probe bitmap for zero-bit (free) </li></ul></ul><ul><li>Fast: runtime O(1) </li></ul><ul><ul><li>M=2 ) E[# of probes] · 2 </li></ul></ul>00000001 1010 10 size = 2 i+3 2 i+4 2 i+5 metadata heap
  34. 34. <ul><li>malloc(8) : </li></ul><ul><ul><li>compute size class = ceil(log 2 sz) – 3 </li></ul></ul><ul><ul><li>randomly probe bitmap for zero-bit (free) </li></ul></ul><ul><li>Fast: runtime O(1) </li></ul><ul><ul><li>M=2 ) E[# of probes] · 2 </li></ul></ul>Randomized Allocation 00010001 1010 10 size = 2 i+3 2 i+4 2 i+5 metadata heap
  35. 35. <ul><li>free(ptr) : </li></ul><ul><ul><li>Ensure object valid – aligned to right address </li></ul></ul><ul><ul><li>Ensure allocated – bit set </li></ul></ul><ul><ul><li>Resets bit </li></ul></ul><ul><li>Prevents invalid frees, double frees </li></ul>Randomized Deallocation 00010001 1010 10 size = 2 i+3 2 i+4 2 i+5 metadata heap
  36. 36. Randomized Deallocation <ul><li>free(ptr) : </li></ul><ul><ul><li>Ensure object valid – aligned to right address </li></ul></ul><ul><ul><li>Ensure allocated – bit set </li></ul></ul><ul><ul><li>Resets bit </li></ul></ul><ul><li>Prevents invalid frees, double frees </li></ul>00010001 1010 10 size = 2 i+3 2 i+4 2 i+5 metadata heap
  37. 37. <ul><li>free(ptr) : </li></ul><ul><ul><li>Ensure object valid – aligned to right address </li></ul></ul><ul><ul><li>Ensure allocated – bit set </li></ul></ul><ul><ul><li>Resets bit </li></ul></ul><ul><li>Prevents invalid frees, double frees </li></ul>Randomized Deallocation 000 0 0001 1010 10 size = 2 i+3 2 i+4 2 i+5 metadata heap
  38. 38. Randomized Heaps & Reliability <ul><li>Objects randomly spread across heap </li></ul><ul><li>Different run = different heap </li></ul><ul><ul><li>Errors across heaps independent </li></ul></ul>2 3 4 5 3 1 6 object size = 2 i+4 object size = 2 i+3 … My Mozilla: “malignant” overflow Your Mozilla: “benign” overflow 1 1 6 3 2 5 4 …
  39. 39. DieHard software architecture <ul><li>“ Output equivalent” – kill failed replicas </li></ul>broadcast vote input output execute replicas (separate processes) <ul><li>Each replica has different allocator </li></ul>replica 3 seed 3 replica 1 seed 1 replica 2 seed 2
  40. 40. DieHard Results <ul><li>Analytical results (pictures!) </li></ul><ul><ul><li>Buffer overflows </li></ul></ul><ul><ul><li>Uninitialized reads </li></ul></ul><ul><ul><li>Dangling pointer errors (the best) </li></ul></ul><ul><li>Empirical results </li></ul><ul><ul><li>Runtime overhead </li></ul></ul><ul><ul><li>Error avoidance </li></ul></ul><ul><ul><ul><li>Injected faults & actual applications </li></ul></ul></ul>
  41. 41. Analytical Results: Buffer Overflows <ul><li>Model overflow as write of live data </li></ul><ul><ul><li>Heap half full (max occupancy) </li></ul></ul>
  42. 42. Analytical Results: Buffer Overflows <ul><li>Model overflow as write of live data </li></ul><ul><ul><li>Heap half full (max occupancy) </li></ul></ul>
  43. 43. Analytical Results: Buffer Overflows <ul><li>Model overflow: random write of live data </li></ul><ul><ul><li>Heap half full (max occupancy) </li></ul></ul>
  44. 44. Analytical Results: Buffer Overflows <ul><li>Replicas: Increase odds of avoiding overflow in at least one replica </li></ul>replicas
  45. 45. Analytical Results: Buffer Overflows <ul><li>Replicas: Increase odds of avoiding overflow in at least one replica </li></ul>replicas
  46. 46. Analytical Results: Buffer Overflows <ul><li>Replicas: Increase odds of avoiding overflow in at least one replica </li></ul>replicas <ul><li>P(Overflow in all replicas) = (1/2) 3 = 1/8 </li></ul><ul><li>P(No overflow in ¸ 1 replica) = 1-(1/2) 3 = 7/8 </li></ul>
  47. 47. Analytical Results: Buffer Overflows <ul><li>F = free space </li></ul><ul><li>H = heap size </li></ul><ul><li>N = # objects worth of overflow </li></ul><ul><li>k = replicas </li></ul><ul><li>Overflow one object </li></ul>
  48. 48. Empirical Results: Runtime
  49. 49. Empirical Results: Runtime
  50. 50. Empirical Results: Error Avoidance <ul><li>Injected faults: </li></ul><ul><ul><li>Dangling pointers ( @ 50%, 10 allocations) </li></ul></ul><ul><ul><ul><li>glibc : crashes ; DieHard : 9/10 correct </li></ul></ul></ul><ul><ul><li>Overflows (@1%, 4 bytes over) – </li></ul></ul><ul><ul><ul><li>glibc : crashes 9/10, inf loop ; DieHard : 10/10 correct </li></ul></ul></ul><ul><li>Real faults: </li></ul><ul><ul><li>Avoids Squid web cache overflow </li></ul></ul><ul><ul><ul><li>Crashes BDW & glibc </li></ul></ul></ul><ul><ul><li>Avoids dangling pointer error in Mozilla </li></ul></ul><ul><ul><ul><li>DoS in glibc & Windows </li></ul></ul></ul>
  51. 51. DieHard Conclusion <ul><li>Randomization + replicas = probabilistic memory safety </li></ul><ul><ul><li>Improves over today (0%) </li></ul></ul><ul><ul><li>Useful point between absolute soundness (fail-stop) and unsound </li></ul></ul><ul><ul><li>Future work – locate & fix errors automatically </li></ul></ul><ul><li>Trades hardware resources (RAM,CPU) for reliability </li></ul><ul><ul><li>Hardware trends </li></ul></ul><ul><ul><ul><li>Larger memories, multi-core CPUs </li></ul></ul></ul><ul><ul><li>Follows in footsteps of ECC memory, RAID </li></ul></ul>
  52. 52. The End <ul><li>http://www.cs.umass.edu/~emery/diehard </li></ul><ul><ul><li>Linux, Solaris (stand-alone & replicated) </li></ul></ul><ul><ul><li>Windows (stand-alone only) </li></ul></ul>flux : from Latin fluxus, p.p. of fluere = “to flow” http://flux.cs.umass.edu <ul><ul><li>Hosted by Flux web server </li></ul></ul><ul><ul><li>Download via Flux BitTorrent </li></ul></ul>
  53. 53. Backup
  54. 54. Handling Errors <ul><li>What if image requested doesn’t exist? </li></ul><ul><ul><li>Error = negative return value from component </li></ul></ul><ul><ul><li>Remember – nodes oblivious to Flux </li></ul></ul><ul><li>Solution: error handlers </li></ul><ul><ul><li>Go to alternate paths on error </li></ul></ul><ul><ul><li>Possible extension – can match on error paths </li></ul></ul>Listen FourOhFour handle error ReadInFromDisk  FourOhFour; ReadRequest ReadInFromDisk Write CheckCache Compress StoreInCache Complete hit handler handler
  55. 55. Flux Outline <ul><li>Intro to Flux: building a server </li></ul><ul><ul><li>Components </li></ul></ul><ul><ul><li>Flows </li></ul></ul><ul><ul><li>Atomicity </li></ul></ul><ul><li>Performance results </li></ul><ul><ul><li>Server performance </li></ul></ul><ul><ul><li>Performance prediction </li></ul></ul><ul><li>Future work </li></ul>
  56. 56. Probabilistic Memory Safety <ul><li>Fully-randomized memory manager </li></ul><ul><ul><li>Increases odds of benign memory errors </li></ul></ul><ul><ul><li>Ensures independent heaps across users </li></ul></ul><ul><li>Replication </li></ul><ul><ul><li>Run multiple replicas simultaneously, vote on results </li></ul></ul><ul><ul><ul><li>Detects crashing & non-crashing errors </li></ul></ul></ul>DieHard: correct execution in face of errors with high probability

×