SlideShare a Scribd company logo
Deterministic Memory Abstraction and
Supporting Multicore System Architecture
Farzad Farshchi$, Prathap Kumar Valsan^, Renato Mancuso*, Heechul Yun$
$ University of Kansas, ^ Intel, * Boston University
Multicore Processors in CPS
• Provide high computing performance needed for intelligent CPS
• Allow consolidation reducing cost, size, weight, and power.
2
Challenge: Shared Memory Hierarchy
3
• Memory performance varies widely due to interference
• Task WCET can be extremely pessimistic
Core1 Core2 Core3 Core4
Memory Controller (MC)
Shared Cache
DRAM
Task 1 Task 2 Task 3 Task 4
I D I D I D I D
Challenge: Shared Memory Hierarchy
• Many shared resources: cache space, MSHRs, dram banks, MC buffers, …
• Each optimized for performance with no high-level insight
• Very poor worst-case behavior: >10X, >100X observed in real platforms
• even after cache partitioning is applied.
4
DRAM
LLC
Core1 Core2 Core3 Core4
subject co-runner(s)
IsolBench EEMBC, SD-VBS on Tegra TK1
- P. Valsan, et al., “Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems.” RTAS, 2016
- P. Valsan, et al., “Addressing Isolation Challenges of Non-blocking Caches for Multicore Real-Time Systems.” Real-time Systems Journal. 2017
The Virtual Memory Abstraction
• Program’s logical view of memory
• No concept of timing
• OS maps any available physical memory blocks
• Hardware treats all memory requests as same
• But some memory may be more important
• E.g., code and data memory in a time critical control loop
• Prevents OS and hardware from making informed allocation and
scheduling decisions that affect memory timing
5
New memory abstraction is needed!
Silberschatz et al.,
“Operating System Concepts.”
Outline
• Motivation
• Our Approach
• Deterministic Memory Abstraction
• DM-Aware Multicore System Design
• Implementation
• Evaluation
• Conclusion
6
Deterministic Memory (DM) Abstraction
• Cross-layer memory abstraction with bounded worst-case timing
• Focusing on tightly bounded inter-core interference
• Best-effort memory = conventional memory abstraction
7
Inter-core
interference
Inherent
timing
legend
Best-effort
memory
Deterministic
memory
Worst-case
memory
delay
DM-Aware Resource Management Strategies
• Deterministic memory: Optimize for time predictability
• Best effort memory: Optimize for average performance
8
Space allocation Request scheduling WCET bounds
Deterministic
memory
Dedicated resources Predictability
focused
Tight
Best-effort memory Shared resources Performance
focused
Pessimistic
Space allocation: e.g., cache space, DRAM banks
Request scheduling: e.g., DRAM controller, shared buses
System Design: Overview
• Declare all or part of RT task’s address space as deterministic memory
• End-to-end DM-aware resource management: from OS to hardware
• Assumption: partitioned fixed-priority real-time CPU scheduling
9
Application view (logical) System-level view (physical)
Deterministic
memory
Best-effort
memory
Deterministic Memory-Aware Memory Hierarchy
Core1 Core2 Core3 Core4
W
5
W
1
W
2
W
3
W
4
I D I D I D I D
B
1
B
2
B
3
B
4
B
5
B
6
B
7
B
8
DRAM banks
Cache ways
OS and Architecture Extensions for DM
• OS updates the DM bit in each page table entry (PTE)
• MMU/TLB and bus carries the DM bit.
• Shared memory hierarchy (cache, memory ctrl.) uses the DM bit.
10
OS
MMU/TLB
Page table entry
DM bit=1|0
LLC MCPaddr, DM Paddr, DM
Paddr, DM
Vaddr DRAM cmd/addr.
hardware
software
DM-Aware Shared Cache
• Based on cache way partitioning
• Improve space utilization via DM-aware cache replacement algorithm
• Deterministic memory: allocated on its own dedicated partition
• Best-effort memory: allocated on any non-DM lines in any partition.
• Cleanup: DM lines are recycled as best-effort lines at each OS context switch
11
Way 0 Way 1 Way 2 Way 3
Core 0
partition
Core 1
partition
best-effort line (any core, DM=0)
deterministic line (Core0, DM=1)
deterministic line (Core1, DM=1)
Way 4
shared
partition
Set 0
Set 1
Set 2
Set 3
0
DM Tag Line data
DM-Aware Memory (DRAM) Controller
• OS allocates DM pages on reserved banks, others on shared banks
• Memory-controller implements two-level scheduling (*)
12
Determinism
focused scheduling
(Round-Robin)
Throughput focused
scheduling
(FR-FCFS)
yes noDeterministic
memory request
exist?
(a) Memory controller (MC) architecture (b) Scheduling algorithm
(*) P. Valsan, et al., “MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore based Embedded Systems.” CPSNA, 2015
Implementation
• Fully functional end-to-end implementation in Linux and Gem5.
• Linux kernel extensions
• Use an unused bit combination in ARMv7 PTE to indicate a DM page.
• Modified the ELF loader and system calls to declare DM memory regions
• DM-aware page allocator, replacing the buddy allocator, based on PALLOC (*)
• Dedicated DRAM banks for DM are configured through Linux’s CGROUP interface.
ARMv7 page table entry (PTE)
13
(*) H. Yun et. al, “PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms.” RTAS’14
Implementation
• Gem5 (a cycle-accurate full-system simulator) extensions
• MMU/TLB: Add DM bit support
• Bus: Add the DM bit in each bus transaction
• Cache: implement way-partitioning and DM-aware replacement algorithm
• DRAM controller: implement DM-aware two-level scheduling algorithm.
• Hardware implementability
• MMU/TLB, Bus: adding 1 bit is not difficult. (e.g., Use AXI bus QoS bits)
• Cache and DRAM controllers: logic change and additional storage are
minimal.
14
Outline
• Motivation
• Our Approach
• Deterministic Memory Abstraction
• DM-Aware Multicore System Design
• Implementation
• Evaluation
• Conclusion
15
Simulation Setup
• Baseline Gem5 full-system simulator
• 4 out-of-order cores, 2 GHz
• 2 MB Shared L2 (16 ways)
• LPDDR2@533MHz, 1 rank, 8 banks
• Linux 3.14 (+DM support)
• Workload
• EEMBC, SD-VBS, SPEC2006, IsolBench
16
Real-Time Benchmark Characteristics
• Critical pages: pages accessed in the main loop
• Critical (T98/T90): pages accounting 98/90% L1 misses of all L1 misses of the critical pages.
• Only 38% of pages are critical pages
• Some pages contribute more to L1 misses (hence shared L2 accesses)
• Not all memory is equally important
svm L1 miss distribution
17
Effects of DM-Aware Cache
• Questions
• Does it provide strong cache isolation for real-time tasks using DM?
• Does it improve cache space utilization for best-effort tasks?
• Setup
• RT: EEMBC, SD-VBS, co-runners: 3x Bandwidth
• Comparisons
• NoP: free-for-all sharing. No partitioning
• WP: cache way partitioning (4 ways/core)
• DM(A): all critical pages of a benchmark are marked as DM
• DM(T98): critical pages accounting 98% L1 misses are marked as DM
• DM(T90): critical pages accounting 98% L1 misses are marked as DM
18
DRAM
LLC
Core1 Core2 Core3 Core4
RT co-runner(s)
Effects of DM-Aware Cache
• Does it provide strong cache isolation for real-time tasks using DM?
• Yes. DM(A) and WP (way-partitioning) offer equivalent isolation.
• Does it improve cache space utilization for best-effort tasks?
• Yes. DM(A) uses only 50% cache partition of WP.
19
Effects of DM-Aware DRAM Controller
• Questions
• Does it provide strong isolation for real-time tasks using DM?
• Does it reduce reserved DRAM bank space?
• Setup
• RT: SD-VBS (input: CIF), co-runners: 3x Bandwidth
• Comparisons
• BA & FR-FCFS: Linux’s default buddy allocator + FR-FCFS scheduling in MC
• DM(A): DM on private DRAM banks + two-level scheduling in MC
• DM(T98): same as DM(A), except pages accounting 98% L1 misses are DM
• DM(T90): same as DM(A), except pages accounting 90% L1 misses are DM
20
Effects of DM-Aware DRAM Controller
• Does it provide strong isolation for real-time tasks using DM?
• Yes. BA&FR-FCFS suffers 5.7X slowdown.
• Does it reduce reserved DRAM bank space?
• Yes. Only 51% of pages are marked deterministic in DM(T90)
21
Conclusion
• Challenge
• Balancing performance and predictability in multicore real-time systems
• Memory timing is important to WCET
• The current memory abstraction is limiting: no concept of timing.
• Deterministic Memory Abstraction
• Memory with tightly bounded worst-case timing.
• Enable predictable and high-performance multicore systems
• DM-aware multicore system designs
• OS, MMU/TLB, bus support
• DM-aware cache and DRAM controller designs
• Implemented and evaluated in Linux kernel and gem5
• Availability
• https://github.com/CSL-KU/detmem
22
Ongoing/Future Work
• SoC implementation in FPGA
• Based on open-source RISC-V quad-core SoC
• Basic DM support in bus protocol and Linux
• Implementing DM-aware cache and DRAM controllers
• Tool support and other applications
• Finding “optimal” deterministic memory blocks
• Better timing analysis integration (initial work in the paper)
• Closing micro-architectural side-channels.
23
Thank You
Disclaimer:
Our research has been supported by the National Science
Foundation (NSF) under the grant number CNS 1718880
24

More Related Content

What's hot

Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
Mazin Alwaaly
 
Memory management
Memory managementMemory management
Memory management
Vishal Singh
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory management
Mukesh Chinta
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
Peter Tröger
 
Windows memory management
Windows memory managementWindows memory management
Windows memory management
Tech_MX
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating System
Raj Mohan
 
ppt
pptppt
cache
cachecache
OSCh9
OSCh9OSCh9
Memory Management
Memory ManagementMemory Management
Memory Management
Ramasubbu .P
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
Wayne Jones Jnr
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systems
vampugani
 
Memory management OS
Memory management OSMemory management OS
Memory management OS
UmeshchandraYadav5
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
Rao Mubashar
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
Vitaly Nahshunov
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
C.U
 
VIRTUAL MEMORY
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
Kamran Ashraf
 
Memory managment
Memory managmentMemory managment
Memory managment
Shahbaz Khan
 
Operating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementOperating Systems Part III-Memory Management
Operating Systems Part III-Memory Management
Ajit Nayak
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating Systems
Drishti Bhalla
 

What's hot (20)

Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Memory management
Memory managementMemory management
Memory management
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory management
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Windows memory management
Windows memory managementWindows memory management
Windows memory management
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating System
 
ppt
pptppt
ppt
 
cache
cachecache
cache
 
OSCh9
OSCh9OSCh9
OSCh9
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systems
 
Memory management OS
Memory management OSMemory management OS
Memory management OS
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
VIRTUAL MEMORY
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
 
Memory managment
Memory managmentMemory managment
Memory managment
 
Operating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementOperating Systems Part III-Memory Management
Operating Systems Part III-Memory Management
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating Systems
 

Similar to Deterministic Memory Abstraction and Supporting Multicore System Architecture

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Heechul Yun
 
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsParallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Heechul Yun
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
jaxconf
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
shinolajla
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
2022002857mbit
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
Sisimon Soman
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...
Heechul Yun
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
Heechul Yun
 
Memory
MemoryMemory
Spectrum Scale Memory Usage
Spectrum Scale Memory UsageSpectrum Scale Memory Usage
Spectrum Scale Memory Usage
Tomer Perry
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
Heechul Yun
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
shinolajla
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
ssuser30e7d2
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
Takahiro Hirofuchi
 
Memory (Computer Organization)
Memory (Computer Organization)Memory (Computer Organization)
Memory (Computer Organization)
JyotiprakashMishra18
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
officeaiotfab
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
Niranjana Ambadi
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Kinetic basho public
Kinetic basho publicKinetic basho public
Kinetic basho public
Anton Gerasimov
 

Similar to Deterministic Memory Abstraction and Supporting Multicore System Architecture (20)

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsParallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
 
Memory
MemoryMemory
Memory
 
Spectrum Scale Memory Usage
Spectrum Scale Memory UsageSpectrum Scale Memory Usage
Spectrum Scale Memory Usage
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Memory (Computer Organization)
Memory (Computer Organization)Memory (Computer Organization)
Memory (Computer Organization)
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Kinetic basho public
Kinetic basho publicKinetic basho public
Kinetic basho public
 

Recently uploaded

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 

Recently uploaded (20)

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 

Deterministic Memory Abstraction and Supporting Multicore System Architecture

  • 1. Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi$, Prathap Kumar Valsan^, Renato Mancuso*, Heechul Yun$ $ University of Kansas, ^ Intel, * Boston University
  • 2. Multicore Processors in CPS • Provide high computing performance needed for intelligent CPS • Allow consolidation reducing cost, size, weight, and power. 2
  • 3. Challenge: Shared Memory Hierarchy 3 • Memory performance varies widely due to interference • Task WCET can be extremely pessimistic Core1 Core2 Core3 Core4 Memory Controller (MC) Shared Cache DRAM Task 1 Task 2 Task 3 Task 4 I D I D I D I D
  • 4. Challenge: Shared Memory Hierarchy • Many shared resources: cache space, MSHRs, dram banks, MC buffers, … • Each optimized for performance with no high-level insight • Very poor worst-case behavior: >10X, >100X observed in real platforms • even after cache partitioning is applied. 4 DRAM LLC Core1 Core2 Core3 Core4 subject co-runner(s) IsolBench EEMBC, SD-VBS on Tegra TK1 - P. Valsan, et al., “Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems.” RTAS, 2016 - P. Valsan, et al., “Addressing Isolation Challenges of Non-blocking Caches for Multicore Real-Time Systems.” Real-time Systems Journal. 2017
  • 5. The Virtual Memory Abstraction • Program’s logical view of memory • No concept of timing • OS maps any available physical memory blocks • Hardware treats all memory requests as same • But some memory may be more important • E.g., code and data memory in a time critical control loop • Prevents OS and hardware from making informed allocation and scheduling decisions that affect memory timing 5 New memory abstraction is needed! Silberschatz et al., “Operating System Concepts.”
  • 6. Outline • Motivation • Our Approach • Deterministic Memory Abstraction • DM-Aware Multicore System Design • Implementation • Evaluation • Conclusion 6
  • 7. Deterministic Memory (DM) Abstraction • Cross-layer memory abstraction with bounded worst-case timing • Focusing on tightly bounded inter-core interference • Best-effort memory = conventional memory abstraction 7 Inter-core interference Inherent timing legend Best-effort memory Deterministic memory Worst-case memory delay
  • 8. DM-Aware Resource Management Strategies • Deterministic memory: Optimize for time predictability • Best effort memory: Optimize for average performance 8 Space allocation Request scheduling WCET bounds Deterministic memory Dedicated resources Predictability focused Tight Best-effort memory Shared resources Performance focused Pessimistic Space allocation: e.g., cache space, DRAM banks Request scheduling: e.g., DRAM controller, shared buses
  • 9. System Design: Overview • Declare all or part of RT task’s address space as deterministic memory • End-to-end DM-aware resource management: from OS to hardware • Assumption: partitioned fixed-priority real-time CPU scheduling 9 Application view (logical) System-level view (physical) Deterministic memory Best-effort memory Deterministic Memory-Aware Memory Hierarchy Core1 Core2 Core3 Core4 W 5 W 1 W 2 W 3 W 4 I D I D I D I D B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 DRAM banks Cache ways
  • 10. OS and Architecture Extensions for DM • OS updates the DM bit in each page table entry (PTE) • MMU/TLB and bus carries the DM bit. • Shared memory hierarchy (cache, memory ctrl.) uses the DM bit. 10 OS MMU/TLB Page table entry DM bit=1|0 LLC MCPaddr, DM Paddr, DM Paddr, DM Vaddr DRAM cmd/addr. hardware software
  • 11. DM-Aware Shared Cache • Based on cache way partitioning • Improve space utilization via DM-aware cache replacement algorithm • Deterministic memory: allocated on its own dedicated partition • Best-effort memory: allocated on any non-DM lines in any partition. • Cleanup: DM lines are recycled as best-effort lines at each OS context switch 11 Way 0 Way 1 Way 2 Way 3 Core 0 partition Core 1 partition best-effort line (any core, DM=0) deterministic line (Core0, DM=1) deterministic line (Core1, DM=1) Way 4 shared partition Set 0 Set 1 Set 2 Set 3 0 DM Tag Line data
  • 12. DM-Aware Memory (DRAM) Controller • OS allocates DM pages on reserved banks, others on shared banks • Memory-controller implements two-level scheduling (*) 12 Determinism focused scheduling (Round-Robin) Throughput focused scheduling (FR-FCFS) yes noDeterministic memory request exist? (a) Memory controller (MC) architecture (b) Scheduling algorithm (*) P. Valsan, et al., “MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore based Embedded Systems.” CPSNA, 2015
  • 13. Implementation • Fully functional end-to-end implementation in Linux and Gem5. • Linux kernel extensions • Use an unused bit combination in ARMv7 PTE to indicate a DM page. • Modified the ELF loader and system calls to declare DM memory regions • DM-aware page allocator, replacing the buddy allocator, based on PALLOC (*) • Dedicated DRAM banks for DM are configured through Linux’s CGROUP interface. ARMv7 page table entry (PTE) 13 (*) H. Yun et. al, “PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms.” RTAS’14
  • 14. Implementation • Gem5 (a cycle-accurate full-system simulator) extensions • MMU/TLB: Add DM bit support • Bus: Add the DM bit in each bus transaction • Cache: implement way-partitioning and DM-aware replacement algorithm • DRAM controller: implement DM-aware two-level scheduling algorithm. • Hardware implementability • MMU/TLB, Bus: adding 1 bit is not difficult. (e.g., Use AXI bus QoS bits) • Cache and DRAM controllers: logic change and additional storage are minimal. 14
  • 15. Outline • Motivation • Our Approach • Deterministic Memory Abstraction • DM-Aware Multicore System Design • Implementation • Evaluation • Conclusion 15
  • 16. Simulation Setup • Baseline Gem5 full-system simulator • 4 out-of-order cores, 2 GHz • 2 MB Shared L2 (16 ways) • LPDDR2@533MHz, 1 rank, 8 banks • Linux 3.14 (+DM support) • Workload • EEMBC, SD-VBS, SPEC2006, IsolBench 16
  • 17. Real-Time Benchmark Characteristics • Critical pages: pages accessed in the main loop • Critical (T98/T90): pages accounting 98/90% L1 misses of all L1 misses of the critical pages. • Only 38% of pages are critical pages • Some pages contribute more to L1 misses (hence shared L2 accesses) • Not all memory is equally important svm L1 miss distribution 17
  • 18. Effects of DM-Aware Cache • Questions • Does it provide strong cache isolation for real-time tasks using DM? • Does it improve cache space utilization for best-effort tasks? • Setup • RT: EEMBC, SD-VBS, co-runners: 3x Bandwidth • Comparisons • NoP: free-for-all sharing. No partitioning • WP: cache way partitioning (4 ways/core) • DM(A): all critical pages of a benchmark are marked as DM • DM(T98): critical pages accounting 98% L1 misses are marked as DM • DM(T90): critical pages accounting 98% L1 misses are marked as DM 18 DRAM LLC Core1 Core2 Core3 Core4 RT co-runner(s)
  • 19. Effects of DM-Aware Cache • Does it provide strong cache isolation for real-time tasks using DM? • Yes. DM(A) and WP (way-partitioning) offer equivalent isolation. • Does it improve cache space utilization for best-effort tasks? • Yes. DM(A) uses only 50% cache partition of WP. 19
  • 20. Effects of DM-Aware DRAM Controller • Questions • Does it provide strong isolation for real-time tasks using DM? • Does it reduce reserved DRAM bank space? • Setup • RT: SD-VBS (input: CIF), co-runners: 3x Bandwidth • Comparisons • BA & FR-FCFS: Linux’s default buddy allocator + FR-FCFS scheduling in MC • DM(A): DM on private DRAM banks + two-level scheduling in MC • DM(T98): same as DM(A), except pages accounting 98% L1 misses are DM • DM(T90): same as DM(A), except pages accounting 90% L1 misses are DM 20
  • 21. Effects of DM-Aware DRAM Controller • Does it provide strong isolation for real-time tasks using DM? • Yes. BA&FR-FCFS suffers 5.7X slowdown. • Does it reduce reserved DRAM bank space? • Yes. Only 51% of pages are marked deterministic in DM(T90) 21
  • 22. Conclusion • Challenge • Balancing performance and predictability in multicore real-time systems • Memory timing is important to WCET • The current memory abstraction is limiting: no concept of timing. • Deterministic Memory Abstraction • Memory with tightly bounded worst-case timing. • Enable predictable and high-performance multicore systems • DM-aware multicore system designs • OS, MMU/TLB, bus support • DM-aware cache and DRAM controller designs • Implemented and evaluated in Linux kernel and gem5 • Availability • https://github.com/CSL-KU/detmem 22
  • 23. Ongoing/Future Work • SoC implementation in FPGA • Based on open-source RISC-V quad-core SoC • Basic DM support in bus protocol and Linux • Implementing DM-aware cache and DRAM controllers • Tool support and other applications • Finding “optimal” deterministic memory blocks • Better timing analysis integration (initial work in the paper) • Closing micro-architectural side-channels. 23
  • 24. Thank You Disclaimer: Our research has been supported by the National Science Foundation (NSF) under the grant number CNS 1718880 24