SlideShare a Scribd company logo
1 of 29
Download to read offline
Fast Switching of Threads 
Between Cores 
Richard Strong & Dean Tullsen (University San Diego) 
Jayaram Mudigonda, Jeffrey C. Mogul & Nathan Binkert (HP Labs) 
Ruhaim Izmeth | MS14901218 
Nipuna Pannala | MS14902208
Introduction 
● Now we are in the MULTICORE era. 
● Multi Core CPUs enable inter core communication 
with less cost in the terms of Magnitude compared 
to the traditional multi processors. [This reduce the 
time for hardware to move migrating data working 
set] 
● But software cost for moving thread remain as high
Asymmetric Multicore Processor 
● Core – Core performance asymmetry appears to 
be very useful way to improve energy and area 
efficiency. 
● Relatively little performance cost, But greater 
throughput per watt. 
● Asymmetric Multicore Processor increases the 
need for frequent migration of threads between 
cores very efficiently.
Fast Switching of Threads between 
Cores 
● To get a good performance in switching 
threads, between cores 
○ OS scheduler needs to migrate thread from slow 
core to fast or ideal core. 
○ Also necessary to balance the load between 
cores.(In a symmetric or Asymmetric system) 
○ All thread execution time segments should be 
relatively short.
Simple Cores… 
● Normally simple Cores can be better match 
for memory-bound application code. 
○ Operating systems and OS like codes are typical 
memory bounded applications.
Thread Migration Techniques 
● Migration Mechanism 1 : Constantinou 
○ This mechanism considered verity of costs 
associated with thread migration, But primary 
focus about the threads in warming up (Caches 
and branch predictors) 
○ But this is not addressing the software cost to 
migrate threads between cores.
Thread Migration Techniques 
● Migration Mechanism 2 : Choi 
○ This mechanism specific case of migrating the 
branch predictor state when thread switches 
cores 
○ But this is not addressing the software overhead 
issues.
Thread Migration Techniques 
Shared Thread Multiprocessor: Brown & Tulsan 
● Hardware manage's the thread moments. 
● Thread State is represented in hardware and that is 
shared among the all cores in a chip. 
● Therefore hardware can move threads between 
cores without direct OS involvement.
Software Approaches to Core Switching 
•Core B is in IDLE state ? 
•Is there any thread to run on 
core A after T switching to B ? 
•Can ensure T is the most 
appropriate thread to run on B? 
Transfer architectural state of 
thread from A to B
Approaches used in the research 
● V1: Linux’s thread-migration mechanism 
● V2: Modified scheduler 
● V3: Scheduler fast-paths 
● V4: Addressing IPI costs 
● V5: Cross-core wakeup from quiesce
V1: Linux Thread Migration Mechanism 
● Normally using for relatively long-term load 
balancing across the cores. 
● Linux thread migration mechanism is the art 
of the core switching. 
● One thread is available to initiate the 
migration.
V1: Linux Thread Migration Mechanism 
● When task wants to migrate it puts itself on 
Per-Core Migration Queue. 
● If the target core is idle thread wakes up from 
per-core migration queue and move to the 
Run Queue of the target core. 
● After getting the approval from the target 
queue thread will execute in the target core.
V1: Linux Thread Migration Mechanism 
Cons... 
● This migration approach involves “Extra” 
context switch between initiating thread and 
migrating thread.
Linux Thread Migration Mechanism 
Increase Efficiency 
● To remove extra context switching, 
○ Threads can take migrating decisions by itself 
○ Centralize the thread status 
○ Increase the number of per core queues. 
○ Create Cross core signals
V2: Modified scheduler 
Core 0 
Run Queue 
N T Core 1 
Alternative 
Queue (AQ) 
T 
Run Queue 
T 
schedule() 
interrupt 
SwitchCore() 
Control Block : T 
Core : 1 
... 
1 
2 
3 
4 
5 
6 
7 
● Remove an extra context switch described in V1, 
● Initiate thread migrate by process itself.
V3: Scheduler fast-paths 
● The original modified schedule 
● A fast schedule source version (FSS), called to initiate a core switch, 
● A fast schedule target version (FST), called at the target core in response to the cross-core 
signal. 
FSS and FST omit a number of housekeeping functions normally done in 
schedule (eg: Priority calculation) 
FSS only makes a hint to FST, so no locking takes place 
FST has AQ check, FSS does not have AQ checks.
V3: Scheduler fast-paths
V4: Addressing inter-processor 
interrupt (IPI) costs 
Inter-processor interrupts are sent to ‘wake up’ polling 
or paused processors. 
Modified scheduler wakes up target core if idle. 
The “IPI sending code” modified to be more efficient as 
it sends the interrupts to all members of a specified 
set. 
schedule() is invoked on the target core with the 
interrupt
Modified System Calls 
Modified long 
running system 
calls to initiate 
CoreSwitch() 
Modified system calls : 
open,stat, read, write, 
readv, writev, select, 
poll, fsync, fdatasync, 
readfrom, sendto and 
sendfile. 
4096 bytes
Simulation Environment 
M5 Simulator used for generating detailed timelines, 
showing when interesting events such as procedure 
calls, cache misses, and long-latency instructions 
occur 
x86 models are not debugged with M5. 
Complex core : Alpha EV6 (21264), 64KB L1 
Simple core : EV4-based (21064), 8KB L1 
Simulated on shared L2 3.5 MBytes 
Main-memory access time of 25 nsec.
Simulation Environment - 
Configuration naming scheme 
sim_XXX - number of ‘x’ 
denote the number of 
processors 
eg: 
sim_c - single processor 
sim_sC - dual processor 
Prefix 750Mhz 3Ghz 
Complex c C 
Simple s S 
Tests run on Linux v 2.6.18 kernel 
Only one trial run per experiment, as the 
simulator is deterministic
Microbenchmark results 
Modified gettid() to call coreswitch() and run it 
N= 1,000,000 times in a tight loop
Cross-core wakeup from quiesce 
● idle loop polling is 
inefficient 
● initiating cross-CPU 
interrupt is slow as a 
powered down CPU 
needs to be awakened 
● Kernel should 
dynamically decide 
between spinlock and 
powering down based 
on recent history.
Macrobenchmark results - 
Web Benchmark
Macrobenchmark results - 
Database Benchmark 
Using “TPC-B-like” example from the Berkeley DB 
distribution 
Core switch done only on fdatasync() 
Eliminated disk I/O delays by using a RAM disk on the 
real hardware, and by setting the access time to zero 
in M5’s disk simulator.
Future Work 
● Energy measurement/savings benchmarks 
for the above tests 
● Determining the best core to switch to and 
the best time to switch in 
● Optimal mechanism to poll or power down a 
Processor
Summary 
● Cost of core switching is more important 
when use asymmetric multicores. 
● Core switching to slower OS cores on 
frequent, expensive system calls some times 
reduce performance 
○ But it also provide power down complex application 
cores.
References 
● J. Aas. Understanding the Linux 2.6.8.1 CPU Scheduler. http://josh.trancesoftware. 
com/linux/, Feb. 2005. 
● S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance 
Asymmetry in Emerging Multicore Architectures. In Proc. ISCA, pages 506–517, 
2005. 
● M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous 
Multiprocessor Architectures. J. Instruction Level Parallelism, pages 1–26, June 
2008. 
● N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G.Saidi, and S. K. Reinhardt. 
The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52–60, 2006. 
● D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level 
power analysis and optimizations. In Proc. ISCA, pages 83–94, Jun. 2000.
Q / A 
Thank You

More Related Content

What's hot

HKG15-100: What is Linaro working on - core development lightning talks
HKG15-100:  What is Linaro working on - core development lightning talksHKG15-100:  What is Linaro working on - core development lightning talks
HKG15-100: What is Linaro working on - core development lightning talksLinaro
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorZongYing Lyu
 
Remote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoRemote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoAndrea Lombardo
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingLinaro
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processorMuhammad Ishaq
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationHKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationLinaro
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachjemin lee
 
State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPCinside-BigData.com
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processornoor ul ain
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardHeechul Yun
 
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...jemin lee
 
Best Practices: Large Scale Multiphysics
Best Practices: Large Scale MultiphysicsBest Practices: Large Scale Multiphysics
Best Practices: Large Scale Multiphysicsinside-BigData.com
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architecturesAmit Kumar Rathi
 
Linux process management
Linux process managementLinux process management
Linux process managementRaghu nath
 
Scheduler activations
Scheduler activationsScheduler activations
Scheduler activationsVin Voro
 

What's hot (20)

HKG15-100: What is Linaro working on - core development lightning talks
HKG15-100:  What is Linaro working on - core development lightning talksHKG15-100:  What is Linaro working on - core development lightning talks
HKG15-100: What is Linaro working on - core development lightning talks
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processor
 
Remote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoRemote core locking-Andrea Lombardo
Remote core locking-Andrea Lombardo
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationHKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPC
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processor
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
 
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
Best Practices: Large Scale Multiphysics
Best Practices: Large Scale MultiphysicsBest Practices: Large Scale Multiphysics
Best Practices: Large Scale Multiphysics
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architectures
 
Linux process management
Linux process managementLinux process management
Linux process management
 
Scheduler activations
Scheduler activationsScheduler activations
Scheduler activations
 

Similar to Fast switching of threads between cores - Advanced Operating Systems

Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processorscsandit
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의DzH QWuynh
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors pptSiddhartha Anand
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
 
Q2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemQ2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemLinaro
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel acceleratorBaharJV
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 

Similar to Fast switching of threads between cores - Advanced Operating Systems (20)

Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 
Q2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemQ2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE system
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel accelerator
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 

Fast switching of threads between cores - Advanced Operating Systems

  • 1. Fast Switching of Threads Between Cores Richard Strong & Dean Tullsen (University San Diego) Jayaram Mudigonda, Jeffrey C. Mogul & Nathan Binkert (HP Labs) Ruhaim Izmeth | MS14901218 Nipuna Pannala | MS14902208
  • 2. Introduction ● Now we are in the MULTICORE era. ● Multi Core CPUs enable inter core communication with less cost in the terms of Magnitude compared to the traditional multi processors. [This reduce the time for hardware to move migrating data working set] ● But software cost for moving thread remain as high
  • 3. Asymmetric Multicore Processor ● Core – Core performance asymmetry appears to be very useful way to improve energy and area efficiency. ● Relatively little performance cost, But greater throughput per watt. ● Asymmetric Multicore Processor increases the need for frequent migration of threads between cores very efficiently.
  • 4. Fast Switching of Threads between Cores ● To get a good performance in switching threads, between cores ○ OS scheduler needs to migrate thread from slow core to fast or ideal core. ○ Also necessary to balance the load between cores.(In a symmetric or Asymmetric system) ○ All thread execution time segments should be relatively short.
  • 5. Simple Cores… ● Normally simple Cores can be better match for memory-bound application code. ○ Operating systems and OS like codes are typical memory bounded applications.
  • 6. Thread Migration Techniques ● Migration Mechanism 1 : Constantinou ○ This mechanism considered verity of costs associated with thread migration, But primary focus about the threads in warming up (Caches and branch predictors) ○ But this is not addressing the software cost to migrate threads between cores.
  • 7. Thread Migration Techniques ● Migration Mechanism 2 : Choi ○ This mechanism specific case of migrating the branch predictor state when thread switches cores ○ But this is not addressing the software overhead issues.
  • 8. Thread Migration Techniques Shared Thread Multiprocessor: Brown & Tulsan ● Hardware manage's the thread moments. ● Thread State is represented in hardware and that is shared among the all cores in a chip. ● Therefore hardware can move threads between cores without direct OS involvement.
  • 9. Software Approaches to Core Switching •Core B is in IDLE state ? •Is there any thread to run on core A after T switching to B ? •Can ensure T is the most appropriate thread to run on B? Transfer architectural state of thread from A to B
  • 10. Approaches used in the research ● V1: Linux’s thread-migration mechanism ● V2: Modified scheduler ● V3: Scheduler fast-paths ● V4: Addressing IPI costs ● V5: Cross-core wakeup from quiesce
  • 11. V1: Linux Thread Migration Mechanism ● Normally using for relatively long-term load balancing across the cores. ● Linux thread migration mechanism is the art of the core switching. ● One thread is available to initiate the migration.
  • 12. V1: Linux Thread Migration Mechanism ● When task wants to migrate it puts itself on Per-Core Migration Queue. ● If the target core is idle thread wakes up from per-core migration queue and move to the Run Queue of the target core. ● After getting the approval from the target queue thread will execute in the target core.
  • 13. V1: Linux Thread Migration Mechanism Cons... ● This migration approach involves “Extra” context switch between initiating thread and migrating thread.
  • 14. Linux Thread Migration Mechanism Increase Efficiency ● To remove extra context switching, ○ Threads can take migrating decisions by itself ○ Centralize the thread status ○ Increase the number of per core queues. ○ Create Cross core signals
  • 15. V2: Modified scheduler Core 0 Run Queue N T Core 1 Alternative Queue (AQ) T Run Queue T schedule() interrupt SwitchCore() Control Block : T Core : 1 ... 1 2 3 4 5 6 7 ● Remove an extra context switch described in V1, ● Initiate thread migrate by process itself.
  • 16. V3: Scheduler fast-paths ● The original modified schedule ● A fast schedule source version (FSS), called to initiate a core switch, ● A fast schedule target version (FST), called at the target core in response to the cross-core signal. FSS and FST omit a number of housekeeping functions normally done in schedule (eg: Priority calculation) FSS only makes a hint to FST, so no locking takes place FST has AQ check, FSS does not have AQ checks.
  • 18. V4: Addressing inter-processor interrupt (IPI) costs Inter-processor interrupts are sent to ‘wake up’ polling or paused processors. Modified scheduler wakes up target core if idle. The “IPI sending code” modified to be more efficient as it sends the interrupts to all members of a specified set. schedule() is invoked on the target core with the interrupt
  • 19. Modified System Calls Modified long running system calls to initiate CoreSwitch() Modified system calls : open,stat, read, write, readv, writev, select, poll, fsync, fdatasync, readfrom, sendto and sendfile. 4096 bytes
  • 20. Simulation Environment M5 Simulator used for generating detailed timelines, showing when interesting events such as procedure calls, cache misses, and long-latency instructions occur x86 models are not debugged with M5. Complex core : Alpha EV6 (21264), 64KB L1 Simple core : EV4-based (21064), 8KB L1 Simulated on shared L2 3.5 MBytes Main-memory access time of 25 nsec.
  • 21. Simulation Environment - Configuration naming scheme sim_XXX - number of ‘x’ denote the number of processors eg: sim_c - single processor sim_sC - dual processor Prefix 750Mhz 3Ghz Complex c C Simple s S Tests run on Linux v 2.6.18 kernel Only one trial run per experiment, as the simulator is deterministic
  • 22. Microbenchmark results Modified gettid() to call coreswitch() and run it N= 1,000,000 times in a tight loop
  • 23. Cross-core wakeup from quiesce ● idle loop polling is inefficient ● initiating cross-CPU interrupt is slow as a powered down CPU needs to be awakened ● Kernel should dynamically decide between spinlock and powering down based on recent history.
  • 24. Macrobenchmark results - Web Benchmark
  • 25. Macrobenchmark results - Database Benchmark Using “TPC-B-like” example from the Berkeley DB distribution Core switch done only on fdatasync() Eliminated disk I/O delays by using a RAM disk on the real hardware, and by setting the access time to zero in M5’s disk simulator.
  • 26. Future Work ● Energy measurement/savings benchmarks for the above tests ● Determining the best core to switch to and the best time to switch in ● Optimal mechanism to poll or power down a Processor
  • 27. Summary ● Cost of core switching is more important when use asymmetric multicores. ● Core switching to slower OS cores on frequent, expensive system calls some times reduce performance ○ But it also provide power down complex application cores.
  • 28. References ● J. Aas. Understanding the Linux 2.6.8.1 CPU Scheduler. http://josh.trancesoftware. com/linux/, Feb. 2005. ● S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In Proc. ISCA, pages 506–517, 2005. ● M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. J. Instruction Level Parallelism, pages 1–26, June 2008. ● N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G.Saidi, and S. K. Reinhardt. The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52–60, 2006. ● D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proc. ISCA, pages 83–94, Jun. 2000.
  • 29. Q / A Thank You