SlideShare a Scribd company logo
1 of 24
Download to read offline
Just Enough is More:
Achieving Sustainable Performance in
Thermally-Constrained Mobile Devices
Ayse K. Coskun
Department of Electrical and Computer Engineering
Performance and Energy Aware Computing Laboratory
www.bu.edu/peaclab
Mobile Devices: Trends & Thermal Challenges
• SoC power densities have grown significantly
– Mobile CPUs adopt aggressive µArch design, clock speeds.[*]
– Single-thread CPU performance have improved by 3x-11x over
generations [Halpern et al. HPCA’16].
– Integrated GPUs further elevate power densities.
Power and TDP of S5/S6
[Halpern et al. HPCA’16]
[*] http://www.arm.com/files/pdf/AT-Exploring_the_Design_of_the_Cortex-A15.pdf
[**] http://arstechnica.com/gadgets/2013/11/when-benchmarks-arent-enough-cpu-performance-in-the-nexus-5
• Modern smartphones are thermally constrained
- Chip and skin temperatures elevate to critical levels.
- On/off-chip thermal couplings [Xie et al., ICCAD’13][Prakash et al., DAC’15]
- Power and form-factor restrictions limit cooling capabilities.
• Thermal throttling
• Unsustainable user experience over extended application use.[**]
2
In our work …
 We aim at providing sustainable performance and propose thermally-efficient
QoS management.
 We demonstrate and analyze the performance impacts of multiple thermal
constraints of modern mobile devices.
 We present runtime techniques for improving thermal efficiency:
 Fine-grained DVFS scheduling
 Criticality-driven scheduling
 CPU-GPU thermal coupling aware runtime
 We evaluate these techniques on mobile platforms
 with homogeneous and heterogeneous multi-core CPUs,
 under CPU and skin temperature constraints,
and achieve up to 8x longer durations of extended sustainable performance.
3
Outline
• Experimental Setup & Methodology
• Thermal Constraints in Smartphones
• Techniques for Sustainable Performance
• Evaluation Results
• Summary
4
Experimental Setup & Methodology
Platforms:
• CPU & skin temperature control policies.
– 90°C and 40°C thermal limits, respectively.
Applications:
• Scimark (FFT, SOR);
• SPEC CPU 2006 (H264);
• PARSEC (bodytrack)
• Gaming (Edge of Tomorrow, Real
Racing);
• WebGL (Aquarium, Pearl Boy, Rain);
• Video player apps (Mx Player, Rock
Player);
• Policy implementations:
– cpufreq interface for frequency scaling
– sched_setaffinity for thread-to-core mapping
5
Outline
• Experimental Setup & Methodology
• Thermal Constraints in Smartphones
• Techniques for Sustainable Performance
• Evaluation Results
• Summary
6
Thermal Constraints in Modern Smartphones
• CPU temperature induced throttling largely
degrades QoS.
QoS degradation over time on Odroid-XU3 due to
CPU thermal limits [Sahin et al., ICCAD’16] QoS degradation over time on Nexus 5 due to skin
temperature violation while running EoT.
Users will experience significant performance loss when the device is used for extended
durations (e.g., gaming, streaming etc.)
• Modern smartphones are also constrained
by skin temperature levels (e.g., 34°C-43°C
[Egilmez et al., DATE’15]).
7
Thermal Constraints in Modern Smartphones
• CPU vs. skin level thermal constraints.
• Throttling CPU to lower skin temperature can lead to large waste in CPU
thermal headroom.
(a) FFT application (b) MX Player application
30°C gap
8
Thermal Constraints in Modern Smartphones
• Platform-level thermal management using
additional knobs (e.g., display).
• CPU can better utilize its headroom.
(b) MX Player application
• Current scope of our work focuses on CPU level control knobs.
We propose QoS-centric thermal management to achieve sustained performance and
provide novel observations to improve thermal efficiency.
9
Outline
• Experimental Setup & Methodology
• Thermal Constraints in Smartphones
• Techniques for Sustainable Performance
• Evaluation Results
• Summary
10
Providing QoS-Temperature Tradeoff
Traditional approach:
• Maximizing performance under thermal
limits.
• Unsustainable performance if apps run
longer.
Our approach:
• Limit short-term performance to a “just
enough” level.
• Extend sustainability of acceptable QoS
levels
User
Thermal mgmt.
Power mgmt.
Application
FPS and temperature traces for the Real Racing game
on Odroid-XU3 [Sahin et al., ICCAD’16]
Trading off short term performance for sustainable performance.
11
Fine-grained DVFS State Scheduling
Fine-grained, thermally-efficient scheduling of discrete DVFS states.
Key idea: Divide high frequency interval into fine-grained bins and distribute temporally
while achieving same average frequency.
vs.
Limitations due to DVFS overhead: Real-life demonstration:
Temperature traces for two DVFS schedules
[Sahin et al., ICCAD’15]
12
Exploiting HW/SW Heterogeneity for Thermal Efficiency
QoS scaling for RP (left) EoT (middle) Rain (right)
CPU utilization increase for RP (L) Rain (M) RR (R)
• Experiments on Odroid-XU3
(left) [Sahin, ICCAD’16]
• Non-critical threads increase
CPU util.
- Accelerates heating.
Identifying and leveraging per-thread criticality in scheduling.
Key idea:
QoS Critical
Threads
Non-critical
Threads
High perf. CPU Low perf. CPU
“big” “LITTLE”
Application
13
CPU-GPU Thermal Coupling Aware Runtime Management
• Most thermally-efficient CPU cores
depend on GPU usage.
– Application dependent!
Identifying thermally-efficient cores based on CPU-GPU thermal couplings.
• Offline characterization of thermal coupling via microbenchmarks.
• Varying levels of GPU power -> Record the ordering of cores from the lowest to
highest maximum temperature.
Power (L), temperature (M) and QoS (right)
[Sahin et al., ICCAD’16]
14
CPU-GPU thermal coupling on Exynos 5 [Sahin et
al., ICCAD’16]
Outline
• Experimental Setup & Methodology
• Thermal Constraints in Smartphones
• Techniques for Sustainable Performance
• Evaluation Results
• Summary
15
Evaluation: Extended Sustained Duration
Under CPU temperature constraints on Odroid-
XU3:
Aquarium
30 FPS Constraint
40%
62%
Under skin temperature
constraints on MSM8974:
• QScale provides the longest durations of sustainable
performance.
• Larger improvements (e.g., up to 8x for bodytrack) in
Rain, bodytrack, Rock Player due to criticality
awareness in QScale.
8x
3x
• 55% longer time with
acceptable FPS
16
Evaluation: Runtime Behavior
QoS and temperature traces for Rain
application under 90% target:
Adapting to Dynamic QoS targets:
On MSM8974 running Aquarium
[Sahin et al., ICCAD’15]
On Odroid-XU3 running Edge of
Tomorrow [Sahin et al., ICCAD’16]
17
Summary & Takeaways
Performance and Energy Aware
Computing Lab
• Modern mobile devices are constrained by both skin and
chip level thermal constraints.
• Throttling leads to unsustainable performance
– Users expect consistent performance
• Thermally-efficient runtime techniques
– Reduce temperature
– Strictly adhere user performance requirements
• Up to 8x longer sustainable performance
[1] O. Sahin and A.K. Coskun. "On the Impacts of Greedy Thermal Management in Mobile Devices.", IEEE Embedded System Letters, 2015
[2] O. Sahin, P.T. Varghese and Ayse K. Coskun. "Just Enough is More: Achieving Sustainable Performance in Mobile Devices under Thermal
Limitations.", In ICCAD, 2015.
[3] O. Sahin and Ayse K. Coskun. “QScale: Thermally-Efficient QoS Management on Heterogeneous Mobile Platforms", In ICCAD, 2016.
References:
18
Backup Slides
19
Thread Criticality in Mobile Applications
20
• QoS reaches to the maximum when only a few critical threads are executed on big cores.
We identify the critical threads of an application offline for runtime mapping of application
threads among big and little cores.
• Big cluster utilization can increase when non-critical threads are assigned to big cores.
• Accelerates heating.
Closed-loop QoS Controller Design Details
 Ensures stable control
around Qtarget
 Convergences to Qtarget
21
Summary of Throttling Results on Nexus 5
22
Evaluation of QScale
23
• Policies in comparison:
– Default: Android’s Interactive DVFS
govenor + HMP scheduler.
– DVFS-only: Closed-loop DVFS
controller + HMP scheduler.
– QScale: Closed-loop DVFS controller +
criticality-aware thread mapping.
• Figures show the sustained durations for each
application under different Qos targets.
• QScale provides the longest durations of
sustainable performance.
• Larger improvements (e.g., up to 8x for
bodytrack) Rain, Bodytrack, Rock Player due
to criticality awareness in QScale.
Qscale Overview
24
We propose QScale for providing thermally-efficient QoS management: [ICCAD’16]
• Runtime monitoring of CPU-GPU thermal coupling for core allocation.
• Thread-criticality driven scheduling for big.LITTLE.
• Closed-loop runtime DVFS control to guarantee desired performance.

More Related Content

Similar to Just enough is more: Achieving sustainable performance in thermally-constrained mobile devices

Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...Tarik Reza Toha
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluationGIORGOS STAMELOS
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingRoger Rafanell Mas
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...Matteo Ferroni
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Coresinside-BigData.com
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm IJECEIAES
 
HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...
HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...
HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...IRJET Journal
 
Multicore processor technology advantages and challenges
Multicore processor technology  advantages and challengesMulticore processor technology  advantages and challenges
Multicore processor technology advantages and challengeseSAT Journals
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
Virtualization for efficiency: by Kathrin Winkler, The green grid
Virtualization for efficiency: by Kathrin Winkler, The green gridVirtualization for efficiency: by Kathrin Winkler, The green grid
Virtualization for efficiency: by Kathrin Winkler, The green gridDCC Mission Critical
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...zionsaint
 
AI Sustainability Mascots 23-f.pptx
AI Sustainability Mascots 23-f.pptxAI Sustainability Mascots 23-f.pptx
AI Sustainability Mascots 23-f.pptxTamar Eilam
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 

Similar to Just enough is more: Achieving sustainable performance in thermally-constrained mobile devices (20)

Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud Computing
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Cores
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm
 
HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...
HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...
HEAT TRANSFER ANALYSIS ON LAPTOP COOLING SYSTEM BEFORE AND AFTER INTRODUCING ...
 
Multicore processor technology advantages and challenges
Multicore processor technology  advantages and challengesMulticore processor technology  advantages and challenges
Multicore processor technology advantages and challenges
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
Ron hutchins ga_tech
Ron hutchins ga_techRon hutchins ga_tech
Ron hutchins ga_tech
 
Virtualization for efficiency: by Kathrin Winkler, The green grid
Virtualization for efficiency: by Kathrin Winkler, The green gridVirtualization for efficiency: by Kathrin Winkler, The green grid
Virtualization for efficiency: by Kathrin Winkler, The green grid
 
Data center m&e
Data center  m&eData center  m&e
Data center m&e
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
 
AI Sustainability Mascots 23-f.pptx
AI Sustainability Mascots 23-f.pptxAI Sustainability Mascots 23-f.pptx
AI Sustainability Mascots 23-f.pptx
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
561_Final
561_Final561_Final
561_Final
 

More from Facultad de Informática UCM

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?Facultad de Informática UCM
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...Facultad de Informática UCM
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersFacultad de Informática UCM
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmFacultad de Informática UCM
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingFacultad de Informática UCM
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroFacultad de Informática UCM
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...Facultad de Informática UCM
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Facultad de Informática UCM
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFacultad de Informática UCM
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoFacultad de Informática UCM
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Facultad de Informática UCM
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Facultad de Informática UCM
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Facultad de Informática UCM
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windFacultad de Informática UCM
 

More from Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 

Recently uploaded

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdfAkritiPradhan2
 

Recently uploaded (20)

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
 

Just enough is more: Achieving sustainable performance in thermally-constrained mobile devices

  • 1. Just Enough is More: Achieving Sustainable Performance in Thermally-Constrained Mobile Devices Ayse K. Coskun Department of Electrical and Computer Engineering Performance and Energy Aware Computing Laboratory www.bu.edu/peaclab
  • 2. Mobile Devices: Trends & Thermal Challenges • SoC power densities have grown significantly – Mobile CPUs adopt aggressive µArch design, clock speeds.[*] – Single-thread CPU performance have improved by 3x-11x over generations [Halpern et al. HPCA’16]. – Integrated GPUs further elevate power densities. Power and TDP of S5/S6 [Halpern et al. HPCA’16] [*] http://www.arm.com/files/pdf/AT-Exploring_the_Design_of_the_Cortex-A15.pdf [**] http://arstechnica.com/gadgets/2013/11/when-benchmarks-arent-enough-cpu-performance-in-the-nexus-5 • Modern smartphones are thermally constrained - Chip and skin temperatures elevate to critical levels. - On/off-chip thermal couplings [Xie et al., ICCAD’13][Prakash et al., DAC’15] - Power and form-factor restrictions limit cooling capabilities. • Thermal throttling • Unsustainable user experience over extended application use.[**] 2
  • 3. In our work …  We aim at providing sustainable performance and propose thermally-efficient QoS management.  We demonstrate and analyze the performance impacts of multiple thermal constraints of modern mobile devices.  We present runtime techniques for improving thermal efficiency:  Fine-grained DVFS scheduling  Criticality-driven scheduling  CPU-GPU thermal coupling aware runtime  We evaluate these techniques on mobile platforms  with homogeneous and heterogeneous multi-core CPUs,  under CPU and skin temperature constraints, and achieve up to 8x longer durations of extended sustainable performance. 3
  • 4. Outline • Experimental Setup & Methodology • Thermal Constraints in Smartphones • Techniques for Sustainable Performance • Evaluation Results • Summary 4
  • 5. Experimental Setup & Methodology Platforms: • CPU & skin temperature control policies. – 90°C and 40°C thermal limits, respectively. Applications: • Scimark (FFT, SOR); • SPEC CPU 2006 (H264); • PARSEC (bodytrack) • Gaming (Edge of Tomorrow, Real Racing); • WebGL (Aquarium, Pearl Boy, Rain); • Video player apps (Mx Player, Rock Player); • Policy implementations: – cpufreq interface for frequency scaling – sched_setaffinity for thread-to-core mapping 5
  • 6. Outline • Experimental Setup & Methodology • Thermal Constraints in Smartphones • Techniques for Sustainable Performance • Evaluation Results • Summary 6
  • 7. Thermal Constraints in Modern Smartphones • CPU temperature induced throttling largely degrades QoS. QoS degradation over time on Odroid-XU3 due to CPU thermal limits [Sahin et al., ICCAD’16] QoS degradation over time on Nexus 5 due to skin temperature violation while running EoT. Users will experience significant performance loss when the device is used for extended durations (e.g., gaming, streaming etc.) • Modern smartphones are also constrained by skin temperature levels (e.g., 34°C-43°C [Egilmez et al., DATE’15]). 7
  • 8. Thermal Constraints in Modern Smartphones • CPU vs. skin level thermal constraints. • Throttling CPU to lower skin temperature can lead to large waste in CPU thermal headroom. (a) FFT application (b) MX Player application 30°C gap 8
  • 9. Thermal Constraints in Modern Smartphones • Platform-level thermal management using additional knobs (e.g., display). • CPU can better utilize its headroom. (b) MX Player application • Current scope of our work focuses on CPU level control knobs. We propose QoS-centric thermal management to achieve sustained performance and provide novel observations to improve thermal efficiency. 9
  • 10. Outline • Experimental Setup & Methodology • Thermal Constraints in Smartphones • Techniques for Sustainable Performance • Evaluation Results • Summary 10
  • 11. Providing QoS-Temperature Tradeoff Traditional approach: • Maximizing performance under thermal limits. • Unsustainable performance if apps run longer. Our approach: • Limit short-term performance to a “just enough” level. • Extend sustainability of acceptable QoS levels User Thermal mgmt. Power mgmt. Application FPS and temperature traces for the Real Racing game on Odroid-XU3 [Sahin et al., ICCAD’16] Trading off short term performance for sustainable performance. 11
  • 12. Fine-grained DVFS State Scheduling Fine-grained, thermally-efficient scheduling of discrete DVFS states. Key idea: Divide high frequency interval into fine-grained bins and distribute temporally while achieving same average frequency. vs. Limitations due to DVFS overhead: Real-life demonstration: Temperature traces for two DVFS schedules [Sahin et al., ICCAD’15] 12
  • 13. Exploiting HW/SW Heterogeneity for Thermal Efficiency QoS scaling for RP (left) EoT (middle) Rain (right) CPU utilization increase for RP (L) Rain (M) RR (R) • Experiments on Odroid-XU3 (left) [Sahin, ICCAD’16] • Non-critical threads increase CPU util. - Accelerates heating. Identifying and leveraging per-thread criticality in scheduling. Key idea: QoS Critical Threads Non-critical Threads High perf. CPU Low perf. CPU “big” “LITTLE” Application 13
  • 14. CPU-GPU Thermal Coupling Aware Runtime Management • Most thermally-efficient CPU cores depend on GPU usage. – Application dependent! Identifying thermally-efficient cores based on CPU-GPU thermal couplings. • Offline characterization of thermal coupling via microbenchmarks. • Varying levels of GPU power -> Record the ordering of cores from the lowest to highest maximum temperature. Power (L), temperature (M) and QoS (right) [Sahin et al., ICCAD’16] 14 CPU-GPU thermal coupling on Exynos 5 [Sahin et al., ICCAD’16]
  • 15. Outline • Experimental Setup & Methodology • Thermal Constraints in Smartphones • Techniques for Sustainable Performance • Evaluation Results • Summary 15
  • 16. Evaluation: Extended Sustained Duration Under CPU temperature constraints on Odroid- XU3: Aquarium 30 FPS Constraint 40% 62% Under skin temperature constraints on MSM8974: • QScale provides the longest durations of sustainable performance. • Larger improvements (e.g., up to 8x for bodytrack) in Rain, bodytrack, Rock Player due to criticality awareness in QScale. 8x 3x • 55% longer time with acceptable FPS 16
  • 17. Evaluation: Runtime Behavior QoS and temperature traces for Rain application under 90% target: Adapting to Dynamic QoS targets: On MSM8974 running Aquarium [Sahin et al., ICCAD’15] On Odroid-XU3 running Edge of Tomorrow [Sahin et al., ICCAD’16] 17
  • 18. Summary & Takeaways Performance and Energy Aware Computing Lab • Modern mobile devices are constrained by both skin and chip level thermal constraints. • Throttling leads to unsustainable performance – Users expect consistent performance • Thermally-efficient runtime techniques – Reduce temperature – Strictly adhere user performance requirements • Up to 8x longer sustainable performance [1] O. Sahin and A.K. Coskun. "On the Impacts of Greedy Thermal Management in Mobile Devices.", IEEE Embedded System Letters, 2015 [2] O. Sahin, P.T. Varghese and Ayse K. Coskun. "Just Enough is More: Achieving Sustainable Performance in Mobile Devices under Thermal Limitations.", In ICCAD, 2015. [3] O. Sahin and Ayse K. Coskun. “QScale: Thermally-Efficient QoS Management on Heterogeneous Mobile Platforms", In ICCAD, 2016. References: 18
  • 20. Thread Criticality in Mobile Applications 20 • QoS reaches to the maximum when only a few critical threads are executed on big cores. We identify the critical threads of an application offline for runtime mapping of application threads among big and little cores. • Big cluster utilization can increase when non-critical threads are assigned to big cores. • Accelerates heating.
  • 21. Closed-loop QoS Controller Design Details  Ensures stable control around Qtarget  Convergences to Qtarget 21
  • 22. Summary of Throttling Results on Nexus 5 22
  • 23. Evaluation of QScale 23 • Policies in comparison: – Default: Android’s Interactive DVFS govenor + HMP scheduler. – DVFS-only: Closed-loop DVFS controller + HMP scheduler. – QScale: Closed-loop DVFS controller + criticality-aware thread mapping. • Figures show the sustained durations for each application under different Qos targets. • QScale provides the longest durations of sustainable performance. • Larger improvements (e.g., up to 8x for bodytrack) Rain, Bodytrack, Rock Player due to criticality awareness in QScale.
  • 24. Qscale Overview 24 We propose QScale for providing thermally-efficient QoS management: [ICCAD’16] • Runtime monitoring of CPU-GPU thermal coupling for core allocation. • Thread-criticality driven scheduling for big.LITTLE. • Closed-loop runtime DVFS control to guarantee desired performance.