SlideShare a Scribd company logo
4 Ways to Improve Linux Performance
Michael Christofferson
Director Product Marketing, Enea
IEEE/Enea Webinar
July 23, 2013
FOUNDED
1968
TEN OFFICES
IN NORTH
AMERICA,
EUROPE AND
ASIA
REVENUE
67M
USD
NO. OF
EMPLOYEES
426
 Increasing data traffic in communication devices
require new and innovative software solutions to
handle bandwidth, performance and power
requirements.
 Enea software is heavily used in wireless
Infrastructure (Macro, small cell), gateway,
terminal, military, auto, etc.
 More than 250M of the 325M LTE population
coverage is powered by Enea Solutions
 Enea Solutions run in more than 50% of the
world’s 8.2M radio base stations.
 Enea has recently released its first commercial
Linux distribution, built by Yocto, and specially
tailored for networking and communications
 Global presence, global development, and
headquartered in Stockholm, Sweden
Enea - Powering Communications
Numbers for 2011
FOUNDED
1968
Overview of four approaches to enhancement of standard Linux
performance in embedded multicore devices.
Linux PREEMPT_RT CONFIG Patch Set
Enea LWRT
Open Event Machine
Hypervisors or “thin kernel” solutions
Relative performance comparisons, as well as other metrics
that reflect “Pros and Cons” of each approach
Agenda
Many measures of “performance”
•Real-time Responsiveness
– In embedded, often linked with the concept of “deterministic” response
– But not always!! …. See next slide
•Throughput
– Discreet event processing bandwidth or rates
– Does not necessarily mean short or even deterministic real-time response
•High Performance Computing
– Massive compute intensive applications like modeling and simulation, and mathematical
related computations
– Not the same as throughput
What Does Performance Mean?
=> For embedded, it’s about Real-time response and Throughput
• Real-time systems
– Have “operational deadlines from event to system response”
– Must guarantee the response to external events within strict time constraints
• Non-real-time systems
– Cannot guarantee response time in any situation
– Are often optimized for best-effort, high throughput performance
• “Real-time response” means deterministic response
– Can mean seconds, milliseconds, microseconds.
– I.e. not necessarily short times, but usually this is the case
• Real-time system classifications:
– Hard: missing a deadline means total system failure
– Firm: infrequent misses are tolerable, but result is useless. QoS degrades quickly
– Soft: infrequent misses are tolerable, increased frequency degrades QoS more slowly
=> Real-time performance OFTEN is contradictory to Throughput!!
What Does “Real-time” Performance Mean?
Examples of real-time systems
• Hard real-time applications:
– Automotive: anti-lock brakes, car engine control
– Medical: heart pacemakers
– Industrial: process controllers, robot control
Throughput NOT an issue
• Firm real-time applications:
– 3G/4G baseband processing/signaling in base stations and radio network
controllers
– 3G/4G baseband processing/signaling in wireless modems (phones, tablets)
– Many other examples in the networking space – RRU, optical transport,
backhaul, too numerous to list
Throughput is often an issue
• Soft real-time applications:
– IP network control signaling, network servers
– Live audio-video systems on the edge or in data centers
Throughput with “good enough” real time response IS the issue
Four Ways for Better Performance in Linux:
Linux KernelLinux Kernel
Vertically partition Linux
in two domains:
Linux KernelLinux Kernel Linux KernelLinux Kernel
Add a thin real-time
kernel underneath Linux:
Rework the internals of
Linux:
Realtime KernelRealtime Kernel
RT Runtime- LWRTRT Runtime- LWRT
The PREEMPT_RT
patch
“Thin-kernel” or
virtualization
Vertical Partitioning +
User mode Runtime
RT appsRT apps
Event Machine
Partition Linux in two
domains:, one not
running Linux at all
Linux KernelLinux Kernel
Event MachineEvent Machine
CONFIG_PREEMPT_RT Patch Set
What Problem is PREEMPT_RT Trying to Solve?
Minimize Linux Interrupt Processing
Delays from external event to response
External
Interrupt
Triggered
Interrupt
Taken
Interrupt
Received in
User/Thread
Context
Critical section
with interrupts
disabled
HW
Exception
“Top Half” / ISR Exit from IRQ Reschedule Context Switch
Something else is
executing
(probably
another ISR)
E.g. locks (xtime lock could
be one example?)
Softirqs, RCUs Priority
inversion/
conflict
Cache misses, etc.
Signal/
Wakeup
Locks,
RCUs, etc.
Resource Conflicts
The CONFIG_PREEMPT_RT patch set
• Started 10+ years ago
– Before multicore evolution; uni-core optimized technology
– Many other contributors since then
• Replaces most kernel spinlocks with mutexes with priority
inheritance
• Moves most interrupt handling to kernel threads
– This means many drivers must be modified
• Roughly, PREEMPT_RT patches 500+ locations in the
kernel, with 11,500+ new lines of code in total.
• In a multicore device, is “system wide in scope”
Improves real-time performance (interrupt latency)
but AT THE EXPENSE of throughput
PREEMPT_RT Throughput/RT Tradeoff
A Very Simple Example
Linux 3.6.4:
# netperf -H localhost -t TCP_STREAM -A 16K,16K -l 120 -C -D 20
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % U % S us/KB us/KB
87380 16384 16384 120.00 8782.10 -1.00 84.81 -1.000 1.582
Linux 3.6.4-rt10 (PREEMPT_RT):
# netperf -H localhost -t TCP_STREAM -A 16K,16K -l 120 -C -D 20
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % U % S us/KB us/KB
87380 16384 16384 120.00 4185.48 -1.00 70.21 -1.000 2.748
But this is a simple example that doesn’t always apply
Other CONFIG_PREEMPT_RT Characteristics
• ALL Linux Solution
– API’s / programming paradigm
– Including all tools
– BUT!! Requires driver modifications for all drivers
• Compatible with Core Isolation/Shielding techniques
– Can work reasonably well for both real-time and throughput in a
“bare metal” environment, i.e. no multithreading on isolated cores
• Linux SMP style load balancing, for what it’s worth 
• Standard Linux memory protection
• Standard Linux Power Management
LWRT
LWRT and the Vertical Partitioning Concept
• Partitioning of the system into
separate real-time critical
(shielded cores) an non-critical
domains.
• It is often the Linux kernel itself
that introduces real-time
problems.
• Real-time partition does not need
full POSIX/Linux API
• A combination of partitioning,
combined with a user-mode
environment that avoids using the
kernel can improve performance
and real-time characteristics
compared to a standard Linux.
“Improve performance and
realtime characteristics
under Linux by partitioning
the system into logical
domains, and by avoiding
usage of the Linux kernel and
its resources more than
necessary”
The LWRT Vertical Partitioning Concept (2)
• Configure processes and interrupts to run with core affinity
• Make minor modifications to the kernel to avoid running
unnecessary kernel threads/timers on real-time cores
• Avoid using/calling the kernel, and rely on a user-mode
execution runtime environment
Use Cases:
a.When targeting interrupt latency at a 3-10 us average
and 15-30 us worst case requirements
b.When the application requires multi-threading
performance
How does LWRT work?
PthreadPthread
Core
0
Core
0
Core
N
Core
N
Linux KernelLinux Kernel
PthreadPthread
LWRT EnvironmentLWRT Environment
LWRT Kernel
Module
Realtime ProcessesRealtime ProcessesNon-realtime ProcessesNon-realtime Processes
LWRT partitions the system into one realtime
domain and one non-realtime domain.
LWRT adds a user-mode runtime environment,
including an optimized user-mode scheduler.
LWRT adds a kernel module to catch and forward
interrupts to the user-mode environment.
LWRT migrates some specific kernel functionality
(e.g. timers) away from the realtime domain.
What are the benefits of LWRT?
PthreadPthread
Core
0
Core
0
Core
N
Core
N
Linux KernelLinux Kernel
PthreadPthread
LWRT EnvironmentLWRT Environment
LWRT Kernel
Module
Realtime ProcessesRealtime ProcessesNon-realtime ProcessesNon-realtime Processes
LWRT provides a solution that is unencumbered by
GPL, even for interrupt driven code which can be
placed in user-space without any major penalty.
LWRT provides very good (i.e. low-latency) interrupt
response time, all the way up to user-mode.
LWRT provides low latency and high throughput. LWRT
does not depend on the PREEMPT_RT patch, and does
not affect throughput negatively.
LWRT provides optimized APIs for realtime
applications, and allows the same application to use
the POSIX/Linux APIs when realtime doesn’t matter.
LWRT is an “all-Linux” solution, based on a single Linux
Kernel. Thus, almost all tools from the existing Linux
ecosystem will be available.
LWRT vs Linux/PREEMPT_RT
Performance
0
20000
40000
60000
80000
100000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
pthreads
Enea’s User-Space Linux Executive
Much better performance i.e.
lower scheduling latency
Much better real-time characteristics,
i.e. less variance.
Clock cycles
(lower is better)
Number of samples
measured
(ideally a single peak)
Scheduling Latency – LWRT vs Pthreads
Message Passing Latency
Interrupt Latency
Throughput ≈ “Idle” Time
Based on a Real-world LTE Example
0 500 μs 1000 μs 2000 μs1500 μs
Cell N
Cell 1
Cell 0
“Idle” Time
In our example:
“Theoretical”
maximum for a
system with
infinitesimally little
overhead is 400 μs
In our example:
“Theoretical”
maximum for a
system with
infinitesimally little
overhead is 400 μs
Idle time (Throughput)
Other LWRT Characteristics
• NOT ALL Linux Solution
– Different API’s, programming paradigm
– Does include all Linux tools, except for LWRT thread awareness
– BUT, doesn’t require standard Linux driver modification
• Depends on Core Isolation/Shielding
• Slightly better real-time response/determinism than
PREEMPT_RT
– Interrupt handling model “cleaner”
• Better than PREEMPT_RT for Throughput
– But only if Multithreading in the application is necessary
– Not for bare metal
• No load balancing – the current vertical partitioning concept
prohibits it
• No memory protection between threading environments on a
core
– Best implementation requires ONE pthread per core
• Not standard Linux Power Management
Open Event Machine
sourceforge.net/projects/eventmachine
What does Event Machine Look Like?
EMEM
Core
0
Core
0
Core
N
Core
N
Linux KernelLinux Kernel
EMEM EMEM EMEM
EM needs a “dispatcher”EM needs a “dispatcher”
EM Scheduler
Realtime ApplicationsRealtime ApplicationsNon-realtime ProcessesNon-realtime Processes
EM partitions the system into one realtime
domain and one non-realtime domain, like
LWRT.
EM is a run-to-completion model for
individual “contextless” work packages. NO
threading or OS model .
EM does not necessary need a special
interrupt handling model. Needs a “scheduler”
in either Linux partition OR in HW
EM does not require kernel mods, nor core
isolation, but it can use core shielding, i.e.
non-essential Linux processes and
interrupts are migrated away from the EM
cores.
Event Machine
• An efficient (low overhead) execution
model for data plane processing.
• An “event” based programming
paradigm, replacing traditional threads
and processes.
– “Events are data associated with code
– Run-to-completion model code. This means
“context-less” or “state-less” code for
processing
• New “first class” OS primitives:
queues, events, execution objects.
– Can work within an RTOS environment!! See
next slide
• A framework for distribution and
scheduling in multicore scenario.
• A standardized API.
• HW offloading friendly API.
EOX
Scheduler
Core/Thread 1
EOX
EOY
Core/Thread N
EOX
EOY
DispatcherDispatcher
Push versus Pull Models
• Pull model
– Simple design
– Passive loadbalancing.
– Offload a majority of scheduling decisions
to HW
– Core hot-plug(powersave) easier to
implement.
– Cache cold problems on MIMO/SIMO
queues.
• Push model
– Cache prefetching can be improved.
– Active load balancing protocols needed.
– Offloading scheduling decisions to I/O
co/processor ? i.e. smart HW queues.
• Push/Pull
– Pull whenever HW can schedule I/O.
– Keep it simple.
EOX
Scheduler
Core/Thread 1
EOX
EOY
Core/Thread N
EOX
EOY
DispatcherDispatcher
Priority Processes
Interrupt Processes
Event Scheduling
(in scheduler idle)
OS + Event Machine Scheduling Model
Preemption
Background Jobs
Other Event Machine Characteristics
• NOT ALL Linux Solution
– Different API’s, programming paradigm on EM cores
• This means tools as well
– Requires restructuring code into simple, non-preemptive, run-
to-completion models …. “Context-less” processing
• Depends on Core Isolation/Shielding
• Superior for max data plane THROUGHPUT
• Real-time response is not part of the equation
– Time to process events is not a parameter
– But it “could” result in good real-time response depending on
use case
• Designed for best load balancing on the data plane
• No memory protection EM instances on cores
• Not standard Linux Power Management
– But not a hard problem to solve in a “Pull” model
Virtualization Techniques
 Virtualizes Linux
 Examples includes hypervisor, Xenomai, RTLinux etc
 Provides a highly deterministic RTOS environment for RT apps
 Cannot completely utilize the Linux eco-system (e.g. tools) in the realtime domain.
 Suitable for very high real-time requirements, inherited from classic RTOS domains
CPU 0
Multicore SoC
RT OS
CPU 1 CPU 2
Tools
CPU 3 CPU 4 CPU 5
Virtual Machine
Bare Metal OR
RT Apps
Data Plane fast path application
CPU 6 CPU 7
Linux
Real Time Virtualization Solution
Hypervisor Characteristics
• NOT ALL Linux Solution
– Different API’s, programming paradigm for real-time
cores
• This means tools as well
• Superior real-time response
• Excellent THROUGHPUT
• Memory protection across cores
But best use case is for legacy migration or
consolidation. Hypervisors really not discussed
too much anymore in the embedded industry
Four Ways for Better Performance in Linux:
Linux KernelLinux Kernel
Vertically partition Linux
in two domains:
Linux KernelLinux Kernel Linux KernelLinux Kernel
Add a thin real-time
kernel underneath Linux:
Rework the internals of
Linux:
Realtime KernelRealtime Kernel
RT Runtime- LWRTRT Runtime- LWRT
The PREEMPT_RT
patch
“Thin-kernel” or
virtualization
Vertical Partitioning +
User mode Runtime
RT appsRT apps
Event Machine
Partition Linux in two
domains:, one not
running Linux at all
Linux KernelLinux Kernel
Event MachineEvent Machine
Enea supports PREEMPT_RT, Virtualization, LWRT
Event Machine with Linux is a research topic
Thank You
Visit us at enea.com

More Related Content

What's hot

BeRTOS: Free Embedded RTOS
BeRTOS: Free Embedded RTOSBeRTOS: Free Embedded RTOS
BeRTOS: Free Embedded RTOS
Develer S.r.l.
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
mentoresd
 
The survey on real time operating systems (1)
The survey on real time operating systems (1)The survey on real time operating systems (1)
The survey on real time operating systems (1)
manojkumarsmks
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
rahul km
 
Real time operating system
Real time operating systemReal time operating system
Real time operating system
Bharti Goyal
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
Jean-François Deverge
 
Rtos princples adn case study
Rtos princples adn case studyRtos princples adn case study
Rtos princples adn case study
vanamali_vanu
 
Real-Time Operating Systems
Real-Time Operating SystemsReal-Time Operating Systems
Real-Time Operating Systems
Praveen Penumathsa
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
Jim St. Leger
 
Qnx os
Qnx os Qnx os
Qnx os
Student
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
Hariharan Ganesan
 
Vxworks
VxworksVxworks
How to choose an RTOS?
How to choose an RTOS?How to choose an RTOS?
How to choose an RTOS?
Rohit Choudhury
 
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
JOLLUSUDARSHANREDDY
 
Basic functions & types of RTOS ES
Basic functions & types of  RTOS ESBasic functions & types of  RTOS ES
Basic functions & types of RTOS ES
JOLLUSUDARSHANREDDY
 
Rtos
RtosRtos
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
Chris Genazzio
 
Rtos
RtosRtos
RTOS Basic Concepts
RTOS Basic ConceptsRTOS Basic Concepts
RTOS Basic Concepts
Pantech ProLabs India Pvt Ltd
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time Systems
Wayne Jones Jnr
 

What's hot (20)

BeRTOS: Free Embedded RTOS
BeRTOS: Free Embedded RTOSBeRTOS: Free Embedded RTOS
BeRTOS: Free Embedded RTOS
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
The survey on real time operating systems (1)
The survey on real time operating systems (1)The survey on real time operating systems (1)
The survey on real time operating systems (1)
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
Real time operating system
Real time operating systemReal time operating system
Real time operating system
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
Rtos princples adn case study
Rtos princples adn case studyRtos princples adn case study
Rtos princples adn case study
 
Real-Time Operating Systems
Real-Time Operating SystemsReal-Time Operating Systems
Real-Time Operating Systems
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
 
Qnx os
Qnx os Qnx os
Qnx os
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
 
Vxworks
VxworksVxworks
Vxworks
 
How to choose an RTOS?
How to choose an RTOS?How to choose an RTOS?
How to choose an RTOS?
 
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
 
Basic functions & types of RTOS ES
Basic functions & types of  RTOS ESBasic functions & types of  RTOS ES
Basic functions & types of RTOS ES
 
Rtos
RtosRtos
Rtos
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
 
Rtos
RtosRtos
Rtos
 
RTOS Basic Concepts
RTOS Basic ConceptsRTOS Basic Concepts
RTOS Basic Concepts
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time Systems
 

Similar to Four Ways to Improve Linux Performance IEEE Webinar, R2.0

Enea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux WhitepaperEnea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux Whitepaper
Enea Software AB
 
Rtos 2
Rtos 2Rtos 2
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how
Chirag Jog
 
Lab6 rtos
Lab6 rtosLab6 rtos
Lab6 rtos
indirakumar86
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
Michael Christofferson
 
UNIT V PPT.ppt
UNIT V PPT.pptUNIT V PPT.ppt
UNIT V PPT.ppt
ThanmayaGoud
 
Embedded system
Embedded systemEmbedded system
Embedded system
Anmol Bagga
 
Embeddedsystem
EmbeddedsystemEmbeddedsystem
Embeddedsystem
anshul parmar
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
Pawandeep Kaur
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Nagios
 
Rtos by shibu
Rtos by shibuRtos by shibu
Rtos by shibu
Shibu Krishnan
 
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on LinuxTommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
linuxlab_conf
 
rtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdfrtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdf
reemasajin1
 
Resilient Network Design Concepts Educat
Resilient Network Design Concepts EducatResilient Network Design Concepts Educat
Resilient Network Design Concepts Educat
SamGrandprix
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05
Rajesh Gupta
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
National Cheng Kung University
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
Pratik Gohel
 
Trainingreport on embedded system
Trainingreport on embedded systemTrainingreport on embedded system
Trainingreport on embedded system
Mukul Mohal
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
madhuinturi
 

Similar to Four Ways to Improve Linux Performance IEEE Webinar, R2.0 (20)

Enea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux WhitepaperEnea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux Whitepaper
 
Rtos 2
Rtos 2Rtos 2
Rtos 2
 
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how
 
Lab6 rtos
Lab6 rtosLab6 rtos
Lab6 rtos
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
UNIT V PPT.ppt
UNIT V PPT.pptUNIT V PPT.ppt
UNIT V PPT.ppt
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Embeddedsystem
EmbeddedsystemEmbeddedsystem
Embeddedsystem
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Rtos by shibu
Rtos by shibuRtos by shibu
Rtos by shibu
 
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on LinuxTommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
 
rtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdfrtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdf
 
Resilient Network Design Concepts Educat
Resilient Network Design Concepts EducatResilient Network Design Concepts Educat
Resilient Network Design Concepts Educat
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
 
Trainingreport on embedded system
Trainingreport on embedded systemTrainingreport on embedded system
Trainingreport on embedded system
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 

Four Ways to Improve Linux Performance IEEE Webinar, R2.0

  • 1. 4 Ways to Improve Linux Performance Michael Christofferson Director Product Marketing, Enea IEEE/Enea Webinar July 23, 2013
  • 2. FOUNDED 1968 TEN OFFICES IN NORTH AMERICA, EUROPE AND ASIA REVENUE 67M USD NO. OF EMPLOYEES 426  Increasing data traffic in communication devices require new and innovative software solutions to handle bandwidth, performance and power requirements.  Enea software is heavily used in wireless Infrastructure (Macro, small cell), gateway, terminal, military, auto, etc.  More than 250M of the 325M LTE population coverage is powered by Enea Solutions  Enea Solutions run in more than 50% of the world’s 8.2M radio base stations.  Enea has recently released its first commercial Linux distribution, built by Yocto, and specially tailored for networking and communications  Global presence, global development, and headquartered in Stockholm, Sweden Enea - Powering Communications Numbers for 2011
  • 3. FOUNDED 1968 Overview of four approaches to enhancement of standard Linux performance in embedded multicore devices. Linux PREEMPT_RT CONFIG Patch Set Enea LWRT Open Event Machine Hypervisors or “thin kernel” solutions Relative performance comparisons, as well as other metrics that reflect “Pros and Cons” of each approach Agenda
  • 4. Many measures of “performance” •Real-time Responsiveness – In embedded, often linked with the concept of “deterministic” response – But not always!! …. See next slide •Throughput – Discreet event processing bandwidth or rates – Does not necessarily mean short or even deterministic real-time response •High Performance Computing – Massive compute intensive applications like modeling and simulation, and mathematical related computations – Not the same as throughput What Does Performance Mean? => For embedded, it’s about Real-time response and Throughput
  • 5. • Real-time systems – Have “operational deadlines from event to system response” – Must guarantee the response to external events within strict time constraints • Non-real-time systems – Cannot guarantee response time in any situation – Are often optimized for best-effort, high throughput performance • “Real-time response” means deterministic response – Can mean seconds, milliseconds, microseconds. – I.e. not necessarily short times, but usually this is the case • Real-time system classifications: – Hard: missing a deadline means total system failure – Firm: infrequent misses are tolerable, but result is useless. QoS degrades quickly – Soft: infrequent misses are tolerable, increased frequency degrades QoS more slowly => Real-time performance OFTEN is contradictory to Throughput!! What Does “Real-time” Performance Mean?
  • 6. Examples of real-time systems • Hard real-time applications: – Automotive: anti-lock brakes, car engine control – Medical: heart pacemakers – Industrial: process controllers, robot control Throughput NOT an issue • Firm real-time applications: – 3G/4G baseband processing/signaling in base stations and radio network controllers – 3G/4G baseband processing/signaling in wireless modems (phones, tablets) – Many other examples in the networking space – RRU, optical transport, backhaul, too numerous to list Throughput is often an issue • Soft real-time applications: – IP network control signaling, network servers – Live audio-video systems on the edge or in data centers Throughput with “good enough” real time response IS the issue
  • 7. Four Ways for Better Performance in Linux: Linux KernelLinux Kernel Vertically partition Linux in two domains: Linux KernelLinux Kernel Linux KernelLinux Kernel Add a thin real-time kernel underneath Linux: Rework the internals of Linux: Realtime KernelRealtime Kernel RT Runtime- LWRTRT Runtime- LWRT The PREEMPT_RT patch “Thin-kernel” or virtualization Vertical Partitioning + User mode Runtime RT appsRT apps Event Machine Partition Linux in two domains:, one not running Linux at all Linux KernelLinux Kernel Event MachineEvent Machine
  • 9. What Problem is PREEMPT_RT Trying to Solve? Minimize Linux Interrupt Processing Delays from external event to response External Interrupt Triggered Interrupt Taken Interrupt Received in User/Thread Context Critical section with interrupts disabled HW Exception “Top Half” / ISR Exit from IRQ Reschedule Context Switch Something else is executing (probably another ISR) E.g. locks (xtime lock could be one example?) Softirqs, RCUs Priority inversion/ conflict Cache misses, etc. Signal/ Wakeup Locks, RCUs, etc. Resource Conflicts
  • 10. The CONFIG_PREEMPT_RT patch set • Started 10+ years ago – Before multicore evolution; uni-core optimized technology – Many other contributors since then • Replaces most kernel spinlocks with mutexes with priority inheritance • Moves most interrupt handling to kernel threads – This means many drivers must be modified • Roughly, PREEMPT_RT patches 500+ locations in the kernel, with 11,500+ new lines of code in total. • In a multicore device, is “system wide in scope” Improves real-time performance (interrupt latency) but AT THE EXPENSE of throughput
  • 11. PREEMPT_RT Throughput/RT Tradeoff A Very Simple Example Linux 3.6.4: # netperf -H localhost -t TCP_STREAM -A 16K,16K -l 120 -C -D 20 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % U % S us/KB us/KB 87380 16384 16384 120.00 8782.10 -1.00 84.81 -1.000 1.582 Linux 3.6.4-rt10 (PREEMPT_RT): # netperf -H localhost -t TCP_STREAM -A 16K,16K -l 120 -C -D 20 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % U % S us/KB us/KB 87380 16384 16384 120.00 4185.48 -1.00 70.21 -1.000 2.748 But this is a simple example that doesn’t always apply
  • 12. Other CONFIG_PREEMPT_RT Characteristics • ALL Linux Solution – API’s / programming paradigm – Including all tools – BUT!! Requires driver modifications for all drivers • Compatible with Core Isolation/Shielding techniques – Can work reasonably well for both real-time and throughput in a “bare metal” environment, i.e. no multithreading on isolated cores • Linux SMP style load balancing, for what it’s worth  • Standard Linux memory protection • Standard Linux Power Management
  • 13. LWRT
  • 14. LWRT and the Vertical Partitioning Concept • Partitioning of the system into separate real-time critical (shielded cores) an non-critical domains. • It is often the Linux kernel itself that introduces real-time problems. • Real-time partition does not need full POSIX/Linux API • A combination of partitioning, combined with a user-mode environment that avoids using the kernel can improve performance and real-time characteristics compared to a standard Linux. “Improve performance and realtime characteristics under Linux by partitioning the system into logical domains, and by avoiding usage of the Linux kernel and its resources more than necessary”
  • 15. The LWRT Vertical Partitioning Concept (2) • Configure processes and interrupts to run with core affinity • Make minor modifications to the kernel to avoid running unnecessary kernel threads/timers on real-time cores • Avoid using/calling the kernel, and rely on a user-mode execution runtime environment Use Cases: a.When targeting interrupt latency at a 3-10 us average and 15-30 us worst case requirements b.When the application requires multi-threading performance
  • 16. How does LWRT work? PthreadPthread Core 0 Core 0 Core N Core N Linux KernelLinux Kernel PthreadPthread LWRT EnvironmentLWRT Environment LWRT Kernel Module Realtime ProcessesRealtime ProcessesNon-realtime ProcessesNon-realtime Processes LWRT partitions the system into one realtime domain and one non-realtime domain. LWRT adds a user-mode runtime environment, including an optimized user-mode scheduler. LWRT adds a kernel module to catch and forward interrupts to the user-mode environment. LWRT migrates some specific kernel functionality (e.g. timers) away from the realtime domain.
  • 17. What are the benefits of LWRT? PthreadPthread Core 0 Core 0 Core N Core N Linux KernelLinux Kernel PthreadPthread LWRT EnvironmentLWRT Environment LWRT Kernel Module Realtime ProcessesRealtime ProcessesNon-realtime ProcessesNon-realtime Processes LWRT provides a solution that is unencumbered by GPL, even for interrupt driven code which can be placed in user-space without any major penalty. LWRT provides very good (i.e. low-latency) interrupt response time, all the way up to user-mode. LWRT provides low latency and high throughput. LWRT does not depend on the PREEMPT_RT patch, and does not affect throughput negatively. LWRT provides optimized APIs for realtime applications, and allows the same application to use the POSIX/Linux APIs when realtime doesn’t matter. LWRT is an “all-Linux” solution, based on a single Linux Kernel. Thus, almost all tools from the existing Linux ecosystem will be available.
  • 19. 0 20000 40000 60000 80000 100000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 pthreads Enea’s User-Space Linux Executive Much better performance i.e. lower scheduling latency Much better real-time characteristics, i.e. less variance. Clock cycles (lower is better) Number of samples measured (ideally a single peak) Scheduling Latency – LWRT vs Pthreads
  • 22. Throughput ≈ “Idle” Time Based on a Real-world LTE Example 0 500 μs 1000 μs 2000 μs1500 μs Cell N Cell 1 Cell 0 “Idle” Time In our example: “Theoretical” maximum for a system with infinitesimally little overhead is 400 μs In our example: “Theoretical” maximum for a system with infinitesimally little overhead is 400 μs
  • 24. Other LWRT Characteristics • NOT ALL Linux Solution – Different API’s, programming paradigm – Does include all Linux tools, except for LWRT thread awareness – BUT, doesn’t require standard Linux driver modification • Depends on Core Isolation/Shielding • Slightly better real-time response/determinism than PREEMPT_RT – Interrupt handling model “cleaner” • Better than PREEMPT_RT for Throughput – But only if Multithreading in the application is necessary – Not for bare metal • No load balancing – the current vertical partitioning concept prohibits it • No memory protection between threading environments on a core – Best implementation requires ONE pthread per core • Not standard Linux Power Management
  • 26. What does Event Machine Look Like? EMEM Core 0 Core 0 Core N Core N Linux KernelLinux Kernel EMEM EMEM EMEM EM needs a “dispatcher”EM needs a “dispatcher” EM Scheduler Realtime ApplicationsRealtime ApplicationsNon-realtime ProcessesNon-realtime Processes EM partitions the system into one realtime domain and one non-realtime domain, like LWRT. EM is a run-to-completion model for individual “contextless” work packages. NO threading or OS model . EM does not necessary need a special interrupt handling model. Needs a “scheduler” in either Linux partition OR in HW EM does not require kernel mods, nor core isolation, but it can use core shielding, i.e. non-essential Linux processes and interrupts are migrated away from the EM cores.
  • 27. Event Machine • An efficient (low overhead) execution model for data plane processing. • An “event” based programming paradigm, replacing traditional threads and processes. – “Events are data associated with code – Run-to-completion model code. This means “context-less” or “state-less” code for processing • New “first class” OS primitives: queues, events, execution objects. – Can work within an RTOS environment!! See next slide • A framework for distribution and scheduling in multicore scenario. • A standardized API. • HW offloading friendly API. EOX Scheduler Core/Thread 1 EOX EOY Core/Thread N EOX EOY DispatcherDispatcher
  • 28. Push versus Pull Models • Pull model – Simple design – Passive loadbalancing. – Offload a majority of scheduling decisions to HW – Core hot-plug(powersave) easier to implement. – Cache cold problems on MIMO/SIMO queues. • Push model – Cache prefetching can be improved. – Active load balancing protocols needed. – Offloading scheduling decisions to I/O co/processor ? i.e. smart HW queues. • Push/Pull – Pull whenever HW can schedule I/O. – Keep it simple. EOX Scheduler Core/Thread 1 EOX EOY Core/Thread N EOX EOY DispatcherDispatcher
  • 29. Priority Processes Interrupt Processes Event Scheduling (in scheduler idle) OS + Event Machine Scheduling Model Preemption Background Jobs
  • 30. Other Event Machine Characteristics • NOT ALL Linux Solution – Different API’s, programming paradigm on EM cores • This means tools as well – Requires restructuring code into simple, non-preemptive, run- to-completion models …. “Context-less” processing • Depends on Core Isolation/Shielding • Superior for max data plane THROUGHPUT • Real-time response is not part of the equation – Time to process events is not a parameter – But it “could” result in good real-time response depending on use case • Designed for best load balancing on the data plane • No memory protection EM instances on cores • Not standard Linux Power Management – But not a hard problem to solve in a “Pull” model
  • 32.  Virtualizes Linux  Examples includes hypervisor, Xenomai, RTLinux etc  Provides a highly deterministic RTOS environment for RT apps  Cannot completely utilize the Linux eco-system (e.g. tools) in the realtime domain.  Suitable for very high real-time requirements, inherited from classic RTOS domains CPU 0 Multicore SoC RT OS CPU 1 CPU 2 Tools CPU 3 CPU 4 CPU 5 Virtual Machine Bare Metal OR RT Apps Data Plane fast path application CPU 6 CPU 7 Linux Real Time Virtualization Solution
  • 33. Hypervisor Characteristics • NOT ALL Linux Solution – Different API’s, programming paradigm for real-time cores • This means tools as well • Superior real-time response • Excellent THROUGHPUT • Memory protection across cores But best use case is for legacy migration or consolidation. Hypervisors really not discussed too much anymore in the embedded industry
  • 34. Four Ways for Better Performance in Linux: Linux KernelLinux Kernel Vertically partition Linux in two domains: Linux KernelLinux Kernel Linux KernelLinux Kernel Add a thin real-time kernel underneath Linux: Rework the internals of Linux: Realtime KernelRealtime Kernel RT Runtime- LWRTRT Runtime- LWRT The PREEMPT_RT patch “Thin-kernel” or virtualization Vertical Partitioning + User mode Runtime RT appsRT apps Event Machine Partition Linux in two domains:, one not running Linux at all Linux KernelLinux Kernel Event MachineEvent Machine Enea supports PREEMPT_RT, Virtualization, LWRT Event Machine with Linux is a research topic
  • 35. Thank You Visit us at enea.com