SlideShare a Scribd company logo
1 of 25
Download to read offline
Solving Real-World Real-Time Scheduling
    Problems With RT_PREEMPT and
       Deadline-Based Scheduler


                     Xi Wang
               Broadcom Corporation


        Embedded Linux Conference 2011
Introduction
●   Higher degree of processor sharing in embedded
    systems
     –   Before: Several task specific processors with dedicated
         tasks, often running without OS
     –   Now: Multiple tasks running on powerful application
         processors, including some DSP tasks
            ●   E.g. making a video call from a mobile phone
            ●   Common tasks include media tasks for audio/video streams,
                network traffic, wireless and security, etc.
●
    Expectations of real-time scheduling have become
    higher
     –   Multiple low latency tasks with significant CPU time usage is
         more challenging and more interesting
     –   Maintaining low latency and reaching high CPU utilization are
         often conflicting goals for traditional strict priority or rate
         monotonic scheduling
Introduction
●   This talk
     –   Part 1: Meeting real-time requirements of both VoIP
         and packet forwarding tasks by modifying Linux
         priority model with RT_PREEMPT
     –   Part 2: Take it further, how to make everyone happy
         in a three class scheduling problem (not yet a
         solution, but sharing thoughts)
Introduction
●   System in background
     –   Home gateway device with three major classes of
         tasks: Packet forwarding, VoIP and Applications
         such as UI
           ●   VoIP: Real-time
           ●   Packet forwarding: Even more so – low latency tolerance
               due to limited Rx buffers in the embedded device
●   Warning: UI application can be
    hard read-time
     –   User's patience runs out →
         computers destroyed
           ●   If in doubt, search Internet
VoIP and Packet Forwarding
●   Problem for Part 1
     –   When system is loaded with heavy network
         traffic,VoIP tasks cannot get enough CPU cycles

●   Incomplete solution
     –   Make voice threads run at real-time class
           ●   Despite that they can preempt other normal priority
               kernel threads at will, network traffic still has higher
               priority

●   Complete understanding the behavior of
    softirq is the key to solve the problem
Native Linux Priority Model
                          ksoftirqd      ●
                                             Variable priority of softirq
                 normal user                  – Very high when lightly loaded –
                processes and
    App
                kernel threads
                                                preempt everything except hard
             workqueues
                                                irq
                                              – Low, running in ksoftirqd kernel

                 real-time user
                                                thread when oversubscribed
    VoIP        processes and
                kernel threads
                                         ●
                                             The goal is to provide low
                                             latency to softirq tasks without
                               tasklet       starving other tasks
Packet               softirq                  – The downside is that there is no
Forwarding
                               timers           consistent priority model
                      IRQ
Undesirable Default Priority Model
●   In default Linux priority model, no kernel
    thread/user process can have higher strict
    priority over network packet processing
     –   softirq is thread like but not under the control of
         process scheduler
     –   Moving voice processing to softirq is not practical,
         nor does it give voice tasks the highest priority
           ●   It could run in ksoftirqd too
     –   Moving to hard irq might work as an ugly solution,
         but nobody would like the side effects
     –   Layered OS is possible, but overkill
●   The problem cannot be solved without
    changing this model
Changing the Priority Model
●   The solution: Always run softirq in kernel
    thread context
     –   Now a feature in RT_PREEMPT
           ●   Called “preempt softirq” in Kconfig, but better described
               as running softirq in thread context
     –   This code is less-known and small, but important
           ●   First developed by Ingo Monlar's as an independent
               patch. Later included in RT_PREEMPT. Some of the
               RT_PREEMPT features are mainlined, but not this one
               (as of 2.6.38)
           ●   With this change, traditional softirq no longer
               exists ̶ softirq tasks always run in ksoftirqd
Changing The Priority Model

                           ksoftirqd

                  normal user                              Normal user
    App          processes and               App          processes and
                 kernel threads                           kernel threads

              workqueues                               workqueues


                                                                         Timers
                  real-time user                             ksoftirqd
                                           Packet
    VoIP         processes and
                                          forwarding                     Tasklet
                 kernel threads
                                          (NetRx/Tx)
                                                          Real-time user
                                                          processes and
                                tasklet                   kernel threads
Packet                softirq
Forwarding
                                timers       VoIP


                       IRQ                                     IRQ
Under The New Priority Model
●   Moving softirq to a process context and using
    real-time scheduling classes made a working
    system
     –   VoIP tasks are always happy
           ●   Higher strict priority over everything except hard irq
     –   Packet forwarding is mostly happy
           ●   Lower priority than VoIP but won't be affected by
               anything else
On softirq
●
    My observations on softirq
      –   Original purpose is trying to do everything in a non-preemptive kernel
              ●   Defer hard irq processing
              ●   Enable high priority low-latency tasks, and avoid starving other tasks
              ●   Provide preemption when kernel preemption is not enabled/supported
      –   Kernel process scheduling has included most of these features over the
          years
      –   Cost of softirq
              ●   Priority jumps up and down
              ●   Having both Interrupt context and non-interrupt context can create problems
                      –   Inflexible binding of spin lock ↔ softirq, mutex ↔ process
      –   Feature creep, complex protocols in net rx/tx are not necessarily irq related
●
    Phase out softirq?
      –   This gets the author's vote, but Linux platforms are diverse, and it may be
          bad for other systems
              ●   At least packet forwarding is not worse, reporting that running Net Rx/Tx in process
                  context resulted in similar network packet throughput
      –   Keep hard irq, everything else in the process context under the control of
          process scheduler
              ●   Some special mechanisms and optimizations for ultra low latency tasks
Next Step
●   Make it even better – problem for Part 2
●   No compromise, make everyone completely
    happy
     –   Meet strict performance requirements of both VoIP
         and packet forwarding tasks without one having the
         priority over the other
     –   Provide latency guarantees to UI applications too, so
         users can enjoy a responsive UI
Next Step
●   Disclaimer: I would like to share some
    thoughts for which I haven't experimented
     –   Me and process scheduler code
           ●   Tried an early version of SCHED_DEADLINE, but didn't
               give it enough time to see results
           ●   Tried to write my own deadline based scheduler
                 –   Not necessarily to make it a final product, but to better
                     understand the problem. Didn't find enough time to complete
           ●   Relation between this talk to SCHED_DEADLINE
                 –   Overlapping parts can be equivalent, need to do more
                     investigations to find out all the similarities and differences
●   In next pages
     –   A bucket model from me
     –   A proposed scheduling mechanism
     –   Characteristics, problems and more features to add
Understanding Requirements
●   Real-time requirements cannot be described with
    simple priorities
     –   Max latency
           ●   Packet forwarding: 1 ms
                  –   Limited Rx buffers + wirespeed forwarding
           ●   VoIP: 10 ms class
                  –   Protocols, DMA
           ●   Application: 500 ms class
     –   CPU time asked
           ●   Packet forwarding: 0%~100%
           ●   VoIP: 0%~50%
           ●   Application: 0%~100%
     –   CPU time “priority”
           ●   Packet forwarding: Lower than VoIP, higher than application
           ●   VoIP: Gets as much as needed
           ●   Application: Shouldn't be starved (guarantee > 5%)
Making It Better
●   QoS process scheduling?
     –   Detailed requirements on multiple aspects are
         similar to network QoS
     –   Both network QoS and process scheduling are
         related to resource allocation and queuing
           ●   CFS ~ Fair Queuing
QoS Scheduling
●   Review of existing mechanisms – making everyone
    happy is difficult
     –   Strict priority can starve tasks
           ●   Another side effect is that priority inversion leads to deadlocks
     –   CFS provide good CPU time allocation, but no latency
         guarantee
     –   Real-time group scheduling can be a solution, but with
         some drawbacks
           ●   Global scheduling period needs to be aligned with the most
               latency sensitive task, very frequent task switching possible
           ●   Fixed CPU time allocation
●   To solve the three-class QoS scheduling problem
     –   Deadline-based scheduling as the main component,
         complimented by some additional mechanisms
QoS Scheduling
●   Background on deadline-based scheduling
     –   For batch processing, clear model for Earliest Deadline
         First (EDF) scheduling
            ●   N tasks, each with known deadline
            ●   Always pick the task with earliest deadline when rescheduling
            ●   Optimal – guarantee that every job meets its deadline if
                schedulable
     –   Extended to dynamic scheduling with recurrence
            ●Regenerate each task at a certain period, and run the
             classical EDF in each period
                – Alternative methods may exist
     –   See SCHED_DEADLINE and related papers
            ●   http://gitorious.org/sched_deadline/pages/Home
●   In this talk we use a continuous bucket model
    instead
     –   Turns out things can work without scheduling period
Bucket Model
  Each task can be described with      ●   It's simple
    inflow rate and bucket size              –   Drain the right bucket at the
                                                 right time, do not let any
                                                 bucket to overflow
                                             –   Modeled after producer →
                                                 buffer → consumer problems
                                                    ●   Can be reversed to prevent
                                                        underflow for media applications
                                       ●   Continuous model
                                             –   No scheduling period or task
                                                 recurrence period, tasks as
                                                 incoming streams
              Switch
                                       ●   Background
                                             –   Borrow concept from network
                                                 QoS
                                             –   Generic model, not limited to
CPU can drain any bucket at anytime,             deadline scheduling
  but only one at a given moment
QoS Scheduling
●
    Mapping tasks into bucket model
      –   The first parameter, deadline or buffer overflow time often
          already exists in real world
             ●   Packet forwarding: ~1 ms
             ●   VoIP: ~10 ms
             ●   UI Application: ~500 ms (patience runs out = buffer overflow)
      –   Two independent parameters
             ●   Assuming constant inflow rate and bucket size
             ●   Bucket Size, Inflow Rate → Deadline
             ●   Inflow Rate, Drain Rate → CPU Time Allocation
             ●   Only two independent parameters in this model, e.g. bucket size
                 can be normalized out. I use
                     – Bucket fill time (latency)
                     – Bucket drain time (bandwidth, determines CPU time allocation
                       when combined with bucket fill time)
●   Classful task scheduling
      –   No need to map deadline parameters to every task, deadline scheduling
          should happen at task group level
Deadline Based Scheduling Algorithm
●
    Have bucket model, will schedule
●
    When to reschedule is important
      –   Multiple valid solutions to meet the goal
      –   Cannot always run earliest deadline first in a continuous
          model – results in infinitely high task switching frequency
●   A straightforward first step design with lazy switching
      –   Mechanism A
             ●   Triggered: When one of the inactive task's bucket is full – deadline
                 reached
             ●   Action: Switch to that task
      –   Mechanism B
             ●   Triggered: When current task's bucket is empty
                    –   Indicated by current finishes/blocks
                    –   As calculated by bucket model, but the task hasn't finished – timeslice overrun
             ●   Action: Switch to lower scheduling domains or idle task
Lazy Switching Bucket Scheduler Example
             Bucket Fill /
             Active Task



                 1


  Task A
                 0
             Active




                 1

  Task B
                 0
             Active




Lower Scheduling
Domain / Idle Task
             Active




                             Time
Discussion
●   Characteristics, good or bad
     –   Adaptive time slice
           ●   If anyone familiar with image processing, this should be
               comparable to dithering, half-tone pattern or error diffusion
     –   Each task is protected with latency and CPU time
         guarantee, won't be affected by other tasks
     –   Bad mode / pattern possible
           ●   When two classes with close or equal parameters and both of
               their bucket is about to overflow, there will be very frequent
               task switching / thrashing
           ●   Need to be prevented / mitigated with additional mechanisms
           ●   One possible solution is to look ahead to one or more future
               deadlines and switch before bucket is full
     –   Overheads are higher than most schedulers – but we
         probably only need a handful of QoS scheduling classes
         in a given system
Discussion
●   Features to add
     –   SMP ignored in this talk, should be SMP aware
     –   Soft real-time features
           ●   What was discussed before made fixed CPU time
               reservations, sometimes we want to consider both worst
               case and average case, where tasks do not use up all
               their CPU time allocations
                 –   Two sets of bucket parameters, one for worst case guarantee,
                     one for more optimistic situation
                 –   Allow each task to have two taps to receive CPU time, a
                     primary tap at deadline scheduling domain and a second tap at
                     lower scheduling domain, e.g. CFS
Conclusion
●   A good time to experiment with process
    schedulers
     –   Task scheduling can cause a surprising number of
         problems in modern embedded systems
     –   Don't be afraid to work on it. Task scheduling may
         be less up to date as you think – in actual OS or
         academia
           ●   Process scheduler is deeply buried in OS core and rarely
               gives trouble in desktop systems
           ●   Not very easy to experiment and do research on
                  – Many ideas should have been discussed, but not
                    necessarily connected to real OS
     –   With pluggable scheduler design in newer Linux
         kernels, it is now relatively easy to experiment
Conclusion
●   Do we really need more scheduling
    mechanisms?
     –   More challenging for embedded systems
           ●   More real-time requirements, less powerful processors,
               one-size-fits-all may not be enough
     –   Less challenging for embedded systems
           ●   Scalability is not a must
           ●   OK to be mission specific
                 –   Useful even if not mainlined
     –   Answer is probably yes

More Related Content

What's hot

Real-time soultion
Real-time soultionReal-time soultion
Real-time soultionNylon
 
Overview of Linux real-time challenges
Overview of Linux real-time challengesOverview of Linux real-time challenges
Overview of Linux real-time challengesDaniel Stenberg
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Coresyeokm1
 
RTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstRTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstStefano Bragaglia
 
RTOS on ARM cortex-M platform -draft
RTOS on ARM cortex-M platform -draftRTOS on ARM cortex-M platform -draft
RTOS on ARM cortex-M platform -draftJou Neo
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementationRajan Kumar
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux InterruptsKernel TLV
 
RTOS MICRO CONTROLLER OPERATING SYSTEM-2
RTOS MICRO CONTROLLER OPERATING SYSTEM-2RTOS MICRO CONTROLLER OPERATING SYSTEM-2
RTOS MICRO CONTROLLER OPERATING SYSTEM-2JOLLUSUDARSHANREDDY
 
LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager Alison Chaiken
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking MechanismsKernel TLV
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
 
Kernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesKernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesAnne Nicolas
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating SystemTech_MX
 
Bottom halves on Linux
Bottom halves on LinuxBottom halves on Linux
Bottom halves on LinuxChinmay V S
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux InstrumentationDarkStarSword
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory ModelSeongJae Park
 

What's hot (20)

Real-time soultion
Real-time soultionReal-time soultion
Real-time soultion
 
Overview of Linux real-time challenges
Overview of Linux real-time challengesOverview of Linux real-time challenges
Overview of Linux real-time challenges
 
RT linux
RT linuxRT linux
RT linux
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
 
RTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstRTAI - Earliest Deadline First
RTAI - Earliest Deadline First
 
RTOS on ARM cortex-M platform -draft
RTOS on ARM cortex-M platform -draftRTOS on ARM cortex-M platform -draft
RTOS on ARM cortex-M platform -draft
 
Vx works RTOS
Vx works RTOSVx works RTOS
Vx works RTOS
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
RTOS MICRO CONTROLLER OPERATING SYSTEM-2
RTOS MICRO CONTROLLER OPERATING SYSTEM-2RTOS MICRO CONTROLLER OPERATING SYSTEM-2
RTOS MICRO CONTROLLER OPERATING SYSTEM-2
 
LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
Kernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesKernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologies
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Daemons
DaemonsDaemons
Daemons
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating System
 
Bottom halves on Linux
Bottom halves on LinuxBottom halves on Linux
Bottom halves on Linux
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory Model
 

Viewers also liked

Viewers also liked (7)

Job inteview
Job inteviewJob inteview
Job inteview
 
A Lean Case Study
A Lean Case StudyA Lean Case Study
A Lean Case Study
 
How 2 Face An Interview
How 2 Face An InterviewHow 2 Face An Interview
How 2 Face An Interview
 
Case study interview tips
Case study interview tipsCase study interview tips
Case study interview tips
 
24. Advanced Transaction Processing in DBMS
24. Advanced Transaction Processing in DBMS24. Advanced Transaction Processing in DBMS
24. Advanced Transaction Processing in DBMS
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Types of interviews
Types of interviewsTypes of interviews
Types of interviews
 

Similar to Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Scheduler

Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIURohit Jnagal
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...NTT Software Innovation Center
 
Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0Michael Christofferson
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slidesrlrevell
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014vespian_256
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixDatabricks
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Meltonharryvanhaaren
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernelMahendra M
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...Andrey Korolyov
 
OSMC 2014: Naemon 1, 2, 3, N | Andreas Ericsson
OSMC 2014: Naemon 1, 2, 3, N | Andreas EricssonOSMC 2014: Naemon 1, 2, 3, N | Andreas Ericsson
OSMC 2014: Naemon 1, 2, 3, N | Andreas EricssonNETWAYS
 
Percona Xtradb Cluster (pxc) 101 percona university 2019
Percona Xtradb Cluster (pxc) 101 percona university 2019Percona Xtradb Cluster (pxc) 101 percona university 2019
Percona Xtradb Cluster (pxc) 101 percona university 2019Marcelo Henrique Gonçalves
 

Similar to Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Scheduler (20)

Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...
 
Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0
 
Concept of thread
Concept of threadConcept of thread
Concept of thread
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slides
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
RTDroid_Presentation
RTDroid_PresentationRTDroid_Presentation
RTDroid_Presentation
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
Realtime
RealtimeRealtime
Realtime
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
 
OSMC 2014: Naemon 1, 2, 3, N | Andreas Ericsson
OSMC 2014: Naemon 1, 2, 3, N | Andreas EricssonOSMC 2014: Naemon 1, 2, 3, N | Andreas Ericsson
OSMC 2014: Naemon 1, 2, 3, N | Andreas Ericsson
 
Router3
Router3Router3
Router3
 
Percona Xtradb Cluster (pxc) 101 percona university 2019
Percona Xtradb Cluster (pxc) 101 percona university 2019Percona Xtradb Cluster (pxc) 101 percona university 2019
Percona Xtradb Cluster (pxc) 101 percona university 2019
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Scheduler

  • 1. Solving Real-World Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Scheduler Xi Wang Broadcom Corporation Embedded Linux Conference 2011
  • 2. Introduction ● Higher degree of processor sharing in embedded systems – Before: Several task specific processors with dedicated tasks, often running without OS – Now: Multiple tasks running on powerful application processors, including some DSP tasks ● E.g. making a video call from a mobile phone ● Common tasks include media tasks for audio/video streams, network traffic, wireless and security, etc. ● Expectations of real-time scheduling have become higher – Multiple low latency tasks with significant CPU time usage is more challenging and more interesting – Maintaining low latency and reaching high CPU utilization are often conflicting goals for traditional strict priority or rate monotonic scheduling
  • 3. Introduction ● This talk – Part 1: Meeting real-time requirements of both VoIP and packet forwarding tasks by modifying Linux priority model with RT_PREEMPT – Part 2: Take it further, how to make everyone happy in a three class scheduling problem (not yet a solution, but sharing thoughts)
  • 4. Introduction ● System in background – Home gateway device with three major classes of tasks: Packet forwarding, VoIP and Applications such as UI ● VoIP: Real-time ● Packet forwarding: Even more so – low latency tolerance due to limited Rx buffers in the embedded device ● Warning: UI application can be hard read-time – User's patience runs out → computers destroyed ● If in doubt, search Internet
  • 5. VoIP and Packet Forwarding ● Problem for Part 1 – When system is loaded with heavy network traffic,VoIP tasks cannot get enough CPU cycles ● Incomplete solution – Make voice threads run at real-time class ● Despite that they can preempt other normal priority kernel threads at will, network traffic still has higher priority ● Complete understanding the behavior of softirq is the key to solve the problem
  • 6. Native Linux Priority Model ksoftirqd ● Variable priority of softirq normal user – Very high when lightly loaded – processes and App kernel threads preempt everything except hard workqueues irq – Low, running in ksoftirqd kernel real-time user thread when oversubscribed VoIP processes and kernel threads ● The goal is to provide low latency to softirq tasks without tasklet starving other tasks Packet softirq – The downside is that there is no Forwarding timers consistent priority model IRQ
  • 7. Undesirable Default Priority Model ● In default Linux priority model, no kernel thread/user process can have higher strict priority over network packet processing – softirq is thread like but not under the control of process scheduler – Moving voice processing to softirq is not practical, nor does it give voice tasks the highest priority ● It could run in ksoftirqd too – Moving to hard irq might work as an ugly solution, but nobody would like the side effects – Layered OS is possible, but overkill ● The problem cannot be solved without changing this model
  • 8. Changing the Priority Model ● The solution: Always run softirq in kernel thread context – Now a feature in RT_PREEMPT ● Called “preempt softirq” in Kconfig, but better described as running softirq in thread context – This code is less-known and small, but important ● First developed by Ingo Monlar's as an independent patch. Later included in RT_PREEMPT. Some of the RT_PREEMPT features are mainlined, but not this one (as of 2.6.38) ● With this change, traditional softirq no longer exists ̶ softirq tasks always run in ksoftirqd
  • 9. Changing The Priority Model ksoftirqd normal user Normal user App processes and App processes and kernel threads kernel threads workqueues workqueues Timers real-time user ksoftirqd Packet VoIP processes and forwarding Tasklet kernel threads (NetRx/Tx) Real-time user processes and tasklet kernel threads Packet softirq Forwarding timers VoIP IRQ IRQ
  • 10. Under The New Priority Model ● Moving softirq to a process context and using real-time scheduling classes made a working system – VoIP tasks are always happy ● Higher strict priority over everything except hard irq – Packet forwarding is mostly happy ● Lower priority than VoIP but won't be affected by anything else
  • 11. On softirq ● My observations on softirq – Original purpose is trying to do everything in a non-preemptive kernel ● Defer hard irq processing ● Enable high priority low-latency tasks, and avoid starving other tasks ● Provide preemption when kernel preemption is not enabled/supported – Kernel process scheduling has included most of these features over the years – Cost of softirq ● Priority jumps up and down ● Having both Interrupt context and non-interrupt context can create problems – Inflexible binding of spin lock ↔ softirq, mutex ↔ process – Feature creep, complex protocols in net rx/tx are not necessarily irq related ● Phase out softirq? – This gets the author's vote, but Linux platforms are diverse, and it may be bad for other systems ● At least packet forwarding is not worse, reporting that running Net Rx/Tx in process context resulted in similar network packet throughput – Keep hard irq, everything else in the process context under the control of process scheduler ● Some special mechanisms and optimizations for ultra low latency tasks
  • 12. Next Step ● Make it even better – problem for Part 2 ● No compromise, make everyone completely happy – Meet strict performance requirements of both VoIP and packet forwarding tasks without one having the priority over the other – Provide latency guarantees to UI applications too, so users can enjoy a responsive UI
  • 13. Next Step ● Disclaimer: I would like to share some thoughts for which I haven't experimented – Me and process scheduler code ● Tried an early version of SCHED_DEADLINE, but didn't give it enough time to see results ● Tried to write my own deadline based scheduler – Not necessarily to make it a final product, but to better understand the problem. Didn't find enough time to complete ● Relation between this talk to SCHED_DEADLINE – Overlapping parts can be equivalent, need to do more investigations to find out all the similarities and differences ● In next pages – A bucket model from me – A proposed scheduling mechanism – Characteristics, problems and more features to add
  • 14. Understanding Requirements ● Real-time requirements cannot be described with simple priorities – Max latency ● Packet forwarding: 1 ms – Limited Rx buffers + wirespeed forwarding ● VoIP: 10 ms class – Protocols, DMA ● Application: 500 ms class – CPU time asked ● Packet forwarding: 0%~100% ● VoIP: 0%~50% ● Application: 0%~100% – CPU time “priority” ● Packet forwarding: Lower than VoIP, higher than application ● VoIP: Gets as much as needed ● Application: Shouldn't be starved (guarantee > 5%)
  • 15. Making It Better ● QoS process scheduling? – Detailed requirements on multiple aspects are similar to network QoS – Both network QoS and process scheduling are related to resource allocation and queuing ● CFS ~ Fair Queuing
  • 16. QoS Scheduling ● Review of existing mechanisms – making everyone happy is difficult – Strict priority can starve tasks ● Another side effect is that priority inversion leads to deadlocks – CFS provide good CPU time allocation, but no latency guarantee – Real-time group scheduling can be a solution, but with some drawbacks ● Global scheduling period needs to be aligned with the most latency sensitive task, very frequent task switching possible ● Fixed CPU time allocation ● To solve the three-class QoS scheduling problem – Deadline-based scheduling as the main component, complimented by some additional mechanisms
  • 17. QoS Scheduling ● Background on deadline-based scheduling – For batch processing, clear model for Earliest Deadline First (EDF) scheduling ● N tasks, each with known deadline ● Always pick the task with earliest deadline when rescheduling ● Optimal – guarantee that every job meets its deadline if schedulable – Extended to dynamic scheduling with recurrence ●Regenerate each task at a certain period, and run the classical EDF in each period – Alternative methods may exist – See SCHED_DEADLINE and related papers ● http://gitorious.org/sched_deadline/pages/Home ● In this talk we use a continuous bucket model instead – Turns out things can work without scheduling period
  • 18. Bucket Model Each task can be described with ● It's simple inflow rate and bucket size – Drain the right bucket at the right time, do not let any bucket to overflow – Modeled after producer → buffer → consumer problems ● Can be reversed to prevent underflow for media applications ● Continuous model – No scheduling period or task recurrence period, tasks as incoming streams Switch ● Background – Borrow concept from network QoS – Generic model, not limited to CPU can drain any bucket at anytime, deadline scheduling but only one at a given moment
  • 19. QoS Scheduling ● Mapping tasks into bucket model – The first parameter, deadline or buffer overflow time often already exists in real world ● Packet forwarding: ~1 ms ● VoIP: ~10 ms ● UI Application: ~500 ms (patience runs out = buffer overflow) – Two independent parameters ● Assuming constant inflow rate and bucket size ● Bucket Size, Inflow Rate → Deadline ● Inflow Rate, Drain Rate → CPU Time Allocation ● Only two independent parameters in this model, e.g. bucket size can be normalized out. I use – Bucket fill time (latency) – Bucket drain time (bandwidth, determines CPU time allocation when combined with bucket fill time) ● Classful task scheduling – No need to map deadline parameters to every task, deadline scheduling should happen at task group level
  • 20. Deadline Based Scheduling Algorithm ● Have bucket model, will schedule ● When to reschedule is important – Multiple valid solutions to meet the goal – Cannot always run earliest deadline first in a continuous model – results in infinitely high task switching frequency ● A straightforward first step design with lazy switching – Mechanism A ● Triggered: When one of the inactive task's bucket is full – deadline reached ● Action: Switch to that task – Mechanism B ● Triggered: When current task's bucket is empty – Indicated by current finishes/blocks – As calculated by bucket model, but the task hasn't finished – timeslice overrun ● Action: Switch to lower scheduling domains or idle task
  • 21. Lazy Switching Bucket Scheduler Example Bucket Fill / Active Task 1 Task A 0 Active 1 Task B 0 Active Lower Scheduling Domain / Idle Task Active Time
  • 22. Discussion ● Characteristics, good or bad – Adaptive time slice ● If anyone familiar with image processing, this should be comparable to dithering, half-tone pattern or error diffusion – Each task is protected with latency and CPU time guarantee, won't be affected by other tasks – Bad mode / pattern possible ● When two classes with close or equal parameters and both of their bucket is about to overflow, there will be very frequent task switching / thrashing ● Need to be prevented / mitigated with additional mechanisms ● One possible solution is to look ahead to one or more future deadlines and switch before bucket is full – Overheads are higher than most schedulers – but we probably only need a handful of QoS scheduling classes in a given system
  • 23. Discussion ● Features to add – SMP ignored in this talk, should be SMP aware – Soft real-time features ● What was discussed before made fixed CPU time reservations, sometimes we want to consider both worst case and average case, where tasks do not use up all their CPU time allocations – Two sets of bucket parameters, one for worst case guarantee, one for more optimistic situation – Allow each task to have two taps to receive CPU time, a primary tap at deadline scheduling domain and a second tap at lower scheduling domain, e.g. CFS
  • 24. Conclusion ● A good time to experiment with process schedulers – Task scheduling can cause a surprising number of problems in modern embedded systems – Don't be afraid to work on it. Task scheduling may be less up to date as you think – in actual OS or academia ● Process scheduler is deeply buried in OS core and rarely gives trouble in desktop systems ● Not very easy to experiment and do research on – Many ideas should have been discussed, but not necessarily connected to real OS – With pluggable scheduler design in newer Linux kernels, it is now relatively easy to experiment
  • 25. Conclusion ● Do we really need more scheduling mechanisms? – More challenging for embedded systems ● More real-time requirements, less powerful processors, one-size-fits-all may not be enough – Less challenging for embedded systems ● Scalability is not a must ● OK to be mission specific – Useful even if not mainlined – Answer is probably yes