Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux

68 views

Published on

Building Linux-based low-latency audio processing software for nowadays multi-core devices can be cumbersome. I’ll present some of our on-going research on the topic at the Real-Time Systems Lab of Scuola Superiore Sant’Anna, focusing on sound synthesis on Android where power-efficiency is a must.

The talk will provide basic background information on how the audio sub-system of Linux works, in terms of interactions between the Linux kernel and the ALSA sound architecture, including how user-space applications normally cope with low-latency requirements, touching briefly on design concepts behind the existence of the JACK low-latency framework. Then, a few concepts will be provided on the peculiarities of the Android audio processing pipeline, crossing the concepts with the due complications arising from the world of mobile and power-efficient devices. Throughout the talk, I’ll touch upon concepts behind our research efforts on the topic, describing how properly designed real-time CPU scheduling strategies can make a difference in what is achievable in this area.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux

  1. 1. Low-latency and power-efficient audio applications on Linux Tommaso Cucinotta tommaso.cucinotta@santannapisa.it
  2. 2. About me LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 2 / 22 ■ 2016-present: Associate Professor at the Real-Time Systems Laboratory (RETIS) of Scuola Superiore Sant’Anna: teaching Component-Based Software Design, Cloud Computing, Big-Data, . . . ■ 2014-2016: Software Development Engineer at AWS, improving the real-time performance and scalability of DynamoDB ■ 2012-2014: Researcher at Alcatel-Lucent Bell Labs, investigating on security and real-time performance of cloud infrastructures with focus on IMS and NFV ■ 2005-2012: Researcher at the RETIS, investigating on adaptive real-time scheduling for multimedia applications on Linux ■ 2001-2004: PhD in Computer Security & Smart-Card Based Authentication, RETIS
  3. 3. About the RETIS LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 3 / 22 ■ Belongs to the Institute of Communications, Information and Perception Technologies of Scuola Superiore Sant’Anna in Pisa ■ Research specialties ◆ predictable execution of software through ■ mechanisms at operating system and kernel level ■ design methodologies and tools ■ performance and timing analysis ◆ provide real-time support for emerging computing platforms ■ multi-core and heterogeneous platforms (big.LITTLE, GPGPU, FPGA) ■ distributed infrastructures for cloud & big-data computing and NFV ◆ make real-time systems resource- and energy- efficient ◆ hard real-time use-cases: automotive, industrial automation, railroads ◆ soft real-time use-cases: multimedia, health-care, telecommunications
  4. 4. Introduction LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 4 / 22 Common multimedia processing case: audio playback and video streaming ■ Works without particular precautions ■ No interactivity nor low-latency requirements ■ 100s of ms, or even seconds of data can be pre-buffered and pre-processed ■ run-time platform (user-space + kernel) needs only ensure presenting pre-processed A/V frames timely to the underlying hardware
  5. 5. Introduction LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 4 / 22 Common multimedia processing case: audio playback and video streaming ■ Works without particular precautions ■ No interactivity nor low-latency requirements ■ 100s of ms, or even seconds of data can be pre-buffered and pre-processed ■ run-time platform (user-space + kernel) needs only ensure presenting pre-processed A/V frames timely to the underlying hardware What about interactivity ?
  6. 6. Problem LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 5 / 22 Interactive multimedia processing ■ low-latency requirement from when a user interaction happens, to when it is reflected in the output A/V stream
  7. 7. Problem LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 5 / 22 Interactive multimedia processing ■ low-latency requirement from when a user interaction happens, to when it is reflected in the output A/V stream Examples ■ video editing: change filter(s) and/or parameters in a real-time video processing pipeline ■ on-line interactive services: eg, office automation, etc. ■ gaming, VR, AR: user interacts with environment and/or other users (eg, multi-player shooting) ■ software-based sound synthesis: user presses one or more instrument keys / controllers
  8. 8. Problem LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 6 / 22 Interactive multimedia processing: how can we achieve low latency ? ■ Digital Audio Workstation (DAW) ◆ DSPs do the real-time work ◆ the general-purpose OS and software just takes care of configuring its pipeline and parameters
  9. 9. Problem LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 6 / 22 Interactive multimedia processing: how can we achieve low latency ? ■ Digital Audio Workstation (DAW) ◆ DSPs do the real-time work ◆ the general-purpose OS and software just takes care of configuring its pipeline and parameters ■ EXPENSIVE ! → Software-based solutions
  10. 10. Problem LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 6 / 22 Interactive multimedia processing: how can we achieve low latency ? ■ Digital Audio Workstation (DAW) ◆ DSPs do the real-time work ◆ the general-purpose OS and software just takes care of configuring its pipeline and parameters ■ EXPENSIVE ! → Software-based solutions ■ “1-system 1-function” paradigm ◆ device dedicated to a single application ◆ nothing else runs with real-time requirements ◆ we can use priorities to minimize interferences
  11. 11. Real-time audio processing LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 7 / 22 Commonly found guidelines for low-latency, skip-free interactive audio processing eg, from http://jackaudio.org/faq/linux_rt_config.html ■ create group of users who can gain RT priority groupadd audio cat /etc/security/limits.d/99-realtime.conf audio - rtprio 99 audio - memlock unlimited ■ add unprivileged user to the new group usermod -a -G audio yourUserID ■ install a “real-time / low-latency” kernel
  12. 12. Real-time audio processing LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 7 / 22 Commonly found guidelines for low-latency, skip-free interactive audio processing eg, from http://jackaudio.org/faq/linux_rt_config.html ■ create group of users who can gain RT priority groupadd audio cat /etc/security/limits.d/99-realtime.conf audio - rtprio 99 audio - memlock unlimited ■ add unprivileged user to the new group usermod -a -G audio yourUserID ■ install a “real-time / low-latency” kernel So, problem solved ?
  13. 13. What about energy? LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 8 / 22 Plenty of energy saving features in the hardware ■ Dynamic Voltage and Frequency Scaling (DVFS) ■ Performance states (P-states), Operating Performance Points (OPP) ■ Core idle states (C-states) ■ Turbo Boosting (hmmm....): spike-up CPU frequency when/if possible
  14. 14. What about energy? LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 8 / 22 Plenty of energy saving features in the hardware ■ Dynamic Voltage and Frequency Scaling (DVFS) ■ Performance states (P-states), Operating Performance Points (OPP) ■ Core idle states (C-states) ■ Turbo Boosting (hmmm....): spike-up CPU frequency when/if possible Useful in a number of cases (both battery-operated and not) ■ laptops, tablets, smartphones ■ desktop PCs, servers
  15. 15. What about energy? LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 8 / 22 Plenty of energy saving features in the hardware ■ Dynamic Voltage and Frequency Scaling (DVFS) ■ Performance states (P-states), Operating Performance Points (OPP) ■ Core idle states (C-states) ■ Turbo Boosting (hmmm....): spike-up CPU frequency when/if possible Useful in a number of cases (both battery-operated and not) ■ laptops, tablets, smartphones ■ desktop PCs, servers All bad for performance stability!
  16. 16. Platform stability LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 9 / 22 Energy saving features in the hardware adverseley impact performance stability and software predictability ■ DVFS → CPUs run at different frequencies over time ◆ frequency islands: groups of CPUs are constrained to the same frequency ■ P-states → even less control on what frequency CPU(s) are running at ◆ frequency control in hardware, high-level tunable exposed to software (minPct, maxPct) ■ C-states → time to enter and exit idle state is variable ◆ going to a deep-idle state is worth only if staying there for a minimum residency time
  17. 17. C-states LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 10 / 22 wake-up resid. C-state latency time POLL 0 0 C1 2 2 C1E 10 20 C3 70 100 C6 85 200 C7s 124 800 C8 200 800 C9 480 5000 C10 890 5000
  18. 18. Making the platform stable LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 11 / 22 How users typically make the computing platform (more) stable/predictable ■ turn-off Turbo Boosting ■ disable DVFS (leverage it to fix frequency), eg: ◆ performance governor or ◆ userspace governor if/when available ■ fix performance % with P-state driver (minPct=maxPct) ■ inhibit deep-idle states ◆ echo 1 > /sys/devices/system/cpu/cpu<n>/cpuidle/state<s>/disable ◆ echo 1 > /sys/devices/system/cpu/cpu0/power/pm qos resume latency us
  19. 19. Making the platform stable LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 11 / 22 How users typically make the computing platform (more) stable/predictable ■ turn-off Turbo Boosting ■ disable DVFS (leverage it to fix frequency), eg: ◆ performance governor or ◆ userspace governor if/when available ■ fix performance % with P-state driver (minPct=maxPct) ■ inhibit deep-idle states ◆ echo 1 > /sys/devices/system/cpu/cpu<n>/cpuidle/state<s>/disable ◆ echo 1 > /sys/devices/system/cpu/cpu0/power/pm qos resume latency us ■ or, just run: ◆ yes > /dev/null & [times # of CPUs]
  20. 20. Making the platform stable LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 11 / 22 How users typically make the computing platform (more) stable/predictable ■ turn-off Turbo Boosting ■ disable DVFS (leverage it to fix frequency), eg: ◆ performance governor or ◆ userspace governor if/when available ■ fix performance % with P-state driver (minPct=maxPct) ■ inhibit deep-idle states ◆ echo 1 > /sys/devices/system/cpu/cpu<n>/cpuidle/state<s>/disable ◆ echo 1 > /sys/devices/system/cpu/cpu0/power/pm qos resume latency us ■ or, just run: ◆ yes > /dev/null & [times # of CPUs] ■ Bad for energy consumption!
  21. 21. Why audio skips LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 12 / 22 ∎ audio burst in playback (top) ∎ fill-level of audio ring buffer (middle) ∎ RT app thread (bottom) ∎ big ring buffer → high latency! ∎ empty ring buffer → audible glitch! ∎ small ring buffer periodically refilled → low latency, glitch-free playback!
  22. 22. Android audio architecture LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 13 / 22 Android audio applications ∎ android.media APIs ◇ playing/recording audio files, Internet streaming ◇ use of large buffers (no low-latency use-cases) ◇ regular mixer thread Low-latency audio applications ∎ native APIs (OpenSL ES, AAudio) ◇ low-latency audio pro- cessing ◇ rely on FastMixer and ALSA ∎ critically low-latency ◇ exclusive mode in AAudio / ALSA
  23. 23. Android audio architecture LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 14 / 22 Power management in Android ■ schedutil selects the minimum operating performance point (OPP) able to satisfy demand ■ based on CPU utilization statistics ◆ Per-Entity Load-Tracking (PELT) ■ exponentially weighted task utilization ■ slow to detect workload changes (ramp-up, cool-down) eg, it may take 50–100 ms to detect a 90% increase of CPU % demand ◆ Window-Assisted Load-Tracking (WALT) ■ max{last window util., avg util. over past N windows} eg, over 3 past 10 ms windows, we have a 10 ms spike detection latency, and a 30 ms cool-down one ■ it forgets quickly a task demand when the task is off the rq ■ WALT more reactive than PELT, but ...
  24. 24. Android audio architecture LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 14 / 22 Power management in Android ■ schedutil selects the minimum operating performance point (OPP) able to satisfy demand ■ based on CPU utilization statistics ◆ Per-Entity Load-Tracking (PELT) ■ exponentially weighted task utilization ■ slow to detect workload changes (ramp-up, cool-down) eg, it may take 50–100 ms to detect a 90% increase of CPU % demand ◆ Window-Assisted Load-Tracking (WALT) ■ max{last window util., avg util. over past N windows} eg, over 3 past 10 ms windows, we have a 10 ms spike detection latency, and a 30 ms cool-down one ■ it forgets quickly a task demand when the task is off the rq ■ WALT more reactive than PELT, but ... not enough for very dynamic workloads ■ can we improve on that?
  25. 25. SCHED DEADLINE LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 15 / 22 SCHED DEADLINE from RETIS+Evidence (ACTORS EU project) ■ mainline since v3.14 (2013) ■ reservation-based scheduling ■ a task is reserved a given runtime within a deadline every period struct sched attr attr = { .size = sizeof(struct sched attr), .sched policy = SCHED DEADLINE, .sched flags = 0, // RECLAIM | RESET ON FORK .sched runtime = runtime us * 1000, .sched deadline = deadline us * 1000, .sched period = period us * 1000 }; if (sched setattr(0, &attr, 0) < 0) { perror("setattr() failed"); exit(-1); }
  26. 26. SCHED DEADLINE LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 16 / 22 How is SCHED DEADLINE w.r.t. POSIX RT? ■ any SCHED DEADLINE task runs before any POSIX RT or CFS task ◆ based on resource reservations (next slide) ◆ throttling safeguard to avoid locking the CPU (can be disabled if needed) ■ any POSIX RT (FIFO/RR) task runs before any CFS task ◆ based on priorities ◆ throttling safeguard to avoid locking the CPU (can be disabled if needed) ■ Completely Fair Scheduler (CFS) tasks run when no SCHED DEADLINE nor RT tasks can ◆ based on weights (weighted fair scheduler)
  27. 27. SCHED DEADLINE LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 17 / 22 Main SCHED DEADLINE properties ■ based on EDF (optimum on uni-processors) and (Hard) Costant Bandwidth Server (CBS) ■ temporal isolation: a task inability to respect its runtime doesn’t affect others ■ on multi-processors: anything from G-EDF (tardiness bound) to P-EDF When trying to exceed the runtime ■ task gets throttled (original) ■ opportunistically get extra runtime (GRUB), if RECLAIM used
  28. 28. SCHED DEADLINE and schedutil LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 18 / 22 schedutil decided OPP depends on overall system utilization, in which we have: ■ SCHED DEADLINE tasks’ utilization: runtime period dynamic workload demand changes via sched setattr(): ■ readily accounted for, by schedutil
  29. 29. SCHED DEADLINE and schedutil LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 18 / 22 schedutil decided OPP depends on overall system utilization, in which we have: ■ SCHED DEADLINE tasks’ utilization: runtime period dynamic workload demand changes via sched setattr(): ■ readily accounted for, by schedutil Does it work?
  30. 30. SCHED DEADLINE and schedutil LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 18 / 22 schedutil decided OPP depends on overall system utilization, in which we have: ■ SCHED DEADLINE tasks’ utilization: runtime period dynamic workload demand changes via sched setattr(): ■ readily accounted for, by schedutil Does it work? Results on a HiKey 960 board: ■ energy-efficient set-up: glitch-free playback at 2.67ms latency, vs 26.67ms of mainline Android using SCHED FIFO and WALT, at the cost of +6.25% power consumption ■ low-latency set-up: at 2.67ms latency, saved 40% energy wrt mainline Android using SCHED FIFO and WALT
  31. 31. Heterogeneous Architectures LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 19 / 22 ARM big.LITTLE (and DynamIQ) architectures ■ tasks can migrate among big and LITTLE cores (same ISA) ■ big cores: high-performance workloads ■ LITTLE cores: energy-efficient workloads ARM Energy Aware Scheduling (EAS) ■ give kernel awareness of the CPU capacity associated with big and LITTLE cores ■ give kernel clues as to how capacity of big and LITTLE cores scales with CPU frequency ■ provide CFS with more informed task placement and migration decisions
  32. 32. Capacity enhancement patches LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 20 / 22 SCHED DEADLINE improvements to account for CPU capacity ■ runtime is specified in terms of the fastest CPU at the fastest frequency ◆ it gets automatically rescaled using the CPU capacity figures ■ if there’s a choice, prefer LITTLE cores before going to big ones ■ proper consideration of CPU capacity in schedutil
  33. 33. Related publications LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 21 / 22 ∎ A. Balsini, Towards Hard and Soft Real-time Operating Systems for Multicore Heterogeneous Architectures, PhD dissertation, 2018 ∎ A. Balsini et al., Modeling and simulation of power consumption and execution times for real-time tasks on embedded heterogeneous architectures, EWILI 2018 ∎ T. Cucinotta et al., Improving Responsiveness of Time-Sensitive Applications by Exploiting Dynamic Task Dependencies, Wiley SPE 2018 ∎ C. Scordino et al., Energy-aware real-time scheduling in the linux kernel, ACM SAC 2018 ∎ D. B. de Oliveira et al., Nested Locks in the Lock Implementation: The Real-Time Read-Write Semaphores on Linux, RTSOPS 2018 ∎ M. Marinoni et al., Allocation and control of computing resources for real-time Virtual Network Functions, SOFTNETWORKING 2018 ∎ T. Cucinotta et al., Adaptive Real-Time Scheduling for Legacy Multimedia Applications, ACM TECS 2012 ∎ J. Lelli et al., An Experimental Comparison of Different Real-Time Schedulers on Multicore Systems, Elsevier JSS 2012 ∎ T. Cucinotta et al., Virtualised e-Learning on the IRMOS Real-time Cloud, Springer SOCA’12 ∎ T. Cucinotta et al., A robust mechanism for adaptive scheduling of multimedia applications, ACM TECS 2011 ∎ T. Cucinotta et al., Low-Latency Audio on Linux by Means of Real-Time Scheduling, LAC’11 ∎ T. Cucinotta et al., Virtualised e-Learning with Real-Time Guarantees on the IRMOS Platform, IEEE SOCA 2010 ∎ T. Cucinotta and L. Palopoli, QoS Control for Pipelines of Tasks Using Multiple Resources, IEEE TOC 2010 ∎ L. Palopoli et al, AQuoSA - Adaptive Quality of Service Architecture, Wiley SPE 2008 ∎ L. Abeni et al, QoS Management through adaptive reservations, Springer RTSJ 2005
  34. 34. Q&A LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 22 / 22 Thanks for listening! Questions ? http://retis.santannapisa.it/˜tommaso tommaso.cucinotta@santannapisa.it

×