This document presents GCMA, a Guaranteed Contiguous Memory Allocator that improves upon the current Contiguous Memory Allocator (CMA) solution in Linux. CMA can have unpredictable latency and even fail when allocating contiguous memory, especially under memory pressure or with background workloads. GCMA guarantees fast latency for contiguous memory allocation, success of allocation, and reasonable memory utilization by using discardable memory as its secondary client instead of movable pages. Experimental results on a Raspberry Pi 2 show that GCMA has significantly faster allocation latency than CMA, keeps camera latency fast even with background workloads, and can improve overall system performance compared to CMA.
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
Linux kernel provides executable and formalized memory model. These slides describe the nature of parallel programming in the Linux kernel and what memory model is and why it is necessary and important for kernel programmers. The slides were used at KOSSCON 2018 (https://kosscon.kr/).
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
Linux kernel provides executable and formalized memory model. These slides describe the nature of parallel programming in the Linux kernel and what memory model is and why it is necessary and important for kernel programmers. The slides were used at KOSSCON 2018 (https://kosscon.kr/).
DB Latency Using DRAM + PMem in App Direct & Memory ModesScyllaDB
How does the latency of DDR4 DRAM compare to Intel Optane Persistent Memory when used in both App Direct and Memory Modes for In-Memory database access?
This talk is about the latency benchmarks that I performed by adding gettimeofday() calls around critical DB kernel operations.
This talk covers the technology, cache hit ratios, lots of histograms and lessons learned.
Slides from a talk given to coursemates about my university final year project on the UWE CRTS course which involved porting uCLinux to the Pluto 6 gaming control board.
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
Storage keeps moving forward, and so does the Linux IO stack. This talk will detail some of the recent additions and changes that have gone into the Linux kernel storage stack, helping Linux get the most out of industry innovations in that space.
Jens Axboe, Facebook
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
Getting the best performance out of your Java applications can often be a challenge due to the managed environment nature of the Java Virtual Machine and the non-deterministic behaviour that this introduces. Automatic garbage collection (GC) can seriously affect the ability to hit SLAs for the 99th percentile and above.
This session will start by looking at what we mean by speed and how the JVM, whilst extremely powerful, means we don’t always get the performance characteristics we want. We’ll then move on to discuss some critical features and tools that address these issues, i.e. garbage collection, JIT compilers, etc. At the end of the session, attendees will have a clear understanding of the challenges and solutions for low-latency Java.
Modern systems are large, and complicated, and it is often difficult to account precisely where CPU cycles are spent in production.
Once you begin measuring, you will find all sorts of strange surprises - like cleaning out strange objects from an attic that has accumulated stuff for decades.
This talk discusses surprising places where we found CPU waste in real-world production environments: From Kubelet consuming multiple percent of whole-cluster CPU, via popular machine learning libraries spending their time juggling exceptions instead of classifying, to EC2 time sources being much slower than necessary. CPU cycles are being lost in surprising places, and often it isn't in your own code.
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
LAS16-TR04: Using tracing to tune and optimize EAS (English)
Speakers: Leo Yan
Date: September 28, 2016
★ Session Description ★
The Energy Aware Scheduler relies on a power model, rather than on heuristics, to make decisions and reduce power usage. As a result of this fact-based decision making EAS presents very few tunables and thus requires a significantly different approach to tuning and optimization when compared to the traditional tune/benchmark/repeat cycle. Tuning EAS has perhaps more similar to debugging: using trace tools such as ftrace, kernelshark and LISA, we can examine its decision making and look for ways to improve this decision making. This talk will offer a practical introduction to using these trace tools.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-tr04
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-tr04/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
Building a full kernel takes time but is often necessary during development or when backporting patches. The nature of the kernel makes it easy to distribute its build on multiple cheap machines. This presentation will explain how to set up a build farm based on cost, size, and performance.
Willy Tarreau, HaProxy
We describe a modification to the Linux Kernel which gives an SRE control over the combined bandwidth of logging on a node of a distributed system, while providing a way for the logging source owner (container or service) to control what happens when the bandwidth limit is hit.
Financial Markets have latency and jitter requirements that Kernel-RT PREEMPT_RT allows to solve.
From Timesync with nfp, ptp, whiterabbit, to business requirement of minimal jitter and latency
Using eBPF to Measure the k8s Cluster HealthScyllaDB
As a k8s cluster-admin your app teams have a certain expectation of your cluster to be available to deploy services at any time without problems. While there is no shortage on metrics in k8s its important to have the right metrics to alert on issues and giving you enough data to react to potential availability issues. Prometheus has become a standard and sheds light on the inner behaviour of Kubernetes clusters and workloads. Lots of KPIs (CPU, IO, network. Etc) in our On-Premise environment are less precise when we start to work in a Cloud environment. Ebpf is the perfect technology that fulfills that requirement as it gives us information down to the kernel level. In 2018 Cloudflare shared an opensource project to expose custom ebpf metrics in Prometheus. Join this session and learn about: • What is ebpf? • What type of metrics we can collect? • How to expose those metrics in a K8s environment. This session will try to deliver a step-by-step guide on how to take advantage of the ebpf exporter.
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
This talk describes how to use some available technologies (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications.
In recent years, there has been a growing interest in supporting virtualized services even in embedded and real-time systems. However, executing real-time applications (characterized by temporal constraints) in virtual machines is not straightforward and presents some non-trivial challenges. This talk will describe how to use some technologies already available in the Linux kernel (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications. After presenting the problem (and providing a quick summary about real-time scheduling), it will be shown how to configure the host and guest kernels and the virtual machine, and how to schedule the VM threads in order to achieve predictable response times and to provide real-time guarantees.
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB
In this talk I will walk you through the performance tuning steps that I took to serve 1.2M JSON requests per second from a 4 vCPU c5 instance, using a simple API server written in C.
At the start of the journey the server is capable of a very respectable 224k req/s with the default configuration. Along the way I made extensive use of tools like FlameGraph and bpftrace to measure, analyze, and optimize the entire stack, from the application framework, to the network driver, all the way down to the kernel.
I began this wild adventure without any prior low-level performance optimization experience; but once I started going down the performance tuning rabbit-hole, there was no turning back. Fueled by my curiosity, willingness to learn, and relentless persistence, I was able to boost performance by over 400% and reduce p99 latency by almost 80%.
DB Latency Using DRAM + PMem in App Direct & Memory ModesScyllaDB
How does the latency of DDR4 DRAM compare to Intel Optane Persistent Memory when used in both App Direct and Memory Modes for In-Memory database access?
This talk is about the latency benchmarks that I performed by adding gettimeofday() calls around critical DB kernel operations.
This talk covers the technology, cache hit ratios, lots of histograms and lessons learned.
Slides from a talk given to coursemates about my university final year project on the UWE CRTS course which involved porting uCLinux to the Pluto 6 gaming control board.
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
Storage keeps moving forward, and so does the Linux IO stack. This talk will detail some of the recent additions and changes that have gone into the Linux kernel storage stack, helping Linux get the most out of industry innovations in that space.
Jens Axboe, Facebook
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
Getting the best performance out of your Java applications can often be a challenge due to the managed environment nature of the Java Virtual Machine and the non-deterministic behaviour that this introduces. Automatic garbage collection (GC) can seriously affect the ability to hit SLAs for the 99th percentile and above.
This session will start by looking at what we mean by speed and how the JVM, whilst extremely powerful, means we don’t always get the performance characteristics we want. We’ll then move on to discuss some critical features and tools that address these issues, i.e. garbage collection, JIT compilers, etc. At the end of the session, attendees will have a clear understanding of the challenges and solutions for low-latency Java.
Modern systems are large, and complicated, and it is often difficult to account precisely where CPU cycles are spent in production.
Once you begin measuring, you will find all sorts of strange surprises - like cleaning out strange objects from an attic that has accumulated stuff for decades.
This talk discusses surprising places where we found CPU waste in real-world production environments: From Kubelet consuming multiple percent of whole-cluster CPU, via popular machine learning libraries spending their time juggling exceptions instead of classifying, to EC2 time sources being much slower than necessary. CPU cycles are being lost in surprising places, and often it isn't in your own code.
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
LAS16-TR04: Using tracing to tune and optimize EAS (English)
Speakers: Leo Yan
Date: September 28, 2016
★ Session Description ★
The Energy Aware Scheduler relies on a power model, rather than on heuristics, to make decisions and reduce power usage. As a result of this fact-based decision making EAS presents very few tunables and thus requires a significantly different approach to tuning and optimization when compared to the traditional tune/benchmark/repeat cycle. Tuning EAS has perhaps more similar to debugging: using trace tools such as ftrace, kernelshark and LISA, we can examine its decision making and look for ways to improve this decision making. This talk will offer a practical introduction to using these trace tools.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-tr04
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-tr04/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
Building a full kernel takes time but is often necessary during development or when backporting patches. The nature of the kernel makes it easy to distribute its build on multiple cheap machines. This presentation will explain how to set up a build farm based on cost, size, and performance.
Willy Tarreau, HaProxy
We describe a modification to the Linux Kernel which gives an SRE control over the combined bandwidth of logging on a node of a distributed system, while providing a way for the logging source owner (container or service) to control what happens when the bandwidth limit is hit.
Financial Markets have latency and jitter requirements that Kernel-RT PREEMPT_RT allows to solve.
From Timesync with nfp, ptp, whiterabbit, to business requirement of minimal jitter and latency
Using eBPF to Measure the k8s Cluster HealthScyllaDB
As a k8s cluster-admin your app teams have a certain expectation of your cluster to be available to deploy services at any time without problems. While there is no shortage on metrics in k8s its important to have the right metrics to alert on issues and giving you enough data to react to potential availability issues. Prometheus has become a standard and sheds light on the inner behaviour of Kubernetes clusters and workloads. Lots of KPIs (CPU, IO, network. Etc) in our On-Premise environment are less precise when we start to work in a Cloud environment. Ebpf is the perfect technology that fulfills that requirement as it gives us information down to the kernel level. In 2018 Cloudflare shared an opensource project to expose custom ebpf metrics in Prometheus. Join this session and learn about: • What is ebpf? • What type of metrics we can collect? • How to expose those metrics in a K8s environment. This session will try to deliver a step-by-step guide on how to take advantage of the ebpf exporter.
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
This talk describes how to use some available technologies (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications.
In recent years, there has been a growing interest in supporting virtualized services even in embedded and real-time systems. However, executing real-time applications (characterized by temporal constraints) in virtual machines is not straightforward and presents some non-trivial challenges. This talk will describe how to use some technologies already available in the Linux kernel (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications. After presenting the problem (and providing a quick summary about real-time scheduling), it will be shown how to configure the host and guest kernels and the virtual machine, and how to schedule the VM threads in order to achieve predictable response times and to provide real-time guarantees.
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB
In this talk I will walk you through the performance tuning steps that I took to serve 1.2M JSON requests per second from a 4 vCPU c5 instance, using a simple API server written in C.
At the start of the journey the server is capable of a very respectable 224k req/s with the default configuration. Along the way I made extensive use of tools like FlameGraph and bpftrace to measure, analyze, and optimize the entire stack, from the application framework, to the network driver, all the way down to the kernel.
I began this wild adventure without any prior low-level performance optimization experience; but once I started going down the performance tuning rabbit-hole, there was no turning back. Fueled by my curiosity, willingness to learn, and relentless persistence, I was able to boost performance by over 400% and reduce p99 latency by almost 80%.
(Live) build and run golang web server on android.aviSeongJae Park
Presented from gdg devfair 2014 and gdg korea golang seoul meetup 2015.
Added explanation about go 1.4 official android support a little from gdg korea golang seoul meetup presentation.
Deep dark-side of git: How git works internallySeongJae Park
Describe how git works internally using small and perfect plumbing commands.
The slide have been used at GDG DevFest 2014 and SOSCON 2014.
The slide can be updated later. And, the latest version would always be provided from this page always.
Continguous Memory Allocator in the Linux KernelKernel TLV
Agenda:
Continguous Memory Allocator - how to allocate large continguous memory for large scale DMA in the kernel.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingChinaNetCloud
ChinaNetCloud training for Linux Memory Basics for SysAdmins.
This is an introduction to general Linux memory for troubleshooting, monitoring, and basic understanding.
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
Ed Huang, CTO of PingCAP, talked at Go System Conference about dealing with the typical and profound issues related to Go’s runtime as your systems become more complex. Taking TiDB as an example, he demonstrated how these problems can be reproduced, located, and analyzed in production.
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageStorage Switzerland
The cloud should be a valuable ally in helping organizations eliminate backup infrastructure and increase their disaster recovery (DR) confidence. The reality is that current cloud backup and DR solutions fall short because they don't fully exploit cloud resources.
Most cloud backup and DR solutions still require on-premises infrastructure and actually increase costs by requiring multiple data protection copies, both on prem and in the cloud.
In this webinar join Storage Switzerland and ClearSky to learn why current solutions aren't doing enough to eliminate backups or simplify DR and how a complete, hybrid cloud storage service can meet these goals without compromising performance.
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliAnne Nicolas
Andrea will provide a short high level view of the most notable milestones in the evolution of the Linux Virtual Memory over the years. He will then focus on the various Memory Management features such as Transparent Huge Pages(THP), automatic NUMA balancing and userfaultd/postcopy live migration of Kernel Virtual Machines (KVM). Andrea will cover best practices, providing the audience with an understanding of when and how to leverage these features in their environments.
Andrea Arcangeli, Red Hat
This presentation will cover the basics of performance testing. Configuring systems correctly is essential to characterizing the performance of SmartNICs. The configuration of BIOS, CPU allocation, OS and VM parameters will be covered. Also, choices of traffic generators and typical test topologies will be described.
Biscuit: an operating system written in goSeongJae Park
Biscuit is a monolithic operating system kernel that is written in Go. This talk introduces the kernel and demonstrate how to build, install, and analyze the code.
Short introduction about vcs & git usage.
Aimed to introduce concept of version control system and git to people who didn't used git or vcs yet. Doesn't introduce deep part of git or operating policy for git. Just focus on simple introduction about main functionality and abstracted internal design.
Describe how to contribute to open source projects.
Provide example process using two hot open source project, linux and AOSP(Android Open Source Project)
Describe how Android input system designed.
Describe the ways to make input event(such as touch, key press, ...) on Android in programmatic way, not manual way.
This can be helpful for some people thinking about Android remote control or test automation.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
2. This work by SeongJae Park is licensed under the Creative
Commons Attribution-ShareAlike 3.0 Unported License. To
view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.
3. These slides were presented during
EWiLi 2015 at Amsterdam
(http://syst.univ-brest.fr/ewili2015/)
4. TL; DR
● Contiguous Memory Allocation is a big problem of embedded system
● Current solutions have weakness
○ Too expensive for embedded system, or
○ Waste memory space, or
○ Too slow and even fail to allocate
● GCMA, our new solution, guarantees
○ Fast latency of contiguous memory allocation
○ Success of contiguous memory allocation
○ and Reasonable memory utilization
● That’s it! ;)
5. Who Wants Physically Contiguous Memory?
● Virtual Memory avoids the fragmentation problem
○ Applications use virtual address; virtual address can be mapped to any physical address
○ MMU do mapping of virtual address to physical address
● In Linux, it uses demand paging and reclaim
Physical
Memory
Application MMUCPU
Logical address Physical address
6. We Need Physically Contiguous Memory!
● Devices using DMA
○ Lots of devices need large, contiguous memory for internal buffer
(e.g., 12 Mega-Pixel Camera needs large, contiguous buffer)
○ Because MMU is behind the CPU, devices using DMA cannot use virtual address space
○ Slow latency or failure of allocation can harm user experience or life
(What if your camera does not react while a kitty go away)
Physical
Memory
Application MMUCPU
Logical address Physical address
Device
DMA
Physical address
7. We Need Physically Contiguous Memory!
● Devices using DMA
○ Lots of devices need large, contiguous memory for internal buffer
(e.g., 12 Mega-Pixel Camera needs large, contiguous buffer)
○ Because MMU is behind the CPU, devices using DMA can not use virtual address space
○ Slow latency or failure of allocation can harm user experience or life
(What if your camera does not react while a kitty go away)
● Huge page, System internal buffer
○ Huge pages can enhance system performance by minimizing TLB miss
○ System needs contiguous memory for internal usage
e.g., struct page *alloc_pages(unsigned int flags, unsigned int order);
NOTE: This research focuses on device case for embedded system, though
8. H/W Solutions: IOMMU, Scatter / Gather DMA
● IOMMU serves as MMU for devices
● Scatter / Gather DMA let DMA to discontiguous memory area
● Consumes more power and money because it requires additional H/W
● Can be expensive for low-end embedded devices
● Useless for Huge page or system internal physically contiguous memory
Physical
Memory
Application MMUCPU
Logical address Physicals address
Device
DMA
Device address
IOMMU
Physical address
9. Reserved Area Technique
● Reserves sufficient amount of contiguous area at boot time and
let only contiguous memory allocation to use the area
● Pros: Simple and effective for contiguous memory allocation
● Cons: System memory space utilization could be low if the contiguous
memory allocation does not fully use the reserved area
○ Sacrifice 10% of your phone memory for a kitty that may or may not meet with?
● Widely adopted solution despite of the utilization problem
System Memory
(Narrowed by Reservation)
Reserved
Area
Process
Normal Allocation Contiguous Allocation
10. CMA: Contiguous Memory Allocator
● Software-based solution in mainline Linux
● Generously extended version of Reserved area technique
○ Mainly focus on memory utilization
○ Let movable pages to use the reserved area
○ If contig mem alloc requires pages being used for movable pages,
move those pages out of reserved area and give the vacant area to contig mem alloc
○ Solves memory utilization problem because general pages are movable
● In other words, CMA gives different priority to clients of the reserved area
○ Primary client: contiguous memory allocation
○ Secondary client: movable page allocation
11. CMA: in Real-land
● Worst case latency for a photo shot using Camera of Raspberry Pi 2 under
background workload is
● 1.6 Seconds when configured with Reserved area technique,
● 9.8 seconds when configured with CMA; Enough for a kitty to go away
12. The Phantom Menace Unveiled: Latency of CMA
● Latency of CMA during a photo shot using Camera of Raspberry Pi 2 under
background workload
● It’s clear that latency of Camera is bounded by CMA
13. CMA: Why So Slow?
● In short, secondary client of CMA is not so nice as expected
○ In this context, ‘nice’ means as unix command ‘nice’ means
● Moving page is expensive task (copying, rmap control, LRU churn, …)
● If someone is holding the page, it could not be moved until the one releases
the page (e.g., get_user_page())
● Result: Unexpected long latency and even failure
● That’s why CMA is not adopted to many devices
● Raspberry Pi does not support CMA officially because of the problem
14. GCMA: Guaranteed Contiguous Memory Allocator
● GCMA is a variant of CMA that guarantees
○ Fast latency and success of allocation
○ While keeping memory utilization as well
● Main idea is simple
○ Follow primary / secondary client idea of CMA to keep memory utilization
○ Select secondary client as only nice one unlike CMA
■ For fast latency, should be able to vacate the reserved area as soon as required without
any task such as moving content
■ For success, should be out of kernel control
■ For memory utilization, most pages should be able to be secondary client
15. Secondary Client of GCMA
● Last chance cache for
○ pages being swapped out and
○ clean pages being evicted from file cache
● Pages being swapped out to write-through swap device
○ Could be discarded immediately because the contents are written through to swap device
○ Kernel think it’s already swapped out
○ Most anonymous pages could be swapped out
● Clean pages being evicted from file cache
○ Could be discarded because the contents are in storage already
○ Kernel think it’s already evicted out
○ Lots of file-backed pages could be the case
● Additional pros 1: Already expected to not be accessed again soon
○ Discarding secondary client pages from GCAM will not affect system performance much
● Additional pros 2: Occurs with severe workload, only
○ In peaceful case, zero overhead at all
16. GCMA: Workflow
● Reserve memory area in boot time
● If a page is swapped out or evicted from file cache, keep the page in reserved
area
○ If system requires the page again, give it back from the reserved area
○ If contiguous memory allocation requires area being used by those pages, discard those
pages and use the area for contiguous memory allocation
System Memory GCMA Reserved
Area
Swap Device
File
Process
Normal Allocation Contiguous Allocation
Reclaim / Swap
Get back if hit
17. GCMA Implementation
● Used Cleancache / Frontswap interface for second class client of GCMA
rather than inventing the wheel again
● DMEM: Discardable Memory
○ Works as a last chance cache for secondary client pages using GCMA reserved area
○ Manages Index table using hash table and RB-tree
○ Evicts pages in LRU scheme
Reserved Area
Dmem Logic
GCMA Logic
Dmem Interface
CMA Interface
CMA Logic
Cleancache Frontswap
18. GCMA Implementation: in Short
● Implemented on Linux v3.18
● About 1,500 lines of code
● Ported, evaluated on CuBox and Raspberry Pi 2
● Available under GPL v3 at: https://github.com/sjp38/linux.
gcma/releases/tag/gcma/rfc/v2
19. GCMA Optimization for Write-through Swap
● Write-through swap device could consume I/O resource
● GCMA recommends compressed memory block device as a swap device
○ Zram is a compressed memory block device in upstream Linux
○ Google also recommends zram as a swap device for low-memory Android devices
20. Experimental Setup
● Device setup
○ Raspberry Pi 2
○ ARM cortex-A7 900 MHz * 4 cores
○ 1 GiB LPDDR2 SDRAM
○ Class 10 SanDisk 16 GiB microSD card
● Configurations
○ Baseline: Linux rpi-v3.18.11 + 100 MiB swap +
256 MiB reserved area
○ CMA: Linux rpi-v3.18.11 + 100 MiB swap +
256 MiB CMA area
○ GCMA: Linux rpi-v3.18.11 + 100 MiB Zram swap +
256 MiB GCMA area
● Workloads
○ Contig Mem Alloc: Contiguous memory allocation using CMA or GCMA
○ Blogbench: Realistic file system benchmark
○ Camera shot: Repeated shot of picture using Raspberry Pi 2 Camera module with 10 seconds
interval between each shot
21. Contig Mem Alloc without Background Workload
● Average of 30 allocations
● GCMA shows 14.89x to 129.41x faster latency compared to CMA
● CMA even failed once for 32,768 contiguous pages allocation even though
there is no background task
22. 4MiB Contig Mem Alloc w/o Background Workload
● Latency of CMA is more predictable compared to GCMA
23. 4MiB Contig Mem Allocation w/ Background Task
● Background workload: Blogbench
● CMA consumes 5 seconds (unacceptable!) for one allocation in bad case
24. Camera Latency w/ Background Task
● Camera is a realistic, important workload of Raspberry Pi 2
● Background task: Blogbench
● GCMA keeps latency as fast as Reserved area technique configuration
25. Blogbench on CMA / GCMA
● Scores are normalized to scores of Baseline configuration
● ‘/ cam’ means camera workload as a background workload
● System under GCMA even outperforms CMA with realistic workload owing to
its light overhead
● Allocation using CMA even degrades system performance because it should
move out pages in reserved area
26. Conclusion
● Contiguous memory allocation is still a big problem for embedded device
○ H/W solutions are too expensive
○ S/W solutions like CMA is not effective
○ Most system uses Reserved area technique despite of low memory utilization
● GCMA guarantees fast latency, success of allocation and reasonable memory
utilization
○ Allocation latency of GCMA is as fast as Reserved area technique
○ System performance under GCMA is is similar or outperforms the performance under CMA