Preemptable ticket spinlocks: improving consolidated performance in the cloudJiannan Ouyang, PhD
This slides were presented at the 9th ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments (VEE '13).
When executing inside a virtual machine environment, OS level synchronization primitives are faced with significant challenges due to the scheduling behavior of the underlying virtual machine monitor. Operations that are ensured to last only a short amount of time on real hardware, are capable of taking considerably longer when running virtualized. This change in assumptions has significant impact when an OS is executing inside a critical region that is protected by a spinlock. The interaction between OS level spinlocks and VMM scheduling is known as the Lock Holder Preemption problem and has a significant impact on overall VM performance. However, with the use of ticket locks instead of generic spinlocks, virtual environments must also contend with waiters being preempted before they are able to acquire the lock. This has the effect of blocking access to a lock, even if the lock itself is available. We identify this scenario as the Lock Waiter Preemption problem. In order to solve both problems we introduce Preemptable Ticket spinlocks, a new locking primitive that is designed to enable a VM to always make forward progress by relaxing the ordering guarantees offered by ticket locks. We show that the use of Preemptable Ticket spinlocks improves VM performance by 5.32X on average, when running on a non paravirtual VMM, and by 7.91X when running on a VMM that supports a paravirtual locking interface, when executing a set of microbenchmarks as well as a realistic e-commerce benchmark.
Mingbo Zhang, Rutgers University
Saman Zonouz, Rutgers University
Time-of-check-to-time-of-use (TOCTOU) also known as “race condition” or “double fetch” is a long standing problem. Since memory read/write is so common an operation, it barely triggers no security mechanisms. We leverage a CPU feature called SMAP(Supervisor Mode Access Prevention) to efficiently monitor the events of kernel accessing user-mode memory. When user pages being accessed by kernel, our mitigation kicks in and protect them against further modifications from other user-mode threads. We also leverage the same CPU feature to find double fetch errors in kernel modules. A simple hypervisor is used to confine a system wide CPU feature such as SMAP to particular process.
Sangam 18 - Database Development: Return of the SQL JediConnor McDonald
A look at the techniques that middle tier developers can employ to get greater value out of their applications, simply by having an understanding of how the database works and how to make it sing.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
The research work that I describe in this dissertation is concerned with
the problem of shared-memory synchronization in large-scale
programs.
The difficulties of developing fine-grained lock-based synchronization
are well-known and many researchers have argued for the need of
alternative approaches.
Simply put, the main goal of my work is to provide an efficient
alternative to such approaches.
My proposal is based on Software Transactional Memory
(STM) and I implemented it in a well-known STM framework for
Java---Deuce STM.
To that end I propose a new approach that significantly lowers the
overhead caused by an STM in large-scale programs for which only a
small fraction of the memory is under contention. My solution
combines two novel optimization techniques in a synergistic way,
allowing us to get, for the first time, performance with an STM that
rivals the performance of the best lock-based approaches in some of
the more challenging benchmarks. My approach and experimental
results show that STMs may be the first efficient alternative to locks
for shared-memory synchronization in real-world--sized applications.
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
Mes conteneurs JVM sont en prod, oups ils se font oomkill, oups le démarrage traîne en longueur, oups ils sont lent en permanence. Nous avons vécu ces situations.
Ces problèmes émergent parce qu’un conteneur est par nature un milieu restreint. Sa configuration a un impact sur le process Java, cependant ce process a lui aussi des besoins pour fonctionner.
Il y a un espace entre la heap Java et le RSS : c’est la mémoire off-heap et elle se décompose en plusieurs zones. À quoi servent-elles ? Comment les prendre en compte ?
La configuration du CPU impacte la JVM sur divers aspects : Quelles sont les influences entre le GC et le CPU ? Que choisir entre la rapidité ou la consommation CPU au démarrage ?
Au cours de cette université nous verrons comment diagnostiquer, comprendre et remédier à ces problèmes.
Artificial intelligence (AI) has already been attracting the attention of deep tech investors for some years. The reasons why are clear. In its ‘Sizing The Prize’ analysis of artificial intelligence (AI), PwC forecast that AI will contribute $15.7 trillion to the global economy by 2030, with the ‘AI boost’ available to most national economies being approximately 26%. But what investors often overlook is that AI is not singular. Many individual components must work together to create AI.
At its core artificial intelligence consists essentially of detecting statistical patterns in signals with many dimensions, such as analysis of audio frequencies (voice recognition) or high-resolution images (face recognition). The repetition of this search in order to detect these patterns is the basis of artificial intelligence.
There are usually three components to AI:
First, given a data set, learning what the patterns are.
Second, building a model that can detect these patterns.
Third, model deployment to the target environment.
Traditionally, data mining or learning was done by experts in the matter who would develop some sort of classifier or detector based on certain features, and then try to see their correlations. This process was tedious and time consuming.
https://klepsydra.com/cityam-ai-on-the-edge/
On the way to low latency (2nd edition)Artem Orobets
This is the second edition of the story about how we struggled to implement strict latency requirements in a service implemented with Java and how we managed to do that.
The most common latency contributors are an in-process locking, thread scheduling, I/O, algorithmic inefficiencies and, of course, garbage collector.
I will share our experience of dealing with the causes. And tell what you can do to prevent them from affecting the production.
Aggregating Ad Events with Kafka Streams and Interactive Queries at InvidiHostedbyConfluent
"Invidi ad decisioning engine needs semi-realtime feedback on the performance of the ad campaigns it runs.
In the heart of this feedback loop there is a service that aggregates 1B+ daily ad tracking events and serves campaign performance time series to the ad decisioning engine over http. Recently we successfully rewrote it as a pure Kafka Streams application with all data being stored in Kafka and served via Interactive Queries.
The experience was surprisingly not straightforward and we had to trade off some of the simplicity of our processing topology to increase scalability and lower resource consumption.
In this talk we plan to go over the system architecture and share the issues we faced and how we solved them.
Here are some highlights:
- The distribution of our aggregation keys is very skewed, so early repartitioning resulted in poor scalability. To mitigate this issue we used a scatter-gather approach avoiding repartitioning and combining results in IQ. To minimize memory consumption we had to combine the above approach with pre-aggregating events before re-partitioning in a lambda-architecture style.
- Due to multiple stores sharing buffer memory we had to resort to manually deleting entries in our live windowed stores to avoid premature flushing of the aggregates due to cache thrashing.
- We had to implement our own in-memory windowed store to increase IQ performance
We hope that our findings will be helpful to a wider audience.
We also plan to file and fix the issues we discovered in the near future."
Static partitioning is used to split an embedded system into multiple domains, each of them having access only to a portion of the hardware on the SoC. It is key to enable mixed-criticality scenarios, where a critical application, often based on a small RTOS, runs alongside a larger non-critical app, typically based on Linux. The two domains cannot interfere with each other.
This talk will explain how to use Xen for static partitioning. It will introduce dom0-less, a new Xen feature written for the purpose. Dom0-less allows multiple VMs to start at boot time directly from the Xen hypervisor, decreasing boot times drastically. It makes it very easy to partition the system without virtualization overhead. Dom0 becomes unnecessary.
This presentation will go into details on how to setup a Xen dom0-less system. It will show configuration examples and explain device assignment. The talk will discuss its implications for latency-sensitive and safety-critical environments.
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...The Linux Foundation
TrenchBoot is a cross-community OSS integration project for hardware-rooted, late launch integrity of open and proprietary systems. It provides a general purpose, open-source DRTM kernel for measured system launch and attestation of device integrity to trust-centric access infrastructure. TrenchBoot closes the UEFI Measurement Gap and reduces the need to trust system firmware. This talk will introduce TrenchBoot architecture and a recent collaboration with Oracle to launch the Linux kernel directly with Intel TXT or AMD SVM Secure Launch. It will propose mechanisms for integrating the Xen hypervisor into a TrenchBoot system launch. DRTM-enabled capabilities for client, server and embedded platforms will be presented for consideration by the Xen community.
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...The Linux Foundation
Artem will briefly cover what has been done since the first talk on Xen in Automotive domain back in 2013, what is going on now and what is still missing for broad adaptation of Xen in vehicles. The following topics will be covered:
Embedded/automotive features of Xen
Collaboration with AGL and GENIVI organizations for standardization
Efforts on Functional Safety compliance
Artem will also go over typical automotive use scenarios for Xen which may not be the same as generic computing use of hypervisor.
Preemptable ticket spinlocks: improving consolidated performance in the cloudJiannan Ouyang, PhD
This slides were presented at the 9th ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments (VEE '13).
When executing inside a virtual machine environment, OS level synchronization primitives are faced with significant challenges due to the scheduling behavior of the underlying virtual machine monitor. Operations that are ensured to last only a short amount of time on real hardware, are capable of taking considerably longer when running virtualized. This change in assumptions has significant impact when an OS is executing inside a critical region that is protected by a spinlock. The interaction between OS level spinlocks and VMM scheduling is known as the Lock Holder Preemption problem and has a significant impact on overall VM performance. However, with the use of ticket locks instead of generic spinlocks, virtual environments must also contend with waiters being preempted before they are able to acquire the lock. This has the effect of blocking access to a lock, even if the lock itself is available. We identify this scenario as the Lock Waiter Preemption problem. In order to solve both problems we introduce Preemptable Ticket spinlocks, a new locking primitive that is designed to enable a VM to always make forward progress by relaxing the ordering guarantees offered by ticket locks. We show that the use of Preemptable Ticket spinlocks improves VM performance by 5.32X on average, when running on a non paravirtual VMM, and by 7.91X when running on a VMM that supports a paravirtual locking interface, when executing a set of microbenchmarks as well as a realistic e-commerce benchmark.
Mingbo Zhang, Rutgers University
Saman Zonouz, Rutgers University
Time-of-check-to-time-of-use (TOCTOU) also known as “race condition” or “double fetch” is a long standing problem. Since memory read/write is so common an operation, it barely triggers no security mechanisms. We leverage a CPU feature called SMAP(Supervisor Mode Access Prevention) to efficiently monitor the events of kernel accessing user-mode memory. When user pages being accessed by kernel, our mitigation kicks in and protect them against further modifications from other user-mode threads. We also leverage the same CPU feature to find double fetch errors in kernel modules. A simple hypervisor is used to confine a system wide CPU feature such as SMAP to particular process.
Sangam 18 - Database Development: Return of the SQL JediConnor McDonald
A look at the techniques that middle tier developers can employ to get greater value out of their applications, simply by having an understanding of how the database works and how to make it sing.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
The research work that I describe in this dissertation is concerned with
the problem of shared-memory synchronization in large-scale
programs.
The difficulties of developing fine-grained lock-based synchronization
are well-known and many researchers have argued for the need of
alternative approaches.
Simply put, the main goal of my work is to provide an efficient
alternative to such approaches.
My proposal is based on Software Transactional Memory
(STM) and I implemented it in a well-known STM framework for
Java---Deuce STM.
To that end I propose a new approach that significantly lowers the
overhead caused by an STM in large-scale programs for which only a
small fraction of the memory is under contention. My solution
combines two novel optimization techniques in a synergistic way,
allowing us to get, for the first time, performance with an STM that
rivals the performance of the best lock-based approaches in some of
the more challenging benchmarks. My approach and experimental
results show that STMs may be the first efficient alternative to locks
for shared-memory synchronization in real-world--sized applications.
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
Mes conteneurs JVM sont en prod, oups ils se font oomkill, oups le démarrage traîne en longueur, oups ils sont lent en permanence. Nous avons vécu ces situations.
Ces problèmes émergent parce qu’un conteneur est par nature un milieu restreint. Sa configuration a un impact sur le process Java, cependant ce process a lui aussi des besoins pour fonctionner.
Il y a un espace entre la heap Java et le RSS : c’est la mémoire off-heap et elle se décompose en plusieurs zones. À quoi servent-elles ? Comment les prendre en compte ?
La configuration du CPU impacte la JVM sur divers aspects : Quelles sont les influences entre le GC et le CPU ? Que choisir entre la rapidité ou la consommation CPU au démarrage ?
Au cours de cette université nous verrons comment diagnostiquer, comprendre et remédier à ces problèmes.
Artificial intelligence (AI) has already been attracting the attention of deep tech investors for some years. The reasons why are clear. In its ‘Sizing The Prize’ analysis of artificial intelligence (AI), PwC forecast that AI will contribute $15.7 trillion to the global economy by 2030, with the ‘AI boost’ available to most national economies being approximately 26%. But what investors often overlook is that AI is not singular. Many individual components must work together to create AI.
At its core artificial intelligence consists essentially of detecting statistical patterns in signals with many dimensions, such as analysis of audio frequencies (voice recognition) or high-resolution images (face recognition). The repetition of this search in order to detect these patterns is the basis of artificial intelligence.
There are usually three components to AI:
First, given a data set, learning what the patterns are.
Second, building a model that can detect these patterns.
Third, model deployment to the target environment.
Traditionally, data mining or learning was done by experts in the matter who would develop some sort of classifier or detector based on certain features, and then try to see their correlations. This process was tedious and time consuming.
https://klepsydra.com/cityam-ai-on-the-edge/
On the way to low latency (2nd edition)Artem Orobets
This is the second edition of the story about how we struggled to implement strict latency requirements in a service implemented with Java and how we managed to do that.
The most common latency contributors are an in-process locking, thread scheduling, I/O, algorithmic inefficiencies and, of course, garbage collector.
I will share our experience of dealing with the causes. And tell what you can do to prevent them from affecting the production.
Aggregating Ad Events with Kafka Streams and Interactive Queries at InvidiHostedbyConfluent
"Invidi ad decisioning engine needs semi-realtime feedback on the performance of the ad campaigns it runs.
In the heart of this feedback loop there is a service that aggregates 1B+ daily ad tracking events and serves campaign performance time series to the ad decisioning engine over http. Recently we successfully rewrote it as a pure Kafka Streams application with all data being stored in Kafka and served via Interactive Queries.
The experience was surprisingly not straightforward and we had to trade off some of the simplicity of our processing topology to increase scalability and lower resource consumption.
In this talk we plan to go over the system architecture and share the issues we faced and how we solved them.
Here are some highlights:
- The distribution of our aggregation keys is very skewed, so early repartitioning resulted in poor scalability. To mitigate this issue we used a scatter-gather approach avoiding repartitioning and combining results in IQ. To minimize memory consumption we had to combine the above approach with pre-aggregating events before re-partitioning in a lambda-architecture style.
- Due to multiple stores sharing buffer memory we had to resort to manually deleting entries in our live windowed stores to avoid premature flushing of the aggregates due to cache thrashing.
- We had to implement our own in-memory windowed store to increase IQ performance
We hope that our findings will be helpful to a wider audience.
We also plan to file and fix the issues we discovered in the near future."
Static partitioning is used to split an embedded system into multiple domains, each of them having access only to a portion of the hardware on the SoC. It is key to enable mixed-criticality scenarios, where a critical application, often based on a small RTOS, runs alongside a larger non-critical app, typically based on Linux. The two domains cannot interfere with each other.
This talk will explain how to use Xen for static partitioning. It will introduce dom0-less, a new Xen feature written for the purpose. Dom0-less allows multiple VMs to start at boot time directly from the Xen hypervisor, decreasing boot times drastically. It makes it very easy to partition the system without virtualization overhead. Dom0 becomes unnecessary.
This presentation will go into details on how to setup a Xen dom0-less system. It will show configuration examples and explain device assignment. The talk will discuss its implications for latency-sensitive and safety-critical environments.
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...The Linux Foundation
TrenchBoot is a cross-community OSS integration project for hardware-rooted, late launch integrity of open and proprietary systems. It provides a general purpose, open-source DRTM kernel for measured system launch and attestation of device integrity to trust-centric access infrastructure. TrenchBoot closes the UEFI Measurement Gap and reduces the need to trust system firmware. This talk will introduce TrenchBoot architecture and a recent collaboration with Oracle to launch the Linux kernel directly with Intel TXT or AMD SVM Secure Launch. It will propose mechanisms for integrating the Xen hypervisor into a TrenchBoot system launch. DRTM-enabled capabilities for client, server and embedded platforms will be presented for consideration by the Xen community.
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...The Linux Foundation
Artem will briefly cover what has been done since the first talk on Xen in Automotive domain back in 2013, what is going on now and what is still missing for broad adaptation of Xen in vehicles. The following topics will be covered:
Embedded/automotive features of Xen
Collaboration with AGL and GENIVI organizations for standardization
Efforts on Functional Safety compliance
Artem will also go over typical automotive use scenarios for Xen which may not be the same as generic computing use of hypervisor.
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...The Linux Foundation
In this keynote talk, we will give an overview of the state of the Xen Project, trends that impact the project, see whether challenges that surfaced last year have been addressed and how we did it, and highlight new challenges and solutions for the coming year.
In recent years unikernels have shown immense performance potential (e.g., boot times of only a few ms, image sizes of only hundreds of KBs).The fundamental drawback of unikernels is that they require that applications be manually ported to the underlying minimalistic OS, needing both expert work and often considerable amount of time.
The Unikraft project provides a unikernel code base and build system that significantly simplifies the building of unikernels. In addition to support for a number CPU architectures, languages and frameworks, Unikraft provides debugging and tracing features that are generally sorely missing from unikernel projects. In this talk we will talk about these features, show a set of preliminary performance numbers, and provide a roadmap for the project's future.
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...The Linux Foundation
The idea of making Xen secret-free has been floating since Spectre and Meltdown came into light. In this talk we will discuss what is being done and what needs to be done next.
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxThe Linux Foundation
This talk will introduce Dom0-less: a new way of using Xen to build mixed-criticality solutions. Dom0-less is a Xen feature that adds a novel approach to static partitioning based on virtualization. It allows multiple domains to start at boot time directly from the Xen hypervisor, decreasing boot times dramatically. Xen userspace tools, such as xl and libvirt, become optional.
Dom0-less extends the existing device tree based Xen boot protocol to cover information required by additional domains. Binaries, such as kernels and ramdisks, are loaded by the bootloader (u-boot) and advertised to Xen via new device tree bindings.
The audience will learn how to use Dom0-less to partition the system. Uboot and device tree configuration details will be explained to enable the audience to get the most out of this feature. The talk will include a status update and details on future plans.
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...The Linux Foundation
As the number of contributions grow, reviewer bandwidth becomes a bottleneck; and maintainers are always asking for more help. However, ultimately maintainers must at least Ack every patch that goes in; so if you're not a maintainer, how can you contribute? Why should anyone care about your opinion?
This talk will try to lay out some advice and guidelines for non-maintainers, for how they can do code review in a way which will effectively reduce the load on maintainers when they do come to review a patch.
This talk is a follow-up to our Summit 2017 presentation in which we covered our plans for Intel VMFUNC and #VE, as well as related use-cases. This year, we will provide a report on what we have accomplished in Xen 4.12, and what remains to be addressed. We will also give a brief status update of VMI on AMD hardware. The session will end with some real-world numbers of the Hypervisor Introspection solution running on Citrix Hypervisor 8.0 with #VE enabled.
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...The Linux Foundation
Safety certification is one of the essential requirements for software to be used in highly regulated industries. Besides technical and compliance issues (such as ISO 26262 vs IEC 611508) transitioning an existing project to become more easily safety certifiable requires significant changes to development practices within an open source project.
In this session, we will lay out some challenges of making safety certification achievable in open source and the Xen Project. We will outline the process the Xen Project has followed thus far and highlight lessons learned along the way. The talk will primarily focus on necessary process, tooling changes and community challenges that can prevent progress. We will be offering an in-depth review of how Xen Project is approaching this challenging goal and try to derive lessons for other projects and contributors.
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...The Linux Foundation
Safety certification is one of the essential requirements for software to be used in highly regulated industries. The Xen Project, a secure and stable hypervisor that is used in many different markets, has been exploring the feasibility of building safety certified products on top of Xen for a year, looking at key aspects of its code base and development practices.
In this session, we will lay out the motivation and challenges of making safety certification achievable in open source and the Xen Project. We will outline the process the project has followed thus far and highlight lessons learned along the way. The talk will cover technical enablers, necessary process and tooling changes and community challenges offering an in-depth review of how Xen Project is approaching this exciting and and challenging goal.
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixThe Linux Foundation
2018 saw fundamental shifts in security boundaries which were previously taken for granted. A lot of work has been done in the past 2 years, and largely in secret under embargo, but there is plenty more work to be done to strengthen the existing mitigations and to try to recover some performance without reopening security holes.
This talk will look at speculative execution sidechannels, the work which has already been done to mitigate the security holes, and future work which hopes to bring some improvements.
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdThe Linux Foundation
The Arm architecture provides a set of guidelines that any software should abide by when accessing the memory with MMU off and update page-tables. Failing to do so may result in getting TLB conflicts or breaking coherency.
In a previous talk ("Keeping coherency on Arm"), we focused on updating safely the stage-2 (aka P2M) page-tables. This talk will focus on the boot code and Xen memory management.
During this session, we will introduce some of the guidelines and when they should be used. We will also discuss how Xen boot sequence needs to be reworked to avoid breaking the guidelines.
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...The Linux Foundation
For many years the QEMU codebase has contained PV backends for Xen guests, giving them paravirtual access to storage, network, keyboard, mouse, etc. however these backends have not been configurable as QEMU devices as their implementation did not fully adhere to the QEMU Object Model (QOM).
Particularly the PV storage backend not using proper QOM devices, or qdevs, meant that the QEMU block layer needed to maintain legacy code that was cluttering up the source. This was causing push-back from the maintainers who did not want to accept any patches relating to that Xen backend until it was 'qdevified'.
In this talk, I'll explain the modifications I made to QEMU to achieve 'qdevification' of the PV storage backend, how compatibility with the libxl toolstack was maintained, and what the next steps in both QEMU and libxl development should be.
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DThe Linux Foundation
PCI is a local computer bus for attaching hardware devices in a computer, and is the main peripheral bus on modern x86 systems. As such, having a proper way to emulate it is crucial for Xen to be able to expose both fully emulated devices or passthrough devices to guests.
This talk will focus on the current status of PCI emulation in Xen, how and where it is used, what are its main limitations and future plans to improve it in order to be more robust and modular.
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsThe Linux Foundation
Volodymyr will speak about TEE mediators. This is a new feature in Xen which allows multiple virtual machines to interact with Trusted Execution Environment available on platform. He developed mediator for one of TEEs, namely OP-TEE.
He will give background information on why TEE is needed at all and share some implementation details.
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...The Linux Foundation
Xen is a very powerful hypervisor with a talented and diverse developers community. Despite the fact it's almost everywhere (from the Cloud to the embedded world), it can be difficult to set up and manage as a system administrator. General purpose distros have Xen packages, but that's just a start in your Xen journey: you need some tooling and knowledge to have a working and scalable platform.
XCP-ng was built to overcome those issues: by bringing Xen to the masses with a fully turnkey distro with Xen as its core. It's the logical sequel to the XCP project, with a community focus from the start. We'll see how it happened, what we did, and what's next. Finally, we'll see the impact of XCP-ng on the Xen Project.
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...The Linux Foundation
Doug has long advocated for more CI/CD (Continuous Integration / Continuous Delivery) processes to be adopted by the Xen Project from the use of Travis CI and now GitLab CI. This talk aims to propose ideas for building upon the existing process and transforming the development process to provide users a higher quality with each release by the Xen Project.
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...The Linux Foundation
High level toolstacks for server and cloud virtualization are very mature with large communities using and supporting them. Client virtualization is a much more niche community with unique requirements when compared to those found in the server space. In this talk, we’ll introduce a client virtualization toolstack for Xen (redctl) that we are using in Redfield, a new open-source client virtualization distribution that builds upon the work done by the greater virtualization and Linux communities. We will present a case for maturing libxl’s Go bindings and discuss what advantages Go has to offer for high level toolstacks, including in the server space.
Today Xen is scheduling guest virtual cpus on all available physical cpus independently from each other. Recent security issues on modern processors (e.g. L1TF) require to turn off hyperthreading for best security in order to avoid leaking information from one hyperthread to the other. One way to avoid having to turn off hyperthreading is to only ever schedule virtual cpus of the same guest on one physical core at the same time. This is called core scheduling.
This presentation shows results from the effort to implement core scheduling in the Xen hypervisor. The basic modifications in Xen are presented and performance numbers with core scheduling active are shown.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
15. And in Numbers?
guest time time spent spinning
[s] [s] [%]
single kernbench 109.0 0.2 0.2%
kernbench vs while(1) 117.3 9.0 7.6%
difference 7.6%
15 How to Deal with Lock-Holder Preemption
16. What can we do about it?
16 How to Deal with Lock-Holder Preemption
17. Dealing with lock-holder preemption
LHP avoidance
No spinlock held in userspace
Idea: Avoid preempting guest in kernel space
Postpone guest switch to kernel exit
Problem: extraordinary long critical sections, e.g.
Apache using sendfile()
Helping locks
Instead of busy waiting, switch to preempted lock-
holder
Problem: finding the preempted lock-holder
17 How to Deal with Lock-Holder Preemption
18. Helping locks: Ingredients
1) Guest kernel: new 'yield' hypercall when waiting
unusually long
Modify spinlock loop
●
2) Reasonable threshold for 'unusually long'
Histograms help
●
3) Selecting which VCPU to switch to
18 How to Deal with Lock-Holder Preemption
21. Scheduling Strategy
Good choices:
VCPUs of the same VM to make progress locally
(Potential) preempted lock-holders
Cache-„near“ VCPUs
Neither/nor:
VCPUs in user space
Bad choices:
VCPUs which yielded recently
21 How to Deal with Lock-Holder Preemption
24. Performance
wall clock guest time time spent spinning
[s] [s] [s] [%]
LHP 34.8 117.3 9.0 7.6%
yield 33.5 108.4 0.0 0.0%
difference -3.9% -7.6% -7.6%
24 How to Deal with Lock-Holder Preemption
25. Efficiency
25 How to Deal with Lock-Holder Preemption
26. Efficiency
26 How to Deal with Lock-Holder Preemption
27. Efficiency
27 How to Deal with Lock-Holder Preemption
28. Efficiency
28 How to Deal with Lock-Holder Preemption
29. Efficiency
29 How to Deal with Lock-Holder Preemption
30. Efficiency
30 How to Deal with Lock-Holder Preemption
31. Efficiency
31 How to Deal with Lock-Holder Preemption
32. Efficiency
117 sec
×7.6 % = 3.7 %
117 sec 126 sec
Real result of 3.9% is reasonable
➔
Highly efficient
➔
32 How to Deal with Lock-Holder Preemption
34. FIFO ticket spinlocks
Next ticket in dispenser: queue tail
„Now serving“ display at counter: queue head
Lock: atomic( ticket = tail++ ); while ( head != ticket );
Unlock: atomic( head++ );
34 How to Deal with Lock-Holder Preemption
36. Ticket locks and virtualization
wall clock guest time time spent spinning
[s] [s] [s] [%]
LHP 2825.1 22434.2 22270.4 99.3%
36 How to Deal with Lock-Holder Preemption
37. Ticket locks and virtualization
wall clock guest time time spent spinning
[s] [s] [s] [%]
LHP 2825.1 22434.2 22270.4 99.3%
yield 34.1 123.6 6.6 5.4%
37 How to Deal with Lock-Holder Preemption
38. Conclusion
Lock-holder preemption quite serious:
7.6% guest time wasted
Helping locks:
3.9% system performance improvement!
(Amdahl's law explains why)
New ticket spinlocks:
30 secs kernbench takes 45 minutes
Helping locks help here, too
38 How to Deal with Lock-Holder Preemption