This document discusses building a KVM-based hypervisor that can virtualize the key features of Heterogeneous System Architecture (HSA) for a compliant system. It describes HSA features like shared virtual memory, I/O page faulting, and user-level queueing. It then outlines the design of virtualizing these features through techniques like VirtIO-KFD for queues, shadow page tables for shared memory, and shadow PPR interrupts for page faults. Evaluation shows the hypervisor approach incurs average performance overhead of 5% for GPU execution compared to native execution.
Continguous Memory Allocator in the Linux KernelKernel TLV
Agenda:
Continguous Memory Allocator - how to allocate large continguous memory for large scale DMA in the kernel.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
Continguous Memory Allocator in the Linux KernelKernel TLV
Agenda:
Continguous Memory Allocator - how to allocate large continguous memory for large scale DMA in the kernel.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
Xen is a mature enterprise-grade virtual machine with many advanced security features which are unique to Xen. For this reason it's the hypervisor of choice for the NSA, the DoD, and the new QubesOS Secure Desktop project. However, while much of the security of Xen is inherent in its design, many of the advanced security features, such as stub domains, driver domains, XSM, and so on are not enabled by default. This session will describe all of the advanced security features of Xen, and the best way to configure them for the Cloud environment.
Static partitioning is becoming increasingly common in embedded. A static hypervisor, such as Xen dom0less, is employed to split the hardware resources into multiple domains and run a different OS in each domain. For instance, Linux and Zephyr. Only the simplest static partitioning configurations don't involve any data exchanges between the domains. Often, communication and data exchanges between two or more environments are required to complete the data processing pipeline that implements the target application. However, the VM-to-VM communication mechanisms available in static partitioning configurations are typically more limited compared to general-purpose hypervisors. For example, PV drivers are not available to Xen dom0less domains. This presentation will discuss the need for communication in static partitioning setups and it will present the technical challenges involved in getting traditional communication methods to work, including Xen PV drivers and VirtIO. The talk will also provide simpler alternatives based on shared memory and interrupt notifications to set up domain-to-domain data streams: simpler techniques that are easily exploitable both by Linux and by tiny baremetal applications as well.
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Speaker: Bill Fletcher
Date: September 24, 2015
★ Session Description ★
An introductory session of a system-level overview at Power State Coordination
- Focus on ARMv8
- Goes top-down from ACPI
- A demo based on the current code in qemu
- The specifications are very dynamic - what’s onging for ACPI and PSCI
★ Resources ★
Video: https://www.youtube.com/watch?v=vXzPdpaZVto
Presentation: http://www.slideshare.net/linaroorg/sfo15tr9-psci-acpi-and-uefi-to-boot
Etherpad: pad.linaro.org/p/sfo15-tr9
Pathable: https://sfo15.pathable.com/meetings/303087
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxThe Linux Foundation
This talk will introduce Dom0-less: a new way of using Xen to build mixed-criticality solutions. Dom0-less is a Xen feature that adds a novel approach to static partitioning based on virtualization. It allows multiple domains to start at boot time directly from the Xen hypervisor, decreasing boot times dramatically. Xen userspace tools, such as xl and libvirt, become optional.
Dom0-less extends the existing device tree based Xen boot protocol to cover information required by additional domains. Binaries, such as kernels and ramdisks, are loaded by the bootloader (u-boot) and advertised to Xen via new device tree bindings.
The audience will learn how to use Dom0-less to partition the system. Uboot and device tree configuration details will be explained to enable the audience to get the most out of this feature. The talk will include a status update and details on future plans.
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Persistent memory holds a lot of promise: what's not to like about vast amounts of directly-attached memory that remembers its contents over a power cycle? For some years we have been told that large persistent-memory arrays are coming; now it seems that they are about to arrive. In this lecture we will be covering the following:
What is Persistent Memory , The upcoming storage class memory (SCM)devices.
Difference between NVMe and SCM
How to use it and emulate it
Challenge : Durability / Consistency
Remote access
Implication for Next Generation Architecture
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...The Linux Foundation
Hypervisors are used in a broad range of domains ranging from Embedded systems, Automotive to big iron servers. The choice of hypervisor has a strong impact on the overall design of your project and its performance. This talk introduces the state of virtualization on ARM, and provides a description of three popular open source hypervisors: KVM, Jailhouse and Xen. Julien Grall explains respective key features, technical differences and suitability of the hypervisor for different application domains.
Julien Grall is a Software Virtualisation Engineer at ARM.
The talk was delivered at Root Linux Conference 2017. Learn more: http://linux.globallogic.com/materials. The video recording is available at https://www.youtube.com/watch?v=jZNXtqFJpuc
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)iXsystems
The slides for Kylie Liang's presentation, “PCI Pass-through - FreeBSD VM on Hyper-V”, given at MeetBSD California 2016 in Berkeley, CA.
A recording of the talk can be viewed at: http://bit.ly/2hteton.
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
Xen is a mature enterprise-grade virtual machine with many advanced security features which are unique to Xen. For this reason it's the hypervisor of choice for the NSA, the DoD, and the new QubesOS Secure Desktop project. However, while much of the security of Xen is inherent in its design, many of the advanced security features, such as stub domains, driver domains, XSM, and so on are not enabled by default. This session will describe all of the advanced security features of Xen, and the best way to configure them for the Cloud environment.
Static partitioning is becoming increasingly common in embedded. A static hypervisor, such as Xen dom0less, is employed to split the hardware resources into multiple domains and run a different OS in each domain. For instance, Linux and Zephyr. Only the simplest static partitioning configurations don't involve any data exchanges between the domains. Often, communication and data exchanges between two or more environments are required to complete the data processing pipeline that implements the target application. However, the VM-to-VM communication mechanisms available in static partitioning configurations are typically more limited compared to general-purpose hypervisors. For example, PV drivers are not available to Xen dom0less domains. This presentation will discuss the need for communication in static partitioning setups and it will present the technical challenges involved in getting traditional communication methods to work, including Xen PV drivers and VirtIO. The talk will also provide simpler alternatives based on shared memory and interrupt notifications to set up domain-to-domain data streams: simpler techniques that are easily exploitable both by Linux and by tiny baremetal applications as well.
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Speaker: Bill Fletcher
Date: September 24, 2015
★ Session Description ★
An introductory session of a system-level overview at Power State Coordination
- Focus on ARMv8
- Goes top-down from ACPI
- A demo based on the current code in qemu
- The specifications are very dynamic - what’s onging for ACPI and PSCI
★ Resources ★
Video: https://www.youtube.com/watch?v=vXzPdpaZVto
Presentation: http://www.slideshare.net/linaroorg/sfo15tr9-psci-acpi-and-uefi-to-boot
Etherpad: pad.linaro.org/p/sfo15-tr9
Pathable: https://sfo15.pathable.com/meetings/303087
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxThe Linux Foundation
This talk will introduce Dom0-less: a new way of using Xen to build mixed-criticality solutions. Dom0-less is a Xen feature that adds a novel approach to static partitioning based on virtualization. It allows multiple domains to start at boot time directly from the Xen hypervisor, decreasing boot times dramatically. Xen userspace tools, such as xl and libvirt, become optional.
Dom0-less extends the existing device tree based Xen boot protocol to cover information required by additional domains. Binaries, such as kernels and ramdisks, are loaded by the bootloader (u-boot) and advertised to Xen via new device tree bindings.
The audience will learn how to use Dom0-less to partition the system. Uboot and device tree configuration details will be explained to enable the audience to get the most out of this feature. The talk will include a status update and details on future plans.
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Persistent memory holds a lot of promise: what's not to like about vast amounts of directly-attached memory that remembers its contents over a power cycle? For some years we have been told that large persistent-memory arrays are coming; now it seems that they are about to arrive. In this lecture we will be covering the following:
What is Persistent Memory , The upcoming storage class memory (SCM)devices.
Difference between NVMe and SCM
How to use it and emulate it
Challenge : Durability / Consistency
Remote access
Implication for Next Generation Architecture
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...The Linux Foundation
Hypervisors are used in a broad range of domains ranging from Embedded systems, Automotive to big iron servers. The choice of hypervisor has a strong impact on the overall design of your project and its performance. This talk introduces the state of virtualization on ARM, and provides a description of three popular open source hypervisors: KVM, Jailhouse and Xen. Julien Grall explains respective key features, technical differences and suitability of the hypervisor for different application domains.
Julien Grall is a Software Virtualisation Engineer at ARM.
The talk was delivered at Root Linux Conference 2017. Learn more: http://linux.globallogic.com/materials. The video recording is available at https://www.youtube.com/watch?v=jZNXtqFJpuc
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)iXsystems
The slides for Kylie Liang's presentation, “PCI Pass-through - FreeBSD VM on Hyper-V”, given at MeetBSD California 2016 in Berkeley, CA.
A recording of the talk can be viewed at: http://bit.ly/2hteton.
Stupid Boot Tricks: using ipxe and chef to get to boot management blissmacslide
In this talk I will cover how I built a boot system using ipxe and chef's api to create a lightweight tool for managing install and firmware updating of hosts and network gear.
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...The Linux Foundation
Shared Virtual Memory (SVM) is a VT-d feature that allows sharing application address space with the I/O device. The feature works with the PCI sig Process Address Space ID (PASID). With SVM, programmer gets a consistent view of memory across host application and device, avoids pining or copying overheads. We have been working on supporting SVM in Xen to enable SVM usage in guest if a SVM capable device is assigned. e.g. assign IGD to a guest, applications like OpenCL would benefit if SVM is supported in guest. SVM virtualization requires exposing a virtual VT-d to guest. In this discussion, Yi would update the latest SVM virtualization implementation and foresee the future work about supporting SVM and IOVA a single virtual VT-d.
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
Zhiyong Bai
As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapid,and provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment.
hbaseconasia2017 hbasecon hbase
Get Your GeekOn with Ron - Session One: Designing your VDI ServersUnidesk Corporation
Join virtualization expert and industry veteran Ron Oglesby as he breaks down how to select and configure servers, including:
• Server CPU selection - they were not made equal!
• Desktop-to-core guesstimation?
• Memory - and its temperamental relationship with disk design
• Local storage options - yes, it's an option
• And, overall best practices for VDI implementation
Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDI...inside-BigData.com
In this deck from the 2019 Stanford HPC Conference, Mohan Potheri from VMware,presents: Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs.
"This session introduces machine learning on vSphere to the attendee and explains when and why GPUs are important for them. Basic machine learning with Apache Spark is demonstrated. GPUs can be effectively shared in vSphere environments and the various methods of sharing are addressed here. We will explore features like suspending/resuming a VM that has GPUs attached to it, for sharing of those resources. Compelling use cases were developed leveraging vSphere's GPU capabilities. These use cases showcase deep learning with GPGPUs for image processing, stock prediction and distributed training. References to various technical papers will be given."
Mohan Potheri is VCDX#98 and has more than 20 years in IT infrastructure, with in depth experience on VMWARE virtualization. He currently focuses on evangelization of "High Performance Computing (HPC)" and "Big Data” Virtualization on vSphere. He also has extensive experience with business-critical applications such as SAP, Oracle, SQL and Java across UNIX, Linux and Windows environments. Mohan Potheri is an expert on SAP virtualization and has been a speaker in multiple VMWORLD and PEX events. Prior to VMWARE, Mohan worked at many large enterprises where he has engineered fully virtualized HPC Solutions. He has planned, designed, implemented and managed robust highly available, DR compliant virtual environments in UNIX and x86 environments.
Watch the video: https://youtu.be/rDsht9NFwR0
Learn more: https://www.vmware.com/solutions/high-performance-computing.html
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Best Practices for Virtualizing Apache HadoopHortonworks
Join this webinar to discuss best practices for designing and building a solid, robust and flexible Hadoop platform on an enterprise virtual infrastructure. Attendees will learn the flexibility and operational advantages of Virtual Machines such as fast provisioning, cloning, high levels of standardization, hybrid storage, vMotioning, increased stabilization of the entire software stack, High Availability and Fault Tolerance. This is a can`t miss presentation for anyone wanting to understand design, configuration and deployment of Hadoop in virtual infrastructures.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compliant System
1. Building A KVM-based Hypervisor for A
Heterogeneous System Architecture
Compliant System
National Chiao Tung University & National Tsing Hua University & National Taiwan University
Yu-Ju Huang, Hsuan-Heng Wu,
Yeh-Ching Chung, Wei-Chung Hsu
2. Agenda
• Motivation
• Background
• HSA features
• AMD’s implementation on Kaveri, the HSA-
compliant platform
• Design and Implementation
• Evaluation
• Conclusion
2
3. Motivation
• Problem of heterogeneous computing
• Data communication between CPU & GPU
• Inefficiency
• Programmability inconvenience
• Heterogeneous System Architecture (HSA)
• Developed by HSA Foundation
• Goal
• Improving computation efficiency for heterogeneous computing
• Reducing programmability barrier
• Make virtual machines also get benefit of HSA !
3
HSA
Hypervisor
Guest
OS
Guest
OS
A
p
p
A
p
p
A
p
p
A
p
p HSA!!!
4. HSA Features
• Shared virtual memory
• I/O page faulting
• User-level queueing
• Memory based signaling
4
CPU Memory
GPUCPU
GPU
Memory
Data copy
Before HSA
Physical Memory
HSA GPUCPU
Virtual Memory
HSA
Application
Queues
Operating System
GPU Driver
GPU
Before HSA
HSA GPU
Application
Queues
HSA
• Shared virtual memory
• I/O page faulting
• User-level queueing
• Memory based signaling
5. Shared Virtual Memory - IOMMU
• Set process page table to IOMMU to carry out virtual to
physical address translation
• CPU and GPU share same process page table
5
System Memory
GPU CPU
IOMMU MMUProcess Page Table
6. I/O Page Faulting - PPR
• PPR(peripheral page service request) issued by IOMMU as
interrupt
• PPR logs contains fault process ID and fault address
• get_user_pages API can be used to fix page fault
6
IOMMU CPU
Call PPR handler
Get PPR logs
Fix fault fault
COMPLETE command
PPR Interrupt
1
2
3
4
5
7. User Level Queueing -
Kernel Fusion Driver (KFD)
• Help applications set address of user level queues to GPU
7
Kernel Space
GPU
Userspace
KFD
Addr of user
level queue
User Level Queues
Computation
11. GPU
IOMMU
Memory
ID System Page table
1 Host, process 1 Addr of PT
2 Guest 1,
process 1
Addr of SPT
Page
Table
ID=1
HVA
MPA
Native ScenarioGuest Scenario
More guest processes in different guest OSes are also allowed.
11
IOMMU Snapshot During GPU Execution
GVA
MPA
ID=2
16. GPU Execution Time
16
Achieve average 95% of native performance in most cases.
GPU time
(sec)
BinarySea
rch
FastWalsh
Transform
BitonocSort FloydWars
hall
MatrixMulti
plication
MatrixTrans
pose
MoteCarlo
Asian
Native 0.0108 0.0018 0.014 16.094 8.012 0.502 17.458
Guest 0.0113 0.0019 0.016 16.603 8.286 0.538 18.342
Small benchmark
Enqueue Task
Kick GPU
Wait Signal
World Switch to Host
Switch Back
Guest Application
World Switch to Host
Signal
delay
Enqueue many times
17. Conclusion
• Successfully implementing a hypervisor virtualizing HSA
features.
• Guest system can get benefit of HSA and carry out
heterogeneous computing.
• GPU in Kaveri is shareable between multiple guest OSes and
host OS.
17
Hello everyone. My name is Yu-Ju Huang. Here is the author list, this is me, my partner, and two professors. We all from Taiwan, a country in the east Asia. <NEED funny intro>
This is my topic today. It’s a little long, right :D? So now, I’m gonna give you a brief introduction and image about this work. Hope you can enjoy it !
In this work, our target is a special HW architecture called Heterogeneous System Architecture, or HSA in short. HSA is mainly focus on helping heterogeneous computing system more powerful and more efficient.
Given the HSA-compliant HW platform, we implement a hypervisor running on top of it.
And the hypervisor tries to virtualize the features provided by HSA such that the virtual machines can also get the benefits of HSA.
In the beginning, I’ll introduce the motivation of this work. And then a brief background about HSA including the HSA features and the AMD’s implementation on Kaveri which is the first HSA-compliant platform, and also is our target platform.
After that, we can talk about our design and implementation.
And then the evaluation and conclusion.
About the motivation, we start from the heterogeneous computing.
The heterogeneous computing programming model requires data communication between devices.
This communication cause inefficiency and programmability inconvenience.
So HSA foundation propose the HSA architecture to resolve this problems.
For the motivation of our work, the motivation is that if we believe the heterogeneous computing will be more and more popular in the future, then there must be a hypervisor to support virtual machines to get benefits of HSA.
Here, though our discussion is based on HSA and the implementation is based on AMD’s platform. Our design philosophy can also be applied to other platform, or even other architecture that tries to improve heterogeneous computing systems.
OK, let’s start to introduce HSA.
As previous description, HSA tries to solve the communication inefficiency and inconvenience. Here is the solution of HSA.
It proposes many features. And here the list is the features focusing on how a program is able to execute. These features are also what we need to virtualize.
The first, shared virtual memory. Before HSA, CPU and GPU use different memory and address space, so data copy is required. For HSA, all the computing resource, like CPU and GPU or other HSA-aware devices, see the same virtual address space so they can access the system memory with virtual address. This way can eliminate the data copy.
For the I/O page faulting feature, this is a requirement for shared virtual memory because we allow I/O device to access system memory directly, then the page fault service must also support it
And the user-level queuing. Before HSA, tasks can only be dispatched to GPU by OS, or GPU driver.
As for HSA, GPU is able to see all the user level queues. So the jobs dispatching don’t need trap into GPU driver any more. This design reduce the latency of dispatching jobs.
Final, the memory based signaling is also designed for reduce OS intervention latency. Previous to HSA, once GPU finishes its task, it issue an interrupt to CPU and let CPU to notify user-space program. This path incurs OS intervention overhead. So HSA makes GPU able to access a particular memory address for job finishing notification. The particular memory address is assigned by application when it dispatch jobs.
For these fours features, the memory based signaling can be achieved once GPU is able to access process address space. So actually, we have only take care to virtualize the first three features.
Well, in the following page, I will introduce the AMD’s implementation of the HSA features.
The shared virtual memory. AMD implement IOMMU for GPU or other HSA-aware devices to translate virtual address physical address.
And since the CPU and GPU see the same process address space, the page table of IOMMU should be same as what CPU MMU uses.
So with setting the page table properly, the shared virtual memory feature can be achieved.
About the I/O page faulting, AMD designs a mechanism call PPR, peripheral page service request. This request is issued by IOMMU as an interrupt to CPU once a failure occurs in address translation, such as page doesn’t exist or insufficient permission to access the page.
The IOMMU will also write log containing fault process ID and fault address. With these information, Linux API get_user_pages can be used to fix the I/O page fault.
Here is the brief flow of the I/O page fault handling.
As for the user-level queuing feature. The key idea is how to make GPU know where is the address of user-level queues.
AMD designs a driver call kernel fusion driver, or KFD, to complete this function.
During user-program initialization, the CREATE_QUEUE API will send the address of user-level queue to the KFD, and the KFD set this address to GPU. After this setting, driver’s intervention can be moved out.
The driver is only used during initialization, the computation time is co-worked between GPU and user-program.
Good? In previous slides, I describe what we need to virtualize. And from now on, I will introduce you about how we virtualize these HSA features.
You can see on this page, I will elaborate more in the following page.
For one thing I need to mention is that, we use the shadow page table to virtualize the shared virtual memory.
I know you may feel strange why SPT is adopted rather than the nested paging.
This is due to the constrain of the AMD’s IOMMU, and it’s a little complicated so I will not describe it in this talk. But you can still find the explanation in proceeding and the paper.
As I previously describe, the key to support user level queuing is to let the GPU know where is the address of user level queue.
So we implemented VirtIO-KFD, as you can see in the slide. The VirtIO-KFD help guest application to bypass the address of its queue to the real KFD. And the KFD will set it to GPU.
With this way, the GPU can know where is the address of guest application queue.
And then the shared virtual memory.
As we know, the shadow page table guides the MMU to translate guest virtual address to machine physical address.
So in our work, we just need to find the address of shadow page table and set it to IOMMU when guest application tries to use GPU.
This is a snapshot of the GPU executing state.
IOMMU maintains a table to map process address space ID to the corresponding page table address.
In this scenario, there are two process use GPU.
For native execution, like GPU run a program dispatched by a host application.
Then it will know where to find the host application’s page table.
For guest execution, GPU run a program dispatched by a guest application. And this program is encoded in the guest virtual address space.
So IOMMU will find the corresponding SPT to translation the GVA to MPA.
As you can expect, this table can be extended. So in our design, multiple processes from difference guest OSes or even host OS can share the GPU.
So we kind of achieving the GPU sharing in our work.
Final one, I/O page faulting.
One challenge to virtualize this feature is that the PPR log region, where is used to store the page fault information, is inside a special IO region.
Usually, guest system is not allowed to access this region.
So we implemented a module called shadow-PPR. This module is used to store the information about guest GPU program’s page faults.
Once a PPR occurs, the PPR handler will decide whether it is caused by guest program. If so, then store the information into shadow PPR.
Then shadow PPR kick up the KVM and send a virtual interrupt into guest OS.
Inside guest OS, we implemented a VirtIO-IOMMU to handle the I/O page fault.
It will get page fault information from shadow PPR and fix the page fault.
So this how we virtual the I/O page faulting.
Whole system architecture.
VirtIO-KFD for user level queuing.
SPT for SVM.
VirtIO-IOMMU for I/O page fault.
About the experiment. We use AMD SDK as our benchmark.
Data is shown in initialization time and execution time to evaluate our design.
The data is normalized against native scenario.
It’s about 30% performance drop.
This drop is mainly caused by the propagation from VirtIO-KFD to real KFD. Since there are world switch overhead in this path.
But usually, an application only do this initialization process once. So this performance drop is not a great concern.
For GPU execution time.
The major cause of performance drop in GPU execution time is the I/O page fault handling.
But as you can see, our design does get a good result, around 95% of native performance in most cases.
As for the two poor case, FWT, BS. These two benchmark does have a little poor performance.
The reason is that, let’s see this figure. This is about the flow of an application dispatching jobs, waiting for signal, and getting notification when GPU finishes the job.
There is possible that during guest application waits for signal, the CPU may switch to other process.
So if in a particular time, GPU finish the job and send a notification. But in this particular time, the CPU is owned by other process rather than the guest system.
So the application will get the signal lately. These red arrows shows the this delay.
And why only the two benchmarks suffer from it. We can see the raw data here.
Because they are small benchmark, about only 10 ms GPU execution time. For the long benchmark, this signal delay can be amortized.
Another reason is that these two benchmark enqueue many time.
So they keep inside this loop. And the overhead becomes large.
For BinarySearch, though it is a small benchmark, it only enqueue once, so the overhead is invisible.
Conclusion of our work.
We implement a hypervisor that makes guest system can also get the benefit of HSA.
And furthermore, we also achieve GPU sharing.