With large scale and complex configurable systems, it is hard for users to choose the right combination of options (i.e., configurations) in order to obtain the wanted trade-off between functionality and performance goals such as speed or size. Machine learning can help in relating these goals to the configurable system options, and thus, predict the effect of options on the outcome, typically after a costly training step. However, many configurable systems evolve at such a rapid pace that it is impractical to retrain a new model from scratch for each new version. In this paper, we propose a new method to enable transfer learning of binary size predictions among versions of the same configurable system. Taking the extreme case of the Linux kernel with its ≈ 14, 500 configuration options, we first investigate how binary size predictions of kernel size degrade over successive versions. We show that the direct reuse of an accurate prediction model from 2017 quickly becomes inaccurate when Linux evolves, up to a 32% mean error by August 2020. We thus propose a new approach for transfer evolution-aware model shifting (TEAMS). It leverages the structure of a configurable system to transfer an initial predictive model towards its future versions with a minimal amount of extra processing for each version. We show that TEAMS vastly outperforms state of the art approaches over the 3 years history of Linux kernels, from 4.13 to 5.8.
preprint: https://hal.inria.fr/hal-03720273
presented at SPLC 2022 research track
Linux kernels are used in a wide variety of appliances, many of them having strong requirements on the kernel size due to constraints such as limited memory or instant boot. With more than nine thousands of configuration options to choose from, developers and users of Linux actually spend significant effort to document, understand, and eventually tune (combinations of) options for meeting a kernel size. In this paper, we describe a large-scale endeavour automating this task and predicting a given Linux kernel binary size out of unmeasured configurations. We first experiment that state-of-theart solutions specifically made for configurable systems such as performance-influence models cannot cope with that number of options, suggesting that software product line techniques may need to be adapted to such huge configuration spaces. We then show that tree-based feature selection can learn a model achieving low prediction errors over a reduced set of options. The resulting model, trained on 95 854 kernel configurations, is fast to compute, simple to interpret and even outperforms the accuracy of learning without feature selection.
packageFor certain workloads and environments: Consolidation on large virtualized servers raises utilization, reduces core requirements, and lowers cost per workload
The document summarizes the advantages of IBM LinuxONE systems over traditional x86 servers for running Linux workloads. LinuxONE systems provide massive scale with high performance, throughput, and security across many workloads like MongoDB, Docker containers, and virtual machines. They also have significantly lower total cost of ownership compared to solutions on x86 servers due to higher utilization rates and lower management costs.
This document discusses parallel and cluster computing. It begins with an introduction to cluster computing and classifications of cluster computing. It then discusses technologies used in cluster computing like Beowulf clusters and their construction. It describes how cluster computing is used in fields like bioinformatics and parallel computing through projects like Folding@Home. The document outlines different types of clusters and provides details about building a science cluster, including hardware, networking, operating systems, and parallel programming environments. It gives examples of cluster applications in science, computation, and other domains.
Ibm and Erb's Presentation Insider's Edition Event . September 2010Erb's Marketing
IBM and Erb's partnership has spanned 20 years. Together we provide world class computing and business technology solutions to growing companies across Iowa and Western Illinois. Here's joint presentation from a recent Erb's customer event - Erb's and IBM review IBM leadership in server and storage technology.
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
This slides were presented at the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15)
Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS co-kernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
The document compares several open source virtualization and container technologies. It describes experiments conducted to measure the performance and scalability of KVM, Xen, Linux-VServer, OpenVZ, VirtualBox and KQEMU. The experiments involved benchmarking the technologies using various workloads to test aspects like overhead, throughput, and ability to run multiple virtual machines concurrently. Linux-VServer showed the best overall performance, while Xen performed well except on I/O-bound workloads. The conclusions recommend using OpenVZ for network applications, Linux-VServer for general use, and KVM or VirtualBox for development environments.
Open Source Software on OpenPOWER systems.
With 100% open source system software (including the firmware), OpenPOWER is the most open server architecture in the market. Based on the IBM POWER8 chip, this new family of servers featuring the latest Nvidia NVLink technology runs all the software solutions presented at OPEN'16 with significant cost advantages. This session explains how Docker, EnterpriseDB and many others benefit from this advanced design, and how 200+ technology companies including Google and RackSpace are collaborating in an open development alliance to build the datacenter of the future.
preprint: https://hal.inria.fr/hal-03720273
presented at SPLC 2022 research track
Linux kernels are used in a wide variety of appliances, many of them having strong requirements on the kernel size due to constraints such as limited memory or instant boot. With more than nine thousands of configuration options to choose from, developers and users of Linux actually spend significant effort to document, understand, and eventually tune (combinations of) options for meeting a kernel size. In this paper, we describe a large-scale endeavour automating this task and predicting a given Linux kernel binary size out of unmeasured configurations. We first experiment that state-of-theart solutions specifically made for configurable systems such as performance-influence models cannot cope with that number of options, suggesting that software product line techniques may need to be adapted to such huge configuration spaces. We then show that tree-based feature selection can learn a model achieving low prediction errors over a reduced set of options. The resulting model, trained on 95 854 kernel configurations, is fast to compute, simple to interpret and even outperforms the accuracy of learning without feature selection.
packageFor certain workloads and environments: Consolidation on large virtualized servers raises utilization, reduces core requirements, and lowers cost per workload
The document summarizes the advantages of IBM LinuxONE systems over traditional x86 servers for running Linux workloads. LinuxONE systems provide massive scale with high performance, throughput, and security across many workloads like MongoDB, Docker containers, and virtual machines. They also have significantly lower total cost of ownership compared to solutions on x86 servers due to higher utilization rates and lower management costs.
This document discusses parallel and cluster computing. It begins with an introduction to cluster computing and classifications of cluster computing. It then discusses technologies used in cluster computing like Beowulf clusters and their construction. It describes how cluster computing is used in fields like bioinformatics and parallel computing through projects like Folding@Home. The document outlines different types of clusters and provides details about building a science cluster, including hardware, networking, operating systems, and parallel programming environments. It gives examples of cluster applications in science, computation, and other domains.
Ibm and Erb's Presentation Insider's Edition Event . September 2010Erb's Marketing
IBM and Erb's partnership has spanned 20 years. Together we provide world class computing and business technology solutions to growing companies across Iowa and Western Illinois. Here's joint presentation from a recent Erb's customer event - Erb's and IBM review IBM leadership in server and storage technology.
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
This slides were presented at the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15)
Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS co-kernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
The document compares several open source virtualization and container technologies. It describes experiments conducted to measure the performance and scalability of KVM, Xen, Linux-VServer, OpenVZ, VirtualBox and KQEMU. The experiments involved benchmarking the technologies using various workloads to test aspects like overhead, throughput, and ability to run multiple virtual machines concurrently. Linux-VServer showed the best overall performance, while Xen performed well except on I/O-bound workloads. The conclusions recommend using OpenVZ for network applications, Linux-VServer for general use, and KVM or VirtualBox for development environments.
Open Source Software on OpenPOWER systems.
With 100% open source system software (including the firmware), OpenPOWER is the most open server architecture in the market. Based on the IBM POWER8 chip, this new family of servers featuring the latest Nvidia NVLink technology runs all the software solutions presented at OPEN'16 with significant cost advantages. This session explains how Docker, EnterpriseDB and many others benefit from this advanced design, and how 200+ technology companies including Google and RackSpace are collaborating in an open development alliance to build the datacenter of the future.
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
This document provides an overview of IBM's portfolio of hardware, software, and services for an enterprise-grade Linux operating environment on IBM LinuxONE systems. It highlights key capabilities such as freedom and agility, standards-based design, speed of innovation, developer productivity, community collaboration, quality software, dynamic resource allocation, non-disruptive scalability, continuous business availability, and operational efficiency. The document also discusses IBM LinuxONE solutions for cloud, DevOps, analytics, elastic pricing, and announcements. It provides examples of workloads such as databases that benefit from the LinuxONE architecture.
The document discusses plans to establish an institutional high performance computing (HPC) facility at North-West University. It outlines the technical goals of building a Beowulf cluster to link existing departmental clusters and integrate with national and international computational grids. It also discusses management principles for the new HPC facility to ensure sustainability, efficiency, reliability, availability and high performance.
Interface for Performance Environment Autoconfiguration FrameworkLiang Men
The document presents PEAK (Performance Environment Autoconfiguration frameworK), a framework that helps developers and users find optimal configurations for scientific applications on HPC systems. PEAK's interface automates benchmarking of applications using different compiler, library, and environment settings. It develops a performance database to provide reference for optimal configurations to achieve speedups. The framework was tested on two HPC systems, comparing performance of applications compiled with different compilers and linked to numerical libraries including LibSci, ACML, Intel MKL, FFTW, and CRAFFT. It found default settings were not always optimal, and the framework can help select better performing configurations for applications.
Hands-On Workshop on Performance Optimization for Intel Xeon Phi Processor Family x200 (formerly Knights Landing) from Colfax International. More information at http://colfaxresearch.com/knl-webinar/
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
Evolution of the Windows Kernel Architecture, by Dave Probertyang
Dave Probert is a kernel architect at Microsoft who has over 13 years of experience working on Windows kernels. He helped design key aspects of Windows such as multi-core support and user-mode scheduling. Probert provided an overview of how the Windows kernel architecture has evolved over time from Windows NT to recent versions, focusing on changes made to improve scalability, security and energy efficiency. He also discussed Microsoft's Windows Academic Program which provides universities access to Windows kernel source code and curriculum materials.
Dave Probert is a kernel architect at Microsoft who has over 13 years of experience working on Windows kernels. He helped design key aspects of kernels from Windows 2000 to Windows 7 such as multi-core support and user-mode scheduling. Probert also runs the Windows Academic Program which provides Windows kernel source code and curriculum materials to universities to help teach operating systems concepts.
This document summarizes Intel's Cluster Studio XE and Parallel Studio XE development tools. It provides an overview of the tools' capabilities for developing applications that can efficiently scale from few cores to many cores. The tools include compilers, performance profiling and analysis tools, and libraries that support parallel programming models and provide optimized math functions and algorithms.
Dave Probert is a kernel architect at Microsoft who has worked on Windows kernels for over 13 years. He manages platform-independent kernel development and works on support for multi-core and heterogeneous parallel computing. Probert also co-instigated the Windows Academic Program which provides kernel source code and curriculum materials to universities to aid in operating systems education. The document discusses differences between the UNIX and Windows NT design environments and how those influenced OS design choices. It provides an overview of the Windows kernel architecture and changes made in newer versions like Windows 7 to improve scalability and support for multi-core systems.
The OpenEBS Hangout #4 was held on 22nd December 2017 at 11:00 AM (IST and PST) where a live demo of cMotion was shown . Storage policies of OpenEBS 0.5 were also explained
The document provides an introduction to the Linux operating system, including:
- A brief history of UNIX and Linux, describing their origins in the 1960s-1990s.
- An overview of Linux distributions, kernels, features, and structure, explaining concepts like monolithic vs. microkernel designs.
- Descriptions of key Linux components like modules, eBPF, and the roles of processes, user mode, kernel mode, and context switches.
- Discussions of ongoing developments like extended BPF which allow more dynamic programmability of the Linux kernel.
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Kento Aoyama
1. The study evaluates the performance of running MPI applications in Singularity containers on HPC clouds and local clusters.
2. Experiments were conducted on Chameleon Cloud using Intel Xeon Haswell nodes and a local cluster using Intel Knights Landing nodes.
3. Results show that Singularity incurs less than 8% overhead for MPI point-to-point and collective communications and for HPC applications, demonstrating its potential for efficiently running MPI workloads on HPC clouds and systems.
The Supermicro X12 product line, powered by 3rd Gen Intel® Xeon® Scalable processors, contains many innovations that gives organizations more performance for a variety of workloads.
Join this webinar to learn more about the outstanding performance you can get by using Supermicro X12 servers and storage systems using the latest technologies from Intel®.
Watch the webinar: https://www.brighttalk.com/webcast/17278/514618
The document provides an overview of a training session on clustering 101 and the Rocks cluster distribution. It discusses cluster types, components, pioneers in the field, interconnect technologies, sample applications and benchmarks, cluster software options, challenges of managing clusters, and the philosophy and approach of the Rocks distribution for easily building clusters.
The document describes a system called Libra, formerly known as PROSE, which aims to run applications in isolated partitions for improved control, reliability and performance. It allows creating specialized kernels as easily as applications. Resources are shared between partitions using the 9P2000 protocol over a unified file namespace. This allows finer-grained control over system services while maintaining simplicity and reliability.
Vijayendra Shamanna from SanDisk presented on optimizing the Ceph distributed storage system for all-flash architectures. Some key points:
1) Ceph is an open-source distributed storage system that provides file, block, and object storage interfaces. It operates by spreading data across multiple commodity servers and disks for high performance and reliability.
2) SanDisk has optimized various aspects of Ceph's software architecture and components like the messenger layer, OSD request processing, and filestore to improve performance on all-flash systems.
3) Testing showed the optimized Ceph configuration delivering over 200,000 IOPS and low latency with random 8K reads on an all-flash setup.
The document discusses PROSE (Partitioned Reliable Operating System Environment), an approach that runs applications in specialized kernel partitions for finer control over system resources and improved reliability. It aims to simplify development of specialized kernels and enable resource sharing across partitions. The approach is evaluated using IBM's research hypervisor rHype, which shows PROSE can reduce noise and provide more deterministic performance than Linux. Future work focuses on running larger commercial workloads and further performance/noise experiments.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Producing a variant of code is highly challenging, particularly for individuals unfamiliar with programming. This demonstration introduces a novel use of generative AI to aid end-users in customizing code. We first describe how generative AI can be used to customize code through prompts and instructions, and further demonstrate its potential in building end-user tools for configuring code. We showcase how to transform an undocumented, technical, low-level TikZ into a user-friendly, configurable, Web-based customization tool written in Python, HTML, CSS, and JavaScript and itself configurable. We discuss how generative AI can support this transformation process and traditional variability engineering tasks, such as identification and implementation of features, synthesis of a template code generator, and development of end-user configurators. We believe it is a first step towards democratizing variability programming, opening a path for end-users to adapt code to their needs.
More Related Content
Similar to Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
This document provides an overview of IBM's portfolio of hardware, software, and services for an enterprise-grade Linux operating environment on IBM LinuxONE systems. It highlights key capabilities such as freedom and agility, standards-based design, speed of innovation, developer productivity, community collaboration, quality software, dynamic resource allocation, non-disruptive scalability, continuous business availability, and operational efficiency. The document also discusses IBM LinuxONE solutions for cloud, DevOps, analytics, elastic pricing, and announcements. It provides examples of workloads such as databases that benefit from the LinuxONE architecture.
The document discusses plans to establish an institutional high performance computing (HPC) facility at North-West University. It outlines the technical goals of building a Beowulf cluster to link existing departmental clusters and integrate with national and international computational grids. It also discusses management principles for the new HPC facility to ensure sustainability, efficiency, reliability, availability and high performance.
Interface for Performance Environment Autoconfiguration FrameworkLiang Men
The document presents PEAK (Performance Environment Autoconfiguration frameworK), a framework that helps developers and users find optimal configurations for scientific applications on HPC systems. PEAK's interface automates benchmarking of applications using different compiler, library, and environment settings. It develops a performance database to provide reference for optimal configurations to achieve speedups. The framework was tested on two HPC systems, comparing performance of applications compiled with different compilers and linked to numerical libraries including LibSci, ACML, Intel MKL, FFTW, and CRAFFT. It found default settings were not always optimal, and the framework can help select better performing configurations for applications.
Hands-On Workshop on Performance Optimization for Intel Xeon Phi Processor Family x200 (formerly Knights Landing) from Colfax International. More information at http://colfaxresearch.com/knl-webinar/
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
Evolution of the Windows Kernel Architecture, by Dave Probertyang
Dave Probert is a kernel architect at Microsoft who has over 13 years of experience working on Windows kernels. He helped design key aspects of Windows such as multi-core support and user-mode scheduling. Probert provided an overview of how the Windows kernel architecture has evolved over time from Windows NT to recent versions, focusing on changes made to improve scalability, security and energy efficiency. He also discussed Microsoft's Windows Academic Program which provides universities access to Windows kernel source code and curriculum materials.
Dave Probert is a kernel architect at Microsoft who has over 13 years of experience working on Windows kernels. He helped design key aspects of kernels from Windows 2000 to Windows 7 such as multi-core support and user-mode scheduling. Probert also runs the Windows Academic Program which provides Windows kernel source code and curriculum materials to universities to help teach operating systems concepts.
This document summarizes Intel's Cluster Studio XE and Parallel Studio XE development tools. It provides an overview of the tools' capabilities for developing applications that can efficiently scale from few cores to many cores. The tools include compilers, performance profiling and analysis tools, and libraries that support parallel programming models and provide optimized math functions and algorithms.
Dave Probert is a kernel architect at Microsoft who has worked on Windows kernels for over 13 years. He manages platform-independent kernel development and works on support for multi-core and heterogeneous parallel computing. Probert also co-instigated the Windows Academic Program which provides kernel source code and curriculum materials to universities to aid in operating systems education. The document discusses differences between the UNIX and Windows NT design environments and how those influenced OS design choices. It provides an overview of the Windows kernel architecture and changes made in newer versions like Windows 7 to improve scalability and support for multi-core systems.
The OpenEBS Hangout #4 was held on 22nd December 2017 at 11:00 AM (IST and PST) where a live demo of cMotion was shown . Storage policies of OpenEBS 0.5 were also explained
The document provides an introduction to the Linux operating system, including:
- A brief history of UNIX and Linux, describing their origins in the 1960s-1990s.
- An overview of Linux distributions, kernels, features, and structure, explaining concepts like monolithic vs. microkernel designs.
- Descriptions of key Linux components like modules, eBPF, and the roles of processes, user mode, kernel mode, and context switches.
- Discussions of ongoing developments like extended BPF which allow more dynamic programmability of the Linux kernel.
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Kento Aoyama
1. The study evaluates the performance of running MPI applications in Singularity containers on HPC clouds and local clusters.
2. Experiments were conducted on Chameleon Cloud using Intel Xeon Haswell nodes and a local cluster using Intel Knights Landing nodes.
3. Results show that Singularity incurs less than 8% overhead for MPI point-to-point and collective communications and for HPC applications, demonstrating its potential for efficiently running MPI workloads on HPC clouds and systems.
The Supermicro X12 product line, powered by 3rd Gen Intel® Xeon® Scalable processors, contains many innovations that gives organizations more performance for a variety of workloads.
Join this webinar to learn more about the outstanding performance you can get by using Supermicro X12 servers and storage systems using the latest technologies from Intel®.
Watch the webinar: https://www.brighttalk.com/webcast/17278/514618
The document provides an overview of a training session on clustering 101 and the Rocks cluster distribution. It discusses cluster types, components, pioneers in the field, interconnect technologies, sample applications and benchmarks, cluster software options, challenges of managing clusters, and the philosophy and approach of the Rocks distribution for easily building clusters.
The document describes a system called Libra, formerly known as PROSE, which aims to run applications in isolated partitions for improved control, reliability and performance. It allows creating specialized kernels as easily as applications. Resources are shared between partitions using the 9P2000 protocol over a unified file namespace. This allows finer-grained control over system services while maintaining simplicity and reliability.
Vijayendra Shamanna from SanDisk presented on optimizing the Ceph distributed storage system for all-flash architectures. Some key points:
1) Ceph is an open-source distributed storage system that provides file, block, and object storage interfaces. It operates by spreading data across multiple commodity servers and disks for high performance and reliability.
2) SanDisk has optimized various aspects of Ceph's software architecture and components like the messenger layer, OSD request processing, and filestore to improve performance on all-flash systems.
3) Testing showed the optimized Ceph configuration delivering over 200,000 IOPS and low latency with random 8K reads on an all-flash setup.
The document discusses PROSE (Partitioned Reliable Operating System Environment), an approach that runs applications in specialized kernel partitions for finer control over system resources and improved reliability. It aims to simplify development of specialized kernels and enable resource sharing across partitions. The approach is evaluated using IBM's research hypervisor rHype, which shows PROSE can reduce noise and provide more deterministic performance than Linux. Future work focuses on running larger commercial workloads and further performance/noise experiments.
Similar to Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size (20)
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Producing a variant of code is highly challenging, particularly for individuals unfamiliar with programming. This demonstration introduces a novel use of generative AI to aid end-users in customizing code. We first describe how generative AI can be used to customize code through prompts and instructions, and further demonstrate its potential in building end-user tools for configuring code. We showcase how to transform an undocumented, technical, low-level TikZ into a user-friendly, configurable, Web-based customization tool written in Python, HTML, CSS, and JavaScript and itself configurable. We discuss how generative AI can support this transformation process and traditional variability engineering tasks, such as identification and implementation of features, synthesis of a template code generator, and development of end-user configurators. We believe it is a first step towards democratizing variability programming, opening a path for end-users to adapt code to their needs.
Slides presented at MODEVAR 2024 co-located with VaMoS 2024: https://modevar.github.io/program/ Thanks to Jessie Galasso and Chico Sundermann for the invitation Abstract: "Variability models (e.g., feature models) are widely considered in software engineering research and practice, in order to develop software product lines, configurable systems, generators, or adaptive systems. Numerous success stories of variability models have been reported in different domains (e.g., automotive, aerospace, avionics) and the applicability broadens. However, the use of variability models is not yet universally adopted… Why? Some examples: variability models are not continuously extracted from projects and artefacts; each time a variability model is used, its expressiveness and language should be specialized; learning-based models are decoupled from variability models; modeling software variability should cover different layers and their interactions, etc. The list is arguably opinionated, incomplete, and based on my own practical experience, but also on observations of existing works. I will give 24 reasons to occupy your day as a researcher or practitioner in the field of variability modeling."
Slides: https://github.com/acherm/24RWVMANYU-VaMoS-MODEVAR/blob/main/vamos2024.md (Markdown content) PDF: https://github.com/acherm/24RWVMANYU-VaMoS-MODEVAR/blob/main/VaMoS2024-MODEVAR.pdf (slides in PDF format)
Programming variability is central to the design and implementation of software systems that can adapt to a variety of contexts and requirements, providing increased flexibility and customization. Managing the complexity that arises from having multiple features, variations, and possible configurations is known to be highly challenging for software developers. In this paper, we explore how large language model (LLM)-based assistants can support the programming of variability. We report on new approaches made possible with LLM-based assistants, like: features and variations can be implemented as prompts; augmentation of variability out of LLM-based domain knowledge; seamless implementation of variability in different kinds of artefacts, programming languages, and frameworks, at different binding times (compile-time or run-time). We are sharing our data (prompts, sessions, generated code, etc.) to support the assessment of the effectiveness and robustness of LLMs for variability-related tasks.
https://inria.hal.science/hal-04153310/
The migration and reengineering of existing variants into a software product line (SPL) is an error-prone and time-consuming activity. Many extractive approaches have been proposed, spanning different activities from feature identification and naming to the synthesis of reusable artefacts. In this paper, we explore how large language model (LLM)-based assistants can support domain analysts and developers. We revisit four illustrative cases of the literature where the challenge is to migrate variants written in different formalism (UML class diagrams, Java, GraphML, statecharts). We systematically report on our experience with ChatGPT-4, describing our strategy to prompt LLMs and documenting positive aspects but also failures. We compare the use of LLMs with state-of-the-art approach, BUT4Reuse. While LLMs offer potential in assisting domain analysts and developers in transitioning software variants into SPLs, their intrinsic stochastic nature and restricted ability to manage large variants or complex structures necessitate a semiautomatic approach, complete with careful review, to counteract inaccuracies.
The document discusses deep software variability and the challenges of predicting software performance across different configuration layers. It provides an example of measuring the video encoder x264 across different hardware platforms, operating systems, compiler options, and input video files. The key points are that software performance prediction models may not generalize if they do not account for variability across these different layers. The document calls for more empirical studies of deep software variability in real-world systems to better understand how it manifests and which layers have the most influence. This knowledge could help reduce costs when exploring configuration spaces and transferring performance models between environments.
talk @ DiverSE coffee (informal, work in progress)
abstract: Cheating in chess is a serious issue, mainly due to the use of chess engines during play.
Chess engines like Stockfish or AlphaZero-like variant can give to the cheater a decisive advantage, since almost perfect moves can be played.
Already in 2006, Topalov accused Kramnik as part of the world championship.
Sébastien Feller https://en.wikipedia.org/wiki/S%C3%A9bastien_Feller case in Chess Olympiad in 2010 was a shock.
With the rise of online games and $$$, cheating is even more problematic.
A few weeks ago, Magnus Carlsen https://en.wikipedia.org/wiki/Magnus_Carlsen accused Hans Niemann (HN) of cheating over the board and refused to ever play him again.
You may have heard headlines with "anal bead" supposed to help HN.
In fact, I'm not specifically aiming to talk about chess (and cheating).
I rather want to discuss how science has been (and will be) at the heart of the anti-cheating chess problem.
I will first argue that many people (chess hobbyists/experts, data nerds, etc.), most being non-scientists, have actually done science for trying to demonstrate or refute the cheating case.
The basic idea is to confront moves played by humans (players) with those of computer engines.
With the sharing of data (analysis of chess games like those played by HN), scripts, and methods, numerous results and conclusions have emerged, getting popularized with social media (twitch, Youtube, twitter, etc.)
On the one hand, I've been quite excited to see all this energy for trying to advance our understanding and propose interesting ideas/analysis.
On the other hand, there have been some failures in the quality of some analysis or the choice of closed systems to compute unclear metrics.
In-between, there have been a report by chess.com and the analysis of the computer scientist Ken Regan https://cse.buffalo.edu/~regan/ the world renewed specialist.
I still think the problem is open (eg Regan's method is too conservative and missing many cases; chess.com methodology, though unclear and opaque at some points, is certainly effective for online cheating, but not over the board detection).
I will present a variability model of the space of experiments/methods that can be considered to address the problem.
This model can be used to pilot the collaborative effort, to reproduce, replicate or reject some experiments, and to gain confidence or robustness in some conclusions.
Another last point I want to discuss is that most probably the anti-cheating chess problem cannot be resolved solely with retrospective computational analysis.
It's just too uncertain, especially if cheaters are "smart".
(Cyber-)security experts, psychologists, chess players, and of course computer science nerds/professionals can contribute to address this multidisciplinary problem.
Keynote VariVolution/VM4ModernTech@SPLC 2022
At compile-time or at runtime, varying software is a powerful means to achieve optimal functional and performance goals. An observation is that only considering the software layer might be naive to tune the performance of the system or test that the functionality behaves correctly. In fact, many layers (hardware, operating system, input data, build process etc.), themselves subject to variability, can alter performances of software configurations. For instance, configurations' options may have very different effects on execution time or energy consumption when used with different input data, depending on the way it has been compiled and the hardware on which it is executed.
In this talk, I will introduce the concept of “deep software variability” which refers to the interactions of all external layers modifying the behavior or non-functional properties of a software system. I will show how compile-time options, inputs, and software evolution (versions), some dimensions of deep variability, can question the generalization of the variability knowledge of popular configurable systems like Linux, gcc, xz, or x264.
I will then argue that machine learning (ML) is particularly suited to manage very large variants space. The key idea of ML is to build a model based on sample data -- here observations about software variants in variable settings -- in order to make predictions or decisions. I will review state-of-the-art solutions developed in software engineering and software product line engineering while connecting with works in ML (e.g., transfer learning, dimensionality reduction, adversarial learning). Overall, the key challenge is to leverage the right ML pipeline in order to harness all variability layers (and not only the software layer), leading to more efficient systems and variability knowledge that truly generalizes to any usage and context.
From this perspective, we are starting an initiative to collect data, software, reusable artefacts, and body of knowledge related to (deep) software variability: https://deep.variability.io
Finally, I will open a broader discussion on how machine learning and deep software variability relate to the reproducibility, replicability, and robustness of scientific, software-based studies (e.g., in neuroimaging and climate modelling).
Software has eaten the world and will continue to impact society, spanning numerous application domains, including the way we're doing science and acquiring knowledge.
Software variability is key since developers, engineers, entrepreneurs, scientists can explore different hypothesis, methods, and custom products to hopefully fit a diversity of needs and usage.
In this talk, I will first illustrate the importance of software variability using different concrete examples.
I will then show that, though highly desirable, software variability also introduces an enormous complexity due to the combinatorial explosion of possible variants.
For example, 10^6000 variants of the Linux kernel may be build, much more than the estimated number of atoms in the universe (10^80).
Hence, the big picture and challenge for the audience: any organization capable of mastering software variability will have a competitive advantage.
Summer School EIT Digital.
5th July Rennes 2022
Biology, medicine, physics, astrophysics, chemistry: all these scientific domains need to process large amount of data with more and more complex software systems. For achieving reproducible science, there are several challenges ahead involving multidisciplinary collaboration and socio-technical innovation with software at the center of the problem. Despite the availability of data and code, several studies report that the same data analyzed with different software can lead to different results. I am seeing this problem as a manifestation of deep software variability: many factors (operating system, third-party libraries, versions, workloads, compile-time options and flags, etc.) themselves subject to variability can alter the results, up to the point it can dramatically change the conclusions of some scientific studies. In this keynote, I argue that deep software variability is a threat and also an opportunity for reproducible science. I first outline some works about (deep) software variability, reporting on preliminary evidence of complex interactions between variability layers. I then link the ongoing works on variability modelling and deep software variability in the quest for reproducible science.
Most modern software systems are subject to variation or come in many variants. Web browsers like Firefox or Chrome are available on different operating systems, in different languages, while users can configure 2000+ preferences or install numerous 3rd parties extensions (or plugins). Web servers like Apache, operating systems like the Linux kernel, or a video encoder like x264 are other examples of software systems that are highly configurable at compile-time or at run-time for delivering the expected functionality andmeeting the various desires of users. Variability ("the ability of a software system or artifact to be efficiently extended, changed,customized or configured for use in a particular context") is therefore a crucial property of software systems. Organizations capable of mastering variability can deliver high-quality variants (or products) in a short amount of time and thus attract numerous customers, new use-cases or usage contexts. A hard problem for end-users or software developers is to master the combinatorial explosion induced by variability: Hundreds of configuration options can be combined, each potentially with distinct functionality and effects on execution time, memory footprint, quality of the result, etc. The first part of this course will introduce variability-intensive systems, their applications and challenges, in various software contexts. We will use intuitive examples (like a generator of LaTeX paper variants) and real-world systems (like the Linux kernel). A second objective of this course is to show the relevance of ArtificialIntelligence (AI) techniques for exploring and taming such enormous variability spaces. In particular, we will introduce how (1) satisfiability and constraint programming solvers can be used to properly model and reason about variability; (2) how machine learning can be used to discover constraints and predict the variability behavior of configurable systems or software product lines.
http://ejcp2019.icube.unistra.fr/
Presented at SIGCSE 2018: https://sigcse2018.sigcse.org/
Software Product Line (SPL) engineering has emerged to provide the means to efficiently model, produce, and maintain multiple similar software variants, exploiting their common properties, and managing their variabilities (differences). With over two decades of existence, the community of SPL researchers and practitioners is thriving as can be attested by the extensive research output and the numerous successful industrial projects. Education has a key role to support the next generation of practitioners to build highly complex, variability-intensive systems. Yet, it is unclear how the concepts of variability and SPLs are taught, what are the possible missing gaps and difficulties faced, what are the benefits, or what is the material available. Also, it remains unclear whether scholars teach what is actually needed by industry. In this article we report on three initiatives we have conducted with scholars, educators, industry practitioners, and students to further understand the connection between SPLs and education, i.e., an online survey on teaching SPLs we performed with 35 scholars, another survey on learning SPLs we conducted with 25 students, as well as two workshops held at the International Software Product Line Conference in 2014 and 2015 with both researchers and industry practitioners participating. We build upon the two surveys and the workshops to derive recommendations for educators to continue improving the state of practice of teaching SPLs, aimed at both individual educators as well as the wider community.
This document proposes exploiting the enumeration of all configurations of a feature model as a new perspective for automated reasoning with distributed computing. It discusses (1) enumerating configurations in parallel to improve scalability, (2) pre-compiling configurations offline to speed up costly operations like counting, core, and dead features, and (3) how the approach is not always best and depends on the feature model, operation, and time requirements. Preliminary evaluations show the approach is more efficient for large models but not always, and future work is needed to determine when enumeration is best versus traditional solvers.
Product Derivation is a key activity in Software Product Line Engineering. During this process, derivation operators modify or create core assets (e.g., model elements, source code instructions, components) by adding, removing or substituting them according to a given configuration. The result is a derived product that generally needs to conform to a programming or modeling language. Some operators lead to invalid products when applied to certain assets, some others do not; knowing this in advance can help to better use them, however this is challenging, specially if we consider assets expressed in extensive and complex languages such as Java. In this paper, we empirically answer the following question: which product line operators, applied to which program elements, can synthesize variants of programs that are incorrect, correct or perhaps even conforming to test suites? We implement source code transformations, based on the derivation operators of the Common Variability Language. We automatically synthesize more than 370,000 program variants from a set of 8 real large Java projects (up to 85,000 lines of code), obtaining an extensive panorama of the sanity of the operations.
Paper was presented at SPLC'15
Many real-world product lines are only represented as non-hierarchical collections of distinct products, described by their configuration values. As the manual preparation of feature models is a tedious and labour-intensive activity, some techniques have been proposed to automatically generate boolean feature models from product descriptions. However , none of these techniques is capable of synthesizing feature attributes and relations among attributes, despite the huge relevance of attributes for documenting software product lines. In this paper, we introduce for the first time an algorithmic and parametrizable approach for computing a legal and appropriate hierarchy of features, including feature groups, typed feature attributes, domain values and relations among these attributes. We have performed an empirical evaluation by using both randomized configuration matrices and real-world examples. The initial results of our evaluation show that our approach can scale up to matrices containing 2,000 attributed features, and 200,000 distinct configurations in a couple of minutes.
Paper has been presented at SPLC'15 (Nashville, USA)
Variability is omnipresent in numerous kinds of artefacts (e.g., source code, product matrices) and in different shapes (e.g., conditional compilation, differences among product descriptions). For understanding, reasoning about, maintaining or evolving variability, practitioners usually need an explicit encoding of variability (ie a variability model). As a result, numerous techniques have been developed to reverse engineer variability (e.g., through the mining of features and constraints from source code) or for migrating a set of products as a variability system. In this talk we will first present tool-supported techniques for synthesizing variability models from constraints or product descriptions.
Practitioners can build Boolean feature models with an interactive environment for selecting a meaningful and sound hierarchy.
Attributes can also be synthesized for encoding numerical values and constraints among them.
We will present key results we obtain through experiments with the SPLOT repository and product comparison matrices coming from Wikipedia and BestBuy.
Finally we will introduce OpenCompare.org a recent initiative for editing, reasoning, and mining product comparison matrices.
This talk has been done at REVE'15 workshop co-located with SPLC'15 (software product line conference): http://www.isse.jku.at/reve2015/program.html
Pandoc (http://pandoc.org/) is a universal document converter. Taking as input Markdown/Wiki-like syntax, you can obtain an .odt, .docx, .tex, or even Beamer slides (like here)!
In this presentation we introduce Pandoc.
We then zoom on some interesting technical details (parameters, template language, ability to embed LaTeX into Markdown, ways to rewrite your AST with external bindings, etc.)
We believe Pandoc is a very useful piece of software that is worth studying (including for education/research purposes).
The slides have been written by Erwan Bousse (http://people.irisa.fr/Erwan.Bousse/) in Pandoc (of course!) and presented in the context of the DiverSE coffee (a weekly meeting of the DiverSE team: http://diverse.irisa.fr).
#1 The diversity of terminology shows the large spectrum of shapes DSLs can take.
#2 As syntax and development environment matter, DSLs should allow the user to choose the right shape according to their usage or task.
#3 A metamorphic DSL vision is proposed where DSLs can adapt to the most appropriate shape, including transitioning between shapes based on usage or task.
The document discusses customization and 3D printing from a software product line perspective. The researchers observed the Thingiverse community to see how they interact and collaborate to customize and produce 3D models. They found that while variability concepts are present, there is no constraints modeling and configuration leads to many issues due to huge complexity with 38 parameters across 8 tabs and 10^28 possible configurations. Software product line engineering techniques like variability modeling and implementation could help address challenges of complexity and cognitive effort for non-software developers customizing 3D models, but may not provide clear benefits for small communities in garages. Future work includes automated techniques to better analyze large datasets and help communities manage complexity.
Feature Models (FMs) are the de-facto standard for documenting, model checking, and reasoning about the configurations of a software system. This paper introduces WebFML a comprehensive environment for synthesizing FMs from various kinds of artefacts (e.g. propositional formula, dependency graph, FMs or product comparison matrices). A key feature of WebFML is an interactive support (through ranking lists, clusters, and logical heuristics) for choosing a sound and meaningful hierarchy. WebFML opens avenues for numerous practical applications (e.g., merging multiple product lines, slicing a configuration process, reverse engineering configurable systems).
tinyurl.com/WebFMLDemo
More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS (20)
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfUndress Baby
The quest for the best AI face swap solution is marked by an amalgamation of technological prowess and artistic finesse, where cutting-edge algorithms seamlessly replace faces in images or videos with striking realism. Leveraging advanced deep learning techniques, the best AI face swap tools meticulously analyze facial features, lighting conditions, and expressions to execute flawless transformations, ensuring natural-looking results that blur the line between reality and illusion, captivating users with their ingenuity and sophistication.
Web:- https://undressbaby.com/
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size
1. Transfer Learning Across
Variants and Versions
The Case of Linux Kernel Size
Hugo Martin, Mathieu Acher, Juliana Alves Pereira, Luc Lesoil,
Jean-Marc Jézéquel, and Djamel Eddine Khelladi
Published at IEEE Transactions on Software Engineering
(TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817
6. Challenge 1: you cannot build ≈106000
configurations; sampling and learning to the
rescue but still (very) costly!
7.1Mb
176.8Mb
?
Variability
in
space
7. Challenge 2: Linux evolves; (heterogeneous)
transfer learning!
7.1Mb
176.8Mb
?
v4.13 v5.8
3 years
later…
?
?
?
Variability
in
space
…Variability in time…
8. Instead of learning from scratch or reusing “as is”,
we propose to transfer (adapt) a prediction model.
A problem overlooked in the literature is that the
feature spaces differ among versions since options
are added/removed.
We propose TEAMS for transfer evolution-aware
model shifting.
Key results (more details in the rest of the talk):
Rapid degradation of prediction models due to evolution (up to 32% prediction errors)
Effective transfer learning with TEAMS (reaching same accuracy 3 years later + SOTA)
Possible to learn colossal configuration spaces such as the Linux kernel one across
variants and versions
12. A challenging case
● Targeted non-functional, quantitative property: binary size
○ interest for maintainers/users of the Linux kernel (embedded systems, cloud, etc.)
○ challenging to predict (cross-cutting options, interplay with compilers/build systems, etc/.)
● Dataset: version 4.13.3, x86_64 arch, measurements of 95K+ random
configurations
○ paranoiac about deep variability since 2017: Docker to control the build environment and scale
○ build: 8 minutes on average
○ diversity: from 7Mb to 1.9Gb
● Do existing techniques work?
○ most of the work in performance prediction consider a relatively low number of options (<50)
○ Linux has 9K+ options for x86_64
2
12
14. TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
15. TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python3 kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
16. Data: version 4.13.3 and 4.15 (x86_64)
74K+ configurations for Linux 4.15
95K+ configurations for Linux 4.13.3
(and 15K hours of computation on a grid computing)
17. A challenging case
● Linear-based algorithms : high error rate (it’s not additive!)
● Polynomial regression & performance-influence model : Out Of Memory (too much interactions and
not designed for 9K+ options)
● Tree-based algorithms & neural networks: low error rate
Mean Absolute Percentage Error
(MAPE): the lower the better
17
Mathieu Acher, Hugo Martin, Juliana Alves Pereira, Arnaud Blouin, Jean-Marc Jézéquel, Djamel Eddine Khelladi, Luc Lesoil, and Olivier Barais.
Learning Very Large Configuration Spaces: What Matters for Linux Kernel Sizes (2019) https://hal.inria.fr/hal-02314830
N : percentage of the
dataset used to training
for Linux 4.13.3
18. Challenge 2: Linux evolves; (heterogeneous)
transfer learning!
7.1Mb
176.8Mb
?
v4.13 v5.8
3 years
later…
?
?
?
Variability
in
space
…Variability in time…
19. Can we reuse a prediction model among versions?
No. Accuracy quickly
decreases:
○ 4.13: 6%
○ 4.15: 20%
○ 5.7: 35%
3
19
Insights about evolution of (important) options in the TSE article!
20. Transfer learning to the rescue
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 95K+ configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
● Heterogeneous transfer learning: the feature space is different!
5
20
21. Heterogeneity between versions
● Feature changes
○ Deleted features
○ New features
● Model compatible only with 4.13 features (only them, all of them)
○ Delete new features
○ Create dummy values for old features
■ Set all features to 1
4.13
4.15
4
4.15 compatible
with Model 4.13
22. Transfer learning to the rescue
● Heterogeneous transfer learning: the feature space is
different
● TEAMS: transfer evolution-aware model shifting
○ train a source model (aka reuse of prediction model)
○ “align” feature space
○ learn the transfer function through model shift
■ train (1) over the targeted version and (2) using the
predicted value of the source
■ linear model shift is not sufficient (intuitively, the relation
“source_size * k = target_size” does not hold and is much
more complex!)
● Bonus: TEAMS is compositional and you can combine
transfer across successive versions
5
22
Insights about composing TEAMS in the TSE article!
24. TEAMS: transfer evolution-aware model shifting
5
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
24
(5K configurations
for training)
25. Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Works for many kernel versions and any configuration x86_64
Error : ≃ 6.3%
97% of the predictions are below 20% error
12
H. Martin, M. Acher, J. A. Pereira, L. Lesoil, J. Jézéquel and D. E. Khelladi, “Transfer learning across variants and versions: The
case of linux kernel size” Transactions on Software Engineering (TSE), 2021 25
26. Conclusion
Learning across variants and versions at the scale of Linux
Direct perspectives for Linux:
● Considering more recent versions (negative transfer?)
● Deep variability: interplay with compilers (gcc version/clang version) or achitecture (eg
x86_32)
● Beyond binary size, targetting other quantitative properties (performance, energy, security,
build time, etc.)
Beyond Linux, mobile/web apps, web browsers, compilers, database systems, cloud services,
image processing, distributed streaming platforms, data transfer tools are also:
● highly configurable (variability in space)
● continuously evolving with the addition or removal of options + maintenance of code base
(variability in time)
Does evolution degrade performance model?
Is (heterogeneous) transfer learning (eg TEAMS) effective?
27. Published at IEEE Transactions on Software
Engineering (TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817
29. Transfer learning
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 100.000 configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
○ Budget : 5.000 configurations measurements (one night worth of ISTIC computing power)
5
49. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
9
50. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Source + Shifting Model = Full Model
9
Incremental Model Shifting
51. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.15 + Shifting Model 4.20 = Model 4.20
Source + Shifting Model = Full Model
9
Incremental Model Shifting
52. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.15 + Shifting Model 4.20 = Model 4.20
Model 4.20 + Shifting Model 5.0 = Model 5.0
Model 5.0 + Shifting Model 5.4 = Model 5.4
Model 5.4 + Shifting Model 5.7 = Model 5.7
Model 5.7 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
9
Incremental Model Shifting
54. Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
10
55. Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
10
56. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
10
57. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
10
58. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
10
59. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
10
60. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
Budget : 10.000 configurations
● Model shifting :
○ From 6.2% to 6.7% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 6.1% to 6.7%
10
61. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 1.000 configurations
● Model shifting :
○ From 8.5% to 11.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 8.5% to 13.8%
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
Budget : 10.000 configurations
● Model shifting :
○ From 6.2% to 6.7% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 6.1% to 6.7%
10
62. Summary
● Model 4.13 is saved
○ Positively reuse old model on new version at lower cost
○ Better than learning from scratch for years
● Incremental Shifting
○ More sensible to previous models error
○ Better use of more transfer budget
11
63. Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Error : ≃ 6.3%
97% of the predictions are below 20% error
12