While the performance overhead of IPC in microkernel multiserver operating systems is no longer considered a blocker for their practical deployment (thanks to many optimization ideas that have been proposed and implemented over the years), it is undeniable that the overhead still does exist and the more fine-grained the architecture of the operating system is (which is desirable from the reliability, dependability, safety and security point of view), the more severe performance penalties due to the IPC overhead it suffers. A closely related issue is the overhead of handing hardware interrupts in user space device drivers. This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
One reason for the IPC overhead is the fact that current hardware and CPUs were never designed with microkernel multiserver operating systems in mind, but they were rather fitted for the traditional monolithic operating systems. This calls for an out-of-the-box thinking while designing instruction set architecture (ISA) extensions and other hardware features that would support (a) efficient communication between isolated virtual address spaces using synchronous and asynchronous IPC primitives, and (b) treating object references (e.g. capability references) as first-class entities on the hardware level. A good testbed for evaluating such approaches (with the potential to be eventually adopted as industry standard) is the still unspecified RV128 ISA (128-bit variant of RISC-V).
This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
Lessons Learned from Porting HelenOS to RISC-VMartin Děcký
HelenOS is an open source operating system based on the microkernel multiserver design principles. One of its goals is to provide excellent target platform portability. From the time of its inception, HelenOS already supported 4 different hardware platforms and currently it supports platforms as diverse as x86, SPARCv9 and ARM. This talk presents practical experiences and lessons learned from porting HelenOS to RISC-V.
While the unprivileged (user space) instruction set architecture of RISC-V has been declared stable in 2014, the privileged instruction set architecture is technically still allowed to change in the future. Likewise, many major design features and building blocks of HelenOS are already in place, but no official commitment to ABI or API stability has been made yet. This gives an interesting perspective on the pros and cons of both HelenOS and RISC-V. The talk also points to some possible research directions with respect to hardware/software co-design.
Hardware Implementation of Algorithm for Cryptanalysisijcisjournal
Cryptanalysis of block ciphers involves massive computations which are independent of each other and can
be instantiated simultaneously so that the solution space is explored at a faster rate. With the advent of low
cost Field Programmable Gate Arrays (FPGA’s), building special purpose hardware for computationally
intensive applications has now become possible. For this the Data Encryption Standard (DES) is used as a
proof of concept. This paper presents the design for Hardware implementation of DES cryptanalysis on
FPGA using exhaustive key search. Two architectures viz. Rolled and Unrolled DES architecture are compared
and based on experimental result the Rolled architecture is implemented on FPGA. The aim of this
work is to make cryptanalysis faster and better.
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisPietro De Nicolao
Presentation of paper "LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis" for the course of Advanced Topics in Computer Security of prof. Stefano Zanero.
Source and further information: https://github.com/pietrodn/lo-phi
Lessons Learned from Porting HelenOS to RISC-VMartin Děcký
HelenOS is an open source operating system based on the microkernel multiserver design principles. One of its goals is to provide excellent target platform portability. From the time of its inception, HelenOS already supported 4 different hardware platforms and currently it supports platforms as diverse as x86, SPARCv9 and ARM. This talk presents practical experiences and lessons learned from porting HelenOS to RISC-V.
While the unprivileged (user space) instruction set architecture of RISC-V has been declared stable in 2014, the privileged instruction set architecture is technically still allowed to change in the future. Likewise, many major design features and building blocks of HelenOS are already in place, but no official commitment to ABI or API stability has been made yet. This gives an interesting perspective on the pros and cons of both HelenOS and RISC-V. The talk also points to some possible research directions with respect to hardware/software co-design.
Hardware Implementation of Algorithm for Cryptanalysisijcisjournal
Cryptanalysis of block ciphers involves massive computations which are independent of each other and can
be instantiated simultaneously so that the solution space is explored at a faster rate. With the advent of low
cost Field Programmable Gate Arrays (FPGA’s), building special purpose hardware for computationally
intensive applications has now become possible. For this the Data Encryption Standard (DES) is used as a
proof of concept. This paper presents the design for Hardware implementation of DES cryptanalysis on
FPGA using exhaustive key search. Two architectures viz. Rolled and Unrolled DES architecture are compared
and based on experimental result the Rolled architecture is implemented on FPGA. The aim of this
work is to make cryptanalysis faster and better.
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisPietro De Nicolao
Presentation of paper "LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis" for the course of Advanced Topics in Computer Security of prof. Stefano Zanero.
Source and further information: https://github.com/pietrodn/lo-phi
In this deck, Torsten Hoefler from ETH Zurich presents: Data-Centric Parallel Programming.
"The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the skill-set of the average domain scientist. To maintain performance portability in the future, it is imperative to decouple architecture-specific programming paradigms from the underlying scientific computations. We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate representation that enables separating code definition from its optimization. We show how to tune several applications in this model and IR. Furthermore, we show a global, datacentric view of a state-of-the-art quantum transport simulator to optimize its execution on supercomputers. The approach yields coarse and fine-grained data-movement characteristics, which are used for performance and communication modeling, communication avoidance, and data-layout transformations. The transformations are tuned for the Piz Daint and Summit supercomputers, where each platform requires different caching and fusion strategies to perform optimally. We show that SDFGs deliver competitive performance, allowing domain scientists to develop applications naturally and port them to approach peak hardware performance without modifying the original scientific code."
Watch the video: https://wp.mep3RLHQ-kup
Learn more: http://htor.inf.ethz.ch
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An Efficient PDP Scheme for Distributed Cloud StorageIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Edge computing brings cloud services closer to the edge of the network, where data originates, and dramatically reduces the network latency of the cloud. It is a bridge linking clouds and users making the foundation for novel interconnected applications. However, edge computing still faces many challenges like remote configuration, well-defined native applications model, and limited node capacity. It lacks geo-organization and a clear separation of concerns. As such edge computing is hard to be offered as a service for future real-time user-centric applications. This paper presents the dynamic organization of geo-distributed edge nodes into micro data-centers to cover any arbitrary area and expand capacity, availability, and reliability. A cloud organization is used as an influence with adaptations for a different environment, and a model for edge applications utilizing these adaptations is presented. It is argued that the presented model can be integrated into existing solutions or used as a base for the development of future systems. Furthermore, a clear separation of concerns is given for the proposed model. With the separation of concerns setup, edge-native applications model, and a unified node organization, we are moving towards the idea of edge computing as a service, like any other utility in cloud computing.
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYijgca
Using Grid Storage, users can remotely store their data and enjoy the on-demand high quality applications and services from a shared networks of configurable computing resources, without the burden of local data storage and maintenance. In this project based on the dynamic secrets proposed design an encryption scheme for SG wireless communication, named as dynamic secret-based encryption (DSE). Dynamic encryption key (DEK) is updated by XOR the previous DEK with current DS. In this project based on the dynamic secrets proposed design an encryption scheme for SG wireless communication, named as dynamic secret-based encryption (DSE). The basic idea of dynamic secrets is to generate a series of secrets from unavoidable transmission errors and other random factors in wireless communications In DSE, the previous packets are coded as binary values 0 and 1 according to whether they are retransmitted due to channel error. This 0/1 sequence is called as retransmission sequence (RS) which is applied to generate dynamic secret (DS). Dynamic encryption key (DEK) is updated by XOR the previous DEK with current DS
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...ijcsit
Preserving confidentiality, integrity and authenticity of images is becoming very important. There are so many different encryption techniques to protect images from unauthorized access. Matrix multiplication
can be successfully used to encrypt-decrypt digital images. In this paper we made a comparison study between two image encryption techniques based on matrix multiplication namely, segmentation and parallel methods.
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYijcisjournal
This paper evaluates the security of wireless communication network based on the fuzzy logic in Mat lab. A new algorithm is proposed and evaluated which is the hybrid algorithm. We highlight the valuable assets in designing of wireless network communication system based on network simulator (NS2), which is crucial to protect security of the systems. Block cipher algorithms are evaluated by using fuzzy logics and a hybrid
algorithm is proposed. Both algorithms are evaluated in term of the security level. Logic (AND) is used in the rules of modelling and Mamdani Style is used for the evaluations
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...IJNSA Journal
Distributed security is an evolving sub-domain of information and network security. Security applications play a serious role when data exchanging, different volumes of data should be transferred from one site to another safely and at high speed. In this paper, the parallel International Data Encryption Algorithm (IDEA) which is one of the security applications is implemented and evaluated in terms of running time, speedup, and efficiency. The parallel IDEA has been implemented using message passing interface (MPI) library, and the results have been conducted using IMAN1 Supercomputer, where a set of simulation runs carried out on different data sizes to define the best number of processor which can be used to manipulate these data sizes and to build a visualization about the processor number that can be used while the size of data increased. The experimental results show a good performance by reducing the running time, and increasing speed up of encryption and decryption processes for parallel IDEA when the number of processors ranges from 2 to 8 with achieved efficiency 97% to 83% respectively.
Final Year Project Synopsis: Post Quantum Encryption using Neural NetworksJPC Hanson
A synopsis of my final year project at Brunel University exploring the possibilities of using Neural Networks as a method of encryption immune to Shor's algorithm. i.e. a secure, 'post quantum' alternative to the NTRU algorithms.
A New Direction for Computer Architecture Researchdbpublications
This paper we suggest a different computing environment as a worthy new direction for computer architecture research: personal mobile computing, where portable devices are used for visual computing and personal communications tasks. Such a device supports in an integrated fashion all the functions provided to-day by a portable computer, a cellular phone, a digital camera and a video game. The requirements placed on the processor in this environment are energy efficiency, high performance for multimedia and DSP functions, and area efficient, scalable designs. We examine the architectures that were recently pro-posed for billion transistor microprocessors. While they are very promising for the stationary desktop and server workloads, we discover that most of them are un-able to meet the challenges of the new environment and provide the necessary enhancements for multimedia applications running on portable devices.
In this deck, Torsten Hoefler from ETH Zurich presents: Data-Centric Parallel Programming.
"The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the skill-set of the average domain scientist. To maintain performance portability in the future, it is imperative to decouple architecture-specific programming paradigms from the underlying scientific computations. We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate representation that enables separating code definition from its optimization. We show how to tune several applications in this model and IR. Furthermore, we show a global, datacentric view of a state-of-the-art quantum transport simulator to optimize its execution on supercomputers. The approach yields coarse and fine-grained data-movement characteristics, which are used for performance and communication modeling, communication avoidance, and data-layout transformations. The transformations are tuned for the Piz Daint and Summit supercomputers, where each platform requires different caching and fusion strategies to perform optimally. We show that SDFGs deliver competitive performance, allowing domain scientists to develop applications naturally and port them to approach peak hardware performance without modifying the original scientific code."
Watch the video: https://wp.mep3RLHQ-kup
Learn more: http://htor.inf.ethz.ch
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An Efficient PDP Scheme for Distributed Cloud StorageIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Edge computing brings cloud services closer to the edge of the network, where data originates, and dramatically reduces the network latency of the cloud. It is a bridge linking clouds and users making the foundation for novel interconnected applications. However, edge computing still faces many challenges like remote configuration, well-defined native applications model, and limited node capacity. It lacks geo-organization and a clear separation of concerns. As such edge computing is hard to be offered as a service for future real-time user-centric applications. This paper presents the dynamic organization of geo-distributed edge nodes into micro data-centers to cover any arbitrary area and expand capacity, availability, and reliability. A cloud organization is used as an influence with adaptations for a different environment, and a model for edge applications utilizing these adaptations is presented. It is argued that the presented model can be integrated into existing solutions or used as a base for the development of future systems. Furthermore, a clear separation of concerns is given for the proposed model. With the separation of concerns setup, edge-native applications model, and a unified node organization, we are moving towards the idea of edge computing as a service, like any other utility in cloud computing.
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYijgca
Using Grid Storage, users can remotely store their data and enjoy the on-demand high quality applications and services from a shared networks of configurable computing resources, without the burden of local data storage and maintenance. In this project based on the dynamic secrets proposed design an encryption scheme for SG wireless communication, named as dynamic secret-based encryption (DSE). Dynamic encryption key (DEK) is updated by XOR the previous DEK with current DS. In this project based on the dynamic secrets proposed design an encryption scheme for SG wireless communication, named as dynamic secret-based encryption (DSE). The basic idea of dynamic secrets is to generate a series of secrets from unavoidable transmission errors and other random factors in wireless communications In DSE, the previous packets are coded as binary values 0 and 1 according to whether they are retransmitted due to channel error. This 0/1 sequence is called as retransmission sequence (RS) which is applied to generate dynamic secret (DS). Dynamic encryption key (DEK) is updated by XOR the previous DEK with current DS
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...ijcsit
Preserving confidentiality, integrity and authenticity of images is becoming very important. There are so many different encryption techniques to protect images from unauthorized access. Matrix multiplication
can be successfully used to encrypt-decrypt digital images. In this paper we made a comparison study between two image encryption techniques based on matrix multiplication namely, segmentation and parallel methods.
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYijcisjournal
This paper evaluates the security of wireless communication network based on the fuzzy logic in Mat lab. A new algorithm is proposed and evaluated which is the hybrid algorithm. We highlight the valuable assets in designing of wireless network communication system based on network simulator (NS2), which is crucial to protect security of the systems. Block cipher algorithms are evaluated by using fuzzy logics and a hybrid
algorithm is proposed. Both algorithms are evaluated in term of the security level. Logic (AND) is used in the rules of modelling and Mamdani Style is used for the evaluations
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...IJNSA Journal
Distributed security is an evolving sub-domain of information and network security. Security applications play a serious role when data exchanging, different volumes of data should be transferred from one site to another safely and at high speed. In this paper, the parallel International Data Encryption Algorithm (IDEA) which is one of the security applications is implemented and evaluated in terms of running time, speedup, and efficiency. The parallel IDEA has been implemented using message passing interface (MPI) library, and the results have been conducted using IMAN1 Supercomputer, where a set of simulation runs carried out on different data sizes to define the best number of processor which can be used to manipulate these data sizes and to build a visualization about the processor number that can be used while the size of data increased. The experimental results show a good performance by reducing the running time, and increasing speed up of encryption and decryption processes for parallel IDEA when the number of processors ranges from 2 to 8 with achieved efficiency 97% to 83% respectively.
Final Year Project Synopsis: Post Quantum Encryption using Neural NetworksJPC Hanson
A synopsis of my final year project at Brunel University exploring the possibilities of using Neural Networks as a method of encryption immune to Shor's algorithm. i.e. a secure, 'post quantum' alternative to the NTRU algorithms.
A New Direction for Computer Architecture Researchdbpublications
This paper we suggest a different computing environment as a worthy new direction for computer architecture research: personal mobile computing, where portable devices are used for visual computing and personal communications tasks. Such a device supports in an integrated fashion all the functions provided to-day by a portable computer, a cellular phone, a digital camera and a video game. The requirements placed on the processor in this environment are energy efficiency, high performance for multimedia and DSP functions, and area efficient, scalable designs. We examine the architectures that were recently pro-posed for billion transistor microprocessors. While they are very promising for the stationary desktop and server workloads, we discover that most of them are un-able to meet the challenges of the new environment and provide the necessary enhancements for multimedia applications running on portable devices.
Course: "Introductory course to HLS FPGA programming"Mirko Mariotti
Slides of the course: "Introductory course to HLS FPGA programming", Nov 27 – 30, 2023. ICSC National research center on HPC, big data and Quantum Computing
In this deck from the HPC User Forum in Austin, Yutaka Ishikawa from Riken AICS presents: Japan's post K Computer.
Watch the video presentation: http://wp.me/p3RLHQ-fJ6
Learn more: http://hpcuserforum.com
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
Position paper for the NIST Lightweight Cryptography Workshop, 20th and 21st July 2015, Gaithersburg, US.
The link to the workshop is available at: http://www.nist.gov/itl/csd/ct/lwc_workshop2015.cfm
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Architecting Solutions for the Manycore FutureTalbott Crowell
This talk will focus solution architects toward thinking about parallelism when designing applications and solutions specifically Threads vs Tasks on TPL, LINQ vs. PLINQ, and Object Oriented versus Functional Programming techniques. This talk will also compare programming languages, how languages differ when dealing with manycore programming, and the different advantages to these languages. Demonstration include C#, VB, and F# features for functional programming, LINQ and TPL. A demonstration of the Concurrency Visualizer in Visual Studio 2010 will also be included.
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLdbpublications
The goal of this project is to implement the (AES) encryption system using Verilog. To do this, several separate sections of the algorithm will be coded to work together towards the end goal of performing the correct encryption routines. A telecommand is a command sent to control a remote system or systems i.e not directly connected (e.g. via wires) to the place from which the telecommand is sent. The telecommand word is derived from tele = remote (Greek), and command = to entrust/order (Latin). Systems that need remote measurement and reporting of information of interest to the system designer or operator, require the counterpart of telecommand, telemetry. For a telecommand (TC) to be effective, it must be compiled into a pre-arranged format (which may follow a standard structure), modulated onto a carrier wave which is then transmitted with adequate power to the remote system. The remote system will then demodulates the digital signal from the carrier, decode the telecommand, and execute it.
ZCloud Consensus on Hardware for Distributed SystemsGokhan Boranalp
3rd Workshop on Dependability,
May 8, Monday 2017, İYTE,
https://goo.gl/fSVnZy
http://dcs.iyte.edu.tr/ws/ppt/10/presentation.pdf
In distributed applications where the number of members in the cluster increases, the
separation of the consensus related operations at the hardware level is essential for the
following reasons:
1. At the operating system level, messages broadcast on the protocol stack cause latency.
2. It is necessary to increase the number of completed transactions in the communication of
distributed system components and on the network unit (throughput).
3. For devices with limited storage and CPU computing facilities that use embedded operating
systems such as IOT devices, it is also necessary to reduce the processing burden due to
"consensus" operations.
4. A common consensus communication model is needed for different applications that need
to work together in (BFT) distributed systems.
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
I understand that physics and hardware emmaded on the use of finete element methods to predict
fluid flow over airplane wings,that progress is likely to continue. However, in recent years, this
progress has been achieved through greatly increased hardware complexity with the rise of
multicore and manycore processors, and this is affecting the ability of application developers to
achieve the full potential of these systems. currently performance is measured on a dense
matrix–matrix multiplication test which has questionable relevance to real applications.the
incredible advances in processor technology and all of the accompanying aspects of computer
system design, such as the memory subsystem and networking
In embedded it seems to combination of both hardware and the software , it is used to be
combined function of action in the systems .while we do that the application to developed in the
achieve the full potential of the systems in advanced processer technology.
Hardware
(1) Memory
Advances in memory technology have struggled to keep pace with the phenomenal advances in
processors. This difficulty in improving the main memory bandwidth led to the development of a
cache hierarchy with data being held in different cache levels within the processor. The idea is
that instead of fetching the required data multiple times from the main memory, it is instead
brought into the cache once and re-used multiple times. Intel allocates about half of the chip to
cache, with the largest LLC (last-level cache) being 30MB in size. IBM\'s new Power8 CPU has
an even larger L3 cache of up to 96MB [4]. By contrast, the largest L2 cache in NVIDIA\'s
GPUs is only 1.5MB.These different hardware design choices are motivated by careful
consideration of the range of applications being run by typical users.
One complication which has become more common and more important in the past few years is
non-uniform memory access. Ten years ago, most shared-memory multiprocessors would have
several CPUs sharing a memory bus to access a single main memory. A final comment on the
memory subsystem concerns the energy cost of moving data compared to performing a single
floating point computation.
(2) Processors
CPUs had a single processing core, and the increase in performance came partly from an increase
in the number of computational pipelines, but mainly through an increase in clock frequency.
Unfortunately, the power consumption is approximately proportional to the cube of the
frequency and this led to CPUs with a power consumption of up to 250W.CPUs address memory
bandwidth limitations by devoting half or more of the chip to LLC, so that small applications can
be held entirely within the cache. They address the 200-cycle latency issue by using very
complex cores which are capable of out-of-order execution , By contrast, GPUs adopt a very
different design philosophy because of the different needs of the graphical applications they
target. A GPU usually has a number of functional u.
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICESIAEME Publication
Hardware looping is a feature of some processor instruction sets whose hardware can repeat the body of a loop automatically, rather than requiring software instructions which take up cycles (and therefore time) to do so. Loop Unrolling is a loop transformation technique that attempts to advance a program's execution speed to the detriment of its twofold size, which is a methodology known as space–time tradeoff. A convolutional neural network is created with simple loops, with hardware looping, with loop unrolling and with both hardware looping and loop unrolling, and a comparison is made to evaluate the effectiveness of hardware looping and loop unrolling. The hardware loops alone will add to a cycle check decline, while the mix of hardware loops and dot product instructions will decrease the clock cycle tally further. The CNN is simulated on Xilinx Vivado 2021.1 running on Zync-7000 FPGA.
Similar to Hardware/Software Co-Design for Efficient Microkernel Execution (20)
RISC-V is the most recent attempt (originally from UC Berkeley) to design a brand new instruction set architecture based on the reduced instruction set computing (RISC) principles. One of its goals is to be completely open and free (both as in free beer and as in free speech) for designers, users and manufacturers. HelenOS is an open source operating system designed and implemented from scratch based on the microkernel multiserver design principles. One of its goals is to provide excellent target platform portability and it currently supports 8 different hardware platforms.
Both projects are still in the process of maturing: While the unprivileged (user space) instruction set architecture of RISC-V has been declared stable in 2014, the privileged instruction set architecture is still in a stage of draft and is allowed to change in the future. Likewise, many major design features and building blocks of HelenOS are already in place, but no official commitment to ABI or API stability has been made yet.
This talk introduces both projects, presents the initial lessons learned from porting HelenOS to RISC-V and evaluates the portability of HelenOS on yet another porting effort.
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)Martin Děcký
Some developers of both microkernel and monolithic operating systems view the design of their system as absolutely superior to the other design. This black-white thinking and "holy war" attitude, while understandable to a certain degree, makes it hard to to acknowledge that one size does not necessarily fit all. Rather than striving for an unreachable goal of creating the best operating system design for all possible use cases it is vital to understand and reflect the trade-offs of the use cases at hand. This talk focuses on a few features and properties of the current monolithic operating systems that could be an inspiration for the current microkernel operating systems and vice versa. The talk should also initiate a discussion about some "non-goals" of microkernel operating systems that are nevertheless sometimes presented as goals of microkernel operating systems, to the detriment of its own cause.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 2
Who Am I
Passionate programmer and operating systems enthusiast
With a specific inclination towards multiserver microkernels
HelenOS developer since 2004
Research Scientist from 2006 to 2018
Charles University (Prague), Distributed Systems Research Group
Senior Research Engineer since 2017
Huawei Technologies (Munich), German Research Center, Central
Software Institute, OS Kernel Lab
3. 3Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Microkernel Multiserver
Systems are better than
Monolithic Systems
3
4. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 4
Monolithic OS Design is Flawed
Biggs S., Lee D., Heiser G.: The Jury Is In: Monolithic OS Design Is
Flawed: Microkernel-based Designs Improve Security, ACM 9th Asia-
Pacific Workshop on Systems (APSys), 2018
“While intuitive, the benefits of the small TCB have not been quantified to
date. We address this by a study of critical Linux CVEs, where we examine
whether they would be prevented or mitigated by a microkernel-based
design. We find that almost all exploits are at least mitigated to less than
critical severity, and 40 % completely eliminated by an OS design based
on a verified microkernel, such as seL4.”
5. 5Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Problem Statement5
6. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 6
Problem Statement
Microkernel design ideas go as back as 1969
RC 4000 Multiprogramming System nucleus (Per Brinch Hansen)
Isolation of unprivileged processes, inter-process communication,
hierarchical control
Even after 50 years they are not fully accepted as mainstream
Hardware and software used to be designed independently
Designing CPUs used to be an extremely complicated and costly process
Operating systems used to be written after the CPUs were designed
Hardware designs used to be rather conservative
7. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 7
Problem Statement (2)
Mainstream ISAs used to be designed in a rather conservative way
Can you name some really revolutionary ISA features since IBM
System/370 Advanced Function?
Requirements on the new ISAs usually follow the needs of the
mainstream operating systems running on the past ISAs
No wonder microkernels suffer performance penalties compared to
monolithic systems
The more fine-grained the architecture, the more penalties it suffers
Let us design the hardware with microkernels in mind!
8. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 8
The Vicious Cycle
CPUs do not support
microkernels properly
9. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 9
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels suffer
perfromance penalties
10. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 10
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels are not
in the mainstream
Microkernels suffer
perfromance penalties
11. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 11
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels are not
in the mainstream
Microkernels suffer
perfromance penalties
No requirements on
CPUs from microkernels
12. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 12
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels are not
in the mainstream
Microkernels suffer
perfromance penalties
No requirements on
CPUs from microkernels
13. 13Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Any Ideas?
14. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 14
Communication between Address Spaces
Control and data flow between subsystems
Monolithic kernel
Function calls
Passing arguments in registers and on the stack
Passing direct pointers to memory structures
Multiserver microkernel
IPC via microkernel syscalls
Passing arguments in a subset of registers
Privilege level switch, address space switch
Scheduling (in case of asynchronous IPC)
Data copying or memory sharing with page granularity
15. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 15
Communication between Address Spaces (2)
Is the kernel round-trip of the IPC necessary?
Suggestion for synchronous IPC: Extended Jump/Call and Return instructions
that also switch the address space
Communicating parties identified by a “call gate” (capability) containing the target
address space and the PC of the IPC handler (implicit for return)
Call gates stored in a TLB-like hardware cache (CLB)
CLB populated by the microkernel similarly to TLB-only memory management
architecture
Suggestion for asynchronous IPC: Using CPU cache lines as the buffers for the
messages
Async Jump/Call, Async Return and Async Receive instructions
Using the CPU cache like an extended register stack engine
16. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 16
Communication between Address Spaces (3)
Bulk data
Observation: Memory sharing is actually quite efficient for large amounts
of data (multiple pages)
Overhead is caused primarily by creating and tearing down the shared
pages
Data needs to be page-aligned
Sub-page granularity and dynamic data structures
Suggestion: Using CPU cache lines as shared buffers
Much finer granularity than pages (typically 64 to 128 bytes)
A separate virtual-to-cache mapping mechanism before the standard
virtual-to-physical mapping
17. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 17
Fast Context Switching
Current microsecond-scale latency hiding mechanisms
Hardware multi-threading
Effective
Does not scale beyond a few threads
Operating system context switching
Scales for any thread count
Too slow (order of 10 µs)
Goal: Finding a sweet spot between the two mechanisms
18. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 18
Fast Context Switching (2)
Suggestion: Hardware cache for contexts
Again, similar mechanism to TLB-only memory management
Dedicated instructions for context store, context restore, context switch, context
save, context load
Context data could be potentially ABI-optimized
Autonomous mechanism for event-triggered context switch (e.g. external
interrupt)
Efficient hardware mechanism for latency hiding
The equivalent of fine/coarse-grained simultaneous multithreading
The software scheduler is in charge of setting the scheduler policy
The CPU is in charge of scheduling the contexts based on ALU, cache and other resource
availability
19. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 19
User Space Interrupt Processing
Extension of the fast context switching mechanism
Efficient delivery of interrupt events to user space device drivers
Without the routine microkernel intervention
An interrupt could be directly handled by a preconfigured hardware context in
user space
A clear path towards moving even the timer interrupt handler and the scheduler from
kernel space to user space
Going back to interrupt-driven handling of peripherals with extreme low latency
requirements (instead of polling)
The usual pain point: Level-triggered interrupts
Some coordination with the platform interrupt controller is probably needed
to automatically mask the interrupt source
20. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 20
Capabilities as First-Class Entities
Capabilities as unforgeable object identifiers
But eventually each access to an object needs to be bound-checked and
translated into the (flat) virtual address space
Suggestion: Embedding the capability reference in pointers
RV128 (128-bit variant of RISC-V) would provide 64 bits for the capability
reference and 64 bits for object offset
128-bit flat pointers are probably useless anyway
Besides the (somewhat narrow) use in the microkernel, this could be useful
for other purposes
Simplifying the implementation of managed languages’ VMs
Working with multiple virtual address spaces at once
21. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 21
Prior Art
Nordström S., Lindh L., Johansson L., Skoglund T.: Application Specific
Real-Time Microkernel in Hardware, 14th IEEE-NPSS Real Time
Conference, 2005
Offloading basic microkernel operations (e.g. thread creation, context
switching) to hardware shown to improve performance by 15 % on
average and up to 73 %
This was a coarse-grained approach
Hardware message passing in Intel SCC and Tilera TILE-G64/TILE-
Pro64
Asynchronous message passing with tight software integration
22. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 22
Prior Art (2)
Hajj I. E,, Merritt A., Zellweger G., Milojicic D., Achermann R., Faraboschi
P., Hwu W., Roscoe T., Schwan K.: SpaceJMP: Programming with Multiple
Virtual Address Spaces, 21st ACM ASPLOS, 2016
Practical programming model for using multiple virtual address spaces on
commodity hardware (evaluated on DragonFly BSD and Barrelfish)
Useful for data-centric applications for sharing large amounts of memory between
processes
Intel IA-32 Task State Segment (TSS)
Hardware-based context switching
Historically, it has been used by Linux
The primary reason for removal was not performance, but portability
23. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 23
Prior Art (3)
Intel VT-x VM Functions (VMFUNC)
Efficient cross-VM function calls
Switching the EPT and passing register arguments
Current implementation limited to 512 entry points
Practically usable even for very fine-grained virtualization with the
granularity of individual functions
Liu Y., Zhou T., Chen K., Chen H., Xia Y.: Thwarting Memory Disclosure with
Efficient Hypervisor-enforced Intra-domain Isolation, 22nd ACM SIGSAC
Conference on Computer and Communications Security, 2015
– “The cost of a VMFUNC is similar with a syscall”
– “… hypervisor-level protection at the cost of system calls”
SkyBridge paper to appear at EuroSys 2019
24. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 24
Prior Art (4)
Woodruff J., Watson R. N. M., Chisnall D., Moore S., Anderson J., Davis B., Laurie
B., Neumann P. G., Norton R., Roe. M.: The CHERI capability model: Revisiting RISC
in the an age of risk, 41st ACM Annual International Symposium on Computer
Architecture, 2014
Hardware-based capability model for byte-granularity memory protection
Extension of the 64-bit MIPS ISA
Evaluated on an extended MIPS R4000 FPGA soft-core
32 capability registers (256 bits)
Limitation: Inflexible design mostly due to the tight backward compatibility with a 64-bit
ISA
Intel MPX
Several design and implementation issues, deemed not production-ready
25. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 25
Summary
Traditionally, hardware has not been designed to accommodate the
requirements of microkernel multiserver operating systems
Microkernels thus suffer performance penalties
This prevented them from replacing monolithic operating systems and closed
the vicious cycle
Hardware design is hopefully becoming more accessible and democratic
E.g. RISC-V
Co-designing the hardware and software might help us gain the benefits
of the microkernel multiserver design with no performance penalties
However, it requires some out-of-the-box thinking
26. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 26
Acknowledgements
OS Kernel Lab at Huawei Technologies
Javier Picorel
Haibo Chen
27. Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 27
Huawei Dresden R&D Lab
Focusing on microkernel research, design and development
Basic research
Applied research
Prototype development
Collaboration with academia and other technology companies
Looking for senior operating system researchers, designers, developers and
experts
Previous microkernel experience is a big plus
“A startup within a large company”
Shaping the future product portfolio of Huawei
Including hardware/software co-design via HiSilicon
28. 28Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Q&A