This document discusses various memory consistency models for distributed shared memory systems. It begins by defining memory coherence and consistency models, which determine when data updates are propagated and acceptable levels of inconsistency. Strict consistency, also called linearizability or atomic consistency, requires the strongest guarantees where any read returns the value from the most recent write. Sequential consistency is a weaker but commonly used model where the result of an execution is equivalent to some sequential ordering of operations. Causal consistency and PRAM/processor consistency are even weaker, requiring certain reads to see causally related or local writes in order but allowing other writes to be seen in different orders. The document provides examples and discusses implementations of these memory consistency models.
This document discusses several hardware memory models including Total Store Order (TSO), Processor Consistency (PC), and Weak Ordering. TSO allows loads to bypass earlier stores to different addresses but maintains order of loads and stores. PC is similar to TSO but does not guarantee write atomicity. Weak Ordering relaxes all instruction ordering and uses synchronization operations like locks and barriers to enforce ordering. The document also describes memory barrier instructions in PowerPC that can be used to enforce ordering between memory accesses.
This document discusses implementing a parallel merge sort algorithm using MPI (Message Passing Interface). It describes the background of MPI and how it can be used for communication between processes. It provides details on the dataset used, MPI functions for initialization, communication between processes, and summarizes the results which show a decrease in runtime when increasing the number of processors.
This document provides an introduction to multiprocessor systems. It describes how multiprocessor systems use multiple processors together to improve performance and speed over uniprocessor systems. Multiprocessor systems can be tightly or loosely coupled. Tightly coupled systems share memory and communication while loosely coupled systems use separate processors connected via a network. The document discusses different interconnection techniques for multiprocessors like bus-oriented, crossbar, and multistage switching systems. It also covers multiprocessor operating systems and their functions in supporting parallel processing across CPUs.
This document discusses various inter-process communication (IPC) types including shared memory, mapped memory, pipes, FIFOs, message queues, sockets, and signals. Shared memory allows processes to directly read and write to the same region of memory, requiring synchronization between processes. Mapped memory permits processes to communicate by mapping the same file into memory. Pipes and FIFOs allow for sequential data transfer between related and unrelated processes. Message queues provide a way for processes to exchange messages via a common queue. Signals are used to asynchronously notify processes of events.
A compiler acts as a translator that converts programs written in high-level human-readable languages into machine-readable low-level languages. Compilers are needed because computers can only understand machine languages, not human languages. A compiler performs analysis and synthesis on a program, breaking the process into phases like scanning, parsing, code generation, and optimization to translate the high-level code into an executable form. The phases include lexical analysis, syntax analysis, semantic analysis, code generation, and optimization.
RPC allows a program to call a subroutine that resides on a remote machine. When a call is made, the calling process is suspended and execution takes place on the remote machine. The results are then returned. This makes the remote call appear local to the programmer. RPC uses message passing to transmit information between machines and allows communication between processes on different machines or the same machine. It provides a simple interface like local procedure calls but involves more overhead due to network communication.
IPC allows processes to communicate and share resources. There are several common IPC mechanisms, including message passing, shared memory, semaphores, files, signals, sockets, message queues, and pipes. Message passing involves establishing a communication link and exchanging fixed or variable sized messages using send and receive operations. Shared memory allows processes to access the same memory area. Semaphores are used to synchronize processes. Files provide durable storage that outlives individual processes. Signals asynchronously notify processes of events. Sockets enable two-way point-to-point communication between processes. Message queues allow asynchronous communication where senders and receivers do not need to interact simultaneously. Pipes create a pipeline between processes by connecting standard streams.
Processes communicate through interprocess communication (IPC) using two main models: shared memory and message passing. Shared memory allows processes to access the same memory regions, while message passing involves processes exchanging messages through mechanisms like mailboxes, pipes, signals, and sockets. Common IPC techniques include semaphores, shared memory, message queues, and sockets that allow processes to synchronize actions and share data in both blocking and non-blocking ways. Deadlocks can occur if processes form a circular chain while waiting for resources held by other processes.
This document discusses several hardware memory models including Total Store Order (TSO), Processor Consistency (PC), and Weak Ordering. TSO allows loads to bypass earlier stores to different addresses but maintains order of loads and stores. PC is similar to TSO but does not guarantee write atomicity. Weak Ordering relaxes all instruction ordering and uses synchronization operations like locks and barriers to enforce ordering. The document also describes memory barrier instructions in PowerPC that can be used to enforce ordering between memory accesses.
This document discusses implementing a parallel merge sort algorithm using MPI (Message Passing Interface). It describes the background of MPI and how it can be used for communication between processes. It provides details on the dataset used, MPI functions for initialization, communication between processes, and summarizes the results which show a decrease in runtime when increasing the number of processors.
This document provides an introduction to multiprocessor systems. It describes how multiprocessor systems use multiple processors together to improve performance and speed over uniprocessor systems. Multiprocessor systems can be tightly or loosely coupled. Tightly coupled systems share memory and communication while loosely coupled systems use separate processors connected via a network. The document discusses different interconnection techniques for multiprocessors like bus-oriented, crossbar, and multistage switching systems. It also covers multiprocessor operating systems and their functions in supporting parallel processing across CPUs.
This document discusses various inter-process communication (IPC) types including shared memory, mapped memory, pipes, FIFOs, message queues, sockets, and signals. Shared memory allows processes to directly read and write to the same region of memory, requiring synchronization between processes. Mapped memory permits processes to communicate by mapping the same file into memory. Pipes and FIFOs allow for sequential data transfer between related and unrelated processes. Message queues provide a way for processes to exchange messages via a common queue. Signals are used to asynchronously notify processes of events.
A compiler acts as a translator that converts programs written in high-level human-readable languages into machine-readable low-level languages. Compilers are needed because computers can only understand machine languages, not human languages. A compiler performs analysis and synthesis on a program, breaking the process into phases like scanning, parsing, code generation, and optimization to translate the high-level code into an executable form. The phases include lexical analysis, syntax analysis, semantic analysis, code generation, and optimization.
RPC allows a program to call a subroutine that resides on a remote machine. When a call is made, the calling process is suspended and execution takes place on the remote machine. The results are then returned. This makes the remote call appear local to the programmer. RPC uses message passing to transmit information between machines and allows communication between processes on different machines or the same machine. It provides a simple interface like local procedure calls but involves more overhead due to network communication.
IPC allows processes to communicate and share resources. There are several common IPC mechanisms, including message passing, shared memory, semaphores, files, signals, sockets, message queues, and pipes. Message passing involves establishing a communication link and exchanging fixed or variable sized messages using send and receive operations. Shared memory allows processes to access the same memory area. Semaphores are used to synchronize processes. Files provide durable storage that outlives individual processes. Signals asynchronously notify processes of events. Sockets enable two-way point-to-point communication between processes. Message queues allow asynchronous communication where senders and receivers do not need to interact simultaneously. Pipes create a pipeline between processes by connecting standard streams.
Processes communicate through interprocess communication (IPC) using two main models: shared memory and message passing. Shared memory allows processes to access the same memory regions, while message passing involves processes exchanging messages through mechanisms like mailboxes, pipes, signals, and sockets. Common IPC techniques include semaphores, shared memory, message queues, and sockets that allow processes to synchronize actions and share data in both blocking and non-blocking ways. Deadlocks can occur if processes form a circular chain while waiting for resources held by other processes.
Operating system 08 time sharing and multitasking operating systemVaibhav Khanna
Time sharing, or multitasking, is a logical extension of multiprogramming.
Multiple jobs are executed by the CPU switching between them, but the switches occur so frequently that the users may interact with each program while it is running.
An interactive, or hands-on, computer system provides on-line communication between the user and the system.
The user gives instructions to the operating system or to a program directly, and receives an immediate response.
Independent processes operate concurrently without affecting each other, while cooperating processes can impact one another. Inter-process communication (IPC) allows processes to share information, improve computation speed, and share resources. The two main types of IPC are shared memory and message passing. Shared memory uses a common memory region for fast communication, while message passing involves establishing communication links and exchanging messages without shared variables. Key considerations for message passing include direct vs indirect communication and synchronous vs asynchronous messaging.
This document provides an outline for a course on Parallel and Distributed Computing. The course is for 3 credit hours and has prerequisites in Operating Systems. It aims to teach students about parallel and distributed computers, writing portable parallel programs using MPI, analytical modeling and performance analysis of parallel programs, and shared memory programming with OpenMP. The course content covers topics such as asynchronous/synchronous computation, concurrency control, fault tolerance, GPU programming, load balancing, memory models, message passing with MPI, parallel algorithms and architectures, performance analysis, programming models, scheduling, storage systems, synchronization, and tools for parallel and distributed systems. The teaching methodology incorporates lectures, assignments, labs, projects and presentations. Students are assessed through exams, assignments, qu
Synchronization in distributed computingSVijaylakshmi
Synchronization in distributed systems is achieved via clocks. The physical clocks are used to adjust the time of nodes. Each node in the system can share its local time with other nodes in the system. The time is set based on UTC (Universal Time Coordination).
Hello....
Dear views
Scheduling is most important Role in OS..... in this ppt i described very Creatively about Process Scheduling...... I hope you like it..... and easily understand it...... :-) :-)
Chapter 2 Operating System Structures.pptErenJeager20
The document discusses various operating system structures and concepts. It describes different types of operating systems including batch, time-sharing, distributed, and real-time operating systems. It discusses concepts like multiprocessing, multitasking, spooling, and how operating systems provide services to users and processes. The document also covers system calls, different approaches to structuring operating systems like layered, microkernel-based, and modular structures. Popular operating systems like UNIX, Linux, Windows, Mac OS X, iOS, and Android are discussed in terms of their architectural approaches.
Concept of processes, process scheduling, operations on processes, inter-process communication,
communication in Client-Server-Systems, overview & benefits of threads.
The document summarizes key aspects of the MAC (Media Access Control) layer. It discusses how the MAC layer provides MAC addressing using unique identifiers for each device and provides multiple access to allow multiple devices to share the same communication channel. It describes different multiple access protocols like random access, CSMA, polling, and channelization methods including FDMA, TDMA, and CDMA that control how devices access and share the channel.
This document summarizes key aspects of the transport layer:
- The transport layer provides logical communication between application processes running on different hosts and handles reliable data transfer.
- It provides both connection-oriented and connectionless services to the application layer. Quality of service parameters like throughput and delay can be negotiated.
- Transport layer protocols like TCP and UDP are described. TCP provides reliable byte-stream delivery using connections while UDP provides best-effort unreliable datagram delivery.
The document discusses various design issues related to interprocess communication using message passing. It covers topics like synchronization methods, buffering strategies, process addressing schemes, reliability in message passing, and group communication. The key synchronization methods are blocking and non-blocking sends/receives. Issues addressed include blocking forever if the receiving process crashes, buffering strategies like null, single-message and finite buffers, and naming schemes like explicit and implicit addressing. Reliability is achieved using protocols like four-message, three-message and two-message. Group communication supports one-to-many, many-to-one and many-to-many communication with primitives for multicast, membership and different ordering semantics.
This document discusses coordination-based distributed systems. It begins with an introduction to coordination models and a taxonomy that categorizes models based on temporal and referential coupling. Traditional architectures like JavaSpaces and TIB/Rendezvous are described, as well as peer-to-peer architectures using gossip-based publish/subscribe. Mobility coordination with Lime is covered. Key aspects of processes, communication, content-based routing, and supporting composite subscriptions in coordination systems are also summarized.
Chapter 3 discusses processes and process scheduling in operating systems. Key points include:
- A process includes the program code, program counter, stack, data, and process state information stored in a process control block (PCB).
- The operating system uses queues like ready queues and I/O queues to schedule processes between running, waiting, and ready states using long-term and short-term schedulers.
- Processes can cooperate through interprocess communication (IPC) using message passing or shared memory. Common IPC examples are producer-consumer problems and client-server systems.
A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.
This document discusses different types of compilers: single pass, two pass, and multipass. Single pass compilers directly transform source code into machine code. Two pass compilers use an intermediate representation (IR) where the front end maps source code to IR and the back end maps IR to machine code. Multipass compilers analyze and change the IR through multiple passes to reduce runtime and ensure high quality code, though they are generally slower than single pass compilers.
The document discusses deadlocks in computer systems. It defines deadlock, presents examples, and describes four conditions required for deadlock to occur. Several methods for handling deadlocks are discussed, including prevention, avoidance, detection, and recovery. Prevention methods aim to ensure deadlocks never occur, while avoidance allows the system to dynamically prevent unsafe states. Detection identifies when the system is in a deadlocked state.
The document discusses multiprocessor and multicore systems. It defines multiprocessors as systems with two or more CPUs sharing full access to common RAM. It describes different hardware architectures for multiprocessors like bus-based, UMA, and NUMA systems. It discusses cache coherence protocols and issues like false sharing. It also covers scheduling and synchronization challenges in multiprocessor systems like load balancing, task assignment, and avoiding priority inversions.
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order to perform several types of physical simulations. In this paper, we associate this
algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is
able to perform various types of simulations on complex geometries. The use of this algorithm combined
with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the
staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.
Operating system 08 time sharing and multitasking operating systemVaibhav Khanna
Time sharing, or multitasking, is a logical extension of multiprogramming.
Multiple jobs are executed by the CPU switching between them, but the switches occur so frequently that the users may interact with each program while it is running.
An interactive, or hands-on, computer system provides on-line communication between the user and the system.
The user gives instructions to the operating system or to a program directly, and receives an immediate response.
Independent processes operate concurrently without affecting each other, while cooperating processes can impact one another. Inter-process communication (IPC) allows processes to share information, improve computation speed, and share resources. The two main types of IPC are shared memory and message passing. Shared memory uses a common memory region for fast communication, while message passing involves establishing communication links and exchanging messages without shared variables. Key considerations for message passing include direct vs indirect communication and synchronous vs asynchronous messaging.
This document provides an outline for a course on Parallel and Distributed Computing. The course is for 3 credit hours and has prerequisites in Operating Systems. It aims to teach students about parallel and distributed computers, writing portable parallel programs using MPI, analytical modeling and performance analysis of parallel programs, and shared memory programming with OpenMP. The course content covers topics such as asynchronous/synchronous computation, concurrency control, fault tolerance, GPU programming, load balancing, memory models, message passing with MPI, parallel algorithms and architectures, performance analysis, programming models, scheduling, storage systems, synchronization, and tools for parallel and distributed systems. The teaching methodology incorporates lectures, assignments, labs, projects and presentations. Students are assessed through exams, assignments, qu
Synchronization in distributed computingSVijaylakshmi
Synchronization in distributed systems is achieved via clocks. The physical clocks are used to adjust the time of nodes. Each node in the system can share its local time with other nodes in the system. The time is set based on UTC (Universal Time Coordination).
Hello....
Dear views
Scheduling is most important Role in OS..... in this ppt i described very Creatively about Process Scheduling...... I hope you like it..... and easily understand it...... :-) :-)
Chapter 2 Operating System Structures.pptErenJeager20
The document discusses various operating system structures and concepts. It describes different types of operating systems including batch, time-sharing, distributed, and real-time operating systems. It discusses concepts like multiprocessing, multitasking, spooling, and how operating systems provide services to users and processes. The document also covers system calls, different approaches to structuring operating systems like layered, microkernel-based, and modular structures. Popular operating systems like UNIX, Linux, Windows, Mac OS X, iOS, and Android are discussed in terms of their architectural approaches.
Concept of processes, process scheduling, operations on processes, inter-process communication,
communication in Client-Server-Systems, overview & benefits of threads.
The document summarizes key aspects of the MAC (Media Access Control) layer. It discusses how the MAC layer provides MAC addressing using unique identifiers for each device and provides multiple access to allow multiple devices to share the same communication channel. It describes different multiple access protocols like random access, CSMA, polling, and channelization methods including FDMA, TDMA, and CDMA that control how devices access and share the channel.
This document summarizes key aspects of the transport layer:
- The transport layer provides logical communication between application processes running on different hosts and handles reliable data transfer.
- It provides both connection-oriented and connectionless services to the application layer. Quality of service parameters like throughput and delay can be negotiated.
- Transport layer protocols like TCP and UDP are described. TCP provides reliable byte-stream delivery using connections while UDP provides best-effort unreliable datagram delivery.
The document discusses various design issues related to interprocess communication using message passing. It covers topics like synchronization methods, buffering strategies, process addressing schemes, reliability in message passing, and group communication. The key synchronization methods are blocking and non-blocking sends/receives. Issues addressed include blocking forever if the receiving process crashes, buffering strategies like null, single-message and finite buffers, and naming schemes like explicit and implicit addressing. Reliability is achieved using protocols like four-message, three-message and two-message. Group communication supports one-to-many, many-to-one and many-to-many communication with primitives for multicast, membership and different ordering semantics.
This document discusses coordination-based distributed systems. It begins with an introduction to coordination models and a taxonomy that categorizes models based on temporal and referential coupling. Traditional architectures like JavaSpaces and TIB/Rendezvous are described, as well as peer-to-peer architectures using gossip-based publish/subscribe. Mobility coordination with Lime is covered. Key aspects of processes, communication, content-based routing, and supporting composite subscriptions in coordination systems are also summarized.
Chapter 3 discusses processes and process scheduling in operating systems. Key points include:
- A process includes the program code, program counter, stack, data, and process state information stored in a process control block (PCB).
- The operating system uses queues like ready queues and I/O queues to schedule processes between running, waiting, and ready states using long-term and short-term schedulers.
- Processes can cooperate through interprocess communication (IPC) using message passing or shared memory. Common IPC examples are producer-consumer problems and client-server systems.
A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.
This document discusses different types of compilers: single pass, two pass, and multipass. Single pass compilers directly transform source code into machine code. Two pass compilers use an intermediate representation (IR) where the front end maps source code to IR and the back end maps IR to machine code. Multipass compilers analyze and change the IR through multiple passes to reduce runtime and ensure high quality code, though they are generally slower than single pass compilers.
The document discusses deadlocks in computer systems. It defines deadlock, presents examples, and describes four conditions required for deadlock to occur. Several methods for handling deadlocks are discussed, including prevention, avoidance, detection, and recovery. Prevention methods aim to ensure deadlocks never occur, while avoidance allows the system to dynamically prevent unsafe states. Detection identifies when the system is in a deadlocked state.
The document discusses multiprocessor and multicore systems. It defines multiprocessors as systems with two or more CPUs sharing full access to common RAM. It describes different hardware architectures for multiprocessors like bus-based, UMA, and NUMA systems. It discusses cache coherence protocols and issues like false sharing. It also covers scheduling and synchronization challenges in multiprocessor systems like load balancing, task assignment, and avoiding priority inversions.
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order to perform several types of physical simulations. In this paper, we associate this
algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is
able to perform various types of simulations on complex geometries. The use of this algorithm combined
with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the
staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.
PROBABILISTIC DIFFUSION IN RANDOM NETWORK G...ijfcstjournal
In this paper, we consider a random network such that there could be a link between any two nodes in the network with a certain probability (plink). Diffusion is the phenomenon of spreading information throughout the network, starting from one or more initial set of nodes (called the early adopters). Information spreads along the links with a certain probability (pdiff). Diffusion happens in rounds with the first round involving the early adopters. The nodes that receive the information for the first time are said to be covered and
become candidates for diffusion in the subsequent round. Diffusion continues until all the nodes in the network have received the information (successful diffusion) or there are no more candidate nodes to spread the information but one or more nodes are yet to receive the information (diffusion failure). On the basis of exhaustive simulations conducted in this paper, we observe that for a given plink and pdiff values, the fraction of successful diffusion attempts does not appreciably change with increase in the number of early
adopters; whereas, the average number of rounds per successful diffusion attempt decreases with increase
in the number of early adopters. The invariant nature of the fraction of successful diffusion attempts with increase in the number of early adopters for a random network (for fixed plink and pdiff values) is an interesting and noteworthy observation (for further research) and it has not been hitherto reported in the literature.
The researcher focuses on studying fundamental tradeoffs between cache-obliviousness, cache-optimality, and parallelism of algorithms and data structures. Their work combines theory and experiments on topics like stencil computation, dynamic programming, and numerical algorithms. Recent work showed that optimal time and cache complexity can be achieved simultaneously for problems like longest common subsequence via a "cache-oblivious wavefront" scheduling technique. Open questions remain about applying this approach more broadly and understanding tradeoffs between time and cache complexity.
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
Direct Code Execution (DCE) is a userspace kernel network stack that allows running real network stack code in a single process. DCE provides a testing platform that enables reproducible testing, fine-grained parameter tuning, and a development framework for network protocols. It achieves this through a virtualization core layer that runs multiple network nodes within a single process, a kernel layer that replaces the kernel with a shared library, and a POSIX layer that redirects system calls to the kernel library. This allows full control and observability for testing and debugging the network stack.
This document describes two distributed-memory parallelization schemes for efficiently parallelizing an explicit time-domain volume integral equation solver on the IBM Blue Gene/P supercomputer. The first scheme distributes the computationally intensive tested field computations among processors while storing the source field time histories on each processor, requiring all-to-all global communications. The second scheme distributes both the source fields and tested field computations, requiring sequential global communications. Numerical results show that both schemes scale well on Blue Gene/P, and the second more memory-efficient scheme allows solving problems with up to 3 million unknowns without acceleration. The parallel solver is demonstrated on the problem of light scattering from a red blood cell.
Fundamentals Of Transaction Systems - Part 2: Certainty suppresses Uncertaint...Valverde Computing
The document discusses transaction systems and consistency models. It summarizes that:
- Brewer's CAP theorem states that distributed systems can only achieve two of consistency, availability, and partition tolerance.
- Many financial systems achieve all three by using private networks and 3-phase commit, challenging assumptions of the CAP theorem.
- Workflow systems can help achieve consistency across inconsistent distributed systems by driving them into acceptable states.
The document discusses using a regular expression matching architecture called ReCPU for network intrusion detection systems (NIDS). ReCPU can efficiently match regular expressions in hardware and is well-suited for the high-speed regular expression matching needs of NIDS. It describes the ReCPU architecture, which uses parallel comparators to match multiple characters simultaneously, and how its design can be adapted for NIDS computation.
This document provides information about the course "G53SRP Systems and Real-Time Programming" offered in the 2009-2010 academic year. The course assumes a basic knowledge of Java programming and reviews concurrent programming and real-time concepts in Java. It introduces embedded and real-time programming, interfacing with hardware, and the specific requirements of real-time systems such as scheduling, asynchronous events, and memory management. The course involves weekly lectures and practical sessions and is assessed through a final exam.
This document discusses how work groups are scheduled for execution on GPU compute units. It explains that work groups are broken down into hardware schedulable units known as warps or wavefronts. These group threads together and execute instructions in lockstep. The document covers thread scheduling, effects of divergent control flow, predication, warp voting, and optimization techniques like maximizing occupancy.
Compositional Analysis for the Multi-Resource ServerEricsson
The Multi-Resource Server (MRS) technique has been proposed to enable predictable execution of memory intensive real-time applications on COTS multi-core platforms.
IJCER (www.ijceronline.com) International Journal of computational Engineeri...ijceronline
The document proposes implementing register files in the processor hardware to improve context switching performance in hard real-time systems. Conventionally, context switching involves saving processor registers to external memory, which takes 50-80 clock cycles. The proposed approach saves contexts to register files within the processor, requiring only 4 clock cycles. Software and a small operating system were modified to use new "save context" and "restore context" instructions. Simulation results showed contexts being saved and restored from an internal register file in 2 clock cycles each. Two test applications demonstrated the performance improvement from using internal register files versus external memory for context switching.
Performance comparison of row per slave and rows set per slave method in pvm ...eSAT Journals
Abstract Parallel computing operates on the principle that large problems can often be divided into smaller ones, which are then solved concurrently to save time by taking advantage of non-local resources and overcoming memory constraints. Multiplication of larger matrices requires a lot of computation time. This paper deals with the two methods for handling Parallel Matrix Multiplication. First is, dividing the rows of one of the input matrices into set of rows based on the number of slaves and assigning one rows set for each slave for computation. Second method is, assigning one row of one of the input matrices at a time for each slave starting from first row to first slave and second row to second slave and so on and loop backs to the first slave when last slave assignment is finished and repeated until all rows are finished assigning. These two methods are implemented using Parallel Virtual Machine and the computation is performed for different sizes of matrices over the different number of nodes. The results show that the row per slave method gives the optimal computation time in PVM based parallel matrix multiplication. Keywords: Parallel Execution, Cluster Computing, MPI (Message Passing Interface), PVM (Parallel Virtual Machine) RAM (Random Access Memory).
The document discusses different types of parallel architectures including SISD, SIMD, MISD, and MIMD. SISD refers to a single instruction single data stream and includes traditional uniprocessors. SIMD uses a single instruction on multiple data streams, like in vector processors. MISD has multiple instructions on a single data stream, like systolic arrays. MIMD uses multiple instructions and data streams, including traditional multiprocessors and networks of workstations. The document explores techniques for exploiting parallelism in different architectures and trends towards superscalar designs and instruction-level parallelism. It argues future systems will require even more parallelism to continue improving performance.
In this paper we describe paradigms for building and designing parallel computing machines. Firstly we
elaborate the uniqueness of MIMD model for the execution of diverse applications. Then we compare the
General Purpose Architecture of Parallel Computers with Special Purpose Architecture of Parallel
Computers in terms of cost, throughput and efficiency. Then we describe how Parallel Computer
Architecture employs parallelism and concurrency through pipelining. Since Pipelining improves the
performance of a machine by dividing an instruction into a number of stages, therefore we describe how
the performance of a vector processor is enhanced by employing multi pipelining among its processing
elements. Also we have elaborated the RISC architecture and Pipelining in RISC machines After comparing
RISC computers with CISC computers we observe that although the high speed of RISC computers is very
desirable but the significance of speed of a computer is dependent on implementation strategies. Only CPU
clock speed is not the only parameter to move the system software from CISC to RISC computers but the
other parameters should also be considered like instruction size or format, addressing modes, complexity of
instructions and machine cycles required by instructions. Considering all parameters will give performance
gain . We discuss Multiprocessor and Data Flow Machines in a concise manner. Then we discuss three
SIMD (Single Instruction stream Multiple Data stream) machines which are DEC/MasPar MP-1, Systolic
Processors and Wavefront array Processors. The DEC/MasPar MP-1 is a massively parallel SIMD array
processor. A wide variety of number representations and arithmetic systems for computers can be
implemented easily on the DEC/MasPar MP-1 system. The principal advantages of using such 64×64
SIMD array of 4-bit processors for the implementation of a computer arithmetic laboratory arise out of its
flexibility. After comparison of Systolic Processors with Wave front Processors we found that both of the
Systolic Processors and Wave front Processors are fast and implemented in VLSI. The major drawback of
Systolic Processors is the problem of availability of inputs when clock ticks because of propagation delays
in connection buses. The Wave front Processors combine the Systolic Processor architecture with Data
Flow machine architecture. Although the Wave front processors use asynchronous data flow computing
structure, the timing in the interconnection buses, at input and at output is not problematic..
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATIONPiero Belforte
The basic concepts of two fitting methods suitable for signal and power integrity simulation up to multi-gigabit/sec rates are presented. The traditional method is based on Vector Fitting (VF), a well known technique to approximate complex functions of frequency by a rational polynomial expression in terms of poles and residues. The second is a full time-domain approach mainly based on behavioral models supported by the Digital Wave Simulator.
PWLFIT/DWS advantages over VECTFIT/Spice can be summarized with the 3S acronym: SIMPLICITY, STABILITY and SPEED.
SIMPLICITY because the pwl fitting of a time-domain behavior is a very fast, explicit and intuitive process that doens't need the solution of implicit equations as required by Vector fitting. Time-domain S-parameter of actual devices in matched conditions shows simpler behaviors than the corresponding impedance in the frequency domain.
STABILITY because the use of Digital Wave processing is intrinsically very stable. Extracted pwl behaviors processed by fast convolution within DWS are unconditionally stable if the source behavior is stable. This means that NO numerical conditioning is required. As known Vector Fitting often require numerical conditioning to get stable results.
SPEED: time-domain pwl fitting is a very fast process. DWS simulations are also very fast even at very small time steps required by multigigabit system analysis. DWS/SPICE typical speedups are 100X for traditional VF derived RLC-TL circuits and up to 10000X when using pwl Behavioral Models in time domain.
This document summarizes key aspects of GPU hardware and the SIMT (Single Instruction Multiple Thread) architecture used in NVIDIA GPUs. It describes the evolution of NVIDIA GPU hardware, the differences between latency-oriented CPUs and throughput-oriented GPUs, how SIMT combines SIMD and threading, warp scheduling, divergence and convergence, predicated and conditional execution.
This document compares two methods for parallel matrix multiplication using PVM (Parallel Virtual Machine): the row per slave method and the rows set per slave method. It finds that the row per slave method provides optimal computation time. The row per slave method assigns each slave a single row from the first matrix to compute, while the rows set per slave method assigns each slave a set of rows. Experimental results on matrices of varying sizes show the row per slave method takes less time, with an average 50% reduction in computation time compared to the rows set per slave method.
Similar to 5.2.2. Memory Consistency Models.pptx (20)
Unit-III Correlation and Regression.pptxAnusuya123
Unit-III describes different types of relationships between variables through correlation and regression analysis. It discusses:
1) Correlation measures the strength and direction of a linear relationship between two variables on a scatter plot. Positive correlation means variables increase together, while negative correlation means one increases as the other decreases.
2) Regression analysis uses independent variables to predict outcomes of a dependent variable. A regression line minimizes the squared errors between predicted and actual values.
3) The correlation coefficient r and coefficient of determination r-squared quantify the strength and direction of linear relationships, with values between -1 and 1. Extreme scores on one measurement tend to regress toward the mean on subsequent measurements.
This document discusses different types of data and variables that are used in statistical analysis. It describes three main types of data: qualitative data which uses words, letters or codes to represent categories; ranked data which uses numbers to represent relative standing; and quantitative data which uses numbers to represent amounts or counts. Variables can be independent, dependent, discrete, continuous or confounding. The document also provides guidelines for describing data using tables, graphs, frequencies, relative frequencies and percentiles.
Basic Statistical Descriptions of Data.pptxAnusuya123
This document provides an overview of 7 basic statistical concepts for data science: 1) descriptive statistics such as mean, mode, median, and standard deviation, 2) measures of variability like variance and range, 3) correlation, 4) probability distributions, 5) regression, 6) normal distribution, and 7) types of bias. Descriptive statistics are used to summarize data, variability measures dispersion, correlation measures relationships between variables, and probability distributions specify likelihoods of events. Regression models relationships, normal distribution is often assumed, and biases can influence analyses.
Data warehousing involves integrating data from multiple sources into a single database to support analysis and decision making. It includes cleaning, integrating, and consolidating data. A data warehouse is subject-oriented, integrated, non-volatile, and time-variant. It differs from a transactional database by collecting extensive data for analytics rather than real-time transactions. A typical architecture includes data storage, an OLAP server for analysis, and front-end tools. Data is mined for patterns to devise sales and profit strategies. There are three main types: an enterprise data warehouse serving the whole organization, an operational data store refreshing in real-time, and departmental data marts.
Unit 1-Data Science Process Overview.pptxAnusuya123
The document outlines the six main steps of the data science process: 1) setting the research goal, 2) retrieving data, 3) data preparation, 4) data exploration, 5) data modeling, and 6) presentation and automation. It focuses on describing the data preparation step, which involves cleansing data of errors, integrating data from multiple sources, and transforming data into a usable format through techniques like data cleansing, transformations, and integration.
This document provides an introduction to data science, including defining data science, discussing the different types of data (structured, unstructured, natural language, machine-generated, graph-based, audio/video/images, and streaming) and tools used (Python, R, SQL, Hadoop, Spark). It also discusses benefits and uses of data science across industries and gives examples to illustrate each type of data.
This document discusses the Chord peer-to-peer protocol. Chord uses a distributed hash table to map keys to nodes, where both node IDs and data keys are mapped to the same identifier space. It maintains routing tables with O(log n) entries to allow lookups to be performed in O(log n) hops. Chord provides efficient routing as nodes join and leave the network, with only O(1) keys needing to be redistributed on average when a node fails or departs.
This document provides an overview of descriptive statistics techniques for summarizing and describing data, including both categorical and quantitative variables. It discusses frequency distributions, histograms, stem-and-leaf plots, numerical descriptions of center and variability (mean, median, standard deviation), bivariate descriptions using tables, scatterplots and correlation, and simple linear regression. The goal of descriptive statistics is to organize and summarize sample data in order to make inferences about the corresponding population parameters.
This document provides an overview of foundations of data science. It discusses how data science draws from disciplines like statistics, computing, and domain knowledge. Statistics are a central component and help make conclusions from incomplete information. Computing allows applying analysis techniques to large datasets through programming. Domain knowledge helps ask appropriate questions of data and correctly interpret answers. The document also discusses statistical techniques used in data science like hypothesis testing, estimation, and prediction. It describes how data science goes beyond statistics by leveraging computing, visualization, machine learning, and access to large datasets. Key tools recommended include Python, IPython, Jupyter notebooks, and real-world publicly available datasets. The course structure and outcomes focus on understanding statistical foundations, preprocessing raw data, exploratory data
The runtime environment handles the implementation of programming language abstractions on the target machine. It allocates storage for code and data and handles access to variables, procedure linkage and parameter passing. Storage is typically divided into code, static, heap and stack areas. The compiler generates code that maps logical addresses to physical addresses. The stack grows downward and stores activation records for procedures, while the heap grows upward and dynamically allocates memory. Procedure activations are represented by activation records on the call stack. The runtime environment implements variable scoping and access to non-local data using techniques like static scopes, access links and displays.
This document describes an active learning activity called Think-Pair-Share that was implemented in a Compiler Design course to help students understand intermediate code generation. Students first thought individually about sample intermediate code questions. They then discussed their answers in pairs before several pairs shared their concepts with the class. Most students participated actively and their understanding of intermediate code generation improved through discussing it with their peers. The activity addressed the course outcome of enabling students to understand intermediate code generation and syntax directed translation.
LEX is a tool that allows users to specify a lexical analyzer by defining patterns for tokens using regular expressions. The LEX compiler transforms these patterns into a transition diagram and generates C code. It takes a LEX source program as input, compiles it to produce lex.yy.c, which is then compiled with a C compiler to generate an executable that takes an input stream and returns a sequence of tokens. LEX programs have declarations, translation rules that map patterns to actions, and optional auxiliary functions. The actions are fragments of C code that execute when a pattern is matched.
The document discusses various operators in Python including arithmetic, comparison, bitwise, logical, and membership operators. It provides examples of using each operator and explains their functionality. The key types of operators covered are arithmetic (e.g. +, -, *, /), comparison (e.g. ==, !=, >, <), bitwise (e.g. &, |, ^), logical (e.g. and, or, not), and membership (e.g. in, not in) operators. It also discusses operator precedence and provides examples of expressions using different operators.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
5.2.2. Memory Consistency Models.pptx
1. III Year –CSE
Regulation 2017
Department Computer Science &
Engineering
Ramco Institute of Technology
1
CS8603 DISTRIBUTED SYSTEMS
RIT/CSE/CS8603-DS/UNIT V
2. 5.2. Distributed Shared Memory
5.2.2. Memory Consistency
Models
2
UNIT V
Source:
Ajay D Kshemkalyani & Mukesh Singhal (2010). Distributed
Computing: Principles, Algorithms and Systems. Cambridge
University Press
RIT/CSE/CS8603-DS/UNIT V
4. 5.2.2. Memory Consistency Models
RIT/CSE/CS8603-DS/UNIT V
4
Memory coherence is the ability of the system to
execute memory operations correctly.
Memory consistency models determine when
data updates are propagated and what level of
inconsistency is acceptable.
For example, assume the following:
n processes and si memory operations per process
Pi.
All the operations issued by a process are executed
sequentially.
We will get (s1 + s2 + . . . sn )!/(s1!s2! . . . sn !)
possible interleaving
5. 5.2.2. Memory Consistency Models
Contd..
RIT/CSE/CS8603-DS/UNIT V
5
Memory coherence model defines which interleavings
are permitted.
Traditionally, Read returns the value written by the
most recent Write ”Most recent” Write is ambiguous
with replicas and concurrent accesses
DSM consistency model is a contract between DSM
system and application programmer
Hence, a clear definition of correctness is required in
such a system.
The DSM system enforces a particular memory
consistency model
https://www.youtube.com/watch?v=uAqIa-mtjJ4
6. 5.2.2. Memory Consistency Models
Contd…
RIT/CSE/CS8603-DS/UNIT V
6
To use DSM, one must also implement a distributed
synchronization service.
It includes the use of locks, semaphores, and
message passing.
Most implementations, data is read from local copies
of the data but updates to data must be propagated to
other copies of the data.
Some of the memory consistency models are:
Strict Consistency/ Atomic Consistency/Linearizability
Sequential consistency
Casual Consistency
PRAM (Pipelined RAM) / Processor Consistency
Slow Memory
7. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
7
Strict Consistency/ Atomic Consistency/Linearizability
It is corresponding to the notion of correctness on the
traditional Von Neumann architecture or the uni-
processor machine.
Any Read to a location (variable) should return the
value written by the most recent Write to that location
(variable).
Two Salient Feature:
(i) a common global time axis is implicitly available in a
uni-processor system
(ii) each write is immediately visible to all processes.
Adapting this correctness model to a DSM system
with operations that can be concurrently issued by the
various processes gives the strict consistency model,
also known as the atomic consistency model
8. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
8
Strict Consistency/ Atomic Consistency/Linearizability
It can be formally specified as follows:
1. Any Read to a location (variable) is required to return
the value written by the most recent Write to that location
(variable) as per a global time reference.
For operations that do not overlap as per the global time
reference, the specification is clear.
For operations that overlap as per the global time
reference, the following further specifications are
necessary.
2. All operations appear to be executed atomically and
sequentially.
3. All processors see the same ordering of events, which
is equivalent to the global-time occurrence of non-
overlapping events.
9. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
9
Strict Consistency/ Atomic Consistency/Linearizability
An alternate way to specify this consistency model is in
terms of invocation and response to each read and write
operation.
Each operation takes a finite time interval
Hence, different operations by different processors can
overlap in time.
However, the invocation and the response to each
invocation can both be separately viewed as being atomic
events.
An execution sequence in global time is viewed as a
sequence Seq of such invocations and responses.
Clearly, Seq must satisfy the following conditions:
(Liveness:) Each invocation must have a corresponding
response.
(Correctness:) The projection of Seq on any processor i,
10. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
10
Strict Consistency/ Atomic Consistency/Linearizability
(Example)
Figure (a) The execution is not linearizable because
although the Read by P2 begins after Write x 4, the Read
returns the value that existed before the Write. Hence, a
permutation Seq satisfying the condition 2 above on global
time order does not exist.
Figure (b) The execution is linearizable. The global order
of operations (corresponding to invocation, response pairs
in Seq), consistent with the real-time occurrence, is: Write
y 2, Write x 4, Read x 4, Ready 2. This permutation Seq
satisfies conditions 1 and 2.
Figure (c) The execution is not linearizable. The two
dependencies: Read x 0 before Write x 4, and Read y 0
before Write x 2 cannot both be satisfied in a global order
while satisfying the local order of operations at each
processor. Hence, there does not exist any permutation
Seq satisfying conditions 1 and 2.
12. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
12
Strict Consistency/ Atomic
Consistency/Linearizability (Implementation)
Simulating global time axis is expensive.
Assume full replication, and total order broadcast
support.
13. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
13
Sequential Consistency Model(SC)
Linearizability is too strict for most practical
purpose
Strongest memory model for DSM that is used in
practice is sequential consistency.
This is very expensive
Programmers can deal with weaker models.
The first weaker model, that of sequential
consistency (SC) was proposed by Lamport.
It uses logical time reference instead of the global
time reference.
14. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
14
Sequential Consistency Model(SC)
Sequential consistency is specified as follows:
The result of any execution is the same as if all
operations of the processors were executed in
some sequential order.
The operations of each individual processor appear
in this sequence in the local program order.
15. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
15
Sequential Consistency Model(SC)
More formally , a sequence Seq of invocation and
response events is sequentially consistent if there is a
permutation Seq of adjacent pairs of corresponding
invoc resp events satisfying:
1. For every variable v, the projection of Seq on v,
denoted Seq v, is such that every Read (adjacent invoc
resp event pair) returns the most recent Write (adjacent
invoc resp event pair) that immediately preceded it.
2. If the response op1 resp of operation op1 at process
Pi occurred before the invocation op2 invoc of operation
op2 by process Pi in Seq, then op1 (adjacent invoc resp
event pair) occurs before op2 (adjacent invoc resp event
pair) in Seq.
16. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
16
Sequential Consistency Model(SC)
Condition 1 is the same as that for linearizability.
Condition 2 differs from that for linearizability.
It specifies that the common order Seq must satisfy
only the local order of events at each processor,
instead of the global order of non-overlapping
events.
17. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
17
Sequential Consistency Model(Example)
Figure 12.4(a) The execution is sequentially consistent.
The global order Seq is: Write y 2, Read x 0, Write x 4,
Read y 2.
Figure 12.4(b) As the execution is linearizable (seen in
Section 12.2.1), it is also sequentially consistent. The
global order of operations (corresponding to invocation,
response pairs in Seq), consistent with the real-time
occurrence, is: Write y 2, Write x 4, Read x 4, Read y 2.
Figure 12.4(c) The execution is not sequentially
consistent (and hence not linearizable). The two
dependencies: Read x 0 before Write x 4, and Read y 0
before Write x 2 cannot both be satisfied in a global order
while satisfying the local order of operations at each
19. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
19
Sequential Consistency Model(Implementation)
It should be easier to implement it.
Global time ordering need not be preserved across
processes.
It is sufficient to use total order broadcasts for the
Write operations only.
In the simplified algorithm, no total order broadcast is
required for Read
operations, because:
1. all consecutive operations by the same processor are
ordered in the same order because pipelining is not
used;
2. Read operations by different processors are
independent of each other and need to be ordered only
with respect to the Write operations in the execution.
20. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
20
Implementation of SC using Local
Read Operation:
A Read operation completes
atomically, whereas a Write
operation does not.
Between the invocation of a
Write by Pi (line 1b) and its
acknowledgement (lines 2a,
2b), there may be multiple Write
operations initiated by other
processors that take effect at Pi
(line 2a).
Thus, a Write issued locally has
its completion locally delayed.
Such an algorithm is
acceptable for Read intensive
programs.
21. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
21
Implementation of SC using Local
Write Operation:
For Writeintensive programs, it is
desirable that a locally issued
Write gets acknowledged
immediately (as in lines 2a–2c),
even though the total order
broadcast for the Write, and the
actual update for the Write may
not go into effect by updating the
variable at the same time (line
3a).
The algorithm achieves this at the
cost of delaying a Read operation
by a processor until all previously
issued local Write operations by
that same processor have locally
gone into effect (i.e., previous
Writes issued locally have
updated their local variables being
written to).
22. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
22
Implementation of SC using Local Write Operation:
The variable counter is used to track the number of
Write operations that have been locally initiated but
not completed at any time.
A Read operation completes only if there are no prior
locally initiated Write
operations that have not written to their variables (line
1a), i.e., there are no pending locally initiated Write
operations to any variable.
Otherwise, a Read operation is delayed until after all
previously initiated Write operations have written to
their local variables (lines 3b–3d), which happens
after the total order broadcasts associated with the
Write have delivered the broadcast message locally.
23. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
23
Causal Consistency
Write operations issued by different processors
must necessarily be seen in some common order
by all processors is required in Sequential
consistency model.
It can be relaxed to require only that Writes that
are causally related
It must be seen in that same order by all
processors, whereas “concurrent” Writes may be
seen by different processors in different orders.
The resulting consistency model is the causal
consistency model.
24. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
24
Causal Consistency
The causality relation for shared memory systems
is defined as follows:
Local order At a processor, the serial order of the
events defines the local causal order.
Inter-process order A Write operation causally
precedes a Read operation issued by another
processor if the Read returns a value written by the
Write.
Transitive closure The transitive closure of the
above two relations defines the (global) causal
order.
25. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
25
Causal Consistency (Example)
Figure (a) The execution is sequentially consistent (and
hence causally consistent). Both P3 and P4 see the
operations at P1 and P2 in sequential order and in causal
order.
Figure (b) The execution is not sequentially consistent but
it is causally consistent. Both P3 and P4 see the
operations at P1 and P2 in causal order because the lack
of a causality relation between the Writes by P1 and by P2
allows the values written by the two processors to be seen
in different orders in the system. The execution is not
sequentially consistent because there is no global
satisfying the contradictory ordering requirements set by
the Reads by P3 and by P4.
Figure (c) The execution is not causally consistent
because the second Read by P4 returns 4 after P4 has
already returned 7 in an earlier Read.
27. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
27
PRAM (pipelined RAM) or processor consistency
Causal consistency requires all causally related
Writes to be seen in the same order by all
processors
It requires more restriction on application.
Only Write ops issued by the same processor are
seen by others in the order they were issued, but
Writes from different processors may be seen by
other processors in different orders.
All operations issued by any processor appear to
the other processors in a FIFO pipelined
sequence.
28. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
28
PRAM (pipelined RAM) or processor
consistency(Example)
In the previous Figure (c), the execution is PRAM
consistent (even though it is not causally
consistent) because (trivially) both P3 and P4 see
the updates made by P1 and P2 in FIFO order
along the channels P1 to P3 and P2 to P3, and
along P1 to P4 and P2 to P4, respectively.
29. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
29
PRAM (pipelined RAM) or processor
consistency(Implementation)
PRAM consistency can be implemented using
FIFO broadcast
30. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
30
Slow memory
It is the next weaker consistency model.
It represents a location-relative weakening of the
PRAM model.
Here, only all Write operations issued by the
same processor and to the same memory
location must be observed in the same order by
all the processors
31. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
31
Slow memory(Example)
Figure (a) The updates to each of the variables are
seen pipelined separately in a FIFO fashion. The “x”
pipeline from P1 to P2 is slower than the “y” pipeline
from P1 to P2. Thus, the overtaking effect is allowed.
However, PRAM consistency is violated because the
FIFO property is violated over the single common
“pipeline” from P1 to P2 – the update to y is seen by
P2 but the much older value of x = 0 is seen by P2
later.
Figure (b) Slow memory consistency is violated
because the FIFO property is violated for the pipeline
for variable x. “x = 7” is seen by P2 before it sees “x
=0” and “x =2” although 7 was written to x after the
33. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
33
Slow memory(Implementation)
Slow memory can be implemented using a
broadcast primitive
FIFO property should be satisfied only for
updates to the same variable.
35. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
35
Other models based on synchronization instructions
The behavior of DSM differ based on the consistency
model.
The programmer’s logic also depends on the
underlying consistency model.
Consistency conditions apply only to special
”synchronization” instructions, e.g., barrier
synchronization
Non-sync statements may be executed in any order
by various processors
Some of the other consistency models are:
Weak Consistency
Release Consistency
Entry Consistency
36. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
36
Other models based on synchronization instructions-
Weak Consistency Model
All Writes are propagated to other processes, and all
Writes done elsewhere are brought locally, at a sync
instruction.
Properties
Accesses to sync variables are sequentially consistent
Access to sync variable is not permitted unless all Writes
elsewhere have completed
No data access is allowed until all previous
synchronization variable accesses have been
performed
Drawback
It cannot tell whether beginning access to shared
variables (enter CS), or finished access to shared
variables (exit CS)
37. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
37
Other models based on synchronization instructions-
Release Consistency Model
Drawback of Weak Consistency Model:
When a synchronization variable is accessed, the
memory does not know whether this is being done
because the process is finished writing the shared
variables (exiting the CS) or about to begin reading them
(entering the CS)
Release consistency provides these two kinds.
Acquire accesses are used to tell the memory system
that a critical region is about to be entered. Hence, the
actions for case 2 above need to be performed to ensure
that local replicas of variables are made consistent with
remote ones.
Release accesses say that a critical region has just been
exited.
38. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
38
Other models based on synchronization
instructions- Release Consistency Model
Rules are followed by the protected variables:
All previously initiated Acquire operations must
complete successfully before a process can access
a protected shared variable.
All accesses to a protected shared variable must
complete before a Release operation can be
performed.
The Acquire and Release operations effectively
follow the PRAM consistency model.
39. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
39
Other models based on synchronization
instructions- Entry Consistency Model
Each ordinary shared variable is associated with
a synchronization variable (e.g., lock, barrier)
For Acquire /Release on a synchronization
variable, access to only those ordinary variables
guarded by the synchronization variables is
performed.