The document provides information on different types of computer system architectures including SISD, SIMD, MIMD, and MISD. It discusses the key characteristics of each architecture such as SISD involving a single processor executing a single instruction stream on data from a single memory. SIMD involves multiple processors executing the same instruction on multiple data streams simultaneously. MIMD involves multiple processors executing different instruction streams on different data simultaneously. Pipelining is described as a technique used to increase instruction throughput by splitting instruction processing into independent stages.
This presentation is a short introduction to issues in Hardware-Software Codesign. It discusses definition of codesign, its significance, design issues in Hardware-software codesign, Abstraction levels, Duality of harware and software
Embedded Systems (18EC62) – Embedded System Components (Module 3)Shrishail Bhat
Lecture Slides for Embedded Systems (18EC62) - Embedded System Components (Module 3) for VTU Students
Contents
Embedded Vs General computing system, Classification of Embedded systems, Major applications and purpose of ES. Elements of an Embedded System (Block diagram and explanation), Differences between RISC and CISC, Harvard and Princeton, Big and Little Endian formats, Memory (ROM and RAM types), Sensors, Actuators, Optocoupler, Communication Interfaces (I2C, SPI, IrDA, Bluetooth, Wi-Fi, Zigbee only)
I2C is a serial protocol for two-wire interface to connect low-speed devices like microcontrollers, EEPROMs, A/D and D/A converters, I/O interfaces and other similar peripherals in embedded systems. It was invented by Philips and now it is used by almost all major IC manufacturers. Each I2C slave device needs an address – they must still be obtained from NXP (formerly Philips semiconductors).
This Presentation describes the ARM CORTEX M3 core processor with the details of the core peripherals. Soon a CORTEX base controller(STM32F100RBT6) ppt will be uploaded. For more information mail me at:gaurav.iitkg@gmail.com.
This presentation discusses the details of the I2C protocol and interfacing of EEPROM with 8051 based on I2C protocol. It also discusses the other applications of I2C protocol
FPGA are a special form of Programmable logic devices(PLDs) with higher densities as compared to custom ICs and capable of implementing functionality in a short period of time using computer aided design (CAD) software....by mathewsubin3388@gmail.com
This presentation is a short introduction to issues in Hardware-Software Codesign. It discusses definition of codesign, its significance, design issues in Hardware-software codesign, Abstraction levels, Duality of harware and software
Embedded Systems (18EC62) – Embedded System Components (Module 3)Shrishail Bhat
Lecture Slides for Embedded Systems (18EC62) - Embedded System Components (Module 3) for VTU Students
Contents
Embedded Vs General computing system, Classification of Embedded systems, Major applications and purpose of ES. Elements of an Embedded System (Block diagram and explanation), Differences between RISC and CISC, Harvard and Princeton, Big and Little Endian formats, Memory (ROM and RAM types), Sensors, Actuators, Optocoupler, Communication Interfaces (I2C, SPI, IrDA, Bluetooth, Wi-Fi, Zigbee only)
I2C is a serial protocol for two-wire interface to connect low-speed devices like microcontrollers, EEPROMs, A/D and D/A converters, I/O interfaces and other similar peripherals in embedded systems. It was invented by Philips and now it is used by almost all major IC manufacturers. Each I2C slave device needs an address – they must still be obtained from NXP (formerly Philips semiconductors).
This Presentation describes the ARM CORTEX M3 core processor with the details of the core peripherals. Soon a CORTEX base controller(STM32F100RBT6) ppt will be uploaded. For more information mail me at:gaurav.iitkg@gmail.com.
This presentation discusses the details of the I2C protocol and interfacing of EEPROM with 8051 based on I2C protocol. It also discusses the other applications of I2C protocol
FPGA are a special form of Programmable logic devices(PLDs) with higher densities as compared to custom ICs and capable of implementing functionality in a short period of time using computer aided design (CAD) software....by mathewsubin3388@gmail.com
Semiconductor Hubs for Research & InnovationZinnov
The semiconductor industry has evolved significantly in the last 50 years. While in early 60s, US was the clear market leader, by the 90s the semiconductor industry in Taiwan, Singapore and Korea posed a competitive threat to that in the US. Recent times have witnessed other locations in China and India establish themselves firmly on the global semiconductor landscape.
For any innovation hub, the entire ecosystem has to be favorable for growth. This includes access to large skilled talent pool, strong university ecosystem, favorable government policies etc.
Various processor architectures are described in this presentation. It could be useful for people working for h/w selection and processor identification.
Distributed system lectures
Engineering + education purpose
This series of lectures was prepared for the fourth class of computer engineering / Baghdad/ Iraq.
This series is not completed yet, it is just a few lectures in the object.
Forgive me for anything wrong by mistake, I wish you can profit from these lectures
My regard
Marwa Moutaz/ M.Sc. studies of Communication Engineering / University of Technology/ Bagdad / Iraq.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
The timing behavior of the OS must be predictable - services of the OS: Upper bound on the execution time!
2. OS must manage the timing and scheduling
OS possibly has to be aware of task deadlines;
(unless scheduling is done off-line).
3. The OS must be fast
Almaaqal University - Collage of Engineering-Department of Control and Computer Engineering Advanced Computer Architectures, CC408, 4th Year
Advanced Computer Architectures
1. Overview
Computer architecture is a fundamental concept in the field of computer science and engineering. It encompasses the design and structure of computer systems, focusing on how hardware components are organized and interconnected to execute instructions and process data efficiently.
1.1 SISD, MISD, SIMD, MIMD Architectures
Michael Flynn has introduced taxonomy for various computer architectures based on notions of Instruction Streams (IS) and Data Streams (DS). According to this taxonomy, the Computer Architectures could be classified into four categories; Single Instruction Single Data (SISD), Multiple Instruction Single Data (MISD), Single Instruction Multiple Data (SIMD), and Multiple Instruction Multiple Data (MIMD).
Figure 1.1: Flynn Taxonomy of Computer Architecture Let's explore each of them:
Prepared By: Assist. Prof. Dr. Mohammed Al-Ibadi
1
Almaaqal University - Collage of Engineering-Department of Control and Computer Engineering Advanced Computer Architectures, CC408, 4th Year
1. SISD (Single Instruction, Single Data):
SISD architecture is the most basic and traditional form of computing. In this model, a single central processing unit (CPU) executes one instruction at a time on a single piece of data. It's a sequential, linear approach where each instruction is processed one after the other. SISD is commonly found in older, uniprocessor systems.
2. SIMD (Single Instruction, Multiple Data):
In a SIMD architecture, a single instruction is applied simultaneously to multiple data elements. This is achieved through the use of multiple processing units or cores, and each unit processes a different data element simultaneously. SIMD is prevalent in graphics processing units (GPUs) and is well-suited for tasks requiring parallel processing, like image and video processing.
Prepared By: Assist. Prof. Dr. Mohammed Al-Ibadi
2
Almaaqal University - Collage of Engineering-Department of Control and Computer Engineering Advanced Computer Architectures, CC408, 4th Year
3. MISD (Multiple Instruction, Single Data):
MISD architecture is the least common among the four. It involves multiple processing units, each executing its own unique instruction on the same piece of data. This type of architecture has limited practical applications and is often used for experimental or specialized purposes, such as fault-tolerant systems.
4. MIMD (Multiple Instruction, Multiple Data):
MIMD architecture is the most versatile and widely used in modern computing. In MIMD systems, multiple processors or cores independently execute different instructions on separate data. This allows for true parallelism, making it suitable for a wide range of applications, including multi-threaded software, scientific simulations, and distributed computing.
Prepared By: Assist. Prof. Dr. Mohammed Al-I
Parallel computing is computing architecture paradigm ., in which processing required to solve a problem is done in more than one processor parallel way.
Blooms Taxonomy in Engineering EducationA B Shinde
The objective of this presentation is to create awareness among the aspirants regarding the Blooms Taxonomy, how it can be related to define Course objectives and outcomes as well as to assess the students level
This presentation covers the basic guidelines regarding how to face the interview including resume writing, aptitude test, group discussion and facing interview confidently...
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
1. SYSTEM DESIGN
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
PVPIT, Budhgaon.
shindesir.pvp@gmail.com1
2. CONCEPT OF SYSTEM
A system is a collection of elements or components that are organized for a
common purpose.
A system is a set of interacting or interdependent components forming an
integrated design.
A system has structure: it contains parts (or components) that are directly
or indirectly related to each other;
A system has behavior: it exhibits processes that fulfill its function or
purpose;
A system has interconnectivity: the parts and processes are connected by
structural and/or behavioral relationships.
2
3. SYSTEM
Elements of a system
Input: The inputs are said to be fed to the systems in order to get
the output.
Output: Those elements that exists in the system due to the
processing of the inputs is known as output
Processor: It is the operational component of a system which
processes the inputs.
Control: The control element guides the system. It is the decision-
making sub-system that controls the activities like governing
inputs, processing them and generating output.
Boundary and interface: The limits that identify its components,
processes and interrelationships when it interfaces with another
system.
3
4. IMPORTANCE OF SYSTEM ARCHITECTURES
A system architecture is the conceptual model that defines the
structure, behavior (functioning) and more views of a system.
A system architecture can comprise:
system components,
the externally visible properties of those components,
the relationships between them.
It can provide a plan from which products can be procured, and
systems developed, that will work together to implement the overall
system.
4
5. SYSTEM ON CHIP
System-on-a-chip (SoC or SOC) refers to integrating all components
of a computer or other electronic system into a single integrated circuit
(chip).
It may contain digital, analog, or mixed-signal
– all on one semiconductor chip.
5
7. SIMD
Single Instruction Multiple Data (SIMD), is a class of parallel computers in
Flynn's taxonomy.
In computing, SIMD is a technique employed to Achieve data level
parallelism.
7
8. SIMD
SIMD machines are capable of applying the
exact same instruction stream to multiple
streams of data simultaneously.
This type of architecture is perfectly suited to
achieving very high processing rates, as
the data can be split into many different
independent pieces, and the multiple
instruction units can all operate on them at
the same time.
8
For example: each of 64,000 processors in a Thinking
Machines CM-2 would execute the same instruction at the same
time so that you could do 64,000 multiplies on 64,000 pairs of
numbers at a time.
10. SIMD TYPES
Synchronous (lock-step):
These systems are synchronous, meaning that they are built in such a way
as to guarantee that all instruction units will receive the same instruction at
the same time, and thus all will potentially be able to execute the same
operation simultaneously.
Deterministic SIMD architectures:
These are deterministic because, at any one point in time, there is only one
instruction being executed, even though multiple units may be executing it.
So, every time the same program is run on the same data, using the same
number of execution units, exactly the same result is guaranteed at every step
in the process.
Well-suited to instruction/operation level parallelism:
The “single” in single-instruction doesn’t mean that there’s only one
instruction unit, as it does in SISD, but rather that there’s only one instruction
stream, and this instruction stream is executed by multiple processing units
on different pieces of data, all at the same time, thus achieving parallelism.
10
11. SIMD (ADVANTAGES)
An application where the same value is being added (or subtracted) to a large
number of data points, a common operation in many multimedia applications.
One example would be changing the brightness of an image.
To change the brightness, the R G and B values are read from memory, a value is
added (or subtracted) from them, and the resulting values are written back out to
memory.
The data is understood to be in blocks, and a number of values can be loaded
all at once.
Instead of a series of instructions saying "get this pixel, now get the next pixel",
a SIMD processor will have a single instruction that effectively says "get lots of
pixels―. This can take much less time than "getting" each pixel individually, like
with traditional CPU design.
If the SIMD system works by loading up eight data points at once, the add
operation being applied to the data will happen to all eight values at the same
time. 11
12. SIMD (DISADVANTAGES)
Not all algorithms can be vectorized.
Implementing an algorithm with SIMD instructions usually requires
human labor; most compilers don't generate SIMD instructions from a typical
C program, for instance.
Programming with particular SIMD instruction sets can involve numerous
low-level challenges.
It has restrictions on data alignment.
Gathering data into SIMD registers and scattering it to the correct
destination locations is tricky and can be inefficient.
Specific instructions like rotations or three-operand addition aren't in some
SIMD instruction sets.
12
14. SISD
This is the oldest style of computer
architecture, and still one of the most
important: all personal computers fit within this
category
Single instruction refers to the fact that there
is only one instruction stream being acted on
by the CPU during any one clock tick;
single data means, analogously, that one and
only one data stream is being employed as
input during any one clock tick.
14
15. SISD
In computing, SISD is a term referring to
a computer architecture in which a single
processor, (uniprocessor) executes a single
instruction stream, to operate on data stored in
a single memory.
This corresponds to the Von Neumann
Architecture.
Instruction fetching and pipelined execution of
instructions are common examples found
in most modern SISD computers.
15
16. CHARACTERISTICS OF SISD
Serial Instructions are executed one after the other, in lock-step; this
type of sequential execution is commonly called serial, as opposed to
parallel, in which multiple instructions may be processed simultaneously.
Deterministic Because each instruction has a unique place in the
execution stream, and thus a unique time during which it and it alone is
being processed, the entire execution is said to be
deterministic, meaning that you (can potentially) know exactly what is
happening at all times, and, ideally, you can exactly recreate the process, step
by step, at any later time.
Examples:
All personal computers,
All single-instruction-unit-CPU workstations,
Mini-computers, and
Mainframes.
16
18. MIMD
In computing, MIMD is a technique
employed to achieve parallelism.
Machines using MIMD have a number
of processors that function
asynchronously and independently.
At any time, different processors may
be executing different instructions on
different pieces of data.
MIMD architectures may be used in a
number of application areas such as
computer-aided design/computer-
aided manufacturing, simulation,
modeling, and as communication
switches. 18
19. MIMD
MIMD machines can be of either
shared memory or distributed
memory categories.
Shared memory machines
may be of the bus-based,
extended or hierarchical type.
—Distributed memory machines
may have hypercube or mesh
interconnection schemes.
19
20. MIMD: SHARED MEMORY MODEL
The processors are all connected to a "globally available" memory,
via either a software or hardware means. The operating system
usually maintains its memory coherence.
Bus-based:
MIMD machines with shared memory have processors which share a
common, central memory.
Here all processors are attached to a bus which connects them to
memory.
This setup is called bus-base point where there is too much
contention on the bus.
Hierarchical:
MIMD machines with hierarchical shared memory use a hierarchy
of buses to give processors access to each other's memory.
Processors on different boards may communicate through inter-nodal
buses.
Buses support communication between boards.
With this type of architecture, the machine may support over a thousand
processors.
20
21. MIMD: DISTRIBUTED MEMORY MODEL
In distributed memory MIMD machines, each processor has its own
individual memory location. Each processor has no direct
knowledge about other processor's memory.
For data to be shared, it must be passed from one processor to
another as a message. Since there is no shared memory, contention is
not as great a problem with these machines.
It is not economically feasible to connect a large number of processors
directly to each other. A way to avoid this multitude of direct
connections is to connect each processor to just a few others.
The amount of time required for processors to perform simple
message routing can be substantial.
Systems were designed to reduce this time loss and hypercube
and mesh are among two of the popular interconnection schemes.
21
22. MIMD: DISTRIBUTED MEMORY MODEL
Interconnection schemes:
Hypercube interconnection network:
In an MIMD distributed memory machine with a hypercube system
interconnection network containing four processors, a processor and a
memory module are placed at each vertex of a square.
The diameter of the system is the minimum number of steps it takes for
one processor to send a message to the processor that is the farthest
away.
So, for example, In a hypercube system with eight processors and each
processor and memory module being placed in the vertex of a cube, the
diameter is 3. In general, a system that contains 2^N processors with each
processor directly connected to N other processors, the diameter of the
system is N.
Mesh interconnection network:
In an MIMD distributed memory machine with a mesh
interconnection network, processors are placed in a two- dimensional grid.
Each processor is connected to its four immediate neighbors. Wrap
around connections may be provided at the edges of the mesh.
One advantage of the mesh interconnection network over the
hypercube is that the mesh system need not be configured in
powers of two.
22
23. MIMD: CATEGORIES
The most general of all of the major categories, a MIMD machine is
capable of being programmed to operate as if it were in fact any of
the four.
Synchronous or asynchronous MIMD instruction streams can
potentially be executed either synchronously or asynchronously, i.e.,
either in tightly controlled lock-step or in a more loosely bound “do your
own thing” mode.
Deterministic or non-deterministic MIMD systems are potentially
capable of deterministic behavior, that is, of reproducing the exact same
set of processing steps every time a program is run on the same data.
Well-suited to block, loop, or subroutine level parallelism. The more
code each processor in an MIMD assembly is given domain over, the
more efficiently the entire system will operate, in general.
Multiple Instruction or Single Program MIMD-style systems are
capable of running in true “multiple-instruction” mode, with every
processor doing something different, or every processor can be given the
same code; this latter case is called SPMD, “Single Program Multiple
Data”, and is a generalization of SIMD-style parallelism.
23
25. MISD
In computing, MISD is a type of parallel
computing architecture where many
functional units perform different operations
on the same data.
Pipeline architectures belong to this type.
Fault-tolerant computers executing the
same instructions redundantly in order to
detect and mask errors, in a manner known
as task replication, may be considered
to belong to this type.
Not many instances of this
architecture exist, as MIMD and SIMD
are often more appropriate for common data
parallel techniques. 25
26. MISD
Another example of a MISD
process that is carried out routinely
at United Nations.
When a delegate speaks in a
language of his/her choice, his
speech is simultaneously
translated into a number of other
languages for the benefit of
other delegates present. Thus
the delegate‘s speech (a single
data) is being processed by a
number of translators
(processors) yielding different
results.
26
27. MISD
MISD Examples:
Multiple frequency filters operating on a single signal stream.
Multiple cryptography algorithms attempting to crack a single
coded message.
Both of these are examples of this type of processing where
multiple, independent instruction streams are applied simultaneously
to a single data stream.
27
29. PIPELINING
In computing, a pipeline is a set of data processing
elements connected in series, so that the output of one element is
the input of the next one.
The elements of a pipeline are often executed in parallel or in time-
sliced fashion.
29
30. PIPELINING (CONCEPT AND MOTIVATION)
Consider the washing of a car:
A car on the washing line can have only one of the three steps done at
once. After the car has its washing, it moves for drying, leaving the
washing facilities available for the next car.
The first car then moves on to polishing, the second car to drying, and a
third car begins to have its washing.
If each operation needs 30 minutes each, then finishing all three cars
when only one car can be operated at once would take (??????) minutes.
On the other hand, using the washing line, the total time to complete
all three is (?????) minutes. At this point, additional cars will come off the
assembly line.
30
31. PIPELINING (IMPLEMENTATIONS)
Buffered, Synchronous pipelines:
Conventional microprocessors are synchronous circuits that use buffered,
synchronous pipelines. In these pipelines, "pipeline registers" are inserted in-
between pipeline stages, and are clocked synchronously.
Buffered, Asynchronous pipelines:
Asynchronous pipelines are used in asynchronous circuits, and have their
pipeline registers clocked asynchronously. Generally speaking, they use a
request/acknowledge system, wherein each stage can detect when it's finished.
When a stage is finished and the next stage has sent it a "request" signal, the
stage sends an "acknowledge" signal to the next stage, and a "request" signal to
the previous stage. When a stage receives an "acknowledge" signal, it clocks its
input registers, thus reading in the data from the previous stage.
Unbuffered pipelines:
Unbuffered pipelines, called "wave pipelines", do not have registers in-between
pipeline stages.
Instead, the delays in the pipeline are "balanced" so that, for each stage, the
difference between the first stabilized output data and the last is minimized.
31
32. INSTRUCTION PIPELINE
An instruction pipeline is a
technique used in the design of
computers and other digital
electronic devices to increase
their instruction throughput (the
number of instructions that can be
executed in a unit of time).
The fundamental idea is to split
the processing of a computer
instruction into a series of
independent steps, with storage
at the end of each step. This
allows the computer's control
circuitry to issue instructions at the
processing rate of the slowest
step, which is much faster than
the time needed to perform all
steps at once.
32
33. INSTRUCTION PIPELINE
For example, the classic RISC
pipeline is broken into five stages
with a set of flip flops between
each stage.
Instruction fetch
Instruction decode and register
fetch
Execute
Memory access
Register write back
33
34. PIPELINING (ADVANTAGES AND DISADVANTAGES)
Pipelining does not help in all cases. An instruction pipeline is said to be
fully pipelined if it can accept a new instruction every clock cycle. A
pipeline that is not fully pipelined has wait cycles that delay the progress
of the pipeline.
Advantages of Pipelining:
The cycle time of the processor is reduced, thus increasing instruction issue- rate in
most cases.
Some combinational circuits such as adders or multipliers can be made faster by
adding more circuitry. If pipelining is used instead, it can save circuitry.
Disadvantages of Pipelining:
A non-pipelined processor executes only a single instruction at a time. This prevents
branch delays and problems with serial instructions being executed concurrently.
Consequently the design is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly lower than in a
pipelined equivalent. This is due to the fact that extra flip flops must be added
to the data path of a pipelined processor.
A non-pipelined processor will have a stable instruction bandwidth. The
performance of a pipelined processor is much harder to predict and may vary more
widely between different programs.
34
36. PARALLEL COMPUTING
Parallel computing is a form of computation in which many
calculations are carried out simultaneously, operating on the principle
that large problems can often be divided into smaller ones, which
are then solved concurrently ("in parallel").
There are several different forms of parallel computing:
bit-level,
instruction level,
data, and
task parallelism.
Parallelism has been employed for many years, mainly in high-
performance computing.
As power consumption by computers has become a concern in recent
years, parallel computing has become the dominant issue in
computer architecture, mainly in the form of multicore processors.36
37. PARALLEL COMPUTING
Computer Software is written for serial computation. To solve a
problem, an algorithm is constructed and implemented as a serial
stream of instructions. Only one instruction may execute at a time—
after that instruction is finished, the next is executed.
Parallel computing, on the other hand, uses multiple
processing elements simultaneously to solve a problem.
This is accomplished by breaking the problem into independent
parts so that each processing element can execute its part of the
algorithm simultaneously with the others.
The processing elements can be diverse and include resources such
as a single computer with multiple processors, several networked
computers, specialized hardware or any combination of the above.
.
37
38. TYPES OF PARALLELISM
Bit-level parallelism:
From the advent of VLSI in the 1970s until about 1986, speed-up in computer
architecture was driven by doubling computer word size— the amount of
information the processor can manipulate per cycle. Increasing the word size
reduces the number of instructions the processor must execute to perform an
operation on variables whose sizes are greater than the length of the word.
Instruction-level parallelism:
A computer program is, a stream of instructions executed by a processor.
These instructions can be re-ordered and combined into groups which are then
executed in parallel without changing the result of the program. This is known as
instruction-level parallelism.
Data parallelism:
Data parallelism is parallelism inherent in program loops, which focuses on
distributing the data across different computing nodes to be processed in parallel.
Task parallelism:
Task parallelism is the characteristic of a parallel program that "entirely
different calculations can be performed on either the same or different sets of data‖
This contrasts with data parallelism, where the same calculation is performed on
the same or different sets of data.
38
39. TYPES OF PARALLELISM
Bit-level parallelism is a form of parallel computing based on increasing
processor word size.
Increasing the word size reduces the number of instructions the processor
must execute in order to perform an operation on variables whose sizes
are greater than the length of the word.
For example:
Consider a case where an 8-bit processor must add two 16-bit integers. The
processor must first add the 8 lower-order bits from each integer, then add the 8
higher-order bits, requiring two instructions to complete a single operation. A 16-
bit processor would be able to complete the operation with single instruction
Historically, 4-bit microprocessors were replaced with 8-bit, then 16-bit, then 32-bit
microprocessors. This trend generally came to an end with the introduction of 32-
bit processors, which has been a standard in general purpose computing for
two decades. Only recently, with the advent of x86-64 architectures, have
64-bit processors become commonplace.
39
40. TYPES OF PARALLELISM
Instruction-level parallelism (ILP) is a measure of how many of the
operations in a computer program can be performed simultaneously.
Consider the following program:
For Example:
1. e = a + b
2. f = c + d
3. g = e * f
Here, Operation 3 depends on the results of operations 1 and 2, so it cannot
be calculated until both of them are completed. However, operations 1 and 2
do not depend on any other operation, so they can be calculated
simultaneously.
If we assume that each operation can be completed in one unit of time
then these three instructions can be completed in a total of two units of time,
giving an ILP of 3/2.
40
41. TYPES OF PARALLELISM
Instruction-level parallelism (ILP):
A goal of compiler and processor designers is to identify
and take advantage of as much ILP as possible.
Ordinary programs are typically written under a sequential
execution model where instructions execute one after the
other and in the order specified by the programmer. ILP allows
the compiler and the processor to overlap the execution of
multiple instructions or even to change the order in which
instructions are executed.
How much ILP exists in programs is very application specific. In
certain fields, such as graphics and scientific computing the
amount can be very large. However, workloads such as
cryptography exhibit much less parallelism.
41
42. TYPES OF PARALLELISM
Data parallelism (also known as loop-level
parallelism) is a form of parallelization of computing
across multiple processors in parallel computing
environments.
Data parallelism focuses on distributing the data across
different parallel computing nodes.
In a multiprocessor system executing a single set of
instructions (SIMD), data parallelism is achieved when
each processor performs the same task on different
pieces of distributed data. In some situations, a single
execution thread controls operations on all pieces of
data.
42
43. TYPES OF PARALLELISM
Data parallelism
For instance, consider a 2-processor system (CPUs A and B) in
a parallel environment, and we wish to do a task on some data
‗d‘. It is possible to tell CPU A to do that task on one part of ‗d‘
and CPU B on another part simultaneously, thereby reducing
the duration of the execution.
The data can be assigned using conditional statements
As a specific example, consider adding two matrices. In a
data parallel implementation, CPU A could add all
elements from the top half of the matrices, while CPU B could
add all elements from the bottom half of the matrices.
Since the two processors work in parallel, the job of
performing matrix addition would take one half the time of
performing the same operation in serial using 51 one CPU
alone.
43
44. TYPES OF PARALLELISM
Task parallelism (also known as function
parallelism and control parallelism) is a form of parallelization
of computer code across multiple processors in parallel
computing environments.
Task parallelism focuses on distributing execution processes
(threads) across different parallel computing nodes.
In a multiprocessor system, task parallelism is achieved when
each processor executes a different thread (or process) on the
same or different data.
The threads may execute the same or different code. In the
general case, different execution threads communicate with one
another as they work. Communication takes place usually to
pass data from one thread to the next as part of a workflow.
44
45. TYPES OF PARALLELISM
Task parallelism
As a simple example, if we are running code on a 2-
processor system (CPUs "a" & "b") in a parallel
environment and we wish to do tasks "A" and "B" , it is
possible to tell CPU "a" to do task "A" and CPU "b" to do
task 'B" simultaneously, thereby reducing the runtime of
the execution.
The tasks can be assigned using conditional
statements.
Task parallelism emphasizes the distributed
(parallelized) nature of the processing (i.e. threads), as
opposed to the data (data parallelism).
45