Parallel computing involves using multiple processing units simultaneously to solve computational problems. It can save time by solving large problems or providing concurrency. The basic design involves memory storing program instructions and data, and a CPU fetching instructions from memory and sequentially performing them. Flynn's taxonomy classifies computer systems based on their instruction and data streams as SISD, SIMD, MISD, or MIMD. Parallel architectures can also be classified based on their memory arrangement as shared memory or distributed memory systems.
For over 40 years, virtually all computers have followed a common machine model known as the von Neumann computer. Name after the Hungarian mathématicien John von Neumann.
A von Neumann computer uses the stored-program concept. The CPU executes a stored program that specifies a sequence of read and write operations on the memory.
For over 40 years, virtually all computers have followed a common machine model known as the von Neumann computer. Name after the Hungarian mathématicien John von Neumann.
A von Neumann computer uses the stored-program concept. The CPU executes a stored program that specifies a sequence of read and write operations on the memory.
You could be a professional graphic designer and still make mistakes. There is always the possibility of human error. On the other hand if you’re not a designer, the chances of making some common graphic design mistakes are even higher. Because you don’t know what you don’t know. That’s where this blog comes in. To make your job easier and help you create better designs, we have put together a list of common graphic design mistakes that you need to avoid.
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...Mansi Shah
This study examines cattle rearing in urban and rural settings, focusing on milk production and consumption. By exploring a case in Ahmedabad, it highlights the challenges and processes in dairy farming across different environments, emphasising the need for sustainable practices and the essential role of milk in daily consumption.
Dive into the innovative world of smart garages with our insightful presentation, "Exploring the Future of Smart Garages." This comprehensive guide covers the latest advancements in garage technology, including automated systems, smart security features, energy efficiency solutions, and seamless integration with smart home ecosystems. Learn how these technologies are transforming traditional garages into high-tech, efficient spaces that enhance convenience, safety, and sustainability.
Ideal for homeowners, tech enthusiasts, and industry professionals, this presentation provides valuable insights into the trends, benefits, and future developments in smart garage technology. Stay ahead of the curve with our expert analysis and practical tips on implementing smart garage solutions.
Hello everyone! I am thrilled to present my latest portfolio on LinkedIn, marking the culmination of my architectural journey thus far. Over the span of five years, I've been fortunate to acquire a wealth of knowledge under the guidance of esteemed professors and industry mentors. From rigorous academic pursuits to practical engagements, each experience has contributed to my growth and refinement as an architecture student. This portfolio not only showcases my projects but also underscores my attention to detail and to innovative architecture as a profession.
White wonder, Work developed by Eva TschoppMansi Shah
White Wonder by Eva Tschopp
A tale about our culture around the use of fertilizers and pesticides visiting small farms around Ahmedabad in Matar and Shilaj.
1. What is Parallel Computing
•Traditionally, software has been written for
serial computation:
–To be run on a single computer having a single
Central Processing Unit (CPU);
–A problem is broken into a discrete series of
instructions.
–Instructions are executed one after another.
–Only one instruction may execute at any moment
in time.
2.
3. •Parallel computing is the simultaneous use of
multiple compute resources to solve a
computational problem.
–To be run using multiple CPUs
–A problem is broken into discrete parts that can
be solved concurrently
–Each part is further broken down to a series of
instructions
•Instructions from each part execute simultaneously
on different CPUs
6. •The primary reasons for using parallel
computing:
–Save time
–Solve large problems
–Provide concurrency (do multiple things at the
same time)
–Taking advantage of non-local resources
–Overcoming memory constraints
–Cost savings
7.
8. Basic Design
•Basic design
–Memory is used to store both program and data
instructions
–Program instructions are coded data which tell
the computer to do something
–Data is simply information to be used by the
program
•A central processing unit (CPU) gets
instructions and/or data from memory,
decodes the instructions and then
sequentially performs them.
10. 10
Classification of parallel architectures
•Flynn’s taxonomy
•Classification based on the memory
arrangement
•Classification based on type of interconnection
11. 11
Flynn’s Taxonomy
– The most universally accepted method of
classifying computer systems
– Any computer can be placed in one of 4 broad
categories
» SISD: Single instruction stream, single data
stream
» SIMD: Single instruction stream, multiple data
streams
» MIMD: Multiple instruction streams, multiple
data streams
13. Single Instruction, Single Data
(SISD)
•A serial (non-parallel) computer
•Single instruction: only one instruction
stream is being acted on by the CPU during
any one clock cycle
•Single data: only one data stream is being
used as input during any one clock cycle
•This is the oldest and until recently, the most
prevalent form of computer
•Examples: most PCs, single CPU workstations
and mainframes
15. •A type of parallel computer
•Single instruction: All processing units execute the same instruction at any
given clock cycle
•Multiple data: Each processing unit can operate on a different data element
•Best suited for specialized problems characterized by a high degree of
regularity such as image processing.
•Two varieties: Processor Arrays and Vector Pipelines
•Examples:
–Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
–Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
18. • A single data stream is fed into multiple processing
units.
• Each processing unit operates on the data
independently via independent instruction streams.
• Many functional units perform different operations
on the same data.
21. •Currently, the most common type of parallel
computer. Most modern computers fall into this
category.
•Multiple Instruction: every processor may be
executing a different instruction stream
•Multiple Data: every processor may be working
with a different data stream
•Execution can be synchronous or asynchronous,
deterministic or non-deterministic
24. •Multiple processors can operate independently but share
the same memory resources.
•Changes in a memory location effected by one processor
are visible to all other processors.
•Processors easily communicate by means of shared
variables
•Shared memory machines can be divided into two main
classes based upon memory access times: UMA and NUMA
25.
26.
27. Shared Memory: Pro and Con
•Advantages
–Global address space provides a user-friendly programming
perspective to memory
–Data sharing between tasks is both fast and uniform due to
the proximity of memory to CPUs
•Disadvantages:
–Primary disadvantage is the lack of scalability between
memory and CPUs.
– Adding more CPUs can geometrically increases traffic on
the shared memory
–Programmer responsibility for synchronization constructs
that insure "correct" access of global memory.
–Expense: it becomes increasingly difficult and expensive to
design and produce shared memory machines with ever
increasing numbers of processors.
29. •Processors have their own local memory.
•Memory addresses in one processor do not map to another
processor, so there is no concept of global address space across
all processors.
•Because each processor has its own local memory, it operates
independently.
•Changes it makes to its local memory have no effect on the
memory of other processors. Hence, the concept of cache
coherency does not apply.
•When a processor needs access to data in another processor, it
is usually the task of the programmer to explicitly define how
and when data is communicated. Synchronization between tasks
is likewise the programmer's responsibility
30.
31. Distributed Memory: Pro and Con
•Advantages
–Memory is scalable with number of processors. Increase the number of
processors and the size of memory increases proportionately.
–Each processor can rapidly access its own memory without interference
and without the overhead incurred with trying to maintain cache
coherency.
–Cost effectiveness: can use commodity, off-the-shelf processors and
networking.
•Disadvantages
–The programmer is responsible for many of the details associated with
data communication between processors.
–It may be difficult to map existing data structures, based on global
memory, to this memory organization.
–Non-uniform memory access (NUMA) times
34. Scalar and vector processors
•Scalar processors are the most basic type of
processor.
•These process one item at a time, typically
integers or floating point numbers, which are
numbers too large or small to be represented by
integers.
• As each instruction is handled sequentially,
basic scalar processing can take up some time.
35. • vector processors operate on an array of data points.
• Rather than handling each item individually, multiple
items that all have the same instruction can be handled at
once.
•This can save time over scalar processing, but also adds
complexity to a system, which can slow other functions.
•Vector processing works best when there is a large amount
of data to be processed, groups of which can be handled by
one instruction
38. Vector processor can process
• Vector Data type
• Apply same operation on all elements of the
vector
• No dependencies amongst elements
• Same motivation as SIMD
39. What is vector processing?
•A vector processor is one that can compute operations on entire
vectors with one simple instruction.
•A vector compiler will attempt to translate loops into single
vector instructions.
•Example - Suppose we have the following for loop:
for i = 1, n
X(i) = Y(i) + Z(i)
continue
•This will be translated into one long vector of length n and a
vector add instruction will be executed.
40. Why is this more efficient?
#1: Because there is only a need for one instruction, the vector
processor will not have to fetch and decode as many
instructions; Thus, memory bandwidth and the control unit
overhead are reduced considerably.
#2: The Vector Processor, after receiving the instruction, will be
told that it must fetch x amount of pairs of operands. When
received, they will be passed on directly to a pipelined data unit
to process them.
41. There are 2 specific kinds of machines
#1: Memory to memory: operands are fetched from
memory and passed on directly to the functional unit.
The results are then written back out to memory to
complete the process.
#2: Register to register: operands are loaded into a set
of vector registers, the operands are fetched from the
vector registers and the results are returned to a vector
register.
42. Vector Instruction Set Advantages
•Compact
–one short instruction encodes N operations
•Expressive, tells hardware that these N
operations:
–are independent
–use the same functional unit
–access disjoint registers
–access registers in the same pattern as previous instructions
–access a contiguous block of memory (unit-stride load/store)
–access memory in a known pattern (strided load/store)
•Scalable
–can run same object code on more parallel pipelines or lanes
43. Disadvantages
•Not as fast with scalar instructions
•Complexity
•Difficulties in implementing
•High price of on-chip vector memory systems
•Increased code complexity
45. Fall 2008 Introduction to Parallel Processing 45
Array Computers
•An array processor is a synchronous parallel
computer with multiple arithmetic logic units, called
processing elements (PE), that can operate in
parallel.
•The PEs are synchronized to perform the same
function at the same time.
•Only a few array computers are designed primarily
for numerical computation, while the others are for
research purposes.
47. • Array processors are also known as multiprocessors or vector processors.
They perform computations on large arrays of data. Thus, they are used to
improve the performance of the computer.
• Two types of Array Processor:
Attached Array Processors
SIMD Array Processors
Attached Array Processors:
• An attached array processor is a processor which is attached to a general
purpose computer and its purpose is to enhance and improve the performance
of that computer in numerical computational tasks.
• It achieves high performance by means of parallel processing with multiple
functional units.
48.
49. SIMD Array Processors
• SIMD is the organization of a single computer containing multiple processors
operating in parallel.
• The processing units are made to operate under the control of a common
control unit, thus providing a single instruction stream and multiple data
streams.
• A general block diagram of an array processor is next slide.
• It contains a set of identical processing elements (PE's), each of which is
having a local memory M.
• Each processor element includes an ALU and registers.
• The master control unit controls all the operations of the processor elements.
It also decodes the instructions and determines how the instruction is to be
executed.
50.
51. • The main memory is used for storing the program.
• The control unit is responsible for fetching the instructions.
• Vector instructions are send to all PE's simultaneously and results are
returned to the memory.
• The best known SIMD array processor is the ILLIAC IV computer
developed by the Burroughs corps. SIMD processors are highly specialized
computers.
• They are only suitable for numerical problems that can be expressed in vector
or matrix form and they are not suitable for other types of computations.
52. Why use the Array Processor
•Array processors increases the overall instruction processing speed.
•As most of the Array processors operates asynchronously from the host CPU, hence
it improves the overall capacity of the system.
•Array Processors has its own local memory, hence providing extra memory for
systems with low memory.