What is parallel computing?
 Parallel computing is a form of computation in which
many instructions are carried out simultaneously
operating on the principle that large problems can
often be divided into smaller ones,
which are then
solved concurrently
(in parallel)
Why Parallel Computing?
 The primary reasons for using parallel computing:
 Save time - wall clock time
 Solve larger problems
 Provide concurrency (do multiple things at the same
time)
Other reasons might
include:
 Cost savings - using multiple "cheap" computing
resources instead of paying for time on a supercomputer.
 Overcoming memory constraints - single computers
have very finite memory resources. For large problems,
using the memories of multiple computers may
overcome this obstacle.
Architecture Concepts
 von Neumann Architecture
Comprised of four main components:
• Memory
• Control Unit
• Arithmetic Logic Unit
• Input / Output
 Memory is used to store both program instructions
and data
• Program instructions are coded data which tell the
computer to do something
• Data is simply information to be used by the program
 Control unit fetches instructions/data from memory ,
decodes the instructions and then sequentially
coordinates operations to accomplish the programmed
task.
 Arithmetic Unit performs basic arithmetic operations
 Input / Output is the interface to the human operator
Flynn’s Classical Taxonomy
Single Instruction, Single Data
(SISD):
 A serial (non-parallel) computer Single instruction
 only one instruction stream is being acted on by the
CPU during any one clock cycle Single data:
 only one data stream is being used as input during any
one clock cycle
 Deterministic execution
Single Instruction, Multiple Data
(SIMD):
 A type of parallel computer Single instruction
 All processing units execute the same instruction at
any given clock cycle
 Multiple data: Each processing unit can operate on a
different data element
 Best suited for specialized problems characterized by
a high degree of regularity, such as graphics/image
processing.
 Two varieties: Processor Arrays and Vector Pipelines
Multiple Instruction, Single Data
(MISD):
 A single data stream is fed into multiple processing
units.
 Each processing unit operates on the data
independently via independent instruction streams.
 Some conceivable uses might be :
 multiple frequency filters operating on a single signal stream
 multiple cryptography algorithms attempting to crack a single coded
message.
Multiple Instruction, Multiple Data
(MIMD):
 Multiple Instruction: every processor may be
executing a different instruction stream
 Multiple Data: every processor may be working with a
different data stream
 Execution can be synchronous or asynchronous,
deterministic or non- deterministic
Parallel Computer Memory
Architectures:
 Shared Memory
i) Uniform Memory Access (UMA)
ii)Non-Uniform Memory Access (NUMA)
 Distributed Memory
 Hybrid Distributed-Shared Memory
Shared Memory :
 Shared memory parallel computers have in common
the ability for all processors to access all memory as
global address space.
 Multiple processors can operate independently
 Changes in a memory location effected by one
processor are visible to all other processors
 Shared memory machines can be divided into two
main classes based upon memory access times: UMA
and NUMA.
Shared Memory : UMA
 Uniform Memory Access (UMA):
 Identical processors, Symmetric Multiprocessor (SMP)
 Equal access and access times to memory
 Sometimes called CC-UMA - Cache Coherent UMA.
Cache coherent means if one processor updates a
location in shared memory, all the other processors
know about the update.
Shared Memory : NUMA
 Non-Uniform Memory Access (NUMA):
 Often made by physically linking two or more SMPs
 One SMP can directly access memory of another SMP
 Not all processors have equal access time to all
memories
 Memory access across link is slower
 If cache coherency is maintained, then may also be
called CC-NUMA - Cache Coherent NUMA
Distributed Memory :
 Distributed memory systems require a communication
network to connect inter-processor memory
 Processors have their own local memory
 Because each processor has its own local memory, it
operates independently. Hence, the concept of cache
coherency does not apply
 When a processor needs access to data in another
processor, it is usually the task of the programmer to
explicitly define how and when data is communicated.
Synchronization between tasks is likewise the
programmers responsibility.
Hybrid Distributed-Shared
Memory:
 The shared memory component is usually a cache
coherent SMP machine. Processors on a given SMP
can address that machines memory as global
 The distributed memory component is the
networking of multiple SMPs. SMPs know only about
their own memory - not the memory on another SMP.
Therefore, network communications are required to
move data from one SMP to another.
The main fields that need advanced
computing are:
 Computational Fluid Dynamics
 Design of large structures
 Computational physics and Chemistry
 Climate Modeling
 Vehicle Simulation
 Image processing
 Signal Processing
 Oil reservoir Management
 Seismic data processing
India ‘s contribution towards
parallel computing
 India has made significant strides in developing high-
performance parallel computers.
Major Indian parallel computing
projects:
 PARAM(from CDAC, Pune)
 ANUPAM(from BARC, Bombay)
 MTPPS(from BARC, Bombay)
 PACE(from ANURAG, Hyderabad)
 CHIPPS(from CDOT, Bangalore)
 FLOSOLVER(from NAL, Bangalore)
PARAM (Centre for Development
of Advanced Computing)
 India’s first super computer – PARAM 8000
 It was indigenously built in 1990
 1 GF (Giga Flops) Parallel Machine
 64 node prototype i.e. had 64 CPU’s
 PARAM is Sanskrit and means “Supreme”
 Programming environment called PARAS
 Based on Transputers 800/805
 Theoretically peak performance was 1 giga flops
 Practically provided 100 to 200 Mflops
 Hardware upgrade was given to PARAM 8000 to
produce the new PARAM 8600
 Hardware upgradation was the integration of Intel
i860 coprocessor with PARAM 8000
 It was a 256 CPU computer
PARAM 9000
 Mission was to deliver teraflops range parallel system
 This architecture emphasizes flexibility
 PARAM 9000SS is based on SuperSarc Processors
 Operating speed of processor is 75 Mhz
 PARAM 10000 has a peak speed of 6.4 GFlops
PARAM Padma
 Introduced in April 2003
 Top speed of 1024 Gflops ( 1 Tflops)
 Used IBM Power4 CPU’s
 Operating system was IBM AIX 5
 First Indian computer to break 1 Tflops barrier
PARAM Yuva
 Unveiled in November 2008
 The maximum sustainable speed is 38.1 Tflops
 The peak speed is 54 Tflops
 Uses Intel 73XX with 2.9 Ghz each.
 Storage capacity of 25 TB up to 200 TB
 PARAM Yuva II released in February 2013
 Peak performance of 524 Tflops
 Uses less power compared to its predecessor
ANUPAM
 Developed by Bhabha Atomic Research Centre,
Bombay
 200 Mflops of sustained computing power was
needed by them
 Based on standard MultiBus II i860 hardware
ANUPAM 860
 First developed in December 1991
 It made use of i860 microprocessor @ 40Mhz
 Overall sustained speed of 30 Mflops
 Upgraded version released on August 1992 has a
computational speed of 52 Mflops
 Further upgradation provided a sustained speed of 110
Mflops which was released in November 1992
ANUPAM Alpha
 Developed in July 1997 having a sustained speed of
1000 Mflops
 Made use of Alpha 21164 microprocessor @ 400 Mhz
 This system used complete servers / workstations as
compute node instead of processor boards.
 Updated version released in March 1998 had a
sustained speed of 1.5 Gflops.
ANUPAM Pentium
 Started in January 1998
 Main focus of its development is the minimization of
cost
 The first version ANUPAM Pentium II/4 gave a
sustained speed of 248 Mflops
 ANUPAM Pentium II was expanded in march 1999
with a sustained speed of 1.3Gflops
 In April 2000 the system was upgraded to Pentium
III/16 which gave a sustained speed of 3.5 Gflops
 ANUPAM PIV 64 node has a sustained speed of 43
Gflop
Applications
 All the three versions of ANUPAM was introduced to
solve the computational problems at BARC.
 The main fields that ANUPAM being used are ◦
Molecular Dynamic Simulation
◦ Neutron Transport Calculation
◦ Gamma Ray Simulation by Monte Carlo method
◦ Crystal Structure Analysis
PACE
 Developed by ANURAG (Advanced Numerical
Research and Analysis Group) under DRDO
 Developed as a result of R & D in parallel computing
 Uses VLSI
 Started in 1998
 Motorolla 68020 processor @ 16.67 MHz
 Processor for Aerodynamic Computation and
Evaluation (PACE)
 Used to design computational Fluid Dynamics needed
in aircraft
 Developed version is Pace Plus 32 used in missile
development
 More advanced version is PACE++
ANAMICA - Software
 ANURAG’s Medical Imaging and Characterization Aid
(ANAMICA)
 Medical visualization software for data obtained from
MRI , CT and Ultrasound
 Has both two dimensional and three dimensional
visualization
 Used for medical diagnosis etc.
DHRUVA 3
 Set up by DRDO for solving mission critical Defence
Research and Development applications
 Used in design of aircraft
 E.g.: Advanced Medium Combact Aircraft (AMCA)
FLOSOLVER
 Started in 1986 by National Aerospace Laboratories
(NAL)
 Used in numerical weather prediction
 Varsha GCM could predict the weather accurately in
two weeks advance uses FS
 Based on 16 bit Intel 8086 and 8087 processors
 Updated versions were released to increase the
performance
CHIPPS
 Developed to have indigenous digital switching
technology
 Established in rural exchanges and secondary
switching areas
 Speed of 200 Mflops is acquired
THANK YOU
Preeti chauhan

Parallel computing in india

  • 2.
    What is parallelcomputing?  Parallel computing is a form of computation in which many instructions are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel)
  • 3.
    Why Parallel Computing? The primary reasons for using parallel computing:  Save time - wall clock time  Solve larger problems  Provide concurrency (do multiple things at the same time)
  • 4.
    Other reasons might include: Cost savings - using multiple "cheap" computing resources instead of paying for time on a supercomputer.  Overcoming memory constraints - single computers have very finite memory resources. For large problems, using the memories of multiple computers may overcome this obstacle.
  • 5.
    Architecture Concepts  vonNeumann Architecture Comprised of four main components: • Memory • Control Unit • Arithmetic Logic Unit • Input / Output
  • 6.
     Memory isused to store both program instructions and data • Program instructions are coded data which tell the computer to do something • Data is simply information to be used by the program  Control unit fetches instructions/data from memory , decodes the instructions and then sequentially coordinates operations to accomplish the programmed task.  Arithmetic Unit performs basic arithmetic operations  Input / Output is the interface to the human operator
  • 7.
  • 8.
    Single Instruction, SingleData (SISD):  A serial (non-parallel) computer Single instruction  only one instruction stream is being acted on by the CPU during any one clock cycle Single data:  only one data stream is being used as input during any one clock cycle  Deterministic execution
  • 9.
    Single Instruction, MultipleData (SIMD):  A type of parallel computer Single instruction  All processing units execute the same instruction at any given clock cycle  Multiple data: Each processing unit can operate on a different data element  Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing.  Two varieties: Processor Arrays and Vector Pipelines
  • 10.
    Multiple Instruction, SingleData (MISD):  A single data stream is fed into multiple processing units.  Each processing unit operates on the data independently via independent instruction streams.  Some conceivable uses might be :  multiple frequency filters operating on a single signal stream  multiple cryptography algorithms attempting to crack a single coded message.
  • 11.
    Multiple Instruction, MultipleData (MIMD):  Multiple Instruction: every processor may be executing a different instruction stream  Multiple Data: every processor may be working with a different data stream  Execution can be synchronous or asynchronous, deterministic or non- deterministic
  • 12.
    Parallel Computer Memory Architectures: Shared Memory i) Uniform Memory Access (UMA) ii)Non-Uniform Memory Access (NUMA)  Distributed Memory  Hybrid Distributed-Shared Memory
  • 13.
    Shared Memory : Shared memory parallel computers have in common the ability for all processors to access all memory as global address space.  Multiple processors can operate independently  Changes in a memory location effected by one processor are visible to all other processors  Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.
  • 14.
    Shared Memory :UMA  Uniform Memory Access (UMA):  Identical processors, Symmetric Multiprocessor (SMP)  Equal access and access times to memory  Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update.
  • 15.
    Shared Memory :NUMA  Non-Uniform Memory Access (NUMA):  Often made by physically linking two or more SMPs  One SMP can directly access memory of another SMP  Not all processors have equal access time to all memories  Memory access across link is slower  If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA
  • 16.
    Distributed Memory : Distributed memory systems require a communication network to connect inter-processor memory  Processors have their own local memory  Because each processor has its own local memory, it operates independently. Hence, the concept of cache coherency does not apply  When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmers responsibility.
  • 17.
    Hybrid Distributed-Shared Memory:  Theshared memory component is usually a cache coherent SMP machine. Processors on a given SMP can address that machines memory as global  The distributed memory component is the networking of multiple SMPs. SMPs know only about their own memory - not the memory on another SMP. Therefore, network communications are required to move data from one SMP to another.
  • 18.
    The main fieldsthat need advanced computing are:  Computational Fluid Dynamics  Design of large structures  Computational physics and Chemistry  Climate Modeling  Vehicle Simulation  Image processing  Signal Processing  Oil reservoir Management  Seismic data processing
  • 19.
    India ‘s contributiontowards parallel computing  India has made significant strides in developing high- performance parallel computers.
  • 20.
    Major Indian parallelcomputing projects:  PARAM(from CDAC, Pune)  ANUPAM(from BARC, Bombay)  MTPPS(from BARC, Bombay)  PACE(from ANURAG, Hyderabad)  CHIPPS(from CDOT, Bangalore)  FLOSOLVER(from NAL, Bangalore)
  • 21.
    PARAM (Centre forDevelopment of Advanced Computing)  India’s first super computer – PARAM 8000  It was indigenously built in 1990  1 GF (Giga Flops) Parallel Machine  64 node prototype i.e. had 64 CPU’s  PARAM is Sanskrit and means “Supreme”  Programming environment called PARAS
  • 22.
     Based onTransputers 800/805  Theoretically peak performance was 1 giga flops  Practically provided 100 to 200 Mflops  Hardware upgrade was given to PARAM 8000 to produce the new PARAM 8600  Hardware upgradation was the integration of Intel i860 coprocessor with PARAM 8000  It was a 256 CPU computer
  • 23.
    PARAM 9000  Missionwas to deliver teraflops range parallel system  This architecture emphasizes flexibility  PARAM 9000SS is based on SuperSarc Processors  Operating speed of processor is 75 Mhz  PARAM 10000 has a peak speed of 6.4 GFlops
  • 24.
    PARAM Padma  Introducedin April 2003  Top speed of 1024 Gflops ( 1 Tflops)  Used IBM Power4 CPU’s  Operating system was IBM AIX 5  First Indian computer to break 1 Tflops barrier
  • 25.
    PARAM Yuva  Unveiledin November 2008  The maximum sustainable speed is 38.1 Tflops  The peak speed is 54 Tflops  Uses Intel 73XX with 2.9 Ghz each.  Storage capacity of 25 TB up to 200 TB  PARAM Yuva II released in February 2013  Peak performance of 524 Tflops  Uses less power compared to its predecessor
  • 26.
    ANUPAM  Developed byBhabha Atomic Research Centre, Bombay  200 Mflops of sustained computing power was needed by them  Based on standard MultiBus II i860 hardware
  • 27.
    ANUPAM 860  Firstdeveloped in December 1991  It made use of i860 microprocessor @ 40Mhz  Overall sustained speed of 30 Mflops  Upgraded version released on August 1992 has a computational speed of 52 Mflops  Further upgradation provided a sustained speed of 110 Mflops which was released in November 1992
  • 28.
    ANUPAM Alpha  Developedin July 1997 having a sustained speed of 1000 Mflops  Made use of Alpha 21164 microprocessor @ 400 Mhz  This system used complete servers / workstations as compute node instead of processor boards.  Updated version released in March 1998 had a sustained speed of 1.5 Gflops.
  • 29.
    ANUPAM Pentium  Startedin January 1998  Main focus of its development is the minimization of cost  The first version ANUPAM Pentium II/4 gave a sustained speed of 248 Mflops  ANUPAM Pentium II was expanded in march 1999 with a sustained speed of 1.3Gflops
  • 30.
     In April2000 the system was upgraded to Pentium III/16 which gave a sustained speed of 3.5 Gflops  ANUPAM PIV 64 node has a sustained speed of 43 Gflop
  • 31.
    Applications  All thethree versions of ANUPAM was introduced to solve the computational problems at BARC.  The main fields that ANUPAM being used are ◦ Molecular Dynamic Simulation ◦ Neutron Transport Calculation ◦ Gamma Ray Simulation by Monte Carlo method ◦ Crystal Structure Analysis
  • 32.
    PACE  Developed byANURAG (Advanced Numerical Research and Analysis Group) under DRDO  Developed as a result of R & D in parallel computing  Uses VLSI  Started in 1998  Motorolla 68020 processor @ 16.67 MHz
  • 33.
     Processor forAerodynamic Computation and Evaluation (PACE)  Used to design computational Fluid Dynamics needed in aircraft  Developed version is Pace Plus 32 used in missile development  More advanced version is PACE++
  • 34.
    ANAMICA - Software ANURAG’s Medical Imaging and Characterization Aid (ANAMICA)  Medical visualization software for data obtained from MRI , CT and Ultrasound  Has both two dimensional and three dimensional visualization  Used for medical diagnosis etc.
  • 35.
    DHRUVA 3  Setup by DRDO for solving mission critical Defence Research and Development applications  Used in design of aircraft  E.g.: Advanced Medium Combact Aircraft (AMCA)
  • 36.
    FLOSOLVER  Started in1986 by National Aerospace Laboratories (NAL)  Used in numerical weather prediction  Varsha GCM could predict the weather accurately in two weeks advance uses FS  Based on 16 bit Intel 8086 and 8087 processors  Updated versions were released to increase the performance
  • 37.
    CHIPPS  Developed tohave indigenous digital switching technology  Established in rural exchanges and secondary switching areas  Speed of 200 Mflops is acquired
  • 38.