SEMINAR ONPARALLEL COMPUTING   Niranjana Ambadi      B090404EC
What does “parallel” mean?ACCORDING TO WEBSTER, PARALLEL IS“AN ARRANGEMENT OR STATE THATPERMITS SEVERAL OPERATIONS ORTASKS...
What is a parallel computer?“A LARGE COLLECTION OFPROCESSING ELEMENTS THAT CANCOMMUNICATE AND COOPERATETO SOLVE LARGE PROB...
PARALLELISM• Parallel computing -form of computation in which  many calculations are carried out simultaneously.• Parallel...
Flynn’s taxonomy SISD(Single Instruction Single Data) SIMD(Single Instruction Multiple Data)-  available on CPU enables ...
Parallelism-How?• Task parallelism• Data parallelism• Recent CPU-several parallelisation techniques-  branch prediction,ou...
Parallel ArchitecturesThree popular:1. Shared memory (uniform memory access and   symmetric multiprocessing),2. Distribute...
Difference With Distributed                  ComputingParallel Computing different processors/computers work on asingle co...
Difference With Cluster ComputingA computer cluster is a group of linked computers, working togetherclosely so that in man...
Difference With Grid ComputingGrid Computing makes use of computers communicating over theInternet to work on a given prob...
Cluster Computing• Loosely connected n/w of nodes(computers)  via a high speed LAN• Orchestrated by "clustering middleware...
GPU-Graphics Processing Unit• the dominant , massively parallel architecture  available to the masses.• simple yet energy-...
Where are GPUs used?Designed for a particular class of applications with the following characteristics:Computational requ...
Fixed function GPUs• The hardware in any given stage could exploit  data parallelism within that stage, processing  multip...
GPU evolution6 years ago                      Today• A fixed-function processor     • a full fledge parallel• built around...
Remote Sensing Processing• On-the-flow processing: part by part• Most algorithm do not consider neighborhood  of each pixe...
Challenges for parallel-computing                 chips1. Power supply voltage scaling diminishing2. memory bandwidth impr...
Cluster memory                   Increased CPU                 utilisation requires                 limiting number of    ...
Cluster Memory                   Memory                fragmentationEffectiveMemory Usage                   Paging        ...
.                • Total memory is distributed   Memory         into discrete chunksfragmentation   • Uneven and inefficie...
NETWORK RAMApplications can allocate memory   greater than what is locally            available     Idle memory of other  ...
. RAMNetwork RAM Disk
Disadvantages of existing NRAM• Parallel job divides into processes which needs  to be synchronised regularly.• Nodes seek...
Diagram of Parallel Network-RAM. Application 2                 is assigned to    nodes of P3, P4, and P5, but utilizes the...
Generic Description• All nodes host PNR servants-a servant acts as  both client and server• Managers(some servants) coordi...
Generic DescriptionClient attempts to      Once allocated,                                               Client will send ...
Network RAM DesignsCentralised (CEN)StrategyClient (CLI)StrategyLocal Managers (MAN)StrategyBackbone (BB) Strategy
CEN StrategyOnly one managercoordinating all clientrequests.All servents know himAdvantage-Nobroadcast of memoryload infor...
CLI StrategyEach client is a managerand sends allocationrequests directly.Advantage- Nosynchronisation overheadand allocat...
MANStrategyWhen a job starts orstops one clientvolunteers as themanager.Each servant shouldagree on the selectedmanager no...
BB Strategy•Subset of servents act asmanagers•All clients associated witha job must agree on whichmanager to contact.•It i...
Models• Each node-33 MHz,32 MB local RAM,hard  disk with seek time=9ms transfer rate 50  MB/s• Ethernet 100 Mbps star topo...
Metrics• To directly compare DP (“disk paging,” a  system without PNR) to the various PNR  designs, we create another metr...
Experimental set up• We evaluate the performance of PNRAM  under the following situations:1. Varying memory loads2. Varyin...
Varying networkVarying memory                                        Schedulers                                performance...
Results-1• As memory load increases,PNR and DP tend to  infinite response times• As memory load decreases,the response tim...
Results-2• PNR is very sensitive to network performance• PNR response time tends to infinity as  network service time is i...
Result-3• In space sharing system,only one process is  allowed on a node at a time.• In low load case ,CLI is the best cho...
Future work• For some exps,PNR memory usage was even  more non uniform than DP’s.• More work needed to ensure that PNR its...
CONCLUSION• Using a coordinating PNR method under heavier loads  is essential for good performance.• Coordinating PNR meth...
network ram parallel computing
Upcoming SlideShare
Loading in …5
×

network ram parallel computing

1,573 views

Published on

Introducing parallel computing and a detailed emphasis on Parallel Network RAM

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,573
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

network ram parallel computing

  1. 1. SEMINAR ONPARALLEL COMPUTING Niranjana Ambadi B090404EC
  2. 2. What does “parallel” mean?ACCORDING TO WEBSTER, PARALLEL IS“AN ARRANGEMENT OR STATE THATPERMITS SEVERAL OPERATIONS ORTASKS TO BE PERFORMEDSIMULTANEOUSLY RATHER THANCONSECUTIVELY.”
  3. 3. What is a parallel computer?“A LARGE COLLECTION OFPROCESSING ELEMENTS THAT CANCOMMUNICATE AND COOPERATETO SOLVE LARGE PROBLEMS FAST.”
  4. 4. PARALLELISM• Parallel computing -form of computation in which many calculations are carried out simultaneously.• Parallel computers can be roughly classified according to the level at which the hardware supports parallelism.• Multi-core and Multi-processor computers have multiple processing elements within a single machine;clusters and grids use multiple computers to work on the same task. Specialized parallel computer architectures are used alongside traditional processors, for accelerating specific tasks,eg GPUs.
  5. 5. Flynn’s taxonomy SISD(Single Instruction Single Data) SIMD(Single Instruction Multiple Data)- available on CPU enables single op on multiple data at once. MISD(Multiple Instruction Single Data) MIMD(Multiple Instruction Multiple Data)- several cores on a single die
  6. 6. Parallelism-How?• Task parallelism• Data parallelism• Recent CPU-several parallelisation techniques- branch prediction,out of order execution,superscalar• These increase complexity,limiting the number of CPUs on a single chip.• GPU each processing unit is simple,but large number on a single chip
  7. 7. Parallel ArchitecturesThree popular:1. Shared memory (uniform memory access and symmetric multiprocessing),2. Distributed memory (clusters and network of workstations), and3. Shared Distributed (non-uniform memory access)
  8. 8. Difference With Distributed ComputingParallel Computing different processors/computers work on asingle common goalEg.Ten men pulling a rope to lift up one rock.Supercomputers implement parallel computing.Distributed computing is where several different computers workseparately on a multi-faceted computing workload.Eg Ten men pulling ten ropes to lift ten different rocks.Employees working in an office doing their own work.
  9. 9. Difference With Cluster ComputingA computer cluster is a group of linked computers, working togetherclosely so that in many respects they form a single computer.Eg.,In an office of 50 employees,group of 15 doing some work,25some other,and remaining 10 something else.Similarly,in a network of 20 computers,16 working on a commongoal,whereas 4 on some other common goal.Cluster Computing is a specific case of parallel computing.
  10. 10. Difference With Grid ComputingGrid Computing makes use of computers communicating over theInternet to work on a given problem.Eg. When 3 persons, one of them from USA, another from Japan and athird from Norway are working together online on a common project.Websites like Wikipedia,Yahoo!Answers,YouTube,FlickR or opensource OS like Linux are examples of grid computing.Again,an example of parallel computing.
  11. 11. Cluster Computing• Loosely connected n/w of nodes(computers) via a high speed LAN• Orchestrated by "clustering middleware“• Relies on a centralized management approach which makes the nodes available as orchestrated shared servers.
  12. 12. GPU-Graphics Processing Unit• the dominant , massively parallel architecture available to the masses.• simple yet energy-efficient computationalcores,• thousands of simultaneously active fine- grained threads, and
  13. 13. Where are GPUs used?Designed for a particular class of applications with the following characteristics:Computational requirements are large.Parallelism is substantial.Throughput is more important than latency.
  14. 14. Fixed function GPUs• The hardware in any given stage could exploit data parallelism within that stage, processing multiple elements at the same time.• Each stage’s hardware customized for its given task• a lengthy, feed-forward GPU pipeline with many stages, each typically accelerated by special purpose parallel hardware.• Advantage-High throughput• Disadvantage-Load balancing
  15. 15. GPU evolution6 years ago Today• A fixed-function processor • a full fledge parallel• built around the graphics programmable processor pipeline • both application• Best described as additions programming interface of programmability to fixed (APIs) and hardware function pipeline • increasingly focusing on the programmable aspects of the GPU-vertex pgms & fragment pgms
  16. 16. Remote Sensing Processing• On-the-flow processing: part by part• Most algorithm do not consider neighborhood of each pixel• Development of languages like CUDA and OpenCL motivated programmers to heterogenous processing platforms
  17. 17. Challenges for parallel-computing chips1. Power supply voltage scaling diminishing2. memory bandwidth improvements is slowing down3. Programmability – Memory model – Degree of parallelism – Heterogeneity4. Research still going strong in parallel computing
  18. 18. Cluster memory Increased CPU utilisation requires limiting number of parallel processes. However as problem sizeincreases page fault occurs
  19. 19. Cluster Memory Memory fragmentationEffectiveMemory Usage Paging overhead
  20. 20. . • Total memory is distributed Memory into discrete chunksfragmentation • Uneven and inefficient utilisation • Disk paging in heavily Paging loaded nodes-high cost overhead • Hard disks are too much slower
  21. 21. NETWORK RAMApplications can allocate memory greater than what is locally available Idle memory of other machines is used using a fast interconnecting network No Page faults
  22. 22. . RAMNetwork RAM Disk
  23. 23. Disadvantages of existing NRAM• Parallel job divides into processes which needs to be synchronised regularly.• Nodes seek NRAM independently, uneven amount maybe granted-processes run at different speeds• The whole job is limited by speed of the slowest process
  24. 24. Diagram of Parallel Network-RAM. Application 2 is assigned to nodes of P3, P4, and P5, but utilizes the available memory spaces in other nodes, such as P2, P6, and P7.
  25. 25. Generic Description• All nodes host PNR servants-a servant acts as both client and server• Managers(some servants) coordinate client requests• Server has more unallocated memory than a threshold, it will grant NRAM request and allocate memory to the manager.• Read and write requests are directly from the clients.
  26. 26. Generic DescriptionClient attempts to Once allocated, Client will send allocate and de- client is informed pages to the serverallocate NRAM on which are the server for storage and laterbehalf of hosting nodes and the amt retrieval node of memory allocated
  27. 27. Network RAM DesignsCentralised (CEN)StrategyClient (CLI)StrategyLocal Managers (MAN)StrategyBackbone (BB) Strategy
  28. 28. CEN StrategyOnly one managercoordinating all clientrequests.All servents know himAdvantage-Nobroadcast of memoryload informationDisadvantage-Networkconnection leading tomanager nodebecomes bottleneck
  29. 29. CLI StrategyEach client is a managerand sends allocationrequests directly.Advantage- Nosynchronisation overheadand allocates NRAMquicklyDisadvantage-Some clientsreceive large amounts ofNRAM while some maynot, worsening the overallperformance.
  30. 30. MANStrategyWhen a job starts orstops one clientvolunteers as themanager.Each servant shouldagree on the selectedmanager node.Drawback-broadcastmemory loadinformation causingcongestion
  31. 31. BB Strategy•Subset of servents act asmanagers•All clients associated witha job must agree on whichmanager to contact.•It is more scalable thanthe centralized solution• Since load is sharedamong many servents, andit uses fewer messages forsynchronization
  32. 32. Models• Each node-33 MHz,32 MB local RAM,hard disk with seek time=9ms transfer rate 50 MB/s• Ethernet 100 Mbps star topology• Each link latency 50 ns,central switch processing delay 80 microsec.• No collisions• System tasks by separate dedicated processors• One centralised scheduler for the system• Cache hit ratio of 50% ,memory access every 4 clock cycles is assumed.
  33. 33. Metrics• To directly compare DP (“disk paging,” a system without PNR) to the various PNR designs, we create another metric based on average response time (R):optimization ratio, which is defined as
  34. 34. Experimental set up• We evaluate the performance of PNRAM under the following situations:1. Varying memory loads2. Varying network speeds3. Different network topologies4. Different scheduling strategies
  35. 35. Varying networkVarying memory Schedulers performance• Vary RAM at • Link BW • Gang scheduler each node &processing • Space sharing delay scheduler’• Memory demands of jobs constant Paging methods Topologies • Bus • Base method is • Star disk paging • Fully connected • Four PNR n/w methods
  36. 36. Results-1• As memory load increases,PNR and DP tend to infinite response times• As memory load decreases,the response time converges to a constant number• Adding PNR to systems loaded within some bounds (and adequate communication links) lead to performance benefit
  37. 37. Results-2• PNR is very sensitive to network performance• PNR response time tends to infinity as network service time is increased and converges to a constant number when service time is decreased• DP does not follow this model• PNR cannot be considered with low BW/comm bottlenecks
  38. 38. Result-3• In space sharing system,only one process is allowed on a node at a time.• In low load case ,CLI is the best choice• Under heavy load,NRAM allocation coordination is a limiting factor• In gang scheduling,n/w performance is crucial• Lighter load OR-12%,heavier load OR>90%
  39. 39. Future work• For some exps,PNR memory usage was even more non uniform than DP’s.• More work needed to ensure that PNR itself doesnot create more overloaded nodes• Coordination of allocation of memory resources and communication overhead needs to be taken care of.
  40. 40. CONCLUSION• Using a coordinating PNR method under heavier loads is essential for good performance.• Coordinating PNR methods offer the best performance enhancement when under moderate load.• Performance gains can be as high as 100 percent.• CLI can provide acceptable or superior results under light load only.• All PNR methods offer little benefit under very heavy or very light loads.• Good network performance is crucial for good PNR performance.

×