Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. High Performance Computing<br />JawwadShamsi<br />Lecture #6<br />27th January 2010<br />
  2. 2. Recap<br />Cache Coherence<br />NUMA<br />
  3. 3. Today’s topics<br />Cache Coherence – Continuation<br />Vector Processing<br />
  4. 4. Cache Coherence<br />In SMP or NUMA, multiple copies of cache<br />Each copy may have a different value of data item<br />Maintain Coherency<br />How?<br />
  5. 5. Cache Coherence: Two Approaches<br />Write back: Update Main memory once cache is flushed.<br />Write through: Write is updated to cache as well as to the main memory.<br />
  6. 6. Implementations<br />Software Solutions: <br />Compile time decision<br />Conservative<br />Inefficient cache utilization<br />Hardware Solutions:<br />Runtime decision<br />More effective<br />
  7. 7. Hardware based solution<br />Directory Protocol<br />Snoopy Protocol<br />
  8. 8. Directory<br />Centralized Controller<br />Individual cache controller makes a request<br />Centralized controller checks and issues command<br />Updates information<br />
  9. 9. Directory<br />Write<br />Processor requests exclusive writes<br />Controller sends message<br />Invalidates<br />Read<br />Issues command to the processor <br />Holding Processor<br />Writes back to MM<br />Read permitted<br />
  10. 10. Directory<br />Disadvantage<br />Centralized Controller<br />Bottleneck<br />Advantage<br />Useful in large –scale system<br />
  11. 11. Snoopy Protocol<br />Update operation announced<br />All Cache controllers snoop<br />Bus architecture<br />Careful<br />Increased Bus Traffic<br />
  12. 12. Snoopy Protocol<br />Two approaches<br />Write Invalidate<br />One write<br />Multiple readers<br />Exclusive: Writer invalidates others entries<br />Write Update<br />Multiple writers<br />All writes are updated<br />
  13. 13. Write Invalidate<br />The MESI Protocol : P4 processor<br />Data cache: Two status bits, 4 states<br />Modified<br />Exclusive<br />Shared<br />Invalid<br />See Table<br />
  14. 14. 4 Possibilities<br />Read Miss:<br />EX to SH<br />SH to SH<br />MO to SH<br />Read-Hit<br />Write-Miss<br />RWITM<br />MO to IN<br />SH to IN<br />Write Hit<br />SH to IN<br />EX <br />Mo<br />
  15. 15. L1- L2 Cache Consistency<br />
  16. 16. Parallel programming and Amdahl&apos;s Law<br />Suppose 1/N time for sequential code<br />And 1-1/N for the parallel<br />
  17. 17. Amdahl&apos;s Law<br />Speedup: speed gain of using parallel processor vs. single processor<br />Speed= 1/(s+(p/N))<br />S=sequential code, p = parallel code, N= no. of processors<br />S= T(1)/ T(j)<br />For j parallel processors<br />As problem size increases, p may rise and s may decrease<br />