• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Chalmers microprocessor sept 2010

Chalmers microprocessor sept 2010






Total Views
Views on SlideShare
Embed Views



6 Embeds 616

http://www.parallellabs.com 600
http://xianguo.com 6
http://webcache.googleusercontent.com 4
http://static.slidesharecdn.com 3
http://cache.baidu.com 2
http://reader.youdao.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • As opposed to the current CMPs which either tile large cores for high serial thread performance or tile all small cores for high throughput, the ACMP provides one large core and many small cores. The LARGE core of the ACMP executes the serial, or the non-parallelized, part of the application and the small cores execute the parallelized part. Today I will show how the ACMP paradigm can ALSO improve performance of the parallelized part by accelerating the execution of critical sections. So what are critical sections!!! Homogeneous ISA One (or a few) large core(s) Many small cores All cores on the same interconnect Hardware cache coherence
  • The large core and the small cores are functionally similar. The difference is in their performance characterisitcis. We envision the large core to be an aggressive high perofrmance processor. It may include features like out of order execution ,wide fetch, deeper piples, aggressive gbrnach rpediction etc. On the other hand the small core must be POEWR-EFFICIENT. It can be a simple mickey mouse core. It can be in-order with a narrow fetch, a shallow pipline and mickey mouse bernach predictor.
  • First, we compare the three approaches analytically. Y-Axis is the speedup achieved over a single conventional P6 core. X-axis is the degree of parallelism which is the percentage of the program parallelized by the programmer. The three curves show performance of ACMP, Niagara, and P6-Tile. When the parallelism is low, both the ACMP and P6-Tile outperform the Niagara approach because of their high-single thread performance. When parallelism is high, the Niagara outperforms both P6-Tile and ACMP because of its high throughput. But when the parallelism is medium, The ACMP outperforms both Niagara and P6-Tile Note that the Tile-Large approach never outperforms the ACMP, However Niagara does because it has a higher throughput.

Chalmers microprocessor sept 2010 Chalmers microprocessor sept 2010 Presentation Transcript