Algorithmic Memory Increases MemoryPerformance By an Order of Magnitude                        Sundar Iyer              Co...
Problem: Processor-Embedded Memory Performance Gap                                     Performance degradation can be     ...
Why is Embedded Memory Slow?       1 2       3       4       5       6       7       8       9 10 11 12 13 14 15clkread   ...
Solution: Algorithmic Memory®= Memory                     Macros + Algorithms           Physical Memory                   ...
Solution Overview                              2X Performance for ~15% area overhead                                     A...
Usage & Adoption Easily Interface                                                                   128 Width  • Presents...
Increases Density Denser Physical   1P Memory   Algorithmic   2P Memory     Physical    2P Memory                         ...
Increases Density                     Normalized for 1P = 1 Mb/mm2       May 2, 2012
Reduces Total Power                      Based on 40nm example        May 2, 2012
Reduces Total Power                      Based on 40nm example        May 2, 2012
Configurable Performance                       Performance                          (MOPS)            Higher performance  ...
Increases Portfolio of Available Memories                            1R1W                    1R/4W           2R/1W        ...
Rapid Memory Analysis & Generation    2X    3X             Acceleration    4X                                  Push Button...
Multiport Memory Usages             Descriptor and Free Lists, Ingress Buffers3R1W         L2 MAC Lookups, Shared Caches...
Exhaustive Formal Verification Reduces Risk Independently Verify Logic                                                   ...
Tier-1 OEM Evaluation                               – Performance, Area and Power Benefits                  Large ASIC    ...
Summary1.   Increases Port and Clock Performance2.   Lowers Area and Power3.   Easy Interface, Integration and Implementat...
Q&A        Sundar Iyersundaes@memoir-systems.com    Come Visit Our Booth!      Memoir Systems           May 2, 2012
Upcoming SlideShare
Loading in …5
×

Algorithmic Memory Increases Memory Performance by an Order of Magnitude

1,708 views

Published on

Sundar
Iyer, Memoir Systems

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,708
On SlideShare
0
From Embeds
0
Number of Embeds
136
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Today, a single-port embedded memory can perform one memory operation per clock cycle. Therefore embedded memory performance has traditionally been closely tied to memory clock speed and ultimately limited by it. Because embedded memory IP providers (responding to application needs for more on-chip memory) had to make design trade-offs early on that favored high density over high speed, memory clock speeds lag behind processor clock speeds. With its Algorithmic Memory technology, Memoir Systems tackles a fundamental question --- can we increase memory performance without increasing memory clock speeds? Historically, circuits and advances in lithography have been used at every generation as the approach to enhance memory performance. Unfortunately these approaches alone do not give enough performance improvement, and are not keeping up with applications that require higher memory performance. The problem is we have limited our thinking about embedded memories to a purely circuit and process oriented approach. Thus, our focus has been on maximizing the number of transistors on a chip and cranking up the clock speed. This has been successful up to a point, but as transistors approach atomic dimension, we are running into fundamental physical barriers. For this reason, we need to rethink our approach to embedded memory design.
  • Algorithmic memory technology increases the density (lowers area) of physical memories. This also reduces the leakage power consumption.
  • Algorithmic memory technology allows system designers to treat memory performance as a configurable entity with its own set of tradeoffs with respect to speed, area and power.
  • AlgorithmicMemories can be generated from a small set of base physical memories and provide a broad portfolio of customized memories with any combination of read and write interfaces.
  • An algorithmic memory synthesis platform can analyze and estimate the resulting area, power and speed of custom memory configurations in seconds, and generate it in a matter of days.
  • OrangeApplications??Compare sizes area/power
  • Logic is scan insertedScan chain way to test normal logicFlops are scan chain scan enabled flops
  • Algorithmic Memory Increases Memory Performance by an Order of Magnitude

    1. 1. Algorithmic Memory Increases MemoryPerformance By an Order of Magnitude Sundar Iyer Co-Founder & CTO Memoir Systems Track F, Lecture 2: Intellectual Property for SoC & Cores May 2, 2012
    2. 2. Problem: Processor-Embedded Memory Performance Gap Performance degradation can be more significant more significant and is getting worse! Processor Embedded Memory Performance GapNormalized Growth *Source: Hennessy and Patterson, 5th Edition May 2, 2012
    3. 3. Why is Embedded Memory Slow? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15clkread One operation peraddr A B C D E F G H memory clock cycledata A B C D E F G H How can we increase memory performance without increasing memory clock speed? May 2, 2012
    4. 4. Solution: Algorithmic Memory®= Memory Macros + Algorithms Physical Memory Algorithmic Memory 1P @ 500 MHz 1P @ 500 MHz 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P Extra Memory 1P @ 500 MHz 4P @ 500 MHz Allows 500 Million MOPS1 Allows 2000 Million MOPS (1 Memory Operations Per Second) More Ports, Same Clock May 2, 2012
    5. 5. Solution Overview 2X Performance for ~15% area overhead Any Embedded Physical MemoryRTL Based: No Circuit or Simultaneous Accesses to the 1P 1P 1P 1P Layout changes same Address, Row, Column, or Bank (no exceptions) 1P 1P 1P 1P 1P 1P 1P 1P Extra Memory Algorithmic Memory Exhaustively Formally Verified Data Data Addr Addr Addr Addr Data Data Each Port can access the & Transparent to end-user entire Memory Address Using Physical 1-Port Memory to Build any Multiport Functionality May 2, 2012
    6. 6. Usage & Adoption Easily Interface 128 Width • Presents standard memory interface • Adds no clock cycle latency • Used as a drop-in replacement 8K Depth Physical Memory Readily Integrate • Fits seamlessly in SoC design flow Memoir IP IP Memoir • Used in SoCs - ASICs, ASSPs, GPPs A D A D A D A D Rapidly Implement Identical Pinout to Standard Memory • Supports any process, node or foundry May 2, 2012
    7. 7. Increases Density Denser Physical 1P Memory Algorithmic 2P Memory Physical 2P Memory Normalized for 1P = 1 Mb/mm2 May 2, 2012
    8. 8. Increases Density Normalized for 1P = 1 Mb/mm2 May 2, 2012
    9. 9. Reduces Total Power Based on 40nm example May 2, 2012
    10. 10. Reduces Total Power Based on 40nm example May 2, 2012
    11. 11. Configurable Performance Performance (MOPS) Higher performance algorithmic memories 4P Higher density 2P algorithmic memories Memory Density (Mb/mm2) Physical Memory Power efficientalgorithmic memories Higher Performance Algorithmic Memory Algorithmic 2P SP SP Area Efficient Algorithmic Memory Power Efficient Algorithmic MemoryPower Efficiency (Mb/mW) May 2, 2012
    12. 12. Increases Portfolio of Available Memories 1R1W 1R/4W 2R/1W 4R/1W 1R/2W 3R1W 1RW 2RW 2R2W 1R2W 1R3W 2R1W 3R/1W Physical Memory Algorithmic Memory May 2, 2012
    13. 13. Rapid Memory Analysis & Generation 2X 3X Acceleration 4X Push Button Analysis # Read Ports # Write Generate Memory Real-time Algorithmic # Width Specify Capacity Feed Inputs GUI SYN GEN CHK Memory # Depth Memory … Feedback Reduced Latency Standard Power Optimization SRAM Register File Area eDRAM Standard Cell Library & Building Blocks May 2, 2012
    14. 14. Multiport Memory Usages  Descriptor and Free Lists, Ingress Buffers3R1W  L2 MAC Lookups, Shared Caches 2R1W1R2W  Descriptor and Free Lists, Egress Buffers  Cache Coherency Arrays for L2/L3 Caches1R3W2R2W  Netflow, Counters  State Tables, Linked Lists1R1W4Ror1W  Data and Tag Arrays for L2, L3 Caches  Route Lookup Tables3Ror1W  ACL Tables2Ror1W May 2, 2012
    15. 15. Exhaustive Formal Verification Reduces Risk Independently Verify Logic SRAM BIST Wrapper • Mathematically proven algorithms • Formally, exhaustively verified RTL SCAN Physical Memory Separately Test Physical Memories BIST • Supports 3rd party DFT methodology Algorithmic Memory Memoir IP • Transparent customer BIST, BISR • Doesn’t need complex multiport BIST A D A D A D A D May 2, 2012
    16. 16. Tier-1 OEM Evaluation – Performance, Area and Power Benefits Large ASIC Algorithmic Memory Solution 4X MOPS Memories24mm 21mm 24mm 21mm Area 576 mm2  Area 441 mm2 • 800 Mb of total memory • Area Savings of 135 mm2 (23% die) • 165 Memory Instances • 136 Memory Instances Accelerated Versatile memories required  Power Savings > 12W • 4R/1W, 2R1W, 1R2W memories  4X MOPS for select memories May 2, 2012
    17. 17. Summary1. Increases Port and Clock Performance2. Lowers Area and Power3. Easy Interface, Integration and Implementation4. Creates Versatile Memory Portfolio5. Reduces Cost, Risk and Time to Market Algorithmic Memories are not a panacea, but present a new solution to alleviate the processor embedded memory performance gap May 2, 2012
    18. 18. Q&A Sundar Iyersundaes@memoir-systems.com Come Visit Our Booth! Memoir Systems May 2, 2012

    ×