Today, a single-port embedded memory can perform one memory operation per clock cycle. Therefore embedded memory performance has traditionally been closely tied to memory clock speed and ultimately limited by it. Because embedded memory IP providers (responding to application needs for more on-chip memory) had to make design trade-offs early on that favored high density over high speed, memory clock speeds lag behind processor clock speeds. With its Algorithmic Memory technology, Memoir Systems tackles a fundamental question --- can we increase memory performance without increasing memory clock speeds? Historically, circuits and advances in lithography have been used at every generation as the approach to enhance memory performance. Unfortunately these approaches alone do not give enough performance improvement, and are not keeping up with applications that require higher memory performance. The problem is we have limited our thinking about embedded memories to a purely circuit and process oriented approach. Thus, our focus has been on maximizing the number of transistors on a chip and cranking up the clock speed. This has been successful up to a point, but as transistors approach atomic dimension, we are running into fundamental physical barriers. For this reason, we need to rethink our approach to embedded memory design.
Algorithmic memory technology increases the density (lowers area) of physical memories. This also reduces the leakage power consumption.
Algorithmic memory technology allows system designers to treat memory performance as a configurable entity with its own set of tradeoffs with respect to speed, area and power.
AlgorithmicMemories can be generated from a small set of base physical memories and provide a broad portfolio of customized memories with any combination of read and write interfaces.
An algorithmic memory synthesis platform can analyze and estimate the resulting area, power and speed of custom memory configurations in seconds, and generate it in a matter of days.
OrangeApplications??Compare sizes area/power
Logic is scan insertedScan chain way to test normal logicFlops are scan chain scan enabled flops
Algorithmic Memory Increases Memory Performance by an Order of Magnitude
Algorithmic Memory Increases MemoryPerformance By an Order of Magnitude Sundar Iyer Co-Founder & CTO Memoir Systems Track F, Lecture 2: Intellectual Property for SoC & Cores May 2, 2012
Problem: Processor-Embedded Memory Performance Gap Performance degradation can be more significant more significant and is getting worse! Processor Embedded Memory Performance GapNormalized Growth *Source: Hennessy and Patterson, 5th Edition May 2, 2012
Why is Embedded Memory Slow? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15clkread One operation peraddr A B C D E F G H memory clock cycledata A B C D E F G H How can we increase memory performance without increasing memory clock speed? May 2, 2012
Solution Overview 2X Performance for ~15% area overhead Any Embedded Physical MemoryRTL Based: No Circuit or Simultaneous Accesses to the 1P 1P 1P 1P Layout changes same Address, Row, Column, or Bank (no exceptions) 1P 1P 1P 1P 1P 1P 1P 1P Extra Memory Algorithmic Memory Exhaustively Formally Verified Data Data Addr Addr Addr Addr Data Data Each Port can access the & Transparent to end-user entire Memory Address Using Physical 1-Port Memory to Build any Multiport Functionality May 2, 2012
Usage & Adoption Easily Interface 128 Width • Presents standard memory interface • Adds no clock cycle latency • Used as a drop-in replacement 8K Depth Physical Memory Readily Integrate • Fits seamlessly in SoC design flow Memoir IP IP Memoir • Used in SoCs - ASICs, ASSPs, GPPs A D A D A D A D Rapidly Implement Identical Pinout to Standard Memory • Supports any process, node or foundry May 2, 2012
Increases Density Denser Physical 1P Memory Algorithmic 2P Memory Physical 2P Memory Normalized for 1P = 1 Mb/mm2 May 2, 2012
Increases Density Normalized for 1P = 1 Mb/mm2 May 2, 2012
Reduces Total Power Based on 40nm example May 2, 2012
Reduces Total Power Based on 40nm example May 2, 2012
Configurable Performance Performance (MOPS) Higher performance algorithmic memories 4P Higher density 2P algorithmic memories Memory Density (Mb/mm2) Physical Memory Power efficientalgorithmic memories Higher Performance Algorithmic Memory Algorithmic 2P SP SP Area Efficient Algorithmic Memory Power Efficient Algorithmic MemoryPower Efficiency (Mb/mW) May 2, 2012
Increases Portfolio of Available Memories 1R1W 1R/4W 2R/1W 4R/1W 1R/2W 3R1W 1RW 2RW 2R2W 1R2W 1R3W 2R1W 3R/1W Physical Memory Algorithmic Memory May 2, 2012
Rapid Memory Analysis & Generation 2X 3X Acceleration 4X Push Button Analysis # Read Ports # Write Generate Memory Real-time Algorithmic # Width Specify Capacity Feed Inputs GUI SYN GEN CHK Memory # Depth Memory … Feedback Reduced Latency Standard Power Optimization SRAM Register File Area eDRAM Standard Cell Library & Building Blocks May 2, 2012
Multiport Memory Usages Descriptor and Free Lists, Ingress Buffers3R1W L2 MAC Lookups, Shared Caches 2R1W1R2W Descriptor and Free Lists, Egress Buffers Cache Coherency Arrays for L2/L3 Caches1R3W2R2W Netflow, Counters State Tables, Linked Lists1R1W4Ror1W Data and Tag Arrays for L2, L3 Caches Route Lookup Tables3Ror1W ACL Tables2Ror1W May 2, 2012
Exhaustive Formal Verification Reduces Risk Independently Verify Logic SRAM BIST Wrapper • Mathematically proven algorithms • Formally, exhaustively verified RTL SCAN Physical Memory Separately Test Physical Memories BIST • Supports 3rd party DFT methodology Algorithmic Memory Memoir IP • Transparent customer BIST, BISR • Doesn’t need complex multiport BIST A D A D A D A D May 2, 2012
Tier-1 OEM Evaluation – Performance, Area and Power Benefits Large ASIC Algorithmic Memory Solution 4X MOPS Memories24mm 21mm 24mm 21mm Area 576 mm2 Area 441 mm2 • 800 Mb of total memory • Area Savings of 135 mm2 (23% die) • 165 Memory Instances • 136 Memory Instances Accelerated Versatile memories required Power Savings > 12W • 4R/1W, 2R1W, 1R2W memories 4X MOPS for select memories May 2, 2012
Summary1. Increases Port and Clock Performance2. Lowers Area and Power3. Easy Interface, Integration and Implementation4. Creates Versatile Memory Portfolio5. Reduces Cost, Risk and Time to Market Algorithmic Memories are not a panacea, but present a new solution to alleviate the processor embedded memory performance gap May 2, 2012
Q&A Sundar Iyersundaes@memoir-systems.com Come Visit Our Booth! Memoir Systems May 2, 2012