• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Algorithmic Memory Increases Memory Performance by an Order of Magnitude

Algorithmic Memory Increases Memory Performance by an Order of Magnitude




Iyer, Memoir Systems



Total Views
Views on SlideShare
Embed Views



4 Embeds 112

http://www.chiportal.co.il 103
http://www.directrss.co.il 4
http://translate.googleusercontent.com 3
http://chiportal.co.il 2


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Today, a single-port embedded memory can perform one memory operation per clock cycle. Therefore embedded memory performance has traditionally been closely tied to memory clock speed and ultimately limited by it. Because embedded memory IP providers (responding to application needs for more on-chip memory) had to make design trade-offs early on that favored high density over high speed, memory clock speeds lag behind processor clock speeds. With its Algorithmic Memory technology, Memoir Systems tackles a fundamental question --- can we increase memory performance without increasing memory clock speeds? Historically, circuits and advances in lithography have been used at every generation as the approach to enhance memory performance. Unfortunately these approaches alone do not give enough performance improvement, and are not keeping up with applications that require higher memory performance. The problem is we have limited our thinking about embedded memories to a purely circuit and process oriented approach. Thus, our focus has been on maximizing the number of transistors on a chip and cranking up the clock speed. This has been successful up to a point, but as transistors approach atomic dimension, we are running into fundamental physical barriers. For this reason, we need to rethink our approach to embedded memory design.
  • Algorithmic memory technology increases the density (lowers area) of physical memories. This also reduces the leakage power consumption.
  • Algorithmic memory technology allows system designers to treat memory performance as a configurable entity with its own set of tradeoffs with respect to speed, area and power.
  • AlgorithmicMemories can be generated from a small set of base physical memories and provide a broad portfolio of customized memories with any combination of read and write interfaces.
  • An algorithmic memory synthesis platform can analyze and estimate the resulting area, power and speed of custom memory configurations in seconds, and generate it in a matter of days.
  • OrangeApplications??Compare sizes area/power
  • Logic is scan insertedScan chain way to test normal logicFlops are scan chain scan enabled flops

Algorithmic Memory Increases Memory Performance by an Order of Magnitude Algorithmic Memory Increases Memory Performance by an Order of Magnitude Presentation Transcript

  • Algorithmic Memory Increases MemoryPerformance By an Order of Magnitude Sundar Iyer Co-Founder & CTO Memoir Systems Track F, Lecture 2: Intellectual Property for SoC & Cores May 2, 2012
  • Problem: Processor-Embedded Memory Performance Gap Performance degradation can be more significant more significant and is getting worse! Processor Embedded Memory Performance GapNormalized Growth *Source: Hennessy and Patterson, 5th Edition May 2, 2012
  • Why is Embedded Memory Slow? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15clkread One operation peraddr A B C D E F G H memory clock cycledata A B C D E F G H How can we increase memory performance without increasing memory clock speed? May 2, 2012
  • Solution: Algorithmic Memory®= Memory Macros + Algorithms Physical Memory Algorithmic Memory 1P @ 500 MHz 1P @ 500 MHz 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P 1P Extra Memory 1P @ 500 MHz 4P @ 500 MHz Allows 500 Million MOPS1 Allows 2000 Million MOPS (1 Memory Operations Per Second) More Ports, Same Clock May 2, 2012
  • Solution Overview 2X Performance for ~15% area overhead Any Embedded Physical MemoryRTL Based: No Circuit or Simultaneous Accesses to the 1P 1P 1P 1P Layout changes same Address, Row, Column, or Bank (no exceptions) 1P 1P 1P 1P 1P 1P 1P 1P Extra Memory Algorithmic Memory Exhaustively Formally Verified Data Data Addr Addr Addr Addr Data Data Each Port can access the & Transparent to end-user entire Memory Address Using Physical 1-Port Memory to Build any Multiport Functionality May 2, 2012
  • Usage & Adoption Easily Interface 128 Width • Presents standard memory interface • Adds no clock cycle latency • Used as a drop-in replacement 8K Depth Physical Memory Readily Integrate • Fits seamlessly in SoC design flow Memoir IP IP Memoir • Used in SoCs - ASICs, ASSPs, GPPs A D A D A D A D Rapidly Implement Identical Pinout to Standard Memory • Supports any process, node or foundry May 2, 2012
  • Increases Density Denser Physical 1P Memory Algorithmic 2P Memory Physical 2P Memory Normalized for 1P = 1 Mb/mm2 May 2, 2012
  • Increases Density Normalized for 1P = 1 Mb/mm2 May 2, 2012
  • Reduces Total Power Based on 40nm example May 2, 2012
  • Reduces Total Power Based on 40nm example May 2, 2012
  • Configurable Performance Performance (MOPS) Higher performance algorithmic memories 4P Higher density 2P algorithmic memories Memory Density (Mb/mm2) Physical Memory Power efficientalgorithmic memories Higher Performance Algorithmic Memory Algorithmic 2P SP SP Area Efficient Algorithmic Memory Power Efficient Algorithmic MemoryPower Efficiency (Mb/mW) May 2, 2012
  • Increases Portfolio of Available Memories 1R1W 1R/4W 2R/1W 4R/1W 1R/2W 3R1W 1RW 2RW 2R2W 1R2W 1R3W 2R1W 3R/1W Physical Memory Algorithmic Memory May 2, 2012
  • Rapid Memory Analysis & Generation 2X 3X Acceleration 4X Push Button Analysis # Read Ports # Write Generate Memory Real-time Algorithmic # Width Specify Capacity Feed Inputs GUI SYN GEN CHK Memory # Depth Memory … Feedback Reduced Latency Standard Power Optimization SRAM Register File Area eDRAM Standard Cell Library & Building Blocks May 2, 2012
  • Multiport Memory Usages  Descriptor and Free Lists, Ingress Buffers3R1W  L2 MAC Lookups, Shared Caches 2R1W1R2W  Descriptor and Free Lists, Egress Buffers  Cache Coherency Arrays for L2/L3 Caches1R3W2R2W  Netflow, Counters  State Tables, Linked Lists1R1W4Ror1W  Data and Tag Arrays for L2, L3 Caches  Route Lookup Tables3Ror1W  ACL Tables2Ror1W May 2, 2012
  • Exhaustive Formal Verification Reduces Risk Independently Verify Logic SRAM BIST Wrapper • Mathematically proven algorithms • Formally, exhaustively verified RTL SCAN Physical Memory Separately Test Physical Memories BIST • Supports 3rd party DFT methodology Algorithmic Memory Memoir IP • Transparent customer BIST, BISR • Doesn’t need complex multiport BIST A D A D A D A D May 2, 2012
  • Tier-1 OEM Evaluation – Performance, Area and Power Benefits Large ASIC Algorithmic Memory Solution 4X MOPS Memories24mm 21mm 24mm 21mm Area 576 mm2  Area 441 mm2 • 800 Mb of total memory • Area Savings of 135 mm2 (23% die) • 165 Memory Instances • 136 Memory Instances Accelerated Versatile memories required  Power Savings > 12W • 4R/1W, 2R1W, 1R2W memories  4X MOPS for select memories May 2, 2012
  • Summary1. Increases Port and Clock Performance2. Lowers Area and Power3. Easy Interface, Integration and Implementation4. Creates Versatile Memory Portfolio5. Reduces Cost, Risk and Time to Market Algorithmic Memories are not a panacea, but present a new solution to alleviate the processor embedded memory performance gap May 2, 2012
  • Q&A Sundar Iyersundaes@memoir-systems.com Come Visit Our Booth! Memoir Systems May 2, 2012