Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Memory-centric Hardware Acceleration for Machine Intelligence," a Presentation from Crossbar


Published on

For the full video of this presentation, please visit:

For more information about embedded vision, please visit:

Sylvain Dubois, Vice President of Business Development and Marketing at Crossbar, presents the "Memory-centric Hardware Acceleration for Machine Intelligence" tutorial at the May 2019 Embedded Vision Summit.

Even the most advanced AI chip architectures suffer from performance and energy efficiency limitations caused by the memory bottleneck between computing cores and data. Most state-of-the-art CPUs, GPUs, TPUs and other neural network hardware accelerators are limited by the latency, bandwidth and energy consumed to access data through multiple layers of power-hungry and expensive on-chip caches and external DRAMs. Near-memory computing, based on emerging nonvolatile memory technologies, enables a new range of performance and energy efficiency for machine intelligence.

In this presentation, Dubois introduces innovative and affordable near-memory processing architectures for computer vision and voice recognition, and presents architectural recommendations for edge computing and cloud servers. He also discusses how nonvolatile memory technologies, such as Crossbar Inc.’s ReRAM, can be directly integrated on-chip with dedicated processing cores, enabling new memory-centric computing architectures. The superior characteristics of ReRAM over legacy nonvolatile memory technologies help to address the performance and energy efficiency demands of machine intelligence at the edge and in the data center.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

"Memory-centric Hardware Acceleration for Machine Intelligence," a Presentation from Crossbar

  1. 1. © 2019 CROSSBAR - Memory-centric Hardware Acceleration for Machine Intelligence Sylvain Dubois – VP Business Development CROSSBAR May 22nd,2019
  2. 2. © 2019 CROSSBAR - • Uploading, processing and downloading from cloud takes time • Transmitting data burns energy • Some apps cannot rely on wireless connection • Data less exposed if processed locally BATTERY LIFE PERFORMANCE RELIABILITY SECURITY & PRIVACY >37B IoT semiconductor chips in 2018 CLOUD EDGE Pervasive Machine Intelligence to the EDGE 2
  3. 3. © 2019 CROSSBAR - The Challenge of AI Moving data takes time and burns energy 3 • In-processor Crossbar ReRAM: <0.5pJ/bit • In-processor SRAM: • 6pJ/bit for 8Mbit → 47pJ/bit for 64Mbit • In-package HBM DRAM: 64pJ/Byte • DDR4 DIMMs: 320 pJ/Byte COMPUTING (volatile) BIG DATA (non-volatile) GAP Solution: Bring data closer to compute Read latency Read energy AI is about energy-efficient data access
  4. 4. © 2019 CROSSBAR - Memory Technologies Trends 4 DRAM DRAM-DDR SRAM SRAM MRAM MRAM-F ReRAM ReRAM(1T1R) PCM PCM (3D-XP) NAND 3D-NAND (QLC) HBM 3D-SRAM MRAM-S 3D-ReRAM(1TnR) STORAGE STORAGE CLASS MEMORY (SCM) EMBEDDED MEMORY MEMORY VOLATILENON-VOLATILE RD & WR sequential perf (latency/bandwidth) Unlimited RD&WR cycling Cost-effective CMOS integration and silicon area Lower energy / bit than external memory Better performance than external memory RD perf between DRAM and NAND Cheaper than DRAM cost per GB Lowest possible cost per GB Driving factorsToday … near future
  5. 5. © 2019 CROSSBAR - Resistive RAM (ReRAM) Fundamental Technology 5 Program Reading a programmed cell Erase Reading an erased cell LOW RESISTANCE (ON) HIGH RESISTANCE (OFF) Non-volatile way to store information using nano-filaments Integrated in standard CMOS back-end Top Electrode Switching Medium Bottom Electrode
  6. 6. © 2019 CROSSBAR - Crossbar Selector Technology 6 1T1R Selector + ➔ active off ‘1’ ‘0’ ‘1’ ‘0’ off High noise-suppression ratio (> 1 million) with sharp transition High performance NVM High performance & high density NVM > 106 x 1TnR active
  7. 7. © 2019 CROSSBAR - Leading Embedded Memory Technologies 7 STT-MRAM-F (FLASH-LIKE) STT-MRAM-S (SRAM-LIKE) Crossbar ReRAM Physical Mechanism Spin-polarized current 1.5 ON/OFF ratio Spin-polarized current 1.5 ON/OFF ratio Metal atoms storage 1,000 ON/OFF ratio Stack complexity Dedicated product line 10+ layers stack – many materials Very thin layers (hard to manuf) 4 additional masks Dedicated product line 10+ layers stack – many materials Very thin layers (hard to manuf) 4 additional masks Simple stack – Few materials CapEx = new chamber on existing tool Only 3 films and 2 masks Process nodes 28/22 nm 28/22 nm 40 nm to 12 – 7 nm Bit cell configuration 1T-1MTJ 20~40F2 200nm cell pitch limitation 2T-2MTJ 1T-1R 20~40F2 Read access time 25 ns 12.5 ns 15 ns Write access time 200 ns 40 ns 10 us Read energy 1pJ/bit - 0.2 pJ/bit Write current 120 uA/bit - ~60 uA/bit Standby current 200 uA 200 uA 2 uA Data retention > 10 Yr - > 10 Yr Endurance > 1M BER degrades with write cycles > 100M > 1M Operating temp Up to 85 C BER degrades with hot temp Up to 85 C 125 C Magnetic Immunity NO NO YES COST
  8. 8. © 2019 CROSSBAR - Computing & Memory Trends in AI 8 CPU X86 CPU AI training GPU, FPGA Intel Nvidia, Xilinx, Intel(Altera) Graphcore, Kalray, Adapteva, Quest AI inference (cloud) Domain Specific ASICs (e.g. TPU) AI inference (edge) Domain Specific ASICs (ARM, RISC-V based) Google, Amazon, Microsoft, WaveComputing, Bitmain, Horizon-Robotics, Novumind Intel(Movidius/Nervana), Gyrfalcon, Habana, ThinCI, WaveComputing Greenwave, Syntiant, Mythic, Brainchip CLOUDEDGE Key players SRAM + HBM + SCM SRAM + HBM SRAM + ReRAM eFlash, SRAM, MRAM, ReRAM Memory needs ReRAM market (embedded) ReRAM market (SCM) ReRAM entering the AI market on AI inference edge platforms
  9. 9. © 2019 CROSSBAR - 9 Events Video Images Speech Keywords Sensors Any data sources Unstructured datasets Camera, microphones, sensors… Neural Networks Accelerators Features/Vectors Extraction <v1,v2,………> ? For some AI applications, the classification phase can take up to 3X the time than the features extraction with Neural Network Problem: Objects (vectors) Classification in AI • There is a computing-intensive task required after every Neural Network
  10. 10. © 2019 CROSSBAR - Solution: ReRAM for Massive Search Hardware Acceleration 10 • Very wide Non-Volatile memory array • 50 GB/s Read throughput (@8K wide) • Applications • Massive Search • KNN • 1000’s of Distance Calculators above HPM • Classifications • CNN, RNN, NLP Inference • Weights in ReRAM • Embedded MACs • Edge or Cloud • Flexible architecture • Number of instances, 8-bit to binary. • Scalable parallel processing with chip to chip connection • Spare memory enabling Learning at the Edge Simultaneous Processing Deterministic Performance Feature Vector N Legacy R/W Interface ………. Read Bus 8192 bits ReRAM Array Highly Parallel Read Interface Feature Vector 1 Feature Vector N Feature Vector N-1 Feature Vector 2 Computation Engine Computation Engine Computation Engine Computation Engine
  11. 11. © 2019 CROSSBAR - 3+ Billion Objects LookUp Per Second (OLUPS) 11 Object length OLUPS 1024 50,000,000 512 100,000,000 256 200,000,000 128 400,000,000 64 800,000,000 32 1,600,000,000 16 3,200,000,000 OLU/Watt 833,333,333 1,666,666,667 3,333,333,333 6,666,666,667 13,333,333,333 26,666,666,667 53,333,333,333 Scalable to 16 Billion OLUPS per stick - 500,000,000 1,000,000,000 1,500,000,000 2,000,000,000 2,500,000,000 3,000,000,000 3,500,000,000 1632641282565121024 OBJECTLOOKUPPERSECOND OBJECT LENGTH Billions of Object LookUp per Second 50MHz Crossbar XPU 1.5GHz ARM A53 + DDR4 30X improvement compared to ARM+DDR4 at 500X less power! 50MHz Crossbar XPU 1.5GHz ARM A53 + DDR4
  12. 12. © 2019 CROSSBAR - SCAiLE reference platforms 12 AI Innovators Join Forces in Consortium for Development and Commercialization of Best-in-Class AI Computing Platform Demos on Booth #311 Breaking News NEW member !!
  13. 13. © 2019 CROSSBAR - Crossbar ReRAM: Best Memory Technology Enabling AI Based in Santa Clara, CA, U.S.A. $100M+ in raised capital to date Leader in Resistive RAM technology New class of non volatile memory: Metal Filament Resistor Back-end of line Non Volatile Memory: 40nm, 2xnm, 1xnm Patented Technology: 310 filed / 160 issued Applications in Storage Class Memory, AI, FPGAs, eNVM Efficient search and computing with Highly Parallel Memory 13
  14. 14. © 2019 CROSSBAR - Resource Slide 14 More about Crossbar More about SCAiLE Embedded Vision Summit 2019 “Memory-centric hardware acceleration for Machine Learning” By Sylvain DUBOIS VP Business Development & Marketing May 22nd, 2019 – 1 pm Come see our great demos ! Booth #311
  15. 15. © 2019 CROSSBAR - 15 Thank you Sylvain DUBOIS Linkedin: Twitter: @syl20dubois