Successfully reported this slideshow.
Your SlideShare is downloading. ×

“The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 19 Ad

“The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix

Download to read offline

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/the-flex-logix-inferx-x1-pairing-software-and-hardware-to-enable-edge-machine-learning-a-presentation-from-flex-logix/

Randy Allen, Vice President of Software at Flex Logix, presents the “Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning” tutorial at the May 2022 Embedded Vision Summit.

Machine learning is not new—the term was first coined in 1952. Its explosive growth over the past decade has not been the result of technical breakthroughs, but instead of available compute power. Similarly, its future potential will be determined by the amount of compute power that can be applied to an ML problem within the constraints of allowable power, area and cost. The key to increasing computation power is properly pairing hardware and software to effectively exploit parallelism.

The Flex Logix InferX X1 accelerator is a system designed to fully utilize parallelism by teaming software with parallel hardware that is capable of being reconfigured based on the specific algorithm requirements. In this talk, Allen explores the hardware architecture of the InferX X1, the associated programming tools, and how the two work together to form a cost-effective and power-efficient machine learning system.

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/the-flex-logix-inferx-x1-pairing-software-and-hardware-to-enable-edge-machine-learning-a-presentation-from-flex-logix/

Randy Allen, Vice President of Software at Flex Logix, presents the “Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning” tutorial at the May 2022 Embedded Vision Summit.

Machine learning is not new—the term was first coined in 1952. Its explosive growth over the past decade has not been the result of technical breakthroughs, but instead of available compute power. Similarly, its future potential will be determined by the amount of compute power that can be applied to an ML problem within the constraints of allowable power, area and cost. The key to increasing computation power is properly pairing hardware and software to effectively exploit parallelism.

The Flex Logix InferX X1 accelerator is a system designed to fully utilize parallelism by teaming software with parallel hardware that is capable of being reconfigured based on the specific algorithm requirements. In this talk, Allen explores the hardware architecture of the InferX X1, the associated programming tools, and how the two work together to form a cost-effective and power-efficient machine learning system.

Advertisement
Advertisement

More Related Content

Similar to “The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix (20)

More from Edge AI and Vision Alliance (20)

Advertisement

Recently uploaded (20)

“The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix

  1. 1. © 2022 Flex Logix Technologies 1 The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning Randy Allen Vice President of Software Flex Logix Technologies
  2. 2. © 2022 Flex Logix Technologies History of ML/AI 2 1950 1960 1970 1980 1990 2000 2010 2020 KNOWLEDGE DRIVEN 1952: ML coined DATA DRIVEN Winter Winter Boom AI ML ML Winter Birth Symbolic AI Tedious engineering Deep learning Legitimized Minis Killer Micros Ubiquitous GPUs 1965 Neural Networks 1997: Deep Blue 2006: Deep Learning
  3. 3. © 2022 Flex Logix Technologies “ “It’s not possible.” 3 On hitting power and thermal barriers, the hardware world threw up a Hail Mary of “parallelism” hoping someone in the software world will run underneath and catch it Dave Patterson, BERKELEY ARCHITECTURE DUDE When we start talking about parallelism and ease of use of truly parallel computers, we’re talking about a problem that’s as hard as any that computer science has faced… I would be panicked if I were in the industry John Hennessy, STANFORD COMPILER DUDE “ “No, it’s necessary.”
  4. 4. © 2022 Flex Logix Technologies Not a new challenge 4 “Dead Parallel Computer Society” Convex, Alliant, Multiflow, Encore, FPS, Inmos, Kendall Square, MasPar, nCUBE, Sequent, SGI, Thinking Machines …. RIP They Tried and Died
  5. 5. © 2022 Flex Logix Technologies Matrix multiplication: a simple example Scalar Vector for(i=0; i<n; i++) for(j=0; j<n; j++) for(k=0; k<n; k++) c[i][j] += a[i][k] * b[k][j]; for(i=0; i<n; i++) for(js=0; js<n; js+=s) for(k=0; k<n; k++) for(j=js; j<js+s; j++) //vector c[i][j] += a[i][k] * b[k][j]; 5
  6. 6. © 2022 Flex Logix Technologies Matrix multiplication in parallel Scalar: Cache optimized for(ib=0; ib<n; ib+=bi) for(jb=0; jb<n; jb+=bj) for(k=0; k<n; k++) for(i=ib; i<ib+bi; i++) for(j=jb; j < jb+bj; j++) c[i][j] += a[i][k] * b[k][j]; 6
  7. 7. © 2022 Flex Logix Technologies Systolic array for(i=0; i<n; i++) for (j=i; i>0; i–){ for (k=i; i>1; k–) { ta(j,k) = ta(j,k-1); tb(k,j) = tb(k-1,j); } } it = i; for (j=0; j<i; j++){ ta(j,i) = a(j,t); tb(1,j) = b(t,j); it = it - 1; } for (j=0; j<i; j++) for k=0; k<i; k++) c(j,k) += ta(j,k) * tb(j,k); 7
  8. 8. © 2022 Flex Logix Technologies Achieving balance Distributed Memory Shared Memory Number of processors Delivered Performance One Many Efficiency Parallelism Easy to Build Hard to Program Easy to Program Hard to Build 8
  9. 9. © 2022 Flex Logix Technologies The real challenge of AI is NOT Packing more processors onto a die than anyone else in the world Creating the fastest processor synchronization in the world Designing a large ({distributed, fetch-and-phi, shared,}) memory system BUT INSTEAD Designing a multiprocessor system with a balance between processors and memory With the right set of specialized instructions for acceleration In conjunction with software that can effectively utilize the system BECAUSE More processors are useful only if the software can compile the net to use them Eliminating the need for synchronization is far more effective than faster synchronization The key to fast processing is reusing data, not fetching it quickly 9
  10. 10. © 2022 Flex Logix Technologies The Flex Logix InferX™ X1 10 • Dynamic TPU Array • Accelerator/Co-processor for host processor • ASIC performance but dynamic to new models • Low power/High performance • Designed for edge (B=1) applications 4MB L3 SRAM eFPGA LPDDR 4 x32 PCIe Gen3/4 x4 4K MAC Units 8MB distributed L2 SRAM
  11. 11. © 2022 Flex Logix Technologies ASIC efficiency with CPU/GPU flexibility 11 2019: Xin Feng, Computer vision algorithms and hardware implementations: A survey InferX position added by Flex Logix InferX Technology • Dynamic Tensor Processing • Highly Silicon Efficient • Programmed via standard AI Model Paradigms (TensorFlow, PyTorch, ONNX) • Layer by Layer reconfiguration in microseconds
  12. 12. © 2022 Flex Logix Technologies Embedded Application + X1 X1 drivers X1 Runtime Embedded App at The Edge Runtime API Neural Net Conv AvgPool MaxPool Concat .ncf InferXDK Software Development Toolkit System level view of software 12 NN Model Framework
  13. 13. © 2022 Flex Logix Technologies InferX compiler: unleashes the power of X1 13 NN FRAGMENT SELECTION OF OPERATORS Add Convolution AvgPool MaxPool Affine Concatenate …. Configure Connect Select L2 1DTPU + 1DTPU L2 1DTPU 1DTPU L2 L2 1DTPU + 1DTPU L2 1DTPU 1DTPU L2 Input Memory MACS Misc Ops Output Memory
  14. 14. © 2022 Flex Logix Technologies INFERX RUNTIME C++ API FUNCTIONS DESCRIPTION ARGS StartInference Executes single inference Data_handle InferMulti Executes multiple inferences times WaitForInferenceReady Waits for inference to finish { } GetInferOutputData Retrieves inference results data_out SetModel Loads model into memory filename SetData Established Data data Initialize Initialize X1 HW { } Runtime software API 14 Simple API used to couple host application to the inference processing that is offloaded to the InferX X1 co- processor Embedded Application + X1 X1 drivers X1 Runtime Embedded App at The Edge Runtime API
  15. 15. © 2022 Flex Logix Technologies Assuring high quality performance 15 • Input/Output Quantization • Activation Functions • Padding 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 D 1 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 3 Dept h 2 51 2 Dept h 2 51 2 Dept h print(“Hell oWorld”) Cross Functional Exploration 32 2 3 Depth HW configuration depends on Conv2D shape SW HW Apps Input Tensor Input Tensor 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 D 1 Depth H W D1 3 N >> 128 608 608 Capturing All Attributes Random Weight Tensor
  16. 16. © 2022 Flex Logix Technologies Accelerating AI for embedded processing 16 • Retains customer programming model ○ Does not force customer to move to Arm or Linux with SoC approach ○ Minimize time to market and development cost ○ Only need Add-in PCIe or M.2 card plus drivers • Reduces dependence on rapid interface technology evolution ○ Accelerator uses standard PCIe, DRAM ○ Leave Camera and Display interface selection/evolution to Host • Enables R&D and silicon focus on acceleration processing ○ Dedicate more silicon to high leverage acceleration logic ○ Reduced engineering complexity CPU PCIe G4x4 8GB DDR4 4GB LPDDR4 MIPI to USB USB 3 MIPI CSI or USB Cameras AI Accelerator RGMII Ethernet USB 3 eDP or LCD
  17. 17. © 2022 Flex Logix Technologies Minimal effort required to go from model to inference 17 Customer Input InferX Workflow Compiler Neural net Model Inference Results Bit file X1 Hardware Data Stream Runtime Device Control Application Layer STEP 1 STEP 2 STEP 3
  18. 18. © 2022 Flex Logix Technologies Can help you gain a competitive advantage Putting it all together 18 Inferx @flex-logix.com Contact Us Product Impact • Optimized solutions around your use cases Compiler SW • Robust • Maximum performance • Minimal user intervention Hardware • Future proofed • GPU flexibility with ASIC performance • Low power Runtime SW • The InferX Runtime API is streamlined for ease-of-use
  19. 19. © 2022 Flex Logix Technologies Thank you Visit us on the expo floor for more information 19

×