This document discusses considerations for building embedded deep learning systems and running neural networks on accelerators. It covers choosing hardware that provides needed performance acceleration, standards like OpenCL and SYCL for programming accelerators, and Codeplay's choice of widely adopted standards like these in a layered approach to make AI acceleration adaptable. Kernel fusion techniques and performance impacts of algorithms and components are also addressed.