This document discusses deploying neural networks on microcontrollers. It notes that deploying on the edge is best for low latency, privacy, and network constraints. It describes training models at the edge using federated learning and inferencing at the edge. The document also discusses quantizing models to reduce size through techniques like post-training quantization, weight sharing, and pruning. It provides examples of different quantization methods and their effects on size and accuracy reduction. Benchmark results of Tiny MLPerf for edge devices are also referenced.