Neural networks use activation functions to introduce non-linearity. Activation functions determine whether a neuron should be activated or not based on inputs. Non-linear activation functions are needed for neural networks to learn complex patterns from data. Common activation functions include sigmoid, tanh, ReLU, and variants thereof. Convolutional neural networks use successive convolutional and pooling layers for feature extraction, followed by fully connected layers for classification based on the extracted features. LeNet-5 was an early CNN architecture used for digit recognition.