Support Vector Machines (SVM) originated in 1992 and gained popularity due to their effectiveness in tasks like handwritten digit recognition, achieving comparable error rates to neural networks. The core concept revolves around finding a decision boundary that maximizes the margin between classes, utilizing techniques like the kernel trick to handle non-linear decision boundaries effectively. SVM models can be tailored through various kernel functions and their performance is influenced by tuning the penalty parameter to mitigate overfitting.