1) Support vector machines (SVMs) aim to find the optimal separating hyperplane that maximizes the margin between two classes of data points.
2) SVMs can be extended to non-linearly separable data using kernels to project the data into a higher dimensional feature space. Common kernels include polynomial and Gaussian radial basis function kernels.
3) The dual formulation of SVMs involves solving a quadratic programming problem to determine the support vectors, which lie closest to the separating hyperplane. These support vectors are then used to define the optimal hyperplane.