On Support Vector Machines
Though a term support vector machines sounds pretty fancy, the basic idea behind them is pretty easy. For simplicity, let us assume that we have a binary classification problem with a dataset $\mathcal{D} = \{(x^{(i)},y^{(i)}\}$ where $x^{(i)}\in \mathbb{R}^n$ and $y^{(i)}\in \pm1$. Our aim is to find somehow 'optimal' hyperplane, high-dimensional version of plane, that separates positive and negative datapoints sufficiently well. To understand how do we find that optimal hyperplane let us transform our setting to $n$-dimensional Euclidean space.\\ Here we can define an operation of dot product between any two vectors, $\cdot:\mathbb{R}^n\times \mathbb{R}^n\to \mathbb{R}$. This dot product is defined as follows: \[ x\cdot y = \sum_{i=1}^n x_iy_i \] where $x = (x_1,x_2,\ldots,x_n)$ and $y=(y_1,y_2,\ldots,y_n)$. Let us fix some vector $w$ and look at the dot product of the type: \[ w\cdot x, \text{ where } \Vert x \Vert =1 \] One might ask when is such product is m...