Testing for Linear Separability with Linear Programming in R

lin-sep-points-2d-10For the previous article I needed a quick way to figure out if two sets of points are linearly separable. But for crying out loud I could not find a simple and efficient implementation for this task. Except for the perceptron and SVM – both are sub-optimal when you just want to test for linear separability. The perceptron is guaranteed to finish off with a happy ending – if feasible – but it can take quite a while. And SVMs are designed for soft-margin classification which means that they might settle for a not separating plane and also they maximize the margin around the separating plane – which is wasted computational effort.

The efficient way to get the job done is by applying linear programming (LP). That means representing the question “Is it possible to fit a hyper-plane between two sets of points?” with a number of inequalities (that make up a convex area). I’m going to give a quick walk through for the math to make the idea plausible – but this text is more describing an introductory example and not an introduction to LP itself. For solving the linear program I will use Rglpk which provides a high level interface to the GNU Linear Programming Kit (GLPK) – and of course has been co-crafted by the man himself – Kurt Hornik – who is also involved with kernlab and party – thank you, Prof. Hornik and keep up the good work!

The Gory Details

Let’s say we have two sets A and B of points in \mathbb{R}^M:

A = \{a_1, ..., a_{N_1}\} \subset \mathbb{R}^M, B = \{b_1, ..., b_{N_2}\} \subset \mathbb{R}^M

And we want to know if there is a hyper-plane in \mathbb{R}^M which separates A and B then we can formulate the necessary condition with two symmetrical inequalities:

A \beta \in \mathbb{R} and an h \in \mathbb{R}^M exist, such that we can say for all a \in A: h^Ta > \beta (1) and for all b \in B: h^Tb < \beta (2).

This is because a hyper-plane in \mathbb{R}^M can be defined as a vector
(h,\beta) \in \mathbb{R}^{M} \times \mathbb{R} and the points on either side of it can be distinguished with above stated inequalities.

lin-sep-2dThe conditions of a linear program are usually stated as a number of “weakly smaller than” inequalities. So lets transform (1) and (2) appropriately:

The conditions h^Ta > \beta and h^Tb < \beta can be written as h^Ta \geq \beta + 1 and h^Tb \leq \beta - 1. This is because we are dealing with finite sets A and B, so if we have a separating plane, then we can always fit in an \epsilon such that  h^Ta \geq \beta + \epsilon and h^Tb \leq \beta - \epsilon. If we now multiply both inequalities with 1/\epsilon, then we just end up with a different formulation (h / \epsilon, \beta / \epsilon) for the same plane (h, \beta). The first inequality we additionally multiply with -1 to turn \geq into \leq – and we and now we have:

(3) -h^Ta \leq -\beta - 1

(4) h^Tb \leq \beta - 1

Okay great – we’re almost there – now let’s get all the variables on the left hand side:

(5) -h^Ta + \beta \leq -1

(6) h^Tb - \beta \leq -1

These hyper-plane conditions have to be true for all points. All points in A have to fulfil (5) and all points in B have to fulfil (6). Then this set of N_1 + N_2 inequalities describes the convex set in \mathbb{R}^{M+1} of all possible separating hyper planes (h,\beta). Usually the description and purpose of a linear program does not stop at this point and the set of feasible solutions is used to maximize an objective function. In our case such an objective function might be introduced to maximize the distance of the plane from the points. But the article is long enough already and our objective is to just find a plane and not the the best plane. Which is why our objective function is going to be simply the 0 \in \mathbb{R}^{M+1}

So now we formulate (5) and (6) in matrix notation because this is how LP solvers expect the program description to be fed to them – we get:

A\cdot\tilde{h} \leq b with N := N_1 + N_2

A = \begin{pmatrix} -a_1^T & 1 \\ ... & ... \\ -a_{N_1}^T & 1 \\ b_1^T & -1 \\ ... & ... \\ b_{N_2}^T & -1 \end{pmatrix} \in \mathbb{R}^{N\times(M+1)}, \tilde{h} = \begin{pmatrix} h_1 \\ ... \\ h_M \\ \beta \end{pmatrix} \in \mathbb{R}^{M+1}, b = \begin{pmatrix} -1 \\ ... \\ -1 \\ -1 \\ ... \\ -1 \end{pmatrix} \in \mathbb{R}^{N}

Example with Rglpk in two dimensions

In case you are wondering how I managed to include all those pretty pretty math formulas in this post – I am using the QuickLaTeX WordPress plug-in and I must say I really like the result. In previous posts I used a LaTeX web editor and then included the rendered formulas as an image.

(original article published on www.joyofdata.de)

3 thoughts on “Testing for Linear Separability with Linear Programming in R

  1. I think there is a mistake in line 44 of the code….. the slope should be h1/h2 and not negative

  2. Could you explain the pros and cons of this method vs. using LDA?

    • As far as I understand LDA, it is a method to statistically model the relationship between independent and dependent variables. That renders it useful for a different use case as in this article the focus is just on determining separability. Further insight into the model necessarily comes with an increase of computational effort.

Comments are closed.