For the previous article I needed a quick way to figure out if two sets of points are linearly separable. But for crying out loud I could not find a simple and efficient implementation for this task. Except for the perceptron and SVM – both are sub-optimal when you just want to test for linear separability. The perceptron is guaranteed to finish off with a happy ending – if feasible – but it can take quite a while. And SVMs are designed for soft-margin classification which means that they might settle for a not separating plane and also they maximize the margin around the separating plane – which is wasted computational effort.

The efficient way to get the job done is by applying linear programming (LP). That means representing the question “Is it possible to fit a hyper-plane between two sets of points?” with a number of inequalities (that make up a convex area). I’m going to give a quick walk through for the math to make the idea plausible – but this text is more describing an introductory example and not an introduction to LP itself. For solving the linear program I will use Rglpk which provides a high level interface to the GNU Linear Programming Kit (GLPK) – and of course has been co-crafted by the man himself – Kurt Hornik – who is also involved with kernlab and party – thank you, Prof. Hornik and keep up the good work!

# The Gory Details

Let’s say we have two sets and of points in :

And we want to know if there is a hyper-plane in which separates and then we can formulate the necessary condition with two symmetrical inequalities:

A and an exist, such that we can say for all (1) and for all (2).

This is because a hyper-plane in can be defined as a vector

and the points on either side of it can be distinguished with above stated inequalities.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
N <- 100 g <- expand.grid(x=0:N/N,y=0:N/N) # definition of the hyper plane h1 <- 3 h2 <- -4 beta <- -1.3 # points on either side g$col <- ifelse(h1 * g$x + h2 * g$y > beta, "cornflowerblue","darkolivegreen3") # roughly on the hyper plane g$col <- ifelse(abs(h1 * g$x + h2 * g$y - beta) < 2/N, "red", g$col) plot(g$x, g$y, col=g$col, pch=16, cex=.5, xlab="x", ylab="y", main="h(x,y) = 3 * x + (-4) * y + 1.3 = 0") |

The conditions of a linear program are usually stated as a number of “weakly smaller than” inequalities. So lets transform (1) and (2) appropriately:

The conditions and can be written as and . This is because we are dealing with finite sets and , so if we have a separating plane, then we can always fit in an such that and . If we now multiply both inequalities with , then we just end up with a different formulation for the same plane . The first inequality we additionally multiply with to turn into – and we and now we have:

(3)

(4)

Okay great – we’re almost there – now let’s get all the variables on the left hand side:

(5)

(6)

These hyper-plane conditions have to be true for all points. All points in have to fulfil (5) and all points in have to fulfil (6). Then this set of inequalities describes the convex set in of all possible separating hyper planes . Usually the description and purpose of a linear program does not stop at this point and the set of feasible solutions is used to maximize an objective function. In our case such an objective function might be introduced to maximize the distance of the plane from the points. But the article is long enough already and our objective is to just find **a** plane and not the the best plane. Which is why our objective function is going to be simply the

So now we formulate (5) and (6) in matrix notation because this is how LP solvers expect the program description to be fed to them – we get:

with

, ,

# Example with Rglpk in two dimensions

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
library(Rglpk) dim <- 2 N1 <- 3 N2 <- 3 # the points of sets A and B P <- matrix( runif(dim*N1 + dim*N2,0,1), ncol=dim,byrow=T ) # the matrix A defining the lhs of the conditions A <- cbind(P * c(rep(-1,N1),rep(1,N2)), c(rep(1,N1),rep(-1,N2))) # the objective function - no optimization necessary obj <- rep(0, dim+1) # the vector b defining the rhs of the conditions b <- rep(-1, N1+N2) # by default GLPK assums positive boundaries for the # variables. but we need the full set of real numbers. bounds <- list( lower = list(ind = 1:(dim+1), val = rep(-Inf, dim+1)), upper = list(ind = 1:(dim+1), val = rep(Inf, dim+1)) ) # solving the linear program s <- Rglpk_solve_LP(obj, A, rep("<=", N1+N2), b, bounds=bounds) plot(P,col=c(rep("red",N1),rep("blue",N2)), xlab="x", ylab="y", cex=1, pch=16, xlim=c(0,1), ylim=c(0,1)) # status 0 means that a solution was found if(s$status == 0) { h1 = s$solution[1] h2 = s$solution[2] beta = s$solution[3] # drawing the separating line if(h2 != 0) { abline(beta/h2,-h1/h2) } else { abline(v=-beta/h1) } } else { cat("Not linearly separable.") } |

In case you are wondering how I managed to include all those pretty pretty math formulas in this post – I am using the QuickLaTeX WordPress plug-in and I must say I really like the result. In previous posts I used a LaTeX web editor and then included the rendered formulas as an image.

(original article published on www.joyofdata.de)

Could you explain the pros and cons of this method vs. using LDA?

As far as I understand LDA, it is a method to statistically model the relationship between independent and dependent variables. That renders it useful for a different use case as in this article the focus is just on determining separability. Further insight into the model necessarily comes with an increase of computational effort.