Linear SVM Classifier

We will use the e1071 package, that contains the svm function to run our codes

#install.packages("e1071")
require(e1071)
Loading required package: e1071

Creating a low dimensionion dataset, and run SVM classifier on it. Variable x - 20 observations across two classes (normally distributed) Variable y - 20 variables, 10 with -1 and 10 with +1

set.seed(10111)
x <- matrix(rnorm(40), 20, 2)
y <- rep(c(-1,1), c(10,10))
#For the indices where y = 1, we move the means from 0 to 1 for the x
x[y==1,] <- x[y==1,] + 1
plot(x, col = y + 3, pch = 19)

Creating a dataframe of the created dataset and running a linear SVM

#Turning y into a factor variable
dat <- data.frame(x, y = as.factor(y))
#Calling SVM function, y as response. Linear kernal. Cost parameter is 10
#We are not standardizing the variables
svm_fit <- svm(y~., data = dat, kernel = "linear", cost = 10, scale = FALSE)
print(svm_fit)

Call:
svm(formula = y ~ ., data = dat, kernel = "linear", cost = 10, 
    scale = FALSE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  10 

Number of Support Vectors:  6

The above linear SVM fit has 6 support vectors (points on the boundary + on the wrong side of the boundary)

Plotting the svm fit. This is a simple plot, that shows the points, the regions and the SV points. But it flips the axis and is a little crude


Creating our own plot to display SVM results

  1. Create our own grid of values from X1 and X2 (columns of the data we created)
  2. Predict the class for each point
  3. Plot them color coded by the class to see the decision boundary
make_grid <- function(x, n = 75){
  grange <- apply(x, 2, range)
  x1 <- seq(from = grange[1,1], to = grange[2,1], length = n)
  x2 <- seq(from = grange[1,2], to = grange[2,2], length = n)
  expand.grid(X1 = x1, X2 = x2)
}

Plotting our prediction and datapoints

  1. Plot all the points on xgrid, then colour by the class so that we can clearly see the decision boundry
  2. Plot the original points on the decision boundry
  3. SVM fit object has an Index component that has the index of the support vectors, which we plot and highlight
plot(xgrid, col = c("red", "blue")[as.numeric(ygrid)], pch = 20, cex = .2)
points(x,col=y+3,pch=19)
points(x[svm_fit$index,],pch=5,cex=2)


The linear coefficients is not directly available in the SVM fit object, but can be derived from the components of the object. This coefficient makes sense for a linear kernel.

We extract the linear coefficients, and then using simple algebra, we include the decision boundary and the two margins.

  1. The SVM fit function has the “coefs” for the support vectors. (It is 0 for the other points as only the support vectors are instrumental in forming the hyperplane)

The formula is present in ESL

Replotting the points and using the coefficients to draw the decision boundaries

The equation is of the form - beta0 + beta1x1 + beta2x2 = 0

Figure out the slope and the intercept of the decision boundary above. abline uses the intercept and the slope

plot(xgrid,col=c("red","blue")[as.numeric(ygrid)],pch=20,cex=.2)
points(x,col=y+3,pch=19)
points(x[svm_fit$index,],pch=5,cex=2)
abline(beta0/beta[2],-beta[1]/beta[2])
abline((beta0-1)/beta[2],-beta[1]/beta[2],lty=2)
abline((beta0+1)/beta[2],-beta[1]/beta[2],lty=2)

We can use cross-validation to select the right cost


Non linear SVM Classifier

Taking example from ESL (mixture data, which is simulated), where the decision boundry was non-linear

names(ESL.mixture)
[1] "x"        "y"        "xnew"     "prob"     "marginal" "px1"     
[7] "px2"      "means"   

Plotting the data and viewing the classes

There is a good amount of overlap in the data.

Creating a dataframe of the data and fitting a SVM with radial kernerl to it

print(radial_svm_fit)

Call:
svm(formula = factor(y) ~ ., data = dat, kernel = "radial", 
    cost = 5, scale = FALSE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  5 

Number of Support Vectors:  103

Creating a grid such as before and making predictions on the grid. The ESL.mixture dataset comes with the grid as a component, and so we can use that directly. Else we will have to use the function created earlier.

xgrid <- expand.grid(X1=px1,X2=px2)
ygrid <- predict(radial_svm_fit, xgrid)

Plotting the grid and the predictions

The decision boundary is not linear.

Creating the actual decision boundary and plotting it using the contour function.

The ESL.mixture has a prob component, that gives the true probability of a +1 vs -1 at every value of the grid.

So, we will plot the contour at 0.5, which is the true decision boundary of the data. And the predict the decision boundary from our model (the function value itself and not the class output)

Plotting the above decision values

plot(xgrid,col=as.numeric(ygrid),pch=20,cex=.2)
points(x,col=y+1,pch=19)
contour(px1,px2,matrix(func,69,99),level=0,add=TRUE)
contour(px1,px2,matrix(prob,69,99),level=.5,add=TRUE,col="blue",lwd=2)


LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQojIyNMaW5lYXIgU1ZNIENsYXNzaWZpZXINCg0KV2Ugd2lsbCB1c2UgdGhlIGUxMDcxIHBhY2thZ2UsIHRoYXQgY29udGFpbnMgdGhlIHN2bSBmdW5jdGlvbiB0byBydW4gb3VyIGNvZGVzDQoNCmBgYHtyfQ0KI2luc3RhbGwucGFja2FnZXMoImUxMDcxIikNCnJlcXVpcmUoZTEwNzEpDQpgYGANCg0KDQpDcmVhdGluZyBhIGxvdyBkaW1lbnNpb25pb24gZGF0YXNldCwgYW5kIHJ1biBTVk0gY2xhc3NpZmllciBvbiBpdC4NClZhcmlhYmxlIHggLSAyMCBvYnNlcnZhdGlvbnMgYWNyb3NzIHR3byBjbGFzc2VzIChub3JtYWxseSBkaXN0cmlidXRlZCkNClZhcmlhYmxlIHkgLSAyMCB2YXJpYWJsZXMsIDEwIHdpdGggLTEgYW5kIDEwIHdpdGggKzENCmBgYHtyfQ0Kc2V0LnNlZWQoMTAxMTEpDQp4IDwtIG1hdHJpeChybm9ybSg0MCksIDIwLCAyKQ0KeSA8LSByZXAoYygtMSwxKSwgYygxMCwxMCkpDQoNCiNGb3IgdGhlIGluZGljZXMgd2hlcmUgeSA9IDEsIHdlIG1vdmUgdGhlIG1lYW5zIGZyb20gMCB0byAxIGZvciB0aGUgeA0KI1RoaXMgaXMgc28gdGhhdCB3ZSBjYW4gZ2V0IHRoZSBkYXRhc2V0IGEgbGl0dGxlIHNlcGFyYXRlZA0KeFt5PT0xLF0gPC0geFt5PT0xLF0gKyAxDQoNCnBsb3QoeCwgY29sID0geSArIDMsIHBjaCA9IDE5KQ0KYGBgDQoNCg0KQ3JlYXRpbmcgYSBkYXRhZnJhbWUgb2YgdGhlIGNyZWF0ZWQgZGF0YXNldCBhbmQgcnVubmluZyBhIGxpbmVhciBTVk0NCg0KYGBge3J9DQoNCiNUdXJuaW5nIHkgaW50byBhIGZhY3RvciB2YXJpYWJsZQ0KZGF0IDwtIGRhdGEuZnJhbWUoeCwgeSA9IGFzLmZhY3Rvcih5KSkNCg0KI0NhbGxpbmcgU1ZNIGZ1bmN0aW9uLCB5IGFzIHJlc3BvbnNlLiBMaW5lYXIga2VybmFsLiBDb3N0IHBhcmFtZXRlciBpcyAxMA0KI1dlIGFyZSBub3Qgc3RhbmRhcmRpemluZyB0aGUgdmFyaWFibGVzDQpzdm1fZml0IDwtIHN2bSh5fi4sIGRhdGEgPSBkYXQsIGtlcm5lbCA9ICJsaW5lYXIiLCBjb3N0ID0gMTAsIHNjYWxlID0gRkFMU0UpDQoNCnByaW50KHN2bV9maXQpDQpgYGANCg0KVGhlIGFib3ZlIGxpbmVhciBTVk0gZml0IGhhcyA2IHN1cHBvcnQgdmVjdG9ycyAocG9pbnRzIG9uIHRoZSBib3VuZGFyeSArIG9uIHRoZSB3cm9uZyBzaWRlIG9mIHRoZSBib3VuZGFyeSkNCg0KUGxvdHRpbmcgdGhlIHN2bSBmaXQuIFRoaXMgaXMgYSBzaW1wbGUgcGxvdCwgdGhhdCBzaG93cyB0aGUgcG9pbnRzLCB0aGUgcmVnaW9ucyBhbmQgdGhlIFNWIHBvaW50cy4gQnV0IGl0IGZsaXBzIHRoZSBheGlzIGFuZCBpcyBhIGxpdHRsZSBjcnVkZQ0KDQpgYGB7cn0NCnBsb3Qoc3ZtX2ZpdCwgZGF0KQ0KYGBgDQoNCi0tLQ0KDQpDcmVhdGluZyBvdXIgb3duIHBsb3QgdG8gZGlzcGxheSBTVk0gcmVzdWx0cw0KDQoxKSBDcmVhdGUgb3VyIG93biBncmlkIG9mIHZhbHVlcyBmcm9tIFgxIGFuZCBYMiAoY29sdW1ucyBvZiB0aGUgZGF0YSB3ZSBjcmVhdGVkKQ0KMikgUHJlZGljdCB0aGUgY2xhc3MgZm9yIGVhY2ggcG9pbnQNCjMpIFBsb3QgdGhlbSBjb2xvciBjb2RlZCBieSB0aGUgY2xhc3MgdG8gc2VlIHRoZSBkZWNpc2lvbiBib3VuZGFyeQ0KDQpgYGB7cn0NCiNGdW5jdGlvbiBjcmVhdGluZyBhIGdyaWQsIHdpdGggbnVtYmVyIG9mIHBvaW50cyBpbiBlYWNoIGRpcmVjdGlvbg0KbWFrZV9ncmlkIDwtIGZ1bmN0aW9uKHgsIG4gPSA3NSl7DQogIGdyYW5nZSA8LSBhcHBseSh4LCAyLCByYW5nZSkNCiAgeDEgPC0gc2VxKGZyb20gPSBncmFuZ2VbMSwxXSwgdG8gPSBncmFuZ2VbMiwxXSwgbGVuZ3RoID0gbikNCiAgeDIgPC0gc2VxKGZyb20gPSBncmFuZ2VbMSwyXSwgdG8gPSBncmFuZ2VbMiwyXSwgbGVuZ3RoID0gbikNCiAgZXhwYW5kLmdyaWQoWDEgPSB4MSwgWDIgPSB4MikNCn0NCg0KeGdyaWQgPC0gbWFrZV9ncmlkKHgpDQp5Z3JpZCA8LSBwcmVkaWN0KHN2bV9maXQsIHhncmlkKQ0KDQpgYGANCg0KDQpQbG90dGluZyBvdXIgcHJlZGljdGlvbiBhbmQgZGF0YXBvaW50cw0KDQoxKSBQbG90IGFsbCB0aGUgcG9pbnRzIG9uIHhncmlkLCB0aGVuIGNvbG91ciBieSB0aGUgY2xhc3Mgc28gdGhhdCB3ZSBjYW4gY2xlYXJseSBzZWUgdGhlIGRlY2lzaW9uIGJvdW5kcnkNCjIpIFBsb3QgdGhlIG9yaWdpbmFsIHBvaW50cyBvbiB0aGUgZGVjaXNpb24gYm91bmRyeQ0KMykgU1ZNIGZpdCBvYmplY3QgaGFzIGFuIEluZGV4IGNvbXBvbmVudCB0aGF0IGhhcyB0aGUgaW5kZXggb2YgdGhlIHN1cHBvcnQgdmVjdG9ycywgd2hpY2ggd2UgcGxvdCBhbmQgaGlnaGxpZ2h0DQoNCmBgYHtyfQ0KcGxvdCh4Z3JpZCwgY29sID0gYygicmVkIiwgImJsdWUiKVthcy5udW1lcmljKHlncmlkKV0sIHBjaCA9IDIwLCBjZXggPSAuMikNCnBvaW50cyh4LGNvbD15KzMscGNoPTE5KQ0KcG9pbnRzKHhbc3ZtX2ZpdCRpbmRleCxdLHBjaD01LGNleD0yKQ0KDQpgYGANCg0KLS0tDQoNClRoZSBsaW5lYXIgY29lZmZpY2llbnRzIGlzIG5vdCBkaXJlY3RseSBhdmFpbGFibGUgaW4gdGhlIFNWTSBmaXQgb2JqZWN0LCBidXQgY2FuIGJlIGRlcml2ZWQgZnJvbSB0aGUgY29tcG9uZW50cyBvZiB0aGUgb2JqZWN0LiBUaGlzIGNvZWZmaWNpZW50IG1ha2VzIHNlbnNlIGZvciBhIGxpbmVhciBrZXJuZWwuDQoNCldlIGV4dHJhY3QgdGhlIGxpbmVhciBjb2VmZmljaWVudHMsIGFuZCB0aGVuIHVzaW5nIHNpbXBsZSBhbGdlYnJhLCB3ZSBpbmNsdWRlIHRoZSBkZWNpc2lvbiBib3VuZGFyeSBhbmQgdGhlIHR3byBtYXJnaW5zLg0KDQoxKSBUaGUgU1ZNIGZpdCBmdW5jdGlvbiBoYXMgdGhlICJjb2VmcyIgZm9yIHRoZSBzdXBwb3J0IHZlY3RvcnMuIChJdCBpcyAwIGZvciB0aGUgb3RoZXIgcG9pbnRzIGFzIG9ubHkgdGhlIHN1cHBvcnQgdmVjdG9ycyBhcmUgaW5zdHJ1bWVudGFsIGluIGZvcm1pbmcgdGhlIGh5cGVycGxhbmUpDQoNClRoZSBmb3JtdWxhIGlzIHByZXNlbnQgaW4gRVNMDQpgYGB7cn0NCmJldGEgPC0gZHJvcCh0KHN2bV9maXQkY29lZnMpICUqJSB4W3N2bV9maXQkaW5kZXgsXSkNCmJldGEwIDwtIHN2bV9maXQkcmhvDQpgYGANCg0KDQpSZXBsb3R0aW5nIHRoZSBwb2ludHMgYW5kIHVzaW5nIHRoZSBjb2VmZmljaWVudHMgdG8gZHJhdyB0aGUgZGVjaXNpb24gYm91bmRhcmllcw0KDQpUaGUgZXF1YXRpb24gaXMgb2YgdGhlIGZvcm0gLQ0KYmV0YTAgKyBiZXRhMSp4MSArIGJldGEyKngyID0gMA0KDQpGaWd1cmUgb3V0IHRoZSBzbG9wZSBhbmQgdGhlIGludGVyY2VwdCBvZiB0aGUgZGVjaXNpb24gYm91bmRhcnkgYWJvdmUuIGFibGluZSB1c2VzIHRoZSBpbnRlcmNlcHQgYW5kIHRoZSBzbG9wZQ0KDQpgYGB7cn0NCnBsb3QoeGdyaWQsY29sPWMoInJlZCIsImJsdWUiKVthcy5udW1lcmljKHlncmlkKV0scGNoPTIwLGNleD0uMikNCnBvaW50cyh4LGNvbD15KzMscGNoPTE5KQ0KcG9pbnRzKHhbc3ZtX2ZpdCRpbmRleCxdLHBjaD01LGNleD0yKQ0KYWJsaW5lKGJldGEwL2JldGFbMl0sLWJldGFbMV0vYmV0YVsyXSkNCmFibGluZSgoYmV0YTAtMSkvYmV0YVsyXSwtYmV0YVsxXS9iZXRhWzJdLGx0eT0yKQ0KYWJsaW5lKChiZXRhMCsxKS9iZXRhWzJdLC1iZXRhWzFdL2JldGFbMl0sbHR5PTIpDQoNCmBgYA0KDQoNCldlIGNhbiB1c2UgY3Jvc3MtdmFsaWRhdGlvbiB0byBzZWxlY3QgdGhlIHJpZ2h0IGNvc3QNCg0KLS0tDQoNCiMjI05vbiBsaW5lYXIgU1ZNIENsYXNzaWZpZXINCg0KVGFraW5nIGV4YW1wbGUgZnJvbSBFU0wgKG1peHR1cmUgZGF0YSwgd2hpY2ggaXMgc2ltdWxhdGVkKSwgd2hlcmUgdGhlIGRlY2lzaW9uIGJvdW5kcnkgd2FzIG5vbi1saW5lYXINCg0KYGBge3J9DQpsb2FkKHVybCgiaHR0cDovL3d3dy5zdGFuZm9yZC5lZHUvfmhhc3RpZS9FbGVtU3RhdExlYXJuL2RhdGFzZXRzL0VTTC5taXh0dXJlLnJkYSIpKQ0KbmFtZXMoRVNMLm1peHR1cmUpDQphdHRhY2goRVNMLm1peHR1cmUpDQpgYGANCg0KUGxvdHRpbmcgdGhlIGRhdGEgYW5kIHZpZXdpbmcgdGhlIGNsYXNzZXMNCg0KYGBge3J9DQpwbG90KHgsIGNvbCA9IHkgKyAxKQ0KYGBgDQoNClRoZXJlIGlzIGEgZ29vZCBhbW91bnQgb2Ygb3ZlcmxhcCBpbiB0aGUgZGF0YS4NCg0KQ3JlYXRpbmcgYSBkYXRhZnJhbWUgb2YgdGhlIGRhdGEgYW5kIGZpdHRpbmcgYSBTVk0gd2l0aCByYWRpYWwga2VybmVybCB0byBpdA0KDQpgYGB7cn0NCmRhdCA8LSBkYXRhLmZyYW1lKHkgPSBmYWN0b3IoeSksIHgpDQpyYWRpYWxfc3ZtX2ZpdCA8LSBzdm0oZmFjdG9yKHkpfi4sIGRhdGEgPSBkYXQsIHNjYWxlID0gRkFMU0UsIGtlcm5lbCA9ICJyYWRpYWwiLCBjb3N0ID0gNSkNCnByaW50KHJhZGlhbF9zdm1fZml0KQ0KYGBgDQoNCkNyZWF0aW5nIGEgZ3JpZCBzdWNoIGFzIGJlZm9yZSBhbmQgbWFraW5nIHByZWRpY3Rpb25zIG9uIHRoZSBncmlkLg0KVGhlIEVTTC5taXh0dXJlIGRhdGFzZXQgY29tZXMgd2l0aCB0aGUgZ3JpZCBhcyBhIGNvbXBvbmVudCwgYW5kIHNvIHdlIGNhbiB1c2UgdGhhdCBkaXJlY3RseS4gRWxzZSB3ZSB3aWxsIGhhdmUgdG8gdXNlIHRoZSBmdW5jdGlvbiBjcmVhdGVkIGVhcmxpZXIuDQpgYGB7cn0NCnhncmlkIDwtIGV4cGFuZC5ncmlkKFgxPXB4MSxYMj1weDIpDQp5Z3JpZCA8LSBwcmVkaWN0KHJhZGlhbF9zdm1fZml0LCB4Z3JpZCkNCmBgYA0KDQpQbG90dGluZyB0aGUgZ3JpZCBhbmQgdGhlIHByZWRpY3Rpb25zDQoNCmBgYHtyfQ0KcGxvdCh4Z3JpZCwgY29sID0gYXMubnVtZXJpYyh5Z3JpZCksIHBjaCA9IDIwLCBjZXggPSAuMikNCnBvaW50cyh4LCBjb2wgPSB5KzEsIHBjaCA9IDE5KQ0KYGBgDQoNClRoZSBkZWNpc2lvbiBib3VuZGFyeSBpcyBub3QgbGluZWFyLg0KDQpDcmVhdGluZyB0aGUgYWN0dWFsIGRlY2lzaW9uIGJvdW5kYXJ5IGFuZCBwbG90dGluZyBpdCB1c2luZyB0aGUgY29udG91ciBmdW5jdGlvbi4NCg0KVGhlIEVTTC5taXh0dXJlIGhhcyBhIHByb2IgY29tcG9uZW50LCB0aGF0IGdpdmVzIHRoZSB0cnVlIHByb2JhYmlsaXR5IG9mIGEgKzEgdnMgLTEgYXQgZXZlcnkgdmFsdWUgb2YgdGhlIGdyaWQuDQoNClNvLCB3ZSB3aWxsIHBsb3QgdGhlIGNvbnRvdXIgYXQgMC41LCB3aGljaCBpcyB0aGUgdHJ1ZSBkZWNpc2lvbiBib3VuZGFyeSBvZiB0aGUgZGF0YS4gQW5kIHRoZSBwcmVkaWN0IHRoZSBkZWNpc2lvbiBib3VuZGFyeSBmcm9tIG91ciBtb2RlbCAodGhlIGZ1bmN0aW9uIHZhbHVlIGl0c2VsZiBhbmQgbm90IHRoZSBjbGFzcyBvdXRwdXQpDQoNCmBgYHtyfQ0KI1ByZWRpY3RpbmcgZml0IG9uIHRoZSBncmlkLiBEZWNpc2lvbiB2YWx1ZXMgPSBUIGJlY2F1c2Ugd2UgbmVlZCB0aGUgdmFsdWUgb2YgdGhlIGZ1bmN0aW9uIGFuZCBub3QganVzdCB0aGUgY2xhc3NpZmljYXRpb24NCmZ1bmMgPC0gcHJlZGljdChyYWRpYWxfc3ZtX2ZpdCwgeGdyaWQsIGRlY2lzaW9uLnZhbHVlcyA9IFRSVUUpDQoNCiNUaGlzIHJldHVybnMgYW4gYXR0cmlidXRlIGFuZCB3ZSBuZWVkIHRvIHB1bGwgb2YgdGhhdA0KZnVuYyA8LSBhdHRyaWJ1dGVzKGZ1bmMpJGRlY2lzaW9uDQoNCmBgYA0KDQoNClBsb3R0aW5nIHRoZSBhYm92ZSBkZWNpc2lvbiB2YWx1ZXMNCg0KYGBge3J9DQpwbG90KHhncmlkLGNvbD1hcy5udW1lcmljKHlncmlkKSxwY2g9MjAsY2V4PS4yKQ0KcG9pbnRzKHgsY29sPXkrMSxwY2g9MTkpDQoNCg0KY29udG91cihweDEscHgyLG1hdHJpeChmdW5jLDY5LDk5KSxsZXZlbD0wLGFkZD1UUlVFKQ0KI0FkZGluZyB0aGUgdHJ1dGgsIHVzaW5nIHRoZSAwLjUgY29udG91cg0KY29udG91cihweDEscHgyLG1hdHJpeChwcm9iLDY5LDk5KSxsZXZlbD0uNSxhZGQ9VFJVRSxjb2w9ImJsdWUiLGx3ZD0yKQ0KYGBgDQoNCi0tLQ==