Neural Nets with Caffe Utilizing the GPU

network-graphCaffe is an open-source deep learning framework originally created by Yangqing Jia which allows you to leverage your GPU for training neural networks. As opposed to other deep learning frameworks like Theano or Torch you don’t have to program the algorithms yourself; instead you specify your network by means of configuration files. Obviously this approach is less time consuming than programming everything on your own, but it also forces you to stay within the boundaries of the framework, of course. Practically though this won’t matter most of the time as the framework Caffe provides is quite powerful and continuously advanced.

Continue reading

GPU Powered DeepLearning with NVIDIA DIGITS on EC2

activationsIn this tutorial I am going to show you how to set up CUDA 7, cuDNN, caffe and DIGITS on a g2.2xlarge EC2 instance (running Ubuntu 14.04 64 bit) and how to get started with DIGITS. For illustrating DIGITS’ application I use a current Kaggle competition about detecting diabetic retinopathy and its state from fluorescein angiography.

Convolutional Deep Neural Networks for Image Classification

For classification or regression on images you have two choices:

  • Feature engineering and upon that translating an image into a vector
  • Relying on a convolutional DNN to figure out the features

Continue reading

“Digit Recognizer” Challenge on Kaggle using SVM Classification

MNIST-digitsThis article is about the “Digit Recognizer” challenge on Kaggle. You are provided with two data sets. One for training: consisting of 42’000 labeled pixel vectors and one for the final benchmark: consisting of 28’000 vectors while labels are not known. The vectors are of length 784 (28×28 matrix) with values from 0 to 255 (to be interpreted as gray values) and are supposed to be classified as to what number (from 0 to 9) it represents. The classification is realized using SVMs which I implement with kernlab package in R.

Three representation of the data set

The pre- and post-processing in all cases consists of removing the unused (white – pixel value = 0) frame (rows and columns) of every matrix and finally to scale the computed feature vectors (feature-wise) to mean 0 and standard deviation 1. I gave three representations a try:

Continue reading