Caffe is an open-source deep learning framework originally created by Yangqing Jia which allows you to leverage your GPU for training neural networks. As opposed to other deep learning frameworks like Theano or Torch you don’t have to program the algorithms yourself; instead you specify your network by means of configuration files. Obviously this approach is less time consuming than programming everything on your own, but it also forces you to stay within the boundaries of the framework, of course. Practically though this won’t matter most of the time as the framework Caffe provides is quite powerful and continuously advanced.
The subject of this article is the composition of a multi-layer feed-forward network. This model will be trained based on data of the “Otto Group Product Classification Challenge” at Kaggle. We’ll also take a look at applying the model to new data and eventually you’ll see how to visualize the network graph and the trained weights. I won’t explain all the details, as this would bloat the text beyond a bearable scale. Also, if you are like me – straightforward code says more than a thousand words. Instead check out this IPython Notebook for the programmatical details – here I will focus on describing the concepts and some of the tripping stones I encountered.
Setting Up
Most likely you don’t have caffe yet installed on your system – if yes, good for you – if not, I recommend working on an EC2 instance allowing GPU-processing, f.x. the g2.2xlarge instance. For instructions on how to work with EC2 have a look at Guide to EC2 from the Command Line and for setting up caffe and its prerequisits work through GPU Powered DeepLearning with NVIDIA DIGITS on EC2. For playing around with Caffe I also recommend installing IPython Notebook on your instance – the instructions for this you’ll find here.
Defining the Model and Meta-Parameters
Training of a model and its application requires at least three configuration files. The format of those configuration files follows an interface description language called protocol buffers. It supeficially resembles JSON but is significantly different and actually supposed to replace it in use cases where the data document needs to be validateable (by means of a custom schema – like this one for Caffe) and serializable.
For training you need one prototxt-file keeping the meta-parameters (config.prototxt) of the training and the model and another for defining the graph of the network (model_train_test.prototxt) – connecting the layers in an acyclical and directed fashion. Note that the data flows from
bottom to
top with regards to how the order of layers is specified. The example network here is composed of five layers:
- data layer (one for TRAINing and one for TESTing)
- inner product layer (the weights I)
- rectified linear units (the hidden layer)
- inner product layer (the weights II)
- output layer (Soft Max for classification)
- soft max layer giving the loss
- accuracy layer – so we can see how the network improves while training.
The following excerpt from model_train_test.prototxt shows layers (4) and (5A):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
[...] layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" inner_product_param { num_output: 9 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } [...] |
The third prototxt-file (model_prod.prototxt) specifies the network to be used for applying it. In this case it is mostly congruent with the specification for training – but it lacks the data layers (as we don’t read data from a data source at production) and the Soft Max layer won’t yield a loss value but classification probabilities. Also the accuracy layer is gone now. Note also that – at the beginning – we now specify the input dimensions (as expected: 1,93,1,1) – it is certainly confusing that all four dimensions are referred to as input_dim , that only the order defines which is which and no explicit context is specified.
Supported Data Sources
This is one of the first mental obstacle to overcome when trying to get started with Caffe. It is not as simple as providing the caffe executable with some CSV and let it have its way with it. Practically, for not-image data, you have three options.
HDF5 is probably the easiest to use b/c you simply have to store the data sets in files using the HDF5 format. LMDB and LevelDB are databases so you’ll have to go by their protocol. The size of a data set stored as HDF5 will be limited by your memory, which is why I discarded it. The choice between LMDB and LevelDB was rather arbitrary – LMDB seemed more powerful, faster and mature judging from the sources I skimmed over. Then again LevelDB seems more actively maintained, judging from its GitHub repo and also has a larger Google and stackoverflow footprint.
Blobs and Datums
Caffe internally works with a data structure called blobs which is used to pass data forward and gradients backward. It’s a four dimensional array whose four dimensions are referred to as:
- N or batch_size
- channels
- height
- width
This is relevant to us b/c we’ll have to shape our cases into this structure before we can store it in LMDB – from where it is feeded directly to Caffe. The shape is straight-forward for images where a batch of 64 images each defined by 100×200 RGB-pixels would end up as an array shaped (64, 3, 200, 100). For a batch of 64 feature vectors each of length 93 the blob’s shape is (64, 93, 1, 1).
Under Load Data into LMDB you can see that the individual cases or feature vectors are stored in Datum objects. Integer valued features are stored (as a byte string) in data, float valued features in float_data. In the beginning I made the mistake to assign float valued features to data which caused the model to not learn anything. Before storing the Datum in LMDB you have to serialize the object into a byte string representation.
Bottom Line
Getting a grip at Caffe was a surprisingly non-linear experience for me. That means there is no entry point and a continuous learning path which will lead you to a good understanding of the system. The information required to do something useful with Caffe is distributed onto many different tutorial sections, source code on GitHub, IPython notebooks and forum threads. This is why I took the time to compose this tutorial and its accompanying code, following my maxim to summarize what I learned into a text I would have liked to read myself in the beginning.
I think Caffe has a bright future ahead – provided it will not just grow horizontally by adding new features but also vertically by refactoring and improving the over all user experience. It’s definitely a great tool for high performance deep learning. In case you want to do image processing with convolutional neural networks, I recommend you take a look at NVIDIA DIGITS which offers you a comfortable GUI for that purpose.
(original article published on www.joyofdata.de)
Nice article, thank you and I’ll be sure to read over it again later :)