Sunday 14 August 2016

Setting up solver.prototxt for caffe model training

The following contains some initial notes on setting up the solver.prototxt file. These notes were prepared to help a few people get started with training a caffe model. Provided here for archival purposes. These are the very first things "to know" about setting up the solver.prototxt file and what kind of training output to expect.

Further reading is provided here: https://github.com/BVLC/caffe/wiki/Solver-Prototxt
And of course in the comments in caffe.proto

Caffe will print the output of the training to a log file or the console, depending on what you specify.

I usually print both to console and a file like this:
[desired training script] 2>&1 | tee logfile.txt

For instance, I would run:
python solve.py 2>&1 | tee logfile.txt

Where solve.py is a python script that loads some pre-trained model weights, sets up new layers (for transfer learning) and initializes the solver using the solver.prototxt file. In the rest of this post I'll discuss the various variables and settings in the solver.prototxt file.

This output will look something like:

I0621 16:14:39.496878 20752 solver.cpp:228] Iteration 0, loss = 163741 I0621 16:14:39.496912 20752 solver.cpp:244] Train net output #0: loss = 161660 (* 1 = 161660 loss) I0621 16:14:39.496918 20752 sgd_solver.cpp:106] Iteration 0, lr = 0.001 I0621 16:15:25.024097 20752 solver.cpp:228] Iteration 20, loss = 4.84049e+09 I0621 16:15:25.024127 20752 solver.cpp:244] Train net output #0: loss = 3.70536e+09 (* 1 = 3.70536e+09 loss) I0621 16:15:25.024132 20752 sgd_solver.cpp:106] Iteration 20, lr = 0.001 I0621 16:16:12.229852 20752 solver.cpp:228] Iteration 40, loss = 2.78827e+09 I0621 16:16:12.229883 20752 solver.cpp:244] Train net output #0: loss = 1.38314e+09 (* 1 = 1.38314e+09 loss)

Notice that this output is printed every 20 iterations, because I specified display: 20. In this case two losses are reported: one for the current batch, and one averaged over the last 20 iterations because I also specified average_loss: 20.

Every test_interval training iterations, test_iter x batch_size images are fetched for validation.
If batch_size is not specified, it is assumed to be 1.
Depending on how often you evaluate (test_interval) and how big your memory is, you may want to sample more or fewer validation images. For FCN, batch_size: 1.

In the caffe log output, you will see something like this:
I0621 16:22:30.196046 20752 solver.cpp:337] Iteration 200, Testing net (#0)

I0621 16:22:56.249928 20752 solver.cpp:404]     Test net output #0: loss = 1.3426e+09 (* 1 = 1.3426e+09 loss)

If my test_interval is set to be 200, then I will see this output every 200 iterations.

Notice that the learning rate (lr) is also printed out. It starts out as base_lr and follows the lr_policy. For instance, if lr_policy: "fixed", then the lr will remain constant throughout all the iterations. If I set lr_policy: "step", I'll also need to indicate the stepsize and gamma. This means the lr will be multiplied by gamma every stepsize iterations. Decreasing the learning rate over time can help the network to converge, by staying near minima once they are found (with the natural possibility of getting stuck in bad local minima). A stepsize that is too small might not allow the "problem landscape" to be explored enough. A stepsize that is too large will take the network longer to converge.

I typically keep momentum and weight_decay at the relative standard values of momentum: 0.9 and weight_decay: 0.0005. This has to do with how the weights are updated and normalized during iterations.

You can snapshot your network at every snapshot iterations using the snapshot_prefix to which you will have _iter_[iternum].caffemodel appended during the training. snapshot_prefix can contain a full path as well as a prefix name. Snapshot as frequently or rarely as needed: more frequently if you foresee wanting to restart a model from some iteration (for retraining or because of possible crashes). Snapshotting too frequently adds a lot of bulky files that can hog your memory.

You can then resume training from any snapshot by loading in the corresponding caffemodel (which contains the model parameters) and solverstate (which contains the solver parameters).