Sunday, 14 August 2016

Setting up solver.prototxt for caffe model training

The following contains some initial notes on setting up the solver.prototxt file. These notes were prepared to help a few people get started with training a caffe model. Provided here for archival purposes. These are the very first things "to know" about setting up the solver.prototxt file and what kind of training output to expect.

Further reading is provided here: https://github.com/BVLC/caffe/wiki/Solver-Prototxt
And of course in the comments in caffe.proto

Caffe will print the output of the training to a log file or the console, depending on what you specify.

I usually print both to console and a file like this:
[desired training script] 2>&1 | tee logfile.txt

For instance, I would run:
python solve.py 2>&1 | tee logfile.txt

Where solve.py is a python script that loads some pre-trained model weights, sets up new layers (for transfer learning) and initializes the solver using the solver.prototxt file. In the rest of this post I'll discuss the various variables and settings in the solver.prototxt file.

This output will look something like:

I0621 16:14:39.496878 20752 solver.cpp:228] Iteration 0, loss = 163741 I0621 16:14:39.496912 20752 solver.cpp:244] Train net output #0: loss = 161660 (* 1 = 161660 loss) I0621 16:14:39.496918 20752 sgd_solver.cpp:106] Iteration 0, lr = 0.001 I0621 16:15:25.024097 20752 solver.cpp:228] Iteration 20, loss = 4.84049e+09 I0621 16:15:25.024127 20752 solver.cpp:244] Train net output #0: loss = 3.70536e+09 (* 1 = 3.70536e+09 loss) I0621 16:15:25.024132 20752 sgd_solver.cpp:106] Iteration 20, lr = 0.001 I0621 16:16:12.229852 20752 solver.cpp:228] Iteration 40, loss = 2.78827e+09 I0621 16:16:12.229883 20752 solver.cpp:244] Train net output #0: loss = 1.38314e+09 (* 1 = 1.38314e+09 loss)

Notice that this output is printed every 20 iterations, because I specified display: 20. In this case two losses are reported: one for the current batch, and one averaged over the last 20 iterations because I also specified average_loss: 20.

Every test_interval training iterations, test_iter x batch_size images are fetched for validation.
If batch_size is not specified, it is assumed to be 1.
Depending on how often you evaluate (test_interval) and how big your memory is, you may want to sample more or fewer validation images. For FCN, batch_size: 1.

In the caffe log output, you will see something like this:
I0621 16:22:30.196046 20752 solver.cpp:337] Iteration 200, Testing net (#0)

I0621 16:22:56.249928 20752 solver.cpp:404]     Test net output #0: loss = 1.3426e+09 (* 1 = 1.3426e+09 loss)

If my test_interval is set to be 200, then I will see this output every 200 iterations.

Notice that the learning rate (lr) is also printed out. It starts out as base_lr and follows the lr_policy. For instance, if lr_policy: "fixed", then the lr will remain constant throughout all the iterations. If I set lr_policy: "step", I'll also need to indicate the stepsize and gamma. This means the lr will be multiplied by gamma every stepsize iterations. Decreasing the learning rate over time can help the network to converge, by staying near minima once they are found (with the natural possibility of getting stuck in bad local minima). A stepsize that is too small might not allow the "problem landscape" to be explored enough. A stepsize that is too large will take the network longer to converge.

I typically keep momentum and weight_decay at the relative standard values of momentum: 0.9 and weight_decay: 0.0005. This has to do with how the weights are updated and normalized during iterations.

You can snapshot your network at every snapshot iterations using the snapshot_prefix to which you will have _iter_[iternum].caffemodel appended during the training. snapshot_prefix can contain a full path as well as a prefix name. Snapshot as frequently or rarely as needed: more frequently if you foresee wanting to restart a model from some iteration (for retraining or because of possible crashes). Snapshotting too frequently adds a lot of bulky files that can hog your memory.

You can then resume training from any snapshot by loading in the corresponding caffemodel (which contains the model parameters) and solverstate (which contains the solver parameters).










Monday, 27 June 2016

A recipe for getting into deep learning (10 artificially quickened steps)

To answer the constantly recurring question on Quora of "how do I get into deep learning?" I've decided to write down a possible workflow - a recipe from scratch, so to say:

(0) Preparations: get psyched
Watch Ex Machina, Read iRobot, Read Andrej's short story (to get the A.I. researcher's perspective), Google the words "deep learning" and recognize how it has proliferated into the general stream of mass media (and read the journalist's perspective). Breathe.

(1) Read over these notes: http://vision.stanford.edu/teaching/cs231n/ (not once, but twice, so you can catch all the additional tips and tricks sprinkled throughout)


(2) To complement (1), watch these lectures: https://www.youtube.com/playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC - too much content, too little time? https://chrome.google.com/webstore/detail/video-speed-controller/nffaoalbilbmmfgbnbgppjihopabppdk?hl=en (you're welcome)


(3) Install Evernote. Start clipping any tip or hint, blog post, forum response, or useful link. Collect knowledge at a rapid pace.

(4) Go over every example and tutorial: http://caffe.berkeleyvision.org/ 

(5) Start with a simple network like AlexNet and write out the model structure, draw out the blobs and layers, calculate the number of parameters and computations, examine how the input and output dimensions change from layer to layer. Consider some other architectures and look at how different architectural choices would affect these calculations.


(6) Set up iPython notebook and start playing with simple existing nets, figure out how to parse and visualize the training/test errors and monitor model performance over iterations, figure out how to visualize the features and different net computations, run existing nets on some new images, plot some beautiful results.


(7) Fine-tune an existing net for a new dataset and task (bonus points for coming up with a fun new task and dataset).


(8) Hear from the giants and gain some additional high-level intuitions: http://www.ipam.ucla.edu/programs/summer-schools/graduate-summer-school-deep-learning-feature-learning/?tab=schedule


(9) Dig deeper, build stronger foundations from the bottom-up: http://www.deeplearningbook.org/

(10) Re-watch as many deep learning talks from the last few years as possible (at 1.5-2.0x speed, of course). Open ArXiv. Breathe.




Tuesday, 14 June 2016

Caffe on OS X El Capitan

There's a natural barrier to entry for anything that's interesting to work with. If you want something cool, you have to work for it.

Caffe is no exception. Getting the libraries up and running is by now known to be a little bit of a headache. The good news is: so many people have tried it out (sparked by the desire to create hypnotic and viral DeepDream images) that there's plenty of forums dedicated to debugging every possible installation and compilation error (from noob problems to more expert customizations).

To add to this, I include here a few notes for installation and compilation of caffe on my Mac (in case it captures similar problems others may have).

Apparently the ‘make runtest’ (final set of tests to be run to check for correct compilation and library linking) is actually supposed to fail on El Capitan because the new OS strips DYLD_LIBRARY_PATH from the environment, so a number of libraries (that are usually pointed to in the path) are not visible during compile time and cause errors (http://www.megastormsystems.com/news/how-to-install-caffe-on-mac-os-x-10-11). Interestingly, this can cause similar errors for other code bases being compiled on El Capitan (e.g. https://www.postgresql.org/message-id/561E73AB.8060800@gmx.net)

I believe that following the original caffe instructions (http://caffe.berkeleyvision.org/install_osx.html) properly should work ok. I used all the recommended defaults (including anaconda python), except I needed to install OpenBLAS instead of the default.

Then, one can either just skip the make runtest command (since it will fail) and hope for the best, or one has to disable the new OS security feature (http://www.macworld.com/article/2986118/security/how-to-modify-system-integrity-protection-in-el-capitan.html). I disabled SIP (using recovery mode), ran the tests to make sure I didn’t get any errors, and then re-enabled SIP.


After all that, there might still be library linking issues requiring patches (https://github.com/BVLC/caffe/issues/3227#issuecomment-167540070). I also needed to add a symbolic link from my anaconda hdf5 to /usr/local/lib.  

I added all the recommended paths to ~/.bashrc. Here's what it looks like:

export PATH=$HOME/anaconda/bin:/usr/local/cuda/bin:$PATH
export DYLD_FALLBACK_LIBRARY_PATH=/usr/local/cuda/lib:$HOME/anaconda/lib:/usr/local/lib:/usr/lib:$DYLD_FALLBACK_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=$HOME/anaconda/include/python2.7/:
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
export PYTHONPATH=~/Documents/caffe/python:$PYTHONPATH
export LD_LIBRARY_PATH=$HOME/anaconda/lib

I also wanted to get iPython notebook running on my local machine. However, I was having errors linking to libraries in iPython notebook but not in python. Turns out there’s a hack where instead of running ipython from the command line, you run python –m IPython
And respectively, for ipython notebook: python –m IPython notebook
Then everything works as expected with caffe.

In general, I’ve seen mixed opinions about using the anaconda python installation or not, which solves some issues during caffe compilation but comes also with its own

On my laptop, my set-up is to use the Pycaffe interface and train using CPU only. Here is what my final Makefile.config looks like:

## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
# USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
# You should not set this flag if you will be reading LMDBs with any
# possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
# OPENCV_VERSION := 3

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_50,code=compute_50

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
 BLAS_INCLUDE := /usr/local/Cellar/openblas/0.2.18/include/
 BLAS_LIB := /usr/local/Cellar/openblas/0.2.18/lib/

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
 ANACONDA_HOME := $(HOME)/anaconda
 PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
$(ANACONDA_HOME)/include/python2.7 \
$(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := $(ANACONDA_HOME)/lib

# Uncomment to support layers written in Python (will link against Python libs)
 WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

Tuesday, 7 June 2016

Why NOW is the best time to get into neural networks

(1) No more mystery. The ("initially cloaked in a veil of mysterious superpowers") AlexNet of 2012 that became such a frequent topic of conversation and citation has since been superseded by a plethora of new architectures, each offering an exploration of a new setting of parameters or architectural choices. One can now skip the initial phase of questioning and wonderment (of why the network works so well and how each architectural choice contributes), throw off the veil of mystery, and read about what we now know about neural network architectures in general, after trying a bunch.

(2) Make-your-own psychedelic pictures. After deep dream debuted, the number of people who wanted to create a psychedelic picture for their Facebook profiles skyrocketed, resulting in a surge of caffe downloads, installation and compilation headaches, forum discussions, online tutorials, and subsequent caffe improvements. Pretty much every problem or error that could occur has been logged somewhere on the web via what has become massive, crowd-sourced QA (and I'm not even mentioning all the other possible libraries). You may now proceed.

(3) In the news. The terms "neural networks" and "deep learning" have escaped the ivory towers of academia and have appeared in all forms of media. They continue to trickle down from the more specialized channels to reach increasingly broader audiences. From introductory talks at (non-CS) conferences about the potential power of neural networks, to Google I/O 2015 announcements about the reliance on neural net technology to increasingly power applications, to endless tech news and feature articles about neural networks (and the future), to even appearing in Silicon Valley... now if Homer Simpson next utters the words 'neural networks' but they are not in your casual, day-to-day vocabulary, then shame on you.


(4) Spoon-fed brilliance. Brilliant people have taken the time to digest and curate a whole load of content for the rest of us. Some examples: lecture notes and tutorialstextbooks, summer schools; and online courses. Those same brilliant people answer tons of questions on forums and social networks (e.g. recurring Q&A sessions on Quora). One of the hundreds of available explanations/expositions of neural networks will surely speak to you.

(5) A glimpse of the future. No doubt we are all heading towards more advanced systems, backed my neural network architectures. Large companies are rushing to build bigger, faster, more robust deep learning software and hardware, small start-ups with neural network backing are springing up all over the place, and just about every field (from healthcare to organic chemistry to social science) is beginning to feel this tidal wave on the horizon. Should at least know about what's going to hit you.

and the list goes on...

Monday, 6 June 2016

And it begins...

I was first introduced to neural networks during my cognitive science class in my first semester of undergrad, looked at them closer during my third year in undergrad in my first machine learning course, and thought carefully about them at least 3 more times since then in course-form (by taking or TA-ing various undergrad and grad versions of machine learning, where NN were merely tiny course modules... sometimes even optional). And as of about 3 years ago, everyone I know academically has taken them over... or rather, have been taken over by them. While I had some very high level ideas and was able to keep up casual conversations and follow the main trends, I didn't quite delve head deep into them, to start to understand the inner workings, the architectural decisions, or the computational possibilities. I decided that this summer I would take on the quest of learning more about that which will be ubiquitous. Here, I will attempt to document this journey... as well the resources and thoughts that surface.