Deep learning - easy these days

Deep learning is easy. You don't need to take my word for it - many Kaggle competitions are won using Keras, whose author says “I strongly believe that there are no difficult ideas in deep learning”, as well as “Keras, which makes deep learning as easy an manipulating LEGO bricks.” Both of those quotes come from a simple-as-advertised book “Deep Learning with Python”. You may be tempted to look down your nose at something that doesn't have the beautiful equations of Russell and Norvig's AI book (obviously a fantastically wider topic), but you should not. Programming is a combination of science, art, engineering and sticky tape, well reflected in this book which takes only a few hours to read.

To repeat, deep learning is easy. You shouldn't take François's word for it either. Let's see.

MNIST

MNIST is the Hello World of image recognition - 60,000 digitised digits, the image shows 10 of them selected at random.

Ten digits chosen at random from the MNIST dataset
Various digits from the MNIST dataset. Inefficient for RNG, but look at the pretty colours.

Output - running on CPU

Taking around 1 minute per epoch:

    60000/60000 [==============================] - 66s 1ms/step - loss: 0.2569 - acc: 0.9208 - val_loss: 0.0561 - val_acc: 0.9819
Epoch 2/12
60000/60000 [==============================] - 68s 1ms/step - loss: 0.0889 - acc: 0.9739 - val_loss: 0.0407 - val_acc: 0.9859
Epoch 3/12
60000/60000 [==============================] - 67s 1ms/step - loss: 0.0689 - acc: 0.9794 - val_loss: 0.0364 - val_acc: 0.9875
Epoch 4/12
60000/60000 [==============================] - 66s 1ms/step - loss: 0.0551 - acc: 0.9837 - val_loss: 0.0322 - val_acc: 0.9886
Epoch 5/12
60000/60000 [==============================] - 68s 1ms/step - loss: 0.0455 - acc: 0.9861 - val_loss: 0.0290 - val_acc: 0.9898
Epoch 6/12
60000/60000 [==============================] - 68s 1ms/step - loss: 0.0396 - acc: 0.9871 - val_loss: 0.0310 - val_acc: 0.9886
Epoch 7/12
60000/60000 [==============================] - 67s 1ms/step - loss: 0.0379 - acc: 0.9885 - val_loss: 0.0320 - val_acc: 0.9889
Epoch 8/12
60000/60000 [==============================] - 68s 1ms/step - loss: 0.0341 - acc: 0.9896 - val_loss: 0.0271 - val_acc: 0.9907
Epoch 9/12
60000/60000 [==============================] - 68s 1ms/step - loss: 0.0322 - acc: 0.9902 - val_loss: 0.0318 - val_acc: 0.9895
Epoch 10/12
60000/60000 [==============================] - 67s 1ms/step - loss: 0.0287 - acc: 0.9912 - val_loss: 0.0269 - val_acc: 0.9910
Epoch 11/12
60000/60000 [==============================] - 68s 1ms/step - loss: 0.0262 - acc: 0.9919 - val_loss: 0.0252 - val_acc: 0.9917
Epoch 12/12
60000/60000 [==============================] - 66s 1ms/step - loss: 0.0249 - acc: 0.9922 - val_loss: 0.0287 - val_acc: 0.9916
Test loss: 0.02867634769309134
Test accuracy: 0.9916

Output - running on GPU

Taking about 6s per epoch:

-> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
60000/60000 [==============================] - 7s 119us/step - loss: 0.2746 - acc: 0.9153 - val_loss: 0.0590 - val_acc: 0.9818
Epoch 2/12
60000/60000 [==============================] - 6s 95us/step - loss: 0.0868 - acc: 0.9744 - val_loss: 0.0421 - val_acc: 0.9854
Epoch 3/12
60000/60000 [==============================] - 6s 94us/step - loss: 0.0664 - acc: 0.9805 - val_loss: 0.0353 - val_acc: 0.9878
Epoch 4/12
60000/60000 [==============================] - 6s 94us/step - loss: 0.0553 - acc: 0.9837 - val_loss: 0.0330 - val_acc: 0.9896
Epoch 5/12
60000/60000 [==============================] - 6s 95us/step - loss: 0.0464 - acc: 0.9862 - val_loss: 0.0303 - val_acc: 0.9905
Epoch 6/12
60000/60000 [==============================] - 6s 99us/step - loss: 0.0394 - acc: 0.9879 - val_loss: 0.0277 - val_acc: 0.9905
Epoch 7/12
60000/60000 [==============================] - 6s 97us/step - loss: 0.0370 - acc: 0.9883 - val_loss: 0.0359 - val_acc: 0.9895
Epoch 8/12
60000/60000 [==============================] - 6s 96us/step - loss: 0.0359 - acc: 0.9894 - val_loss: 0.0334 - val_acc: 0.9897
Epoch 9/12
60000/60000 [==============================] - 5s 92us/step - loss: 0.0311 - acc: 0.9903 - val_loss: 0.0276 - val_acc: 0.9910
Epoch 10/12
60000/60000 [==============================] - 6s 97us/step - loss: 0.0305 - acc: 0.9907 - val_loss: 0.0291 - val_acc: 0.9909
Epoch 11/12
60000/60000 [==============================] - 6s 99us/step - loss: 0.0278 - acc: 0.9909 - val_loss: 0.0275 - val_acc: 0.9910
Epoch 12/12
60000/60000 [==============================] - 6s 92us/step - loss: 0.0270 - acc: 0.9915 - val_loss: 0.0258 - val_acc: 0.9919
Test loss: 0.025837273157274375
Test accuracy: 0.9919

The values below show the raw data for a single entry in the dataset. Each of the 28x28 digitised pixels takes a value from 0-255. Printing each pixel's value using exactly three characters, it's not difficult for a human to recognise the number three.

0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0 96121121134254254192121191234 53160 38  0  0  0  0  0  0  0
0  0  0  0  0  0  0 77251253253253253253254253253253244250150  0  0  0  0  0  0  0
0  0  0  0  0  0  0201253253253253253253254253253253253225 38  0  0  0  0  0  0  0
0  0  0  0  0  0  0214253236173173173173 40166153233253253 80  0  0  0  0  0  0  0
0  0  0  0  0  0  0 45 53 42  0  0  0  0  0 29136253253253 80  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  6 54 54188215253253253185 21  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0 27253253254253253253116 21  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0 42253253254253253253 53  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0161253253254253253253169 46  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0 91253253254253253253253199  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0128242254254254143  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 11 55231253230 40  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0151253253142  0  0  0  0  0  0
0  0  0  0  0 43  0  0  0  0  0  0  0  0  0  0  0  0 11155253249 90  0  0  0  0  0
0  0  0  0  0219151 46  0  0  0  0  0  0  0  0  0  0  0 54253196 22  0  0  0  0  0
0  0  0  0  0142253232152 54 17  0 32 26  0  0  0  0 79201253 93  0  0  0  0  0  0
0  0  0  0  0 91228253253253199174220211175 47160174237253247 78  0  0  0  0  0  0
0  0  0  0  0  0 85226248253253253253253255253253253253247120  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0120240249253253253255253253253207 84  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0 83120127253255126120120 25  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Trained it. Now what?

You want to use it for something!

Here's One I Prepared Earlier

Although it takes only a few seconds to train this model, it's pointless training it every time. Save the model, run it on examples.

First hand-drawn digit for keras to recognise Second hand-drawn digit for keras to recognise
Good morning. I have two digits I'd like you to recognise.
import keras
import matplotlib.image as image
import sys

model = keras.models.load_model('mnist_model.hdf')
for filename in sys.argv[1:]:
    imagedata = 1 - image.imread(filename)
    n = model.predict_classes(imagedata.reshape(1, 28**2))
    print(n[0], end='')

This script loads the existing model from file and uses it to predict a class for the data contained in each image passed as an argument.

Data-wrangling is surely the most time-consuming part of deep learning. The mnist data on which this model was trained has specific features. An item is 28x28 pixels, with each pixel ranging from 0-1 (raw data was converted from 0-255). If it is passed an image of the wrong size, the model will fail to initialise. But the mnist data also uses relatively-thin pen strokes, and leaves lots of whitespace at the edges of the digit. Deviating from these features, which is all that the network model knows, will make predictions fail. For example, if an image has a tiny inkblot in the corner, the network has never seen anything like it and struggles to classify the digit. For the greyscale pngs loaded here, a pixel with value 1 is white, but the model was trained to consider ‘0’ as the background, so the imagedata passed by the script is inverted from the png data.

That's really just a restatement of the importance of training with representative data. Providing useful data - including understanding what that means - generally takes more effort than using it.

Anyway, download the images of the two digits drawn and digitised above (first.png, second.png). If you didn't run the training script, you can even download a trained model here.

Then run the script predict.py first.png second.png and the output is rather simply 42.

Configuration

Avoiding dishonesty, I'm including the configuration stage. It will take much longer than the actual deep learning bit. Create a fresh environment with python -m venv ~/newenvname and activate it.

Now install what you need. pip install tensorflow-gpu keras If you don't have a configured gpu, don't type the -gpu bit.

Things may get a bit messy with version numbers of libraries, but it's all just configuration - once it's done, the actual development / deep learning / supposedly hard stuff really is easy. For example, I'd installed the latest CUDA (9.1 today, obsolete tomorrow) and was using it while writing CUDA code directly, but the most recent version supported by the default TensorFlow is 9.0. At this point, you can choose between building from source (see the TensorFlow documentation), or installing CUDA 9.0 (see Nvidia's site). Just don't waste your time falling back on the CPU.