• I'll show you how to train your own CNN (Convolutional Neural Net) on the CIFAR-10 Dataset (60000 32x32 colour images in 10 classes).

Let's get started! 

1) Start a preconfigured machine learning image with TensorFlow 2.2

  • In order to take advantage of multiple GPUs, you evidently need to start an instance with more than one. Choose any number between 2 and 10 GPUs, the more the faster. 

  • Beware that scaling with multiple GPUs is never quite linear (i.e. 8 GPUs are faster than 4 GPUs but not exactly twice as fast).

  • As you are going to need a working installation of Jupyter on your cloud instance. All the preconfigured machine learning images are based on Anaconda virtual environments which include tons of machine learning and data-science tools including Jupyter, so you are ready to go out of the box. 

  • For the purpose of this tutorial we'll chose TensorFlow 2.2:

2) Connect to it via SSH

  • Click here for help.

3) Start the Jupyter Notebook server

  • On your remote instance start a browserless jupyter notebook server. 

jupyter notebook --no-browser

  • By default this server is going to bind to port 8888, you can specify otherwise if you want, by using the --port=XXXX option. Once started the server will show you how to access it. You should see two URLs at the bottom:

To access the notebook, open this file in a browser:
Or copy and paste one of these URLs:

4) Forwarding ssh traffic and start Jupyter Notebook in your browser

  • As we don't want to expose the Jupyter notebook to the internet for everyone (who knows the token or password) but only accessible for me from my local browser, we will use a technique called SSH port forwarding which will securely forward the traffic between your local machine and the instance.

  • Therefore, on your local machine, in a second terminal, type the following with your instance's Public IP:

ssh -CNL localhost:8888:localhost:8888 ubuntu@<INSTANCE-IP>
  • This way when you enter localhost on port 8888 the traffic is securely forwarded via SSH to the instance where the Jupyter notebook runs on port 8888 as well.

  • To acces your jupyter server you just have to open localhost:8888 on your local machine.

5) Upload the Notebook that contains the CNN training

  • I have prepared a Jupyter Notebook file for you that you can download here: https://gist.github.com/maxbonaparte/ff9777e1ec284b95e7367a925a972fdf 

  • This notebook guides you through a simple example of training a CNN with multiple GPUs (we've chosen four earlier, but it works with any number of GPUs).

  • Just upload it to Jupyter, open it and you can run each cell step-wise as shown in the video.

  • In the end it shows you the training results. If they look good, you've successfully trained a CNN on multiple GPUs!

  • Feel free to experiment and play by changing as many parameters as you like.

  • For example, you can add more convolutional layers, or more fully connected or dropout layers. You can also experiment with different filter sizes, different number of units in your fully connected layers, different batch sizes, etc... Which ones give you the best validation accuracy and training speed?