1) Start a preconfigured machine learning image with TensorFlow 2.2
In order to take advantage of multiple GPUs, you evidently need to start an instance with more than one. Choose any number between 2 and 10 GPUs, the more the faster. Beware that scaling with multiple GPUs is never quite linear (i.e. 8 GPUs are faster than 4 GPUs but not exactly twice as fast):
As you are going to need a working installation of Jupyter on your cloud instance. All the preconfigured machine learning images are based on Anaconda virtual environments which include tons of machine learning and datascience tools including Jupyter, so you are ready to go out of the box. For the purpose of this tutorial we'll chose TensorFlow 2.2:
2) Connect to it via SSH
See here for help.
3) Start the Jupyter Notebook server
On your remote instance start a browserless jupyter notebook server.
jupyter notebook --no-browser
By default this server is going to bind to port 8888, you can specify otherwise if you want, by using the --port=XXXX option. Once started the server will show you how to access it. You should see two URLs at the bottom:
To access the notebook, open this file in a browser: file:///home/ubuntu/.local/share/jupyter/runtime/nbserver-31730-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=ddef637a8s3d0336dd0f0606d0f2fd859b3867193c972528 or http://127.0.0.1:8888/?token=ddef637a8s3d0336dd0f0606d0f2fd859b3867193c972528
4) Forwarding ssh traffic and start Jupyter Notebook in your browser
As we don't want to expose the jupyter notebook to the internet for everyone (who knows the token or password) but only accessible for me from my local browser, we will use a technique called SSH port forwarding which will securely forward the traffic between your local machine and the instance.
Therefore, on your local machine, in a second terminal, type the following with your instance IP:
ssh -CNL localhost:8888:localhost:8888 ubuntu@<INSTANCE-IP>
This way when you enter localhost on port 8888 the traffic is securely forwarded via SSH to the instance where the Jupyter notebook runs on port 8888 as well.
To acces your jupyter server you just have to open localhost:8888 on your local machine.
5) Upload the Notebook that contains the CNN training
I've prepared a Jupyter Notebook file for you that you can download here: https://gist.github.com/maxbonaparte/ff9777e1ec284b95e7367a925a972fdf
This notebook guides you through a simple example of training a CNN with multiple GPUs (we've chosen four earlier, but it works with any number of GPUs).
Just upload it to Jupyter, open it and you can run each cell step-wise as shown in the video.
In the end it shows you the training results. If they look good, you've successfully trained a CNN on multiple GPUs!
Feel free to experiment and play by changing as many parameters as you like.
For example, you can add more convolutional layers, or more fully connected or dropout layers. You can also experiment with different filter sizes, different number of units in your fully connected layers, different batch sizes, etc... Which ones give you the best validation accuracy and training speed?