11 operations and points of attention for performing deep learning in AWS

Performing large-scale deep learning on AWS is a cheap and effective way to learn and develop. It can be recommended to use tens of gigabytes of memory, dozens of CPUs, and multiple GPUs for a small amount of money.

These commands are very effective if you are a newcomer using EC2 or Linux commands and performing deep learning scripts in the cloud.
The main contents of this article include:
1) Copy data between the local and EC2 instances
2) Make the script run safely by day, week and month
3) Monitor the performance of the process, system and GPU

Note: All commands are executed in a linux-like environment (Linux, OS x or cygwin)

在AWS执行深度学习处理的11种注意操作

0, environmental agreement

Assuming that AWS EC2 is up and running, for the sake of convenience, make the following settings for the environment:
1) The IP address of the EC2 server is 54.218.86.47
2) The user name is ec2-user
3) The SSH key is located in ~/.ssh/ and the file name is aws-keypair.pem;
4) Working with python scripts

1, log in to the server

Before doing anything, first log in to the target server. Simply use the SSH command. Store the SSH key in ~/.ssh/ with a meaningful filename, such as aws-keypair.pem. Use the following command to log in to the EC2 host, paying attention to the address and username:
Ssh -i ~/.ssh/aws-keypair.pem .47

2, copy the file to the server

Use the SCP command to copy local files to the server. For example, the command to copy the script.py file to the EC2 server is as follows:
Scp -i ~/.ssh/aws-keypair.pem script.py .47:~/

3, make the script run in the background of the server

Execute the script in the background of the service, you can ignore other semaphores, ignore standard input and output, and redirect all output and error information to a log file. This is necessary for deep learning models that require long runs.
> nohup python /home/ec2-user/script.py >/home/ec2-user/script.py.log &1 &

The script.py and script.py.log in this command are located in the /home/ec2-user/ directory. A detailed introduction to nohup and redirection references (such as the introduction in wikipedia).

4. Execute the script on the specified GPU of the server

It is recommended to run multiple scripts at the same time if EC2 supports it. For example, if EC2 has 4 GPUs, you can run a separate script on each GPU. The sample code is as follows:
CUDA_VISIBLE_DEVICES=0 nohup python /home/ec2-user/script.py >/home/ec2-user/script.py.log &1 &

If there are 4 GPUs, you can specify CUDA_VISIBLE_DEVICES from 0 to 3. This is feasible on the Keras that TF is doing backstage, and has not been tested in Theano.

5, monitor the output of the script

If there is an item score in the output or the result of an algorithm, the output of the real-time monitoring script is meaningful. An example is as follows:
Tail -f script.py.log

Unfortunately, AWS will close the terminal when there is no output on the screen for a while, so it's best to use:
Watch "tail script.py.log"

Sometimes I can't see the standard output of Python, I don't know if it's a python problem or an EC2 issue.

6. Monitor system and process performance

It makes sense to monitor the performance of the EC2 system, especially how much memory is already in use or left. E.g:
Top -M

Or specify the process ID PID:
Top -p PID -M

7, monitor GPU performance

If you execute multiple scripts simultaneously on the GPU and execute them in parallel, it is a good idea to look at the performance and usage of each GPU. E.g:
Watch "nvidia-smi"

8, check if the script is still running on the server

In general, the terminal will remain open.
Watch "ps -ef | grep python"

9, edit the file on the server

It is generally not recommended to modify directly on the server, except of course you are familiar with vi:
Vi ~/script.py

The usage of vi is not covered here.

10, download files from the server

As opposed to uploading a file, this is an example of the next png file:
Scp -i ~/.ssh/aws-keypair.pem .47:~/*.png .

Points to note

If you want to run multiple scripts at the same time, it's best to use EC2 with multiple GPUs.

Best to write scripts locally

Output the execution result to a file and download it to the local for analysis

Use the watch command to keep the terminal running

Execute remote commands locally

Ballpoint Pens With Stylus

Shenzhen Ruidian Technology CO., Ltd , https://www.wisonen.com