File transfer using scp¶
Learning outcomes
- Practice using the documentation of your favorite HPC cluster
- Can transfer files using
scp - (Optional) Can compress and archive files before transferring
For teachers
Teaching goals are:
- Learners have practiced using the documentation of their favorite HPC cluster
- Learners have transferred files using
scp - (Optional) Can compress and archive files before transferring
Info
- We start with a transferring tool in the command line.
- Let's focus on the easiest.
- Next session we try a tool with a graphical interface!
Overview of terminal transfer tools¶
Info
-
Terminal transfer tools are handy when you are already working in the terminal and can sometimes be superior to Graphical tools.
- No switching applications
- Your hands stay on the keyboard, you don't need to grab the mouse
- Tab completion can make finding files faster
- If you work on servers or over ssh you might not be able to use a graphical user interface (GUI).
scphas the similar arguments as for the linux copy functioncp.sftpis more versatile with more file management capabilities. Optional lesson-
rsyncis perfect for syncing and have many capabilities. Optional lesson -
All are considered secure!
But what is wget and curl?
- These tools are used to download files from websites or ftp servers
wget
wgetsaves downloaded contents to local files, likewget ftp://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/pictures/space/*wget https://upload.wikimedia.org/wikipedia/commons/3/37/Grace_Hopper_and_UNIVAC.jpg -O grace_hopper.jpg- typical usecases:
- download data from a service
- download a program or compressed source code
- supports HTTP, HTTPS, and FTP
- user-friendly for basic tasks
- good for mirroring websites
- downloading entire directories recursively for offline viewing or backups.
- most popular from Unix-based systems, like Linux
curl
curloutputs the content to the terminal by default.- add
-Oto download as file. - supports a wide range of protocols: HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP...
- often preferred for scripting and automation due to its versatility
- interacting with APIs, handling complex web requests
- often available by default on Windows and MacOS.
SCP is an abbreviation for Secure Copy Protocol
Pros
- Simple
- One-line command
- Use cases
- copy just a file
- copy just a specific directory (with sub-directories).
When not to use
- When needing several one-line commands
- requires credentials every time
- When looking to do more than a basic file transfer, SCP falls short.
- Transfers that are interrupted you have to restart the entire transfer.
- A file with the same name in the same directory is transferred, will be overwritten.
Risk of overwriting files
- There is no warning if a file is about to be overwritten.
- There is no
scp -i ...as forrm -ithat asks if you really want to remove the file.
- There is no
rsyncmay be a better tool if you want to sync existing content.
Attention
- Some Windows users may need to use
pscpinstead ofscp. - The syntax is however the same in general.
Prior questions
- Who has heard of
scp? - Who has already used
scp?
Procedure¶
Syntax for command arguments
- We use
<content>to tell that this should be replaced by applicable names or paths etcetera... - We use
[content]to tell that this argument is not necessary
- Run the scp commands on YOUR computer, since you probably do not have a server address to your computer!
-
In the terminal (from local, not server session)
-
Where
<from>is the file(s) you want to copy, and<to>is the destination.
Example for Tetralith
This is how you copy a file from your local computer directly to your HOME folder (~/):
To copy a file from Tetralith to your local computer (and present folder), do the command above in reverse order:
- If asked, give your center's password, and possibly, 2nd-factor 6-digit code.
- You can get rid of this prompt if you have setup SSH keys
Set paths
Copy a file from your local computer to the cluster:
Large or many files¶
Compress¶
- Shorten download/upload time by reducing the size of a file!
- A common tool in Linux environments is
gzip. - Usage:
gzip <filename>. You'll get agzfile ending - Decompress:
gunzip <filename>
- A common tool in Linux environments is
Options for compressing during the transfer
-
scp -C ... -
The file(-s) are then also decompressed on the destination.
Compressing is processor intensive
- See extra section
Archive many files¶
- Transferring many files will create so called overhead
- each file has to be addressed individually.
- Solution is to gather the files in an archive, like tar.
- The content then behaves like ONE file.
- Usage:
tar -cf archive.tar /path/filesortar -cf archive.tar /path/folder
- While TARing you may compress the data as well!
tar -czf archive.tar.gz [/path/files]
Workflow
-
Archive and compress a folder with many large files
tar -czf manylargefiles_folder.tar.gz manylargefiles_folder/ -
Transfer data
-
Extract at target destination
tar -xzf manylargefiles_folder.tar.gz -
You should now have
manylargefiles_folder/again at the target destination!
Archiving is often worth more than compressing
Can I use archiving and compressing in all transfer methods?
- Yes!
Exercises¶
Exercise 0: Use the documentation of your HPC cluster
- Search for how to transfer files to/from your HPC cluster using
scp. At which URL is it described?-
Tip: not all HPC centers have documented this, so you should give up searching after a while.
- If the center maintaining you HPC cluster has not documented how to use
scp, follow the Rackham documentation.
- If the center maintaining you HPC cluster has not documented how to use
-
Where is that documentation?
| HPC Cluster | Documentation |
|---|---|
| Alvis | Documentation. |
| Berzelius | Documentation |
| Bianca | Available for download via the transit server, see documentation |
| COSMOS | Documented on UPPMAX page. |
| Dardel | Documentation |
| Kebnekaise | Documentation |
| LUMI | Documentation |
| Pelle | Documentation |
| Rackham | Documentation |
| Tetralith | Documentation |
| Vera | Documentation. |
Exercise 1: Upload a file from your computer, using scp
Tips
-
Useful terminal commands (both locally and remotely)
pwd- which folder am I in?cd [path]- change folder (go up in hierarchy withcd ..)ls- list content of foldermkdir- make a new foldertouch- create empty file
-
(If you want to create a file in local terminal:
$ touch local_file) - (You can check the file structure in an ssh session)
- Send it to an existing folder (e.g.
transfer) on Tetralith- use
mkdir <folder name>if it is not there
- use
- Check on server that it is there
Answer (Tetralith example)
Locally
- (If you want to create a file in local terminal:
$ touch local_file) -
Send it to an existing folder (e.g.
transfer) on Tetralith: -
$ scp local_file <username>@tetralith.nsc.liu.se:~/transfer/
Check on server that it is there
-
$ ls ~/transfer
Exercise 2: Download a file from the server to your computer, using scp
Tips
- (If you want to create a file in remote ssh terminal:
$ touch remote_file) - Send it to an existing local folder
- Check locally that it is there
Answer (Tetralith example)
On Server
- (If you want to create a remote file first, in an SSH session, do:
$ touch remote_file) - Get it the present local folder:
$ scp <username>@tetralith.nsc.liu.se:~/transfer/remote_file .
Check locally that it is there
(Optional) Exercise 1: Download a directory with many files
Tips
- Be in the
transferdirectory (or similar) and create 3000 (empty) files REMOTELY in a directory with namemany_files$ mkdir many_files$ cd many_files$ touch my-file-{1..3000}.txt
- Time the download of the directory, using
time, and the recursive option to include the files within the directorytime scp ....
Answer (Tetralith example)
-
time scp -r sm_bcarl@tetralith.nsc.liu.se:~/test/many_files .
(Optional) Exercise 2: Test the difference between transferring one or several files (using scp)
Tips
-
Archive the many_files directory
- The original directory is still there! Check!
-
Time the download of the original directory, using
time scp ....- If
timedoes not work, count the seconds!
- If
-
Time the download of the compressed directory, using
time scp ....- If
timedoes not work, count the seconds!
- If
-
Focus on the
userline, becauserealincludes the time for establishing connection and giving the credentials! - Do you spott any difference?
Answer (Tetralith example)
Archiving and step on REMOTE
tar -cvf many_files.tar many_files- The original directory is still there! Check!
LOCALLY
time scp -r sm_bcarl@tetralith.nsc.liu.se:~/transfer/many_files .- note the-rfor recursive and including files in the folder.-
time scp sm_bcarl@tetralith.nsc.liu.se:~/transfer/many_files.tar . -
$ ls -
(or in the File explorer)
Extra¶
Cheat sheet for extra scp
-
scpfollowed by none or any of the following option flags and the files and servers involved -
scp provides a number of options that control every aspect of its behavior. The most widely used options are:
-
-P- Specifies the remote host ssh port. -p- Preserves file modification and access times.-q- Use this option if you want to suppress the progress meter and non-error messages.-C- This option forces scp to compress the data as it is sent to the destination machine.-r- This option tells scp to copy directories recursively.
Cheat sheets for gzip and tar
Compressing is processor intensive
- Can delay transfer of that reason.