Copying files using the command line
[anchors selector="#content h2, #content h3" /]
Learning to copy files using the command line is one of the most difficult tasks some students will encounter during Workshop practicals.
The faculty are not forcing the students to copy files using the command line based on a 鈥渢hat鈥檚 the way we did it鈥 mentality, but rather on our current experience. Much of the data analysis that happens today is done on computing clusters in which the users鈥 only interaction with the computer is on the command line. Learning to use the command line effectively is an extremely important skill for the toolbox of anybody with data analysis ambitions.
This document is long, because it is attempting to explain things from as near to first principles as possible. You are likely familiar with many of the concepts discussed in the next session, in which case skip to the topics that will be useful.
Quick start
If you are already familiar with navigating directories and using the command line to copy files, then you should find getting started with the practicals to be straightforward.
You will create some directories to organize things, and then in most cases copy scripts and data out of the appropriate directory in /faculty to the directories you鈥檝e created.
Some definitions
Because for many of the students this may be a completely new topic, We鈥檒l start with some definitions.
Computer words
- command line: The command line refers to a text based interface with a computer. Examples of these are the MS Window鈥檚 Command Shell and PowerShell and MacOS鈥 terminal. In this course connections to the cloud computing environment鈥檚 command line will be made using SSH.
- cli: An abbreviation for 鈥渃ommand line interface鈥. This is often used when describing programs that are run at the commmand line. For example, R has a cli, but there are also other methods to interact with R, such as RStudio. plink and PRSice have a cli.
- SSH: a versatile tool for securily transfering data between computers. In this course students may use SSH to access a command line in the cloud computing environment, or to copy files to or from the cloud computing environment.
- directory: A directory, also called a 鈥渇older鈥 is an organizational unit for computer files. Files exist within a directory. Directories may contain files, other directories (subdirectories), or even nothing at all.
- folder: Synonym for 鈥渄irectory鈥. The two terms can be used interchangeably.
- directory you鈥檙e in: This, or other phrases involving 鈥渋n鈥 refer to the current working directory for the command line. Commands that do not specify a different directory will happen on the directory you are in. For example, less foo will show the contents of a file named 鈥渇oo鈥 in the directory you are in. less /faculty/foo will show the contents of a file named 鈥渇oo鈥 in the /faculty directory.
- cd鈥淐hange directory鈥: is the command used to change what directory you are in. It is similar to setwd() in R.
- home directory: Each user has a 鈥渉ome directory鈥 where all of their files are stored. This can be abbreviated as ~ (the tilde symbol).
- faculty directory: For convenience, all of the faculties鈥 home directories are assembled in a folder called /faculty.
- subdirectory: A directory that is within another directory. All directories (except the 鈥渞oot鈥 directory) are subdirectories of other directories. Usually 鈥渟ubdirectory鈥 will be used when this relationship is important. For example, instructions may say 鈥渃opy the 鈥楬W2鈥 directory into a subdirectory of your 鈥楧ay1鈥 directory.鈥
- /鈥渇orward slash鈥: The forward slash is the Unix/Linux/MacOS directory separator. When writing out directory names the 鈥/鈥 is used to separate directories and subdirectories. For example, ~/Day1/HW2 refers to a subdirectory named 鈥淗W2鈥, which is inside a directory named 鈥淒ay1鈥, which is inside of your home directory.
- path: 鈥淧ath鈥 is used to refer to a series of directories and subdirectories. ~/Day1/HW2 is a path.
- file: an entity on a computer file system. Files may contain text, data, program instructions, or application specific data, such as a PowerPoint slide deck.
- .鈥渄辞迟鈥: (A single period, or 鈥渄辞迟鈥) This represents the directory the command line is operating in.
- ..鈥渄ot dot鈥: (Two periods, or 鈥渄ots鈥) This represents the directory one level higher in the hierarchy.
- copy: The act of duplicating a file or directory from its origin to a different location or name. This action is usually done with the cp command.
- move: The act of removing a file or directory from its origin, and putting it in a different location, or changing its name. This action is usually done with the mv command.
- cp鈥渃辞辫测鈥: The Unix/Linux/MacOS command used to copy files or directories.
- mv鈥渕辞惫别鈥: The Unix/Linux/MacOS command used to move or rename files or directories.
- ls鈥渓颈蝉迟鈥: The Unix/Linux/MacOS list command. It shows the names of files and directories, and can also show other informatino about them.
- less鈥渟ometimes less is more鈥: A general purpose tool for looking at the contents of a text file.
- mkdir鈥渕ake directory鈥 : The command to create a directory. For example, mkdir foo will create an empty directory named 鈥渇oo鈥.
- */wild cards/globbing: These are characters which can be used to match multiple other characters. It is a powerful tool to avoid having to type multiple file names, when action is to be performed on several files. For example foo.* could be used to match foo.bar, foo.baz, and foo.boz. The collective name of the characters used is 鈥渨ild cards,鈥 and the action of matching wild cards to files is called globbing.
- command line switches or options: Extra text given to a command to affect its behavior. Switches are often preceeded by - or --. For example in cp -v the 鈥-v鈥 is a switch to the 鈥渃p鈥 command.
- command line arguments: This text after a command which tells the command what to operate on. For example in cp foo bar 鈥渇oo鈥 and 鈥渂ar鈥 are arguments to the 鈥渃p鈥 command. Some commands may require switches before some arguments.
- ENTER or RETURN: After typing a command at the command line, the ENTER or RETURN key must be pressed to submit the command.
Display conventions in this document
Text will be shown in several different fonts and formats to express meaning.
Text in a fixed, or typewriter, font represents text on a command line. Either something that the user types, or that the computer outputs.
A screen shot of a terminal will show a sequence of command line entries and responses.
Required arguments to a command will be represented by text surrounded by pointy brackets < >. For example in cp it is shown that some argument must be provided in the 鈥渟ource鈥 and 鈥渄estination鈥 location. When substituting in real values for the arguments, the pointy brackets are not included. So the typed command would look like cp source destination, to copy the file 鈥渟ource鈥 to a file named 鈥渄estination鈥.
Optional arguments are shown with square brackets [ ]. These are arguments which are not necessary for the command to function, but may be provided by the user to achieve desired results.
Anatomy of a command line
There are several items on the default command line used at the Workshop.
- The first part is your username. In this case the example username is student.
- @ is a separator.
- Then comes the computer name. In this example it is ip-10-0-201-191, but the exact name will be different depending on which cloud node you are connected to.
- : is a separator.
- ~ shows the current directory path. ~ is used as a shorthand for the current user鈥檚 home directory.
- $ is the end of the command line. Anything typed will appear after the $. Instructions later may show, for example, $ ls which will mean the user has typed ls at the command line.
- The green rectangle is the cursor. Depending on your SSH client and exact terminal settings, the exact color and shape of the cursor will vary.
Putting that all together, if you see a command line showing
smith12@ip-10-0-200-233:~/day2/R-files$That means the user smith12 is logged into the compute node ip-10-0-200-233 and is currently in their home directory, and then in the subdirectories day2 and day2鈥檚 subdirectory R-files.
Looking at files and directories
The list command
ls is the command used to list the names of files and directories.
The command ls is run at the command line, and it shows a single thing is in the current directory, somethings named 鈥淩鈥 and 鈥渇oo鈥.
Switches can be given to ls to have it provide more information.
The 鈥渄鈥 in drwxr-xr-x shows that the thing named 鈥淩鈥 is a directory, and the first 鈥-鈥 in -rw-r--r-- shows that 鈥渇oo鈥 is a regular file. The letters following the first one have to do with permissions, and aren鈥檛 important at the moment.
Next is shown the owner of the files, 鈥渟tudent,鈥 and the group of the file, 鈥渟tudents,鈥. These also aren鈥檛 important for what we鈥檙e doing.
Next is shown the size of the file, then the date and time the file was last modified, and finally the name of the file or directory.
ls and ls -l are extremely useful for seeing what files and directories exist.
ls can be given a directory as an argument, and it will show the contents of that directory.
In all of these examples, ls is showing directories in blue. That will probably be how your screen looks, but depending on exactly which terminal and SSH client you use directories may be shown in the same color as regular files.
Looking inside a file
The less command can be used to view the contents of a text file. Many files, such as R scripts and some data files are just text, and can be easily viewed with less.
To view the contents of a file, run less .
student@ip-10-0-200-228:~$ less foowill show
becasue foo is literally filled with some random text. The final line foo (END) is a status message from less. It is giving the name of the file being viewed, and showing the position in the file.
If the file is long enough, it can be scrolled by pressing the arrow keys.
To exit less, press the q (quit) key.
Navigating directories
Moving between directories is done using the cd 鈥渃hange directory鈥 command. The syntax of the command is cd [destination]".
Where the destination is the name of the directory you want to move into. The destination is optional, because running cd with no destination will return you to your home directory.
Entering cd R has moved the user into the 鈥淩鈥 directory, and the command prompt has been updated to reflect this change.
A full path can be given as the argument to cd
and you will be moved to the final directory in the path. The effect is the same as using multiple cd commands
As can be seen in the previous few examples, the 鈥/鈥 (forward slash) character is extremely important, and it has different meanings depending on where it is in the path.
When at the start of a name, it is telling the computer to look in the 鈥渞oot鈥 directory for that item. For example 鈥/faculty鈥 is in the 鈥渞oot鈥 folder.
When in between names, it tells the computer that those are different directories or files. For example 鈥渆lizabeth/2022/corrs.csv鈥 is referencing something named 鈥渃orrs.csv鈥 which is in the 鈥渆lizabeth鈥 directory and then the 鈥2022鈥 subdirectory.
Leaving out a 鈥/鈥 means that you are referencing something in the directory you are currently in. For example
There is no directory called 鈥/R鈥, so it is not possible to change there. An error is shown, 鈥淣o such file or directory鈥. This error is not serious, and does not cause any problems. It just means that the change directory command could not complete, and you should check for typos, a misplaced /, or other problems.
Actually copying files
Using cp
鈥渃p鈥 is the primary command used to copy files at the command line.
The basic syntax
The basic syntax for cp is
cpcp creates a duplicate of the source file (or directory in some circumstances) at the destination.
Copying the file 鈥渇oo鈥 to another file called 鈥渂ar鈥 is done with the command
cp foo barThis will result in two identical files, foo and bar in the current directory.
cp can be combined with wild cards to copy multiple files at the same time. For example
cp /faculty/elizabeth/2022/*.R .will copy all of the files that end in 鈥.R鈥 to the current directory, which is referenced by 鈥.鈥 which is usually spoken as 鈥渄辞迟鈥.
cp can be given the -r 鈥渞ecursive鈥 switch to cause it to copy a directory, and everything in that directory. For example
has copied everything in the directory /faculty/elizabeth/2022 to the current directory. There is now a new 2022 directory which contains a copy of everything that is in the /faculty/elizabeth/2022 directory.
At the start of most practicals, you will use either cp -r or cp with wild cards to copy files out of the appropriate directory under /faculty to one of your directories.
Using mv
鈥渕v鈥 is the primary command used to move files at the command line. mv is used in a similar way to cp, but there are some very important differences. The most important is that mv removes the source file. After running mv you still have the same number of files or directories you started with, they are just located someplace else, or have a different name.
For example
mv foo barrenames the file 鈥渇oo鈥 to 鈥渂ar鈥. In this case 鈥渇oo鈥 could have been a directory, and then it will be renamed to a directory called 鈥渂ar鈥.
When moving multiple files (or files and directories), then the destination must be a directory.
mv *.R My-RWill move all of the files in the current directory that end in 鈥.R鈥 into the directory 鈥淢y-R鈥. The destination directory, 鈥淢y-R鈥 in the example, must exist before running the mv command. It will not automatically be created.
Creating directories with mkdir
The command to 鈥渕ake directories鈥 is mkdir. It is very simple to use, just mkdir .
To create a directory called 鈥渇oo鈥 just run
mkdir fooYour friends T专 and Up Arrow
Two huge time savers are the use of the T专 key and the Up Arrow key.
T专
T专 is used to complete text on the command line. For example, if I want to copy files from /faculty/elizabeth/2022, I don鈥檛 need to type out all of those characters. This is what my typing will actually look like, with #T专# for each time I press the T专 key.
cp /f#T专#completes to
cp /faculty/and then continue typing
cp /faculty/el#T专#which completes to
cp /faculty/elizabeth/If a completion isn鈥檛 unique, then pressing T专 a second time will list the possible completions. If nothing is listed after repeated pressings of T专, then there aren鈥檛 any possible completions.
This is what that might look like at a terminal, with a red mark inserted each time I pressed the T专 key, and the rest of the line being what was automaticaly added by the computer.
Use of the T专 key is highly recommended to avoid typos in long file and directory names.
Up Arrow
The Up Arrow is used to recall previous typed commands. Those commands can then be edited or used again as they are. The red arrow shows where I pressed the up arrow.
I pressed the Up Arrow once to recover ls, pressed ENTER, and then I pressed the Up Arrow again to recover ls, but edited the line to add a -l before pressing ENTER.
That is a very trivial example, and it is hardly worth pressing the Up Arrow to recover ls, but on long and complicated commands, the Up Arrow is a large time saver.
After pressing the Up Arrow multiple times and getting into your command 鈥渉istory鈥, it is possible to use the Down Arrow to move to more recent commands. You can return to an empty command line by either pressing the Down Arrow until you are back at a bare prompt, or pressing ctrl-c.
Long commands can be edited by using the left and right arrows, and when modified to your satisfaction, pressing the ENTER (or RETURN) key will submit the command line.