Working with Unix
As new Insider's Guide classes are no longer being offered, this site is not currently being updated. Please refer to NCBI's E-utilities documentation for more up-to-date information.
Commands and arguments
Once you have access to a Unix terminal, you will interact with Unix by typing in commands, which are instructions, given by a user, telling a computer to do something. Some commands allow you to modify their behavior by adding arguments, which provide data to be used as the input of a command, or allow you to set options for a command.
For example:
esearch -db pubmed -query "seasonal affective disorder"
In the line above, esearch
is the command, and it has two arguments: -db
and -query
. These arguments provide input to the esearch
command, specifying that we should be searching the database “pubmed”, and that our search query should be “seasonal affective disorder”. (For more about the esearch
command, see our esearch documentation page.)
einfo -dbs
In the line above, we are executing the command einfo
with one argument: -dbs
. Rather than specifying input, the -dbs
argument sets an option, telling einfo
what mode it should use. (For more about the einfo
command, see our einfo documentation page.)
Different commands accept different arguments. Some arguments are required for certain commands, while others are optional. Using a particular command with a particular set of arguments lets you customize your instructions to the computer.
Combining commands together
As was mentioned earlier, one of the strengths of Unix is the ability to take the output of a command and use it as the input for a different command. Unix accomplishes this using the “|” character (pronounced “pipe”, located over the Enter key, on the same key as the backslash). To send the output of one command into another, simply connect them with a “|”:
cat pmids.csv | epost -db pubmed | efetch -format xml
In the line above, we are executing the command cat
on the file “pmids.csv”. We then pipe the output of that command into our next command, epost
. Finally, we pipe the output of the epost
command into a third command, efetch
. (For more information on these commands, see our cat, epost, and efetch documentation pages.)
As you build more elaborate scripts, you may find that your string of commands gets rather long. Long series of commands on a single line can be difficult to read, so it may be more convenient to write your series of commands on multiple lines. Using backslash (“\”), we can indicate to Unix that our command or series of commands continues on the next line. This lets us reformat our previous command as:
cat pmids.csv | \
epost -db pubmed | \
efetch -format xml
The backslash tells Unix that the command or series of commands is not yet complete. If you press enter on a line that ends in a backslash, Unix will not execute the command, but will advance to the next line and allow you to finish typing your commands.
Why didn’t it work?
Once you start working with Unix, you may encounter situations where a series of commands does not work the way you expect (or fails to work at all). Don’t worry; this is quite common! With Unix, every detail matters. Unix is case-sensitive, but is also space-sensitive and spelling sensitive. If you misplace a space, misspell a command, or incorrectly capitalize an argument, your script will most likely fail. Additionally, Unix is not always particularly forthcoming about why your script failed. In many cases, a script will fail and give no indication as to what was wrong.
Successfully using Unix may take some effort. Pay attention to the details. Be willing to experiment. Test early and test often. Above all, have patience!