2. The command line interface

2.1. Why would I want to use the command line?

It is likely that the most common way you interact with your computer is via the mouse or trackpad - to do things like change into different folders, open new folders or files, delete folders or files, etc. While intuitive and simple, this mouse-mediated human-computer interaction is not always a good thing.

For example, let’s imagine that you’re a business owner and you have a Microsoft Word file with a list of customers and the amount they have spent at your business. If you wanted to figure out which customer had spent the most, you might: Open MS Word, click the File –> Open dropdown menu, navigate to the folder with the text file, open the file, and read through it to find the customer who had spent the most.

However, from a scientific and bioinformatic point-of-view, there are at least three fundamental problems with this mouse-driven menu-dropdown human-reading approach. First, it is error prone. In reading the text file, you might mis-read one line, and that line may be the one containing top-spending customer. Second, it is slow. Imagine if you had hundreds of files to look through (e.g. one from each week for the past five years). The job would take days. Third, it is not possible for you or anyone else to repeat the process in a precise and definite manner.

Fortunately, we do not have to do things this way. By using the command line, you can write down all the commands you use and save them (e.g. in a text file). Then, if you want to repeat the process exactly, you can just execute each command in this text file, in order. We will see how this might be done in a future lab.

By having such a list of commands, it is also possible to make the computer execute them in order, making it very feasible to perform this set of commands hundreds or thoughsands of times. Again, we will see how to do this in a future lab.

Finally, by having the computer perform all the commands, we can ensure that they are done correctly (e.g. that the top-spending cutomer is always found) - unless of course there is a bug in your program.

Thus, by using the command line, we have solved all of these problems: we are unlikely to have errors, the process can be automated and applied to thousands of files (Fig. 2.1) in less than a second, and the process is easily repeated by anyone, at any time. For more inspiration, read this brief article.

../_images/100_folders.jpg

Fig. 2.1 This is what the future holds.

Although it may seem that using the command line only complicates your life, you must stay positive, open-minded, and determined. As time goes on, you will begin to see that there are very significant advantages to using the command line to interact with your computer. First, then, you should become acquainted with it.

2.2. Preliminaries

Note that throughout this course you may see commands or files or directories that are named something like my_awesome_file.tab or my_home_directory or myresults.txt. When you see names like this in the instructions, this does not mean that you should name your files or directories in this way. Rather, you should replace these placeholder names with names that are relevant to you, or which are descriptive for you, or which contain your directory names. For example, if you are making a file that contains the results of a quality control analysis of DNA sequences from E. coli, you might name the file ecoli_qc_results.txt.

2.2.1. Naming convention

One important aspect of organising files and directories (folders) is naming convention. When working on the command line, your life will become considerably easier if you avoid using spaces in your files and directory names. Thus, never name your file my awesome file.txt. Instead, name it my_awesome_file.txt (“snake case”), or myAwesomeFile.txt (“camel case”) or my-awesome-file.txt (“kebab case”) or my.awesome.file.txt but probably not MY_AWESOME_FILE.txt (“screaming snake case”). You should pick one of these at the start of the course, and stick to that format throughout the course (i.e. camel case, or kebab case, etc. - see (Fig. 2.2).

../_images/naming.jpg

Fig. 2.2 Please be consistent with your naming.

The second thing to pay attention to when naming files is the extension or suffix. For example text files are usually named with the extension .txt. MS Word files usually have the extension .doc or .docx. In this course, we will run into a wide variety of files with a wide variety of extensions, for example .fastq, .sam, .bam, .txt, .sh, .fasta, .html, .gbk, .bai, .py, .r, .gz, .aln, .tre, .phy, .vcf, .bcf, and many more! Hopefully at the conclusion of this Semester you will be familiar with all of these.

While we are the topic of naming conventions, there are certain characters that you should always avoid when naming files and folders. Besides spaces, these are (not necessarily exhaustive):

: ; ` " ' \ / ! @ # $ % ^ & * ( ) + , ? [ ] { } | > <

2.2.2. Directory structure

In addition to naming conventions, there are good and bad ways to organise your files and directories. Please have a brief read through this resource,

2.4. Becoming a better bioinformatician

Throughout this lab course, google is your friend. If you have errors, or if you are not sure how you might do something, or if you forget a command, google it!

Thus, Step One as you begin the lab is: Approach the command line with confidence and in a calm manner, assured that whatever goes wrong, you can google your way out of it (Fig. 2.5).

../_images/googling.jpg

Fig. 2.5 It’s actually a skill that takes time to develop.

It’s

what

all

good

programmers

do