KEY CONCEPTS 

  1. Unix Fundamentals
    1. 1.1What is an operating system, and why do we need to know how to use it?
      1.2Your computer account and the world
      1.3 Behind the scenes: What does a computer actually do?
      1.4 Grapic interfaces and Window Managers 
  2. What you need to learn

BIRCH Returnto BIRCH home page

1. Unix Fundamentals

1.1 What is an operating system, and why do we need to know how to use it?

Unix is an operating system, that is, an environment, that provides commands for creating, manipulating, and examining datafiles, and running programs. Some operating systems with which you may be familiar are Windows XP (Intel),OS390 (IBM mainframes), Macintosh OS, or VMS (DEC VAX systems). Despite their differences, all of these operating systems do essentially the same things, which is to act as the unifying framework within which all tasksare performed.

If you have only used computers for office tasks like word processing, you have been doing operating system tasks, using commands within the word processor to list your files, rename them etc. While it is useful for software packages to have these facilities, they are really re-inventing the wheel, in that they duplicate the things that are already present in the operating system. In fact, with each new software package that provides these functions,you have to re-learn how to do the same housekeeping tasks!

For a variety of reasons, it is neither possible nor desireable for every program to offer the complete facilities of an operating system. Thus, most programs concentrate on the specific task for which they were designed (such as comparing two sequences for similarity). General tasks, such as creating and editing datafiles, are carried out using the tools provided by the operating system.

The programs and databases comprising BIRCH come from a number of different sources, and perform a wide variety of tasks. There is no single program package that will do everything, and certainly none that will do everything well. Additionally, as new methods of analysis are devised, it is useful to be able to simply add new programs to the existing set. For these andother reasons, it is necessary to make the effort to learn how to do aminimal number of tasks using the operating system. The advantage is that once you have learned this minimal subset of the operating system, youcan perform all of the basic housekeeping and editing functions consistently, regardless of how many new programs are added to the system.

1.2 Your computer account and the world

1.3. Behind the scenes: What does a computer really do?

1.4 Graphic Interfaces and Window managers


Unix is intrinsically a chaWacter-based operating system. To provideusers with a graphic interface, several layers of software act in concert.The figure at right shows that, from the user's point of view, X11 applications,each generating separate windows, are handled by the window manager. Window managers such as CDE, GNOME or KDE  use X11 library modules to pass commands to the Unix kernel. The Unix kernel is the essence of the operating system. The kernel is a relatively small program that manages system resources such as CPU time, and is theonly program to directly interact with the hardware. System commands also interact directly with the kernel. This hierarchical organization of tasks results in a reliable system. In contrast, Windows XP incorporates many elements of the graphic interface directly into the kernel. Problems withthe graphic interface, then, can crash Windows XP.

This hierarchical organization has the important consequence that software developers don't need to know anything about the architecture of the machine. Further, X11 applications can run regardless of which window manager you use.

Multiple Windows and High Resolution Monitors: Size DOES count!

The Graphic User Interface (GUI) was invented for Unix systems at Xerox PARC lab in the early 1980s. The GUI was originally designed to provide multiple windows on high end scientific workstations, with high resolution screens. High resolution monitors with  screen sizes of  17"or greater and resolutions of 1024x768 pixels or greater, have been standard on Unix workstations since the 1980s.

In general, you want a big screen for the same fundamental reason youwant a big desk: to fit more on it. For over 2 decades scientists and engineers have derived great productivity gains by being able to have several programs running side by side on the same screen. Most projects related to bioinformatics involve a variety of programs and different types ofdatafiles. Large screen monitors are therefore essential to work effieiently in this field.

Example: 1024x768 CDE screen .

Compare this to the screen you get with 1152x900 or1600x1200 displays.

Unix desktops carry this idea even further by allowing multiple virtual screens. In CDE, you can move between virtual screens by clicking on the screen buttons "One" through "Four" on the control panel.

The GUI was popularized on the small screens introduced with the AppleLisa, which quickly evolved into the Macintosh. Unitl recently, costs have kept most PC monitors at 14 or 15", with resolutions of 800x600. Consequently, even though MS-Windows  could create multiple windows, software on the Windows platform has been oriented to the "one window owns the screen"model. There just wasn't enough real estate to put more than one usefulwindow onto the screen at a time. Consequently, most PC users never develop working habits that enable them to take advantage of large screens.

Recently, prices for large monitors have dropped, and Windows systems now typically have 17" screens running at 1024x768. This is the lowest resolution that you're likely to see on any X-terminal or Unix workstation.

The moral of this story is that one of the most useful productivity investments you can make is to buy large monitors.
 

2 What you need to learn

Although Unix is an immense operating system, it is possible to define a small set of commands that will enable you to do most of the things you need to do, and to find out how to do new tasks, as the need arises. The minimal set includes: It may also be adviseable to buy a book on using Unix. In particluar, APractical Guide to the Unix System by Mark G. Sobell, is easy to read and contains a useful reference guide for frequently-used commands.

2.1 The core commands

If you learn the commands listed below, you will be able to do the vast majority of what you need to do on the computer, without having to learn the literally thousands of other commands that are present on the system.
cat       Write and concatenate files
cd        Move to new working directory
chmod     Change read,write, execute permissions for files
cp        Copy files
less      View files a page at a time
logout    Terminate Unix session
lpr       Send files to lineprinter
ls        List files and directories
man       Read or find Unix manual pages
mkdir     Make a new directory
mv        Move files
passwd    Change password
rm        Remove files
rmdir     Remove a directory
ps        list processes
top       list most CPU-intensive processes
kill      kill a process
If you have used MS-DOS or other operating systems, you will recognize many of these commands by different names, but they accomplish the same thing. For example, 'ls' is comparable to the 'dir' command in DOS, although it does a lot more. Similarly, 'cat' in Unix corresponds to 'type' in DOS, 'cp' to 'copy' and 'mv' to 'rename'. This is not an accident, since DOS was actually patterned after Unix. Consequently, if you are already familiar with DOS, you will have no problem picking up Unix. In fact, after youhave come to appreciate the extra power of Unix, you will find yourselfdissatisfied with the limitations of DOS.

The first thing to do is to read through the 'Unix command summary' under UsingUnix .
The Unix command summary gives you a quick introduction to how to use the core Unix commands. You should also browse through the online manual pages for these commands to have some idea of what they can do. Since they are online, you don't need to memorize all options for all commands. Forexample, if you wanted to know more about changing file permissions withthe 'chmod' command, simply type 'man chmod' to see  complete information on how this command works.

2.2 The text editors

A text editor is a program that lets you enter data into files, and modify it, with a minimal amount of fuss. Text editors are distinct from word processors in two crucial ways. First, the text editor is a much simpler program, providing none of the formatting features (eg. footnotes, special fonts, tables, graphics, pagination) that word processors provide. This means that the text editor is simpler to learn, and what it can do is adequate for the task of entering a sequence, changing a few lines of text, or writing a quick note to send by electronic mail. For these simple tasks, it is easier and faster to use a text editor.

The second important difference between word processors and text editors is the way in which the data is stored. The price you pay for having underlining, bold face, multiple columns, and other features in word processors is the embedding of special computer codes within your file. If you used a word processor to enter data, your datafile would thus also contain these same codes. Consequently, only the word processor can directly manipulate the data in that file.

Text editors offer a way out of this dilemma, because files produced by a text editor contain only the characters that appear on the screen, and nothing more. These files are sometimes referred to as ASCII files, since they only contain standard ASCII characters.

Generally, files created by Unix or by other programs are ASCII files. This seemingly innocuous fact is of great importance, because it implies a certain universality of files. Thus, regardless of which program or Unix command was used to create a file, it can be viewed on the screen ('cat filename'), sent to the printer ('lpr filename'), appended to another file ('cat filename1 >> filename2'), or used as input by other programs. More importantly, all ASCII files can be edited with any text editor.

The vi editor is the universal screen editor available with all UNIX implementations. You can learn how to use vi in any book on Unix or at a Unix introduction class at computer services. For this reason, discussion of the actual use of this editor will not be included here.

In X-windows, several point-and-click editors are available. From the CDE root menu, choose 'Text Editing -> Text editor' to run dtpad :

dtpad can also be launched by typing 'dtpad' at the command line. OtherX11 text editors available include nedit, gedit, axe and xcoral. All can be run from the command line.

2.3 File organization

It is very easy to rapidly generate so many files that they become an unmanageable mess. (Think of the desk of the 'Perfessor' in the Shoe comic strip.) This section will describe some strategies for managing your data, and keeping things simple.

Structuring Your Data in Directories
Probably the most useful habit to get into is to organize your files in tree-structured directories. Whenever you login, you are placed in your home directory. Depending on what sort of work you are doing, it is useful to create subdirectories ('mkdir') to hold different sets of files. For example, if you were working with pea genes, you might have a subdirectory within your home directory called 'pea'. 

{pssun1:/usr/home/bwf/pea}ls -l
total 3
drwx------  5 bwf      512 Mar 28 18:54 cab
drwx------  4 bwf      512 Apr 24 18:09 drr
drwx------  2 bwf      512 Nov 24 17:35 wft
{pssun1:/usr/home/bwf/pea}ls -l drr
total 49
drwx------  2 bwf     1024 Mar  8 10:02 drr39
drwx------  2 bwf     1024 Mar  8 17:45 drr49
-rw-------  1 bwf      754 Mar  9 15:15 oligos.dna
-rw-------  1 bwf    19932 Jul 10  1990 pCHS2.seq
{pssun1:/usr/home/bwf/pea}ls -l drr/drr39
total 23
-rw-------  1 bwf     1460 Mar  6 19:13 drr39.aln
-rw-------  1 bwf      354 Mar  4 17:16 drr39.pro.aln
-rw-------  1 bwf     2275 Mar  6 19:16 drr39.ref
-rw-------  1 bwf      314 Sep  7  1990 pi230.pro
-rw-------  1 bwf      570 Mar  4 18:09 pi230.seq
-rw-------  1 bwf      326 Sep  7  1990 pi39.pro
-rw-------  1 bwf    11558 Nov 14 11:18 pi39.rest
-rw-------  1 bwf      556 Mar  4 18:08 pi39.seq
-rw-------  1 bwf      469 Mar  4 18:11 pi39.wrp
In the example, the prompt (enclosed in the {} characters) shows that the current working directory is pea. Listing the files ('ls -l') shows that pea contains three subdirectories, indicated by a 'd' in the first column of each line. A directory listing of the drr directory shows several datafiles ('-' in column 1) and two subdirectories, each devoted to a particular multigene familiy (ie. drr39 and drr49). Within each directory are sequences and other files related to each multigene familiy.

Organization of directories can be tailored to each particular problem. If you were sequencing several genes, each gene should probably have a separate directory to contain all of the files related to the sequencing project. Another approach might be to set up directory hierarchies to match an evolutionary tree. The most important thing is to use some sort or organization that makes sense in the context of the projects you are working on. Here are some general guidelines for organizing directories:

  1. Your home directory should be mostly composed of subdirectories. Leave individual files there only on a temporary basis.
  2. Directory organization is for your convenience. Whenever a set of files all relate to the same thing, dedicate a directory to them.
  3. If a directory gets too big (eg. more files than will fit on the screen when you type 'ls -l'), it's time to split it into two or more subdirectories.
  4. Don't go overboard with directories. By splitting your files among too many directories, you could make it harder to use your data.
  5. If you need to use a deeply-nested directory often, you might define an environment variable in your .cshrc file that refers to that directory.
Directories can evolve
The tree-structured directories you create are not cast in concrete. Unix is uniquely suited to re-shuffling directories at will. For example, if you were sequencing three cab genes, you might have three separate directories for genes a, b and c, called caba, cabb and cabc. When the sequences arecompleted, it might be more useful to reorganize files related to these sequences by other criteria. For example, two directories, cabpro and cabdna might contain amino acid and dna sequence of the three genes, respectively. A third directory, cabfig, might contain figures for publication using the three sequences. The organizational utility of directory hierarchiesis limited only by your imagination.

It is sometimes useful to create temporary directories, even if you only use them for half an hour and get rid of them. For example, if you were searching the databases for dna and protein sequences for plant 'pathogenesis-related proteins', you might create a directory called prp, and use this as your working directory when searching for and retrieving the sequences. Once the sequences have been retrieved, you can discard the'false' positives' and then divide the remaining entries among directories specialized for particular classes of sequences (eg. chitinase, glucanase,and so forth). Once the sequences have been re-distributed, you can delete the prp directory.

File extensions identify the type of data in a file
Most operating systems permit files to have extensions that can be used to identify the type of data contained in the file. Although use of file extensions is not required, it is strongly advised that all file shave file extensions.

The drr39 directory (see above)  illustrates the strategic use of file extensions. Two members of the drr39 multigene family have been sequenced: cDNAs pi39 and pi230, whose DNA and protein sequences are stored in pi39.seq and pi230.seq, and pi39.pro and pi230.pro, respectively. Additionally, a restriction site search was done on pi39, and the output stored in pi39.rest (Note that Unix permits file extensions longer than 3 characters). Sequence similarity alignments of the DNA and protein sequences (generated using mase) are stored in the files drr39.aln and drr39.pro.aln.

Another useful convention of file extensions is to use all or part ofthe name of the program that produced the file as the file extension. Thus, the output from a string search using grep would have the file extension '.grep'. Similarly, multiply-aligned sequences re-formated by the reform program have the extension '.ref', as in drr39.ref.

File extensions make it possible to work with groups of files in single commands. For example, if you wanted to create a new directory containing only protein sequences taken from the current directory, the following commands would create the directory 'protein', and move all '.pro' files into it:

mkdir  protein
mv  *.pro  protein
Similarly, the following commands would create a new directory containing all pi230-related files:
mkdir 230
mv pi230.* 230
 

FOR A LIST OF SUGGESTED FILE EXTENSIONS, click here.

Hint: Don't use blanks in filenames.
Most operating systems allow this but it is a bad practice. Commands are broken down into tokens (ie. strings of non-blanks) by the shell, For example, if you had a directory called 'mouse sequences', typing 
ls -l mouse sequences
would tell the shell to list files in two nonexistent directories, mouseand sequences. The safest way to prevent this probelm is to use characterssuch as '_' or '.' to connect words into one long string eg. mouse.sequencesor mouse_sequences.

The file manager
In X-windows, you can perform most file management tasks using the file manager. If you are using CDE, a copy of the file manager will automatically be launched when you begin an X11 session. Additional copies of the file manager can be launched the workspace menu. The CDE file manager can also be launched from the command line by typing 'dtfile'.

The file manager can display files both as lists, as icons, and in a tree-structured view. Here is one possible view:

Clicking on a file opens that file, and clicking on a folder opens that folder (directory). Where the file is an ASCII file, it will be opened in the text editor. For other types of files, the file manager will attempt to launch the appropriate application using that file.