If you have only used computers for office tasks like word processing, you have been doing operating system tasks, using commands within the word processor to list your files, rename them etc. While it is useful for software packages to have these facilities, they are really re-inventing the wheel, in that they duplicate the things that are already present in the operating system. In fact, with each new software package that provides these functions,you have to re-learn how to do the same housekeeping tasks!
For a variety of reasons, it is neither possible nor desireable for every program to offer the complete facilities of an operating system. Thus, most programs concentrate on the specific task for which they were designed (such as comparing two sequences for similarity). General tasks, such as creating and editing datafiles, are carried out using the tools provided by the operating system.
The programs and databases comprising BIRCH come from a number of different sources, and perform a wide variety of tasks. There is no single program package that will do everything, and certainly none that will do everything well. Additionally, as new methods of analysis are devised, it is useful to be able to simply add new programs to the existing set. For these andother reasons, it is necessary to make the effort to learn how to do aminimal number of tasks using the operating system. The advantage is that once you have learned this minimal subset of the operating system, youcan perform all of the basic housekeeping and editing functions consistently, regardless of how many new programs are added to the system.
This hierarchical organization has the important consequence that software developers don't need to know anything about the architecture of the machine. Further, X11 applications can run regardless of which window manager you use.
In general, you want a big screen for the same fundamental reason youwant a big desk: to fit more on it. For over 2 decades scientists and engineers have derived great productivity gains by being able to have several programs running side by side on the same screen. Most projects related to bioinformatics involve a variety of programs and different types ofdatafiles. Large screen monitors are therefore essential to work effieiently in this field.
Example: 1024x768 CDE screen .
Compare this to the screen you get with 1152x900 or1600x1200 displays.
Unix desktops carry this idea even further by allowing multiple virtual screens. In CDE, you can move between virtual screens by clicking on the screen buttons "One" through "Four" on the control panel.
The GUI was popularized on the small screens introduced with the AppleLisa, which quickly evolved into the Macintosh. Unitl recently, costs have kept most PC monitors at 14 or 15", with resolutions of 800x600. Consequently, even though MS-Windows could create multiple windows, software on the Windows platform has been oriented to the "one window owns the screen"model. There just wasn't enough real estate to put more than one usefulwindow onto the screen at a time. Consequently, most PC users never develop working habits that enable them to take advantage of large screens.
Recently, prices for large monitors have dropped, and Windows systems now typically have 17" screens running at 1024x768. This is the lowest resolution that you're likely to see on any X-terminal or Unix workstation.
The moral of this story is that one of the most useful productivity
investments you can make is to buy large monitors.
cat Write and concatenate filesIf you have used MS-DOS or other operating systems, you will recognize many of these commands by different names, but they accomplish the same thing. For example, 'ls' is comparable to the 'dir' command in DOS, although it does a lot more. Similarly, 'cat' in Unix corresponds to 'type' in DOS, 'cp' to 'copy' and 'mv' to 'rename'. This is not an accident, since DOS was actually patterned after Unix. Consequently, if you are already familiar with DOS, you will have no problem picking up Unix. In fact, after youhave come to appreciate the extra power of Unix, you will find yourselfdissatisfied with the limitations of DOS.
cd Move to new working directory
chmod Change read,write, execute permissions for files
cp Copy files
less View files a page at a time
logout Terminate Unix session
lpr Send files to lineprinter
ls List files and directories
man Read or find Unix manual pages
mkdir Make a new directory
mv Move files
passwd Change password
rm Remove files
rmdir Remove a directory
ps list processes
top list most CPU-intensive processes
kill kill a process
The first thing to do is to read through the
'Unix command summary' under UsingUnix
.
The Unix command summary gives you a quick introduction to how to use
the core Unix commands. You should also browse through the online manual
pages for these commands to have some idea of what they can do. Since they
are online, you don't need to memorize all options for all commands. Forexample,
if you wanted to know more about changing file permissions withthe 'chmod'
command, simply type 'man chmod' to see complete information on
how this command works.
The second important difference between word processors and text editors is the way in which the data is stored. The price you pay for having underlining, bold face, multiple columns, and other features in word processors is the embedding of special computer codes within your file. If you used a word processor to enter data, your datafile would thus also contain these same codes. Consequently, only the word processor can directly manipulate the data in that file.
Text editors offer a way out of this dilemma, because files produced by a text editor contain only the characters that appear on the screen, and nothing more. These files are sometimes referred to as ASCII files, since they only contain standard ASCII characters.
Generally, files created by Unix or by other programs are ASCII files. This seemingly innocuous fact is of great importance, because it implies a certain universality of files. Thus, regardless of which program or Unix command was used to create a file, it can be viewed on the screen ('cat filename'), sent to the printer ('lpr filename'), appended to another file ('cat filename1 >> filename2'), or used as input by other programs. More importantly, all ASCII files can be edited with any text editor.
The vi editor is the universal screen editor available with all UNIX implementations. You can learn how to use vi in any book on Unix or at a Unix introduction class at computer services. For this reason, discussion of the actual use of this editor will not be included here.
In X-windows, several point-and-click editors are available. From the CDE root menu, choose 'Text Editing -> Text editor' to run dtpad :
dtpad can also be launched by typing 'dtpad' at the command line. OtherX11 text editors available include nedit, gedit, axe and xcoral. All can be run from the command line.
Structuring Your Data in Directories
Probably the most useful habit to get into is to organize your files
in tree-structured directories. Whenever you login, you are placed in
your home directory. Depending on what sort of work you are doing, it is
useful to create subdirectories ('mkdir') to hold different
sets of files. For example, if you were working with pea genes, you might
have a subdirectory within your home directory called 'pea'.
{pssun1:/usr/home/bwf/pea}ls -l
total 3
drwx------ 5 bwf 512 Mar 28 18:54 cab
drwx------ 4 bwf 512 Apr 24 18:09 drr
drwx------ 2 bwf 512 Nov 24 17:35 wft
{pssun1:/usr/home/bwf/pea}ls -l drr
total 49
drwx------ 2 bwf 1024 Mar 8 10:02 drr39
drwx------ 2 bwf 1024 Mar 8 17:45 drr49
-rw------- 1 bwf 754 Mar 9 15:15 oligos.dna
-rw------- 1 bwf 19932 Jul 10 1990 pCHS2.seq
{pssun1:/usr/home/bwf/pea}ls -l drr/drr39
total 23
-rw------- 1 bwf 1460 Mar 6 19:13 drr39.aln
-rw------- 1 bwf 354 Mar 4 17:16 drr39.pro.aln
-rw------- 1 bwf 2275 Mar 6 19:16 drr39.ref
-rw------- 1 bwf 314 Sep 7 1990 pi230.pro
-rw------- 1 bwf 570 Mar 4 18:09 pi230.seq
-rw------- 1 bwf 326 Sep 7 1990 pi39.pro
-rw------- 1 bwf 11558 Nov 14 11:18 pi39.rest
-rw------- 1 bwf 556 Mar 4 18:08 pi39.seq
-rw------- 1 bwf 469 Mar 4 18:11 pi39.wrp
In the example, the prompt (enclosed in the {} characters)
shows that the current working directory is pea. Listing the files ('ls -l') shows that pea contains
three subdirectories, indicated by a 'd' in the first column of each line.
A directory listing of the drr directory shows several datafiles ('-'
in column 1) and two subdirectories, each devoted to a particular multigene
familiy (ie. drr39 and drr49). Within each directory are sequences and
other files related to each multigene familiy.
Organization of directories can be tailored to each particular problem. If you were sequencing several genes, each gene should probably have a separate directory to contain all of the files related to the sequencing project. Another approach might be to set up directory hierarchies to match an evolutionary tree. The most important thing is to use some sort or organization that makes sense in the context of the projects you are working on. Here are some general guidelines for organizing directories:
It is sometimes useful to create temporary directories, even if you only use them for half an hour and get rid of them. For example, if you were searching the databases for dna and protein sequences for plant 'pathogenesis-related proteins', you might create a directory called prp, and use this as your working directory when searching for and retrieving the sequences. Once the sequences have been retrieved, you can discard the'false' positives' and then divide the remaining entries among directories specialized for particular classes of sequences (eg. chitinase, glucanase,and so forth). Once the sequences have been re-distributed, you can delete the prp directory.
File extensions identify the type of
data in a file
Most operating systems permit files to have extensions that can be
used to identify the type of data contained in the file. Although use
of file extensions is not required, it is strongly advised that all file
shave file extensions.
The drr39 directory (see above) illustrates the strategic use of file extensions. Two members of the drr39 multigene family have been sequenced: cDNAs pi39 and pi230, whose DNA and protein sequences are stored in pi39.seq and pi230.seq, and pi39.pro and pi230.pro, respectively. Additionally, a restriction site search was done on pi39, and the output stored in pi39.rest (Note that Unix permits file extensions longer than 3 characters). Sequence similarity alignments of the DNA and protein sequences (generated using mase) are stored in the files drr39.aln and drr39.pro.aln.
Another useful convention of file extensions is to use all or part ofthe name of the program that produced the file as the file extension. Thus, the output from a string search using grep would have the file extension '.grep'. Similarly, multiply-aligned sequences re-formated by the reform program have the extension '.ref', as in drr39.ref.
File extensions make it possible to work with groups of files
in single commands. For example, if you wanted to create a new directory
containing only protein sequences taken from the current directory, the
following commands would create the directory 'protein', and move all '.pro'
files into it:
mkdir protein
Similarly, the following commands would create a new directory containing
all pi230-related files:
mv *.pro proteinmkdir 230
mv pi230.* 230
FOR A LIST OF SUGGESTED FILE EXTENSIONS, click here.
| Hint: Don't use
blanks in filenames. Most operating systems allow this but it is a bad practice. Commands are broken down into tokens (ie. strings of non-blanks) by the shell, For example, if you had a directory called 'mouse sequences', typing ls -l mouse sequenceswould tell the shell to list files in two nonexistent directories, mouseand sequences. The safest way to prevent this probelm is to use characterssuch as '_' or '.' to connect words into one long string eg. mouse.sequencesor mouse_sequences. |
The file manager
In X-windows, you can perform most file management tasks using the
file manager. If you are using CDE, a copy of the file manager will automatically
be launched when you begin an X11 session. Additional copies of the file
manager can be launched the workspace menu. The CDE file manager can also
be launched from the command line by typing 'dtfile'.
The file manager can display files both as lists, as icons, and in a tree-structured view. Here is one possible view:
Clicking on a file opens that file, and clicking on a folder opens that folder (directory). Where the file is an ASCII file, it will be opened in the text editor. For other types of files, the file manager will attempt to launch the appropriate application using that file.