KEY CONCEPTS
- Unix Fundamentals
1.1What is an operating
system, and why do we need to know how to use it?
1.2Your
computer account and the world
1.3 Behind the scenes:
What does a computer actually do?
1.4 Grapic interfaces and Window
Managers
- What you need to learn
Returnto BIRCH home page
1. Unix
Fundamentals
1.1 What is an operating system, and
why do we need to know how to use it?
Unix is an operating system, that is, an environment, that
provides commands for creating, manipulating, and examining datafiles,
and
running programs. Some operating systems with which you may be familiar
are
Windows XP (Intel), OS390 (IBM mainframes), Macintosh OS, or VMS (DEC
VAX
systems). Despite their differences, all of these operating systems do
essentially
the same things, which is to act as the unifying framework within which
all
tasksare performed.
If you have only used computers for office tasks like word
processing, you have been doing operating system tasks, using commands
within the
word processor to list your files, rename them etc. While it is useful
for software packages to have these facilities, they are really
re-inventing
the wheel, in that they duplicate the things that are already present
in the operating system. In fact, with each new software package that
provides these functions, you have to re-learn how to do the same
housekeeping
tasks!
For a variety of reasons, it is neither possible nor
desireable for every program to offer the complete facilities of an
operating system. Thus, most programs concentrate on the specific task
for which they were designed (such as comparing two sequences for
similarity). General tasks, such as creating and editing datafiles, are
carried out using the tools
provided by the operating system.
The programs and databases comprising BIRCH come from a number
of different sources, and perform a wide variety of tasks. There is no
single program package that will do everything, and certainly none that
will do everything well. Additionally, as new methods of analysis are
devised,
it is useful to be able to simply add new programs to the existing set.
For these and other reasons, it is necessary to make the effort to learn
how to do a minimal number of tasks using the operating system. The
advantage is that once you have learned this minimal subset of the
operating system, you can perform all of the basic housekeeping and
editing functions consistently, regardless of how many new programs are
added to the system.
1.4 Graphic
Interfaces and Window managers
Unix is intrinsically a character-based operating system. To
provideusers with a graphic interface, several layers of software act
in concert. The figure at right shows that, from the user's point of
view, X11 applications,each generating separate windows, are handled by
the window manager. Window
managers such as CDE, GNOME or KDE use X11 library modules to
pass
commands to the Unix kernel. The Unix kernel is the essence of the
operating
system. The kernel is a relatively small program that manages system
resources
such as CPU time, and is theonly program to directly interact with the
hardware. System commands also interact directly with the kernel. This
hierarchical organization of tasks results in a reliable system. In
contrast,
Windows XP incorporates many elements of the graphic interface directly
into the kernel. Problems with the graphic interface, then, can crash
Windows
XP.
This hierarchical organization has the important
consequence
that software developers don't need to know anything about the
architecture
of the machine. Further, X11 applications can run regardless of which
window manager you use.
Multiple Windows and High Resolution
Monitors: Size DOES count!
The Graphic User Interface (GUI) was invented for Unix systems
at Xerox PARC lab in the early 1980s. The GUI was originally designed
to provide multiple windows on high end scientific workstations, with
high resolution screens. High resolution monitors with screen
sizes of 17"or greater and resolutions of 1024x768 pixels or
greater, have been standard on Unix workstations since the 1980s.
In general, you want a big screen for the same fundamental
reason you want a big desk: to fit more on it. For over 2 decades
scientists and engineers have derived great productivity gains by being
able to have
several programs running side by side on the same screen. Most projects
related to bioinformatics involve a variety of programs and different
types
of datafiles. Large screen monitors are therefore essential to work
effieiently
in this field.
Example: 1024x768 CDE screen .
Compare this to the screen you get with 1152x900 or1600x1200
displays.
Unix desktops carry this idea even further by allowing
multiple virtual screens. In CDE, you can move between virtual screens
by clicking on the screen buttons "One" through "Four" on the control
panel.
The GUI was popularized on the small screens introduced
with
the AppleLisa, which quickly evolved into the Macintosh. Unitl
recently,
costs have kept most PC monitors at 14 or 15", with resolutions of
800x600.
Consequently, even though MS-Windows could create multiple
windows,
software on the Windows platform has been oriented to the "one window
owns
the screen"model. There just wasn't enough real estate to put more than
one
usefulwindow onto the screen at a time. Consequently, most PC users
never
develop working habits that enable them to take advantage of large
screens.
Recently, prices for large monitors have dropped, and
Windows systems now typically have 17" screens running at 1024x768.
This is the lowest resolution that you're likely to see on any
X-terminal or Unix workstation.
The moral of this story is that one of the most useful
productivity investments you can make is to buy large monitors.
2 What you
need to learn
Although Unix is an immense operating system, it is possible
to define a small set of commands that will enable you to do most of
the things you need to do, and to find out how to do new tasks, as the
need arises. The minimal set includes:
- a core of Unix commands
- a text editor
- knowledge of how to organize files in
directories, andfilename syntax
- how to use a mailer
- how to read Usenet news
It may also be adviseable to buy a book on using Unix. In
particluar, APractical Guide to the Unix System by Mark G.
Sobell,
is easy to read and contains a useful reference guide for
frequently-used
commands.
2.1 The core
commands
If you learn the commands listed below, you will be able to do
the vast majority of what you need to do on the computer, without
having to learn the literally thousands of other commands that are
present on
the system.
cat Write and concatenate files
cd Move to new working directory
chmod Change read,write, execute permissions for files
cp Copy files
less View files a page at a time
logout Terminate Unix session
lpr Send files to lineprinter
ls List files and directories
man Read or find Unix manual pages
mkdir Make a new directory
mv Move files
passwd Change password
rm Remove files
rmdir Remove a directory
ps list processes
top list most CPU-intensive processes
kill kill a process
If you have used MS-DOS or other operating systems, you will recognize
many of these commands by different names, but they accomplish the same
thing. For example, 'ls' is comparable to the 'dir' command in DOS,
although it does a lot more. Similarly, 'cat' in Unix corresponds to
'type' in DOS, 'cp' to 'copy' and 'mv' to 'rename'. This is not an
accident, since DOS
was actually patterned after Unix. Consequently, if you are already
familiar
with DOS, you will have no problem picking up Unix. In fact, after
youhave
come to appreciate the extra power of Unix, you will find
yourselfdissatisfied with the limitations of DOS.
The first thing to do is to read through
the
'Unix command summary' under UsingUnix
.
The Unix command summary gives you a quick introduction to how to use
the core Unix commands. You should also browse through the online
manual pages for these commands to have some idea of what they can do.
Since they are online, you don't need to memorize all options for all
commands. Forexample, if you wanted to know more about changing file
permissions withthe 'chmod' command, simply type 'man chmod' to
see complete information on
how this command works.
2.2 The text
editors
A text editor is a program that lets you enter data into
files, and modify it, with a minimal amount of fuss. Text editors are
distinct from word processors in two crucial ways. First, the text
editor is a much simpler program, providing none of the formatting
features (eg. footnotes, special fonts, tables, graphics, pagination)
that word processors provide. This means that the text editor is
simpler to learn, and what it can do
is adequate for the task of entering a sequence, changing a few lines
of
text, or writing a quick note to send by electronic mail. For these
simple
tasks, it is easier and faster to use a text editor.
The second important difference between word processors and
text editors is the way in which the data is stored. The price you pay
for
having underlining, bold face, multiple columns, and other features in
word processors is the embedding of special computer codes within your
file. If you used a word processor to enter data, your datafile would
thus also contain these same codes. Consequently, only the word
processor
can directly manipulate the data in that file.
Text editors offer a way out of this dilemma, because files
produced by a text editor contain only the characters that appear on
the screen, and nothing more. These files are sometimes referred to as
ASCII files,
since they only contain standard ASCII characters.
Generally, files created by Unix or by other programs are
ASCII files. This seemingly innocuous fact is of great importance,
because it implies a certain universality of files. Thus, regardless of
which program or Unix command was used to create a file, it can be
viewed on the screen ('cat
filename'), sent to the printer ('lpr filename'), appended
to another file ('cat
filename1 >> filename2'),
or used as input by other programs. More importantly, all ASCII files
can be edited with any text editor.
The vi editor is the universal screen editor available with
all UNIX implementations. You can learn how to use vi in any book on
Unix
or at a Unix introduction class at computer services. For this reason,
discussion of the actual use of this editor will not be included here.
In X-windows, several point-and-click editors are
available.
From the CDE root menu, choose 'Text Editing -> Text editor' to run dtpad :
dtpad can also be launched by typing 'dtpad' at the command
line. OtherX11 text editors available include nedit, and gedit. All can
be launched from the command line.
2.3 File
organization
It is very easy to rapidly generate so many files that they
become an unmanageable mess. (Think of the desk of the 'Perfessor' in
the Shoe comic strip.) This section will describe some strategies for
managing your data, and keeping things simple.
Structuring Your Data in Directories
Probably the most useful habit to get into is to organize your files in
tree-structured directories. Whenever you login, you are placed in
your home directory. Depending on what sort of work you are doing, it
is
useful to create subdirectories ('mkdir') to hold
different
sets of files. For example, if you were working with pea genes, you
might
have a subdirectory within your home directory called 'pea'.
{pssun1:/usr/home/bwf/pea}ls -l
total 3
drwx------ 5 bwf 512 Mar 28 18:54 cab
drwx------ 4 bwf 512 Apr 24 18:09 drr
drwx------ 2 bwf 512 Nov 24 17:35 wft
{pssun1:/usr/home/bwf/pea}ls -l drr
total 49
drwx------ 2 bwf 1024 Mar 8 10:02 drr39
drwx------ 2 bwf 1024 Mar 8 17:45 drr49
-rw------- 1 bwf 754 Mar 9 15:15 oligos.dna
-rw------- 1 bwf 19932 Jul 10 1990 pCHS2.seq
{pssun1:/usr/home/bwf/pea}ls -l drr/drr39
total 23
-rw------- 1 bwf 1460 Mar 6 19:13 drr39.aln
-rw------- 1 bwf 354 Mar 4 17:16 drr39.pro.aln
-rw------- 1 bwf 2275 Mar 6 19:16 drr39.ref
-rw------- 1 bwf 314 Sep 7 1990 pi230.pro
-rw------- 1 bwf 570 Mar 4 18:09 pi230.seq
-rw------- 1 bwf 326 Sep 7 1990 pi39.pro
-rw------- 1 bwf 11558 Nov 14 11:18 pi39.rest
-rw------- 1 bwf 556 Mar 4 18:08 pi39.seq
-rw------- 1 bwf 469 Mar 4 18:11 pi39.wrp
In the example, the prompt (enclosed in the {} characters)
shows that the current working directory is pea. Listing the files ('ls -l') shows that pea
contains three subdirectories, indicated by a 'd' in the first column
of each line. A directory listing of the drr directory shows several
datafiles ('-' in column 1) and two subdirectories, each devoted to a
particular multigene familiy (ie. drr39 and drr49). Within each
directory are sequences and
other files related to each multigene familiy.
Organization of directories can be tailored to each particular
problem. If you were sequencing several genes, each gene should
probably have a separate directory to contain all of the files related
to the sequencing project. Another approach might be to set up
directory hierarchies to
match an evolutionary tree. The most important thing is to use some
sort
or organization that makes sense in the context of the projects you are
working on. Here are some general guidelines for organizing directories:
- Organize your files by topic, not by type. It makes no sense
to put all presentations in one folder, all images in another folder,
and all documents in another folder. Any given task or project will
generate files of many kinds, so it makes sense to put all files
related to a particular task into a single folder or folder tree.
- Each time you start a new task or project or experiment, create a new folder.
- Your home
directory should be mostly composed of subdirectories. Leave individual
files there only on a temporary basis.
- Directory
organization is for your convenience. Whenever a set of files all
relate to the same thing, dedicate a directory to them.
- If a directory
gets too big (eg. more files than will fit on the screen when you type
'ls
-l'), it's time to split it into two or more subdirectories.
Directories can evolve
The tree-structured directories you create are not cast
in concrete. Unix is uniquely suited to re-shuffling directories at
will. For example, if you were sequencing three cab genes, you might
have three separate directories for genes a, b and c, called caba, cabb
and cabc. When the sequences arecompleted, it might be more useful to
reorganize files related to these sequences
by other criteria. For example, two directories, cabpro and cabdna
might
contain amino acid and DNA sequence of the three genes, respectively. A
third directory, cabfig, might contain figures for publication using
the
three sequences. The organizational utility of directory hierarchiesis
limited only by your imagination.
It is sometimes useful to create temporary directories,
even
if you only use them for half an hour and get rid of them. For example,
if
you were searching the databases for dna and protein sequences for
plant
'pathogenesis-related proteins', you might create a directory called
prp,
and use this as your working directory when searching for and
retrieving
the sequences. Once the sequences have been retrieved, you can discard
the'false'
positives' and then divide the remaining entries among directories
specialized
for particular classes of sequences (eg. chitinase, glucanase,and so
forth).
Once the sequences have been re-distributed, you can delete the prp
directory.
File extensions identify the type
of data in a file
Most operating systems permit files to have extensions that
can be
used to identify the type of data contained in the file. Although use
of file extensions is not required, it is strongly advised that all
file
shave file extensions.
The drr39 directory (see above) illustrates the
strategic use of file extensions. Two members of the drr39 multigene
family have
been sequenced: cDNAs pi39 and pi230, whose DNA and protein sequences
are
stored in pi39.seq and pi230.seq, and pi39.pro and pi230.pro,
respectively.
Additionally, a restriction site search was done on pi39, and the
output
stored in pi39.rest (Note that Unix permits file extensions longer than
3 characters). Sequence similarity alignments of the DNA and protein
sequences
(generated using mase) are stored in the files drr39.aln and
drr39.pro.aln.
Another useful convention of file extensions is to use all or
part ofthe name of the program that produced the file as the file
extension. Thus, the output from a string search using grep would have
the file extension '.grep'. Similarly, multiply-aligned sequences
re-formated by the reform program have the extension '.ref', as in
drr39.ref.
File extensions make it possible to work with groups of
files in single commands. For example, if you wanted to create a new
directory containing only protein sequences taken from the current
directory, the following commands would create the directory 'protein',
and move all '.pro' files into it:
mkdir protein
mv *.pro protein
Similarly, the following commands would create a new
directory containing all pi230-related files:
mkdir 230
mv pi230.* 230
FOR A LIST OF SUGGESTED
FILE EXTENSIONS, click here.
Hint: Don't use
blanks in filenames.
Most operating systems allow this but it is a bad practice. Commands
are broken down into tokens (ie. strings of non-blanks) by the shell,
For example, if you had a directory called 'mouse sequences',
typing
ls -l mouse sequences
would tell the shell to list files in two nonexistent directories,
mouseand sequences. The safest way to prevent this probelm is to use
characterssuch as '_' or '.' to connect words into one long string eg.
mouse.sequencesor mouse_sequences. |
The file manager
In X-windows, you can perform most file management tasks
using the
file manager. If you are using CDE, a copy of the file manager will
automatically be launched when you begin an X11 session. Additional
copies of the file manager can be launched the workspace menu. The CDE
file manager can also be launched from the command line by typing
'dtfile'.
The file manager can
display files both as lists, as icons, and in a tree-structured view.
Here is one possible view:
Clicking on a file opens that file, and clicking on a folder
opens that folder (directory). Where the file is an ASCII file, it will
be
opened in the text editor. For other types of files, the file manager
will
attempt to launch the appropriate application using that file.