


ALIGN(1)                 USER COMMANDS                   ALIGN(1)



NAME
     align - compute the global alignment of two protein  or  DNA
     sequences

     align0 - compute the global alignment of two protein or  DNA
     sequences without penalizing for end-gaps



SYNOPSIS
     align [ -m # -s _S_M_A_T_R_I_X -w  #  ]  sequence-file-1  sequence-
     file-2


DESCRIPTION
     align produces an optimal global alignment between two  pro-
     tein  or  DNA  sequences.   align  will automatically decide
     whether the query sequence is DNA or protein by reading  the
     query  sequence  as  protein  and  determining  whether  the
     `amino-acid composition' is more than  85%  A+C+G+T.   align
     uses  a  modification of the algorithm described by E. Myers
     and W. Miller  in   "Optimal  Alignments  in  Linear  Space"
     CABIOS  (1988)  4:11-17.   The program can be invoked either
     with command line arguments or in interactive mode.

     align weights end gaps, so that an alignment of the form
          -----MACF
          SRTKIMACF
     will have a higher score than:
          MACF
          MACF
     align0 uses the same algorithm,  but  does  not  weight  end
     gaps.  Sometimes this can have surprising effects.

     align and align0 use  the  standard  fasta  format  sequence
     file.   Lines  beginning with '>' or ';' are considered com-
     ments and ignored; sequences can be  upper  or  lower  case,
     blanks,tabs   and  unrecognizable  characters  are  ignored.
     align expects sequences to use the single letter amino  acid
     codes, see protcodes(1) .

OPTIONS
     align can be directed to change the scoring matrix and  out-
     put  format  by  entering  options on the command line (pre-
     ceeded by a `-' or `/'  for  MS-DOS).  All  of  the  options
     should  preceed  the file name arguments. Alternately, these
     options can be changed  by  setting  environment  variables.
     The options and environment variables are:

     -m # (MARKX)  =1,2,3.  Alternate  display  of  matches   and
          mismatches in alignments. MARKX=1 uses ":","."," ", for
          identities,   consevative   replacements,   and    non-



Sun Release 4.1        Last change: local                       1






ALIGN(1)                 USER COMMANDS                   ALIGN(1)



          conservative  replacements,  respectively. MARKX=2 uses
          " ","x", and "X".  MARKX=3 does  not  show  the  second
          sequence, but uses the second alignment line to display
          matches  with  a  "."   for  identity,  or   with   the
          mismatched  residue  for mismatches.  MARKX=3 is useful
          for aligning large numbers of similar sequences.

     -s str
          (SMATRIX) the filename of an alternative scoring matrix
          file or "120" to use the PAM120 matrix.

     -w # (LINLEN) output line length  for  sequence  alignments.
          (normally 60, can be set up to 200).

EXAMPLES
     (1)  align musplfm.aa lcbo.aa

     Compare the amino acid sequence in the file musplfm.aa  with
     the  amino  acid  sequence in the file lcbo.aa Each sequence
     should be in the form:
          >LCBO bovine preprolactin
          WILLLSQ ...


     (2)  align -w 80 musplfm.aa lcbo.aa > musplfm.aln

     Compare the amino acid sequence in the file musplfm.aa  with
     the  sequences  in the file lcbo.aa Show both sequences with
     80 residues on each output line and write the output to  the
     file musplfm.aln.

     (3)  align

     Run the align program in interactive mode.  The program will
     prompt  for  the  file  name  for the first sequence and the
     second sequence.

SEE ALSO
     rdf2(1),protcodes(5), dnacodes(5)

AUTHOR
     Bill Pearson
     wrp@virginia.EDU












Sun Release 4.1        Last change: local                       2



