


User Commands                                   LALIGN/PLALIGN(1)



NAME
     lalign - compare two protein  or  DNA  sequences  for  local
     similarity and show the local sequence alignments

     plalign,flalign - compare two sequences for local similarity
     and plot the local sequence alignments


SYNOPSIS
     lalign [-EKfgiImnNOQqrRswxZ] sequence-file-1 sequence-file-2
     plalign  [-EKfgiImnNQqrRsvwxZ]   sequence-file-1   sequence-
     file-2


DESCRIPTION
     lalign and plalign programs compare  two  sequences  looking
     for  local  sequence  similarities.  lalign/plalign use code
     developed by X. Huang and W. Miller (Adv. Appl. Math. (1991)
     12:337-357)  for  the "sim" program.  (Version 2.1 uses sim2
     code.) While ssearch reports only the best alignment between
     the  query  sequence  and  the  library sequence, lalign and
     plalign will report all the alignments with pair-wisse  pro-
     babilities  < 0.05 (default, modified with -E #) between the
     two sequences  lalign  shows  the  actual  local  alignments
     between  the  two  sequences and their scores, while plalign
     produces a plot of the alignments that looks  similar  to  a
     `dot-matrix'  homology  plot.  On Unix systems, plalign gen-
     erates postscript output.  flalign  generates  graphic  com-
     mands for the GCG "figure" program.

     Probability estimates for  the  lalign/plalign/flalign  pro-
     grams  are  based on the parameters provided by Altschul and
     Gish (1996) Meth. Enzymol.  266:460-480.   These  parameters
     are  available  for  BLOSUM50,  BLOSUM62, and PAM250 scoring
     matrices with specific gap penalties, and also for DNA  com-
     parison  with  a  gap penalty of -16, -4.  Probability esti-
     mates are not available for other scoring matrices  and  gap
     penalties.

     The E(10,000) values reported with the  alignments  are  the
     pairwise-alignment probabilities multiplied by 10,000. These
     estimates approximate the significance from a  search  of  a
     10,000 entry database.  They differ from the -E 0.05 initial
     theshold by the same factor of 10,000.  This  is  an  unfor-
     tunate  inconsistency,  but  I believe that it is helpful to
     provide the perspective of a database search.

     The lalign/plalign/fasta programs use a standard text format
     sequence  file.   Lines  beginning  with '>' or ';' are con-
     sidered comments and ignored;  sequences  can  be  upper  or
     lower  case,  blanks,tabs  and unrecognizable characters are
     ignored.  lalign/plalign expect sequences to use the  single



SunOS 5.10             Last change: local                       1






User Commands                                   LALIGN/PLALIGN(1)



     letter amino acid codes, see protcodes(1) .

OPTIONS
     lalign and the other programs can be directed to change  the
     scoring   matrix,  search  parameters,  output  format,  and
     default search directories by entering options on  the  com-
     mand  line  (preceeded  by a `-'). All of the options should
     preceed the file  name  and  ktup  arguments).  Alternately,
     these  options  can  be changed by setting environment vari-
     ables.  The options and environment variables are:

     -E # Pairwise-probability limit (default -E 0.05).

     -K # maximum number of alignments to be  shown  (default  -K
          50).

     -f # Penalty for the first residue a gap (-14 by default).

     -g # Penalty for each additional residue in  a  gap  (-4  by
          default).

     -i   Compare the reverse complement (DNA only).

     -I   Show alignment between identical sequences.   Normally,
          the identity alignment is not shown.

     -m # (MARKX)  =1,2,3.  Alternate  display  of  matches   and
          mismatches in alignments. MARKX=1 uses ":","."," ", for
          identities,   consevative   replacements,   and    non-
          conservative replacements, respectively. MARKX=2 uses "
          ","x", and "X".   MARKX=3  does  not  show  the  second
          sequence, but uses the second alignment line to display
          matches  with  a  "."   for  identity,  or   with   the
          mismatched  residue  for mismatches.  MARKX=3 is useful
          for aligning large numbers of similar sequences.

     -n   pre-specify  DNA  sequence,  rather  than  infer   from
          sequence.

     -N # limit first and second sequences to '#' residues.

     -s str
          (SMATRIX) the filename of an alternative scoring matrix
          file.   For  protein  sequences,  BLOSUM50  is  used by
          default; PAM250 can  be  used  with  the  command  line
          option -s P250, BLOSUM62 with "-s BL62".

     -v str
          (LINEVAL) (plalign only) plalign can use up to  4  dif-
          ferent line styles to denote the scores of local align-
          ments.  The scores that correspond to these line styles
          can  be specified with the environment variable LINVAL,



SunOS 5.10             Last change: local                       2






User Commands                                   LALIGN/PLALIGN(1)



          or with the -v option.  In either case, a  string  with
          three  numbers  separated  by  spaces  should be given.
          This string must  be  surrounded  by  double  quotation
          marks.  For example, LINEVAL="200 100 50" tells plalign
          to use solid lines for  local  alignments  with  scores
          greater  than 200, long dashed lines for scores between
          100 and 200, short dashed lines for scores  between  50
          and 100, and dotted lines for scores less than 50.
               plalign -v "200 100 50"
          Normally, the values are 200, 100, and 50  for  protein
          sequence  comparisons  and  400,  200,  and 100 for DNA
          sequence comparisons.

     -w # (LINLEN) output line length  for  sequence  alignments.
          (normally 60, can be set up to 200).

EXAMPLES
     (1)  lalign mchu.aa mchu.aa

     Compare the amino acid sequence in  the  file  mchu.aa  with
     itself  and  report the ten best local alignments.  Sequence
     files should have the form:

          >MCHU - Calmodulin - Human ...
          ADQLTEEQIAEF ...


     (2)  plalign -K 100 -E 0.01 qrhuld.aa egmsmg.aa

     Display up to 100  local  alignments  of  the  LDL  receptor
     (qrhuld.aa)   with   epidermal   growth   factor   precursor
     (egmsmg.aa) with pairwise probabilities  better  than  0.01.
     Plot the results on the screen.

     (3)  lalign

     Run the lalign program in  interactive  mode.   The  program
     will  prompt  for  the  name  of  two sequence files and the
     number of alignments to show.

SEE ALSO
     ssearch(1), prss(1), fasta(1), protcodes(5), dnacodes(5)

AUTHOR
     Bill Pearson
     wrp@virginia.EDU









SunOS 5.10             Last change: local                       3



