


PRSS3(1)                  User Commands                  PRSS3(1)



NAME
     prss - test a protein sequence similarity for significance

SYNOPSIS
     prss3 [-Q -d # -f # -g # -h -O file -s SMATRIX -w # -_w  #  ]
     sequence-file-1 sequence-file-2 [ #-_o_f-_s_h_u_f_f_l_e_s ]

     prss3 [-dfghsw] - interactive mode


DESCRIPTION
     prss3 is used to evaluate the significance of a  protein  or
     DNA sequence similarity score by comparing two sequences and
     calculating optimal similarity scores, and  then  repeatedly
     shuffling the second sequence, and calculating optimal simi-
     larity scores using the Smith-Waterman algorithm. An extreme
     value  distribution  is  then  fit  to the shuffled-sequence
     scores.  The characteristic parameters of the extreme  value
     distribution  are then used to estimate the probability that
     each of the unshuffled sequence scores would be obtained  by
     chance in one sequence, or in a number of sequences equal to
     the number of shuffles.  This program is derived from  rdf2,
     described  by  Pearson and Lipman, PNAS (1988) 85:2444-2448,
     and Pearson (Meth. Enz.  183:63-98).   Use  of  the  extreme
     value distribution for estimating the probabilities of simi-
     larity scores was described  by  Altshul  and  Karlin,  PNAS
     (1990)  87:2264-2268.   The  and  expectations calculated by
     prdf.  prss3 uses calculates optimal scores using  the  same
     rigorous  Smith-Waterman  algorithm  (Smith and Waterman, J.
     Mol. Biol. (1983) 147:195-197) used by the ssearch3 program.

     prss3 also allows a  more  sophisticated  shuffling  method:
     residues  can be shuffled within a local window, so that the
     order of residues 1-10, 11-20, etc, is destroyed but a resi-
     due  in the first 10 is never swapped with a residue outside
     the first ten, and so on for each local window.

EXAMPLES
     (1)  prss3  -w 10 musplfm.aa lcbo.aa

     Compare the amino acid sequence in the file musplfm.aa  with
     that  in  lcbo.aa,  then  shuffle  lcbo.aa 200 times using a
     local shuffle with a window of 10.  Report the  significance
     of   the  unshuffled  musplfm/lcbo  comparison  scores  with
     respect to the shuffled scores.

     (2)  prss3 -d 1000 musplfm.aa lcbo.aa

     Compare the amino acid sequence in the file musplfm.aa  with
     the  sequences  in  the file lcbo.aa, shuffling lcbo.aa 1000
     times.




SunOS 5.5.1            Last change: local                       1






PRSS3(1)                  User Commands                  PRSS3(1)



     (3)  prss3

     Run prss in interactive mode.  The program will  prompt  for
     the file name of the two query sequence files and the number
     of shuffles to be used.

OPTIONS
     prss3 can be directed to  change  the  scoring  matrix,  gap
     penalties, and shuffle parameters by entering options on the
     command line (preceeded by a `-'). All of the options should
     preceed the file names number of shuffles.

     -d #  Number of shuffles (200 is the default)

     -f #  Penalty for  the  first  residue  in  a  gap  (-12  by
          default) for proteins.

     -g #  Penalty for  additional  residues  in  a  gap  (-2  by
          default) for proteins.

     -h    Do not display histogram of similarity scores.

     -Q -q
          "quiet" - do not prompt for filename.

     -O filename
          send copy of results to "filename."

     -s str

     -w #  Use a local window shuffle with a window  size  of  #.
          (SMATRIX) the filename of an alternative scoring matrix
          file.  For  protein  sequences,  BLOSUM50  is  used  by
          default;  PAM250  can  be  used  with  the command line
          option -s P250(or with -s  pam250.mat).   BLOSUM62  (-s
          BL62) and PAM120 (-S P120).

SEE ALSO
     ssearch3(1), fasta3(1).

AUTHOR
     Bill Pearson
     wrp@virginia.EDU

     The curve fitting routines in rweibull.c  were  provided  by
     Phil Green, U. of Washington.









SunOS 5.5.1            Last change: local                       2



