


User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



NAME
     fasta35, fasta35_t - scan a protein or DNA sequence  library
     for similar sequences

     fastx35, fastx35_t  - compare a DNA sequence  to  a  protein
     sequence  database, comparing the translated DNA sequence in
     forward and reverse frames.

     tfastx35, tfastx35_t  - compare a protein sequence to a  DNA
     sequence database, calculating similarities with frameshifts
     to the forward and reverse orientations.

     fasty35, fasty35_t  - compare a DNA sequence  to  a  protein
     sequence  database, comparing the translated DNA sequence in
     forward and reverse frames.

     tfasty35, tfasty35_t  - compare a protein sequence to a  DNA
     sequence database, calculating similarities with frameshifts
     to the forward and reverse orientations.

     fasts35, fasts35_t - compare unordered peptides to a protein
     sequence database

     fastm35, fastm35_t - compare ordered peptides (or short  DNA
     sequences) to a protein (DNA) sequence database

     tfasts35, tfasts35_t  -  compare  unordered  peptides  to  a
     translated DNA sequence database

     fastf35, fastf35_t - compare mixed  peptides  to  a  protein
     sequence database

     tfastf35,  tfastf35_t  -  compare  mixed   peptides   to   a
     translated DNA sequence database

     ssearch35, ssearch35_t - compare a protein or  DNA  sequence
     to a sequence database using the Smith-Waterman algorithm.

     ggsearch35, ggsearch35_t - compare a protein or DNA sequence
     to  a sequence database using a global alignment (Needleman-
     Wunsch)

     glsearch35, glearch35_t - compare a protein or DNA  sequence
     to  a  sequence  database with alignments that are global in
     the query and local in the database sequence (global-local).

     lalign35 - produce multiple non-overlapping  alignments  for
     protein  and  DNA  sequences  using the Huang and Miller sim
     algorithm for the Waterman-Eggert algorithm.

     prss35, prfx35 - (discontinued, replaced  by  ssearch35  and
     fastx35)  estimate  statistical significance of an alignment



SunOS 5.10             Last change: local                       1






User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



     by comparing the score to  the  distribution  of  similarity
     scores  generated  by shuffling the second sequence.  prss35
     uses Smith-Waterman.  prfx35 uses the fastx algorithm.


DESCRIPTION
     Release 3.4 of the FASTA package provides a modular  set  of
     sequence  comparison  programs  that can run on conventional
     single processor computers or in parallel on  multiprocessor
     computers.   More   than   a   dozen   programs  -  fasta35,
     fastx35/tfastx35,    fasty35/tfasty35,     fasts35/tfasts35,
     fastm35,   fastf35/tfastf35,   ssearch35,   ggsearch35,  and
     glsearch35 - are currently available.

     All of the comparison programs share a set of basic  command
     line  options;  additional options are available for indivi-
     dual comparison functions.

     Threaded  versions  of  the   FASTA   programs   (fasta35_t,
     ssearch35_t, etc.)  will run in parallel on modern Linux and
     Unix multi-core or multi-processor  computers.   Accelerated
     versions  of  the Smith-Waterman algorithm are available for
     architectures with the Intel SSE2 or Altivec PowerPC  archi-
     tectures,  which can speed-up Smith-Waterman calculations 10
     - 20-fold.


Options for comparison functions
     These versions of the FASTA programs have been  modified  to
     accept  a  query sequence from the unix "stdin" data stream.
     This makes it much easier to use fasta35 and  its  relatives
     as part of a WWW page. To indicate that stdin is to be used,
     use "@" as the query sequence file name.  "@"  can  also  be
     used  to  specify a subset of the query sequence to be used,
     e.g:

     cat query.aa | fasta35 -q @:50-150 s

     would search  the  's'  database  with  residues  50-150  of
     query.aa.   FASTA  cannot  automatically detect the sequence
     type (protein vs DNA) when "stdin"  is  used,  so  the  '-n'
     option is required for DNA.

     -1   Sort by "init1" score.

     -3   (TFASTA3,  TFASTX/Y35  only)  use  only  forward  frame
          translations

     -a # "SHOWALL"  option  attempts  to  align  all   of   both
          sequences in FASTA and SSEARCH.

     -A   force  Smith-Waterman  alignment  for  output.   Smith-



SunOS 5.10             Last change: local                       2






User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



          Waterman  is  the  default  for  protein  sequences and
          FASTX35, but not for TFASTA35 or DNA  comparisons  with
          FASTA35.

     -b # number of best scores to show (must be < -E  cutoff  if
          -E is given)

     -B   show z-scores rather than bit scores

     -c # threshold for band optimization (FASTA, FASTX)

     -C # (fasta35t11d4) length of name  abbreviation  in  align-
          ments, default = 6.

     -d # number of best alignments to show ( must be <  -e  cut-
          off)

     -D   turn on debugging mode.   Enables  checks  on  sequence
          alphabet  that  cause problems with tfastx35, tfasty35,
          tfasta35.

     -E # expectation value upper limit for score  and  alignment
          display.   Defaults  are 10.0 for FASTA35 and SSEARCH35
          protein searches, 5.0 for translated  DNA/protein  com-
          parisons, and 2.0 for DNA/DNA searches.

     -f # penalty for opening a gap (or first residue  for  older
          versions)

     -F # expectation value lower limit for score  and  alignment
          display.   -F 1e-6 prevents library sequences with E()-
          values lower  than  1e-6  from  being  displayed.  This
          allows the use to focus on more distant relationships.

     -g # penalty for additional residues in a gap

     -h # (FASTX35, TFASTX35, FASTY35, TFASTY35 only) penalty for
          a frameshift between two codons.

     -j # (FASTY35,  TFASTY35  only)  penalty  for  a  frameshift
          within a codon.

     -H   turn off histogram display

     -i   (DNA  only)  reverse  complement  the  query  sequence.
          (TFASTX) compare against only the reverse complement of
          the library sequence.

     -k   specify number of shuffles  for  statistical  parameter
          estimation (default=500).

     -l str



SunOS 5.10             Last change: local                       3






User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



          specify FASTLIBS file

     -L   report long sequence description in alignments

3
     -m 0,1,2,3,4,5,6,9,10,11 alignment display options.  -
          m 0, 1,  2,
          display different types of alignments.  -m  4  provides
          an  alignment  "map"  on  the  query. -m 5 combines the
          alignment map and a -m 0 alignment.  -m 6  provides  an
          HTML  output.   -m 9 does not change the alignment out-
          put, but  provides  alignment  coordinate  and  percent
          identity  information  with the best scores report.  -m
          9c adds encoded alignment information to the -m  9;  -m
          9i  provides only percent identity and alignment length
          information with the best scores.   With  current  ver-
          sions of the FASTA programs, independent -m options can
          be combined; e.g. -m 1 -m 9c -m 6.

     -m 11 provides lav format output from lalign35.  It does not
          currently  affect  other  alignment  algorithms.    The
          ps_lav program can be used to convert lav format output
          to postscript alignment "dot-plots".

     -M #-#
          molecular weight (residue) cutoffs.  -M "101-200" exam-
          ines only sequences that are 101-200 residues long.

     -n   force query to nucleotide sequence

     -N # break long library sequences into blocks of # residues.
          Useful  for  bacterial  genomes,  which  have  only one
          sequence entry.  -N 2000 works well for well  for  bac-
          terial genomes.

     -o   (FASTA) turn fasta band optimization off during initial
          phase.  This was the behavior of fasta1.x versions.

     -O file
          send output to file.

     -q/-Q
          quiet option; do not prompt for input

     -r "+n/-m"
          values for match/mismatch for DNA  comparisons.  +n  is
          used  for the maximum positive value and -m is used for
          the maximum negative value. Values between max and min,
          are  rescaled,  but  residue  pairs having the value -1
          continue to be -1.

     -R file



SunOS 5.10             Last change: local                       4






User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



          save all scores to statistics file (previously -r file)

     -s name
          specify  substitution  matrix.   BLOSUM50  is  used  by
          default;  PAM250, PAM120, and BLOSUM62 can be specified
          by setting -s P120, P250, or BL62.  With this  version,
          many  more  scoring  matrices  are available, including
          BLOSUM80 (BL80), and MDM10, MDM20, MDM40  (Jones,  Tay-
          lor,  and Thornton, 1992 CABIOS 8:275-282; specified as
          -s M10, -s M20, -s M40). Alternatively, BLASTP1.4  for-
          mat scoring matrix files can be specified.  BL80, BL62,
          and P120 are scaled in 1/2 bit  units;  all  the  other
          matrices  use  1/3 bit units.  DNA scoring matrices can
          also be specified with the "-r" option.

     -S   treat lower case letters in the query  or  database  as
          low  complexity regions that are equivalent to 'X' dur-
          ing the initial database scan, but are treated as  nor-
          mal residues for the final alignment display.  Statist-
          ical estimates are based on the 'X'ed out sequence used
          during the initial search. Protein databases (and query
          sequences) can be generated in the  appropriate  format
          using  John  Wooton's  "pseg"  program,  available from
          ftp://ncbi.nlm.nih.gov/pub/seg/pseg.   Once  you   have
          compiled the "pseg" program, use the command:

          pseg database.fasta -z 1 -q  > database.lc_seg

     -t # Translation  table  -  tfasta35,   fastx35,   tfastx35,
          fasty35,  and  tfasty35  support  the  BLAST tranlation
          tables.      See     http://www.ncbi.nlm.nih.gov/htbin-
          post/Taxonomy/wprintgc?mode=c/.

     -T # (threaded, parallel only) number of threads or  workers
          to use (set by default to 4 at compile time).

     -U   Do RNA sequence comparisons: treat 'T'  as  'U',  allow
          G:U  base  pairs  (by  scoring "G-A" and "T-C" as "G-G"
          -1).  Search only one strand.

     -V "?$%*"
          Allow special annotation characters in query  sequence.
          These characters will be displayed in the alignments on
          the coordinate number line.

     -w # line width for similarity score, sequence alignment, output.

ment,
     -W  # context length (default is 1/2 of line width -
          w) for align-
          like  fasta  and  ssearch,  that   provide   additional
          sequence context.



SunOS 5.10             Last change: local                       5






User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



     -x #match,#mismatch
          scores used for matches to 'X:X','N:N', '*:*'  matches,
          and  the corresponding specified in the scoring matrix.
          If only one value is given, it is used for both values.

     -X "#,#"
          offsets query, library sequence  for  numbering  align-
          ments

     -y # Width for band optimization; by default 16 for DNA  and
          protein ktup=2; 32 for protein ktup=1;

     -z # Specify statistical calculation. Default is -
          z 1 for local
          similarity searches, which uses regression against  the
          length  of the library sequence. -z -1 disables statis-
          tics.  -z 0 estimates significance without  normalizing
          for  sequence  length. -z 2 provides maximum likelihood
          estimates for lambda and K, censoring  the  250  lowest
          and  250  highest scores. -z 3 uses Altschul and Gish's
          statistical estimates for specific protein BLOSUM scor-
          ing  matrices  and  gap penalties. -z 4,5: an alternate
          regression method.  -z 6 uses a composition based  max-
          imum  likelihood  estimate  based on the method of Mott
          (1992) Bull. Math. Biol. 54:59-75.  -z  11,12,14,15,16:
          compute the regression against scores of randomly shuf-
          fled copies of the library sequences.   Twice  as  many
          comparisons  are  performed, but accurate estimates can
          be generated from databases of related sequences. -z 11
          uses the -z 1 regression strategy, etc.

     -Z db_size
          Set the apparent database  size  used  for  expectation
          value  calculations (used for protein/protein FASTA and
          SSEARCH, and for FASTX, FASTY, TFASTX, and TFASTY).

Environment variables:
     FASTLIBS
          location of library choice file (-l FASTLIBS)

     SMATRIX
          default scoring matrix (-s SMATRIX)

     SRCH_URL
          the format string used to  define  the  option  to  re-
          search the database.

     REF_URL
          the format string used to define the option  to  lookup
          the library sequence in entrez, or some other database.





SunOS 5.10             Last change: local                       6






User Commands                     FASTA/SSEARCH/FASTX/TFASTXv3(1)



AUTHOR
     Bill Pearson
     wrp@virginia.EDU




















































SunOS 5.10             Last change: local                       7



