Improvements in database searches - April 18, 1998

Over the past few weeks a lot of improvements and additions have been made to our database search capabilities.
  1. More and faster disk space. This has allowed us to install more recent versions of GenBank and PIR. Note that PIR is now in four files: pir1 - pir4.
  2. fastx3, fasty3  and tfastx3, tfasty3 - Allow comparison of cDNA with protein database or translated DNA database, respectively.

  3. The fastx and fasty programs optimize the alignment of the test DNA sequence with the database protein sequence by insertion of gaps into the test sequence to test whether insertion of a frameshift will improve the alignment. This is particularly useful when the test sequence is a 1-pass DNA sequence (eg. an EST) that is likely to have frameshift errors. The fastx programs allow the insertion of a discrete codon (ie. 3nt), while the fasty programs allow insertion of one or two nucleotides
  4. ssearch - Unlike the fasta programs,which attain speed by doing "quick and dirty" alignments, ssearch performs a true Smith- Waterman optimal alignment between the test sequence and every sequence in the database. This is exhaustive, but VERY SLOW!

  5. It is important to keep in mind that this is not necessarily more sensitive than fastx or fasty, because it can not take frameshifts into account.
  6. Threaded programs. Versions of the fasta tools are available that can take advantage of workstations with multiple CPUs to speed up the search. Threaded programs have a "_t" appended to their names.
  7.         old                  new             threaded
            program              program         version
          --------------------------------------------------
            fasta                fasta3          fasta3_t
            tfasta               tfasta3         tfasta3_t
            fastx                fastx3          fastx3_t
                                 fasty3          fasty3_t
                                 tfastx3         tfastx3_t
                                 tfasty3         tfasty3_t
            ssearch              ssearch3        ssearch3_t
    Even better, you don't have to keep track of which workstations have multiple CPUs (currently, these are castor, merak, mira, pollux, antares, toliman and hadar). When you login to one of these machines, the threaded versions of the programs will automatically be chosen.
  8. GDE now lets you send fasta output directly to a file, rather than to a text editor. This means that, for a long search, you can launch the search, quit GDE, and logout. When the search is finished, the file will automatically be written.
  9. When searching the PIR database, the fasta programs will now automatically search all PIR files, rather than requiring you to search pir1, pir2, pir3 & pir4 separately.
See also - fasta manual pages ( $doc/fasta/fasta.asc).
BIRCHReturn to BIRCH home page