Table Of Contents

Sequence Selection & Formatting

Change Begin, End Points
-b {i#} , -e {i# | 0*}
These flags select the beginning (-b) or end (-e) of a subsequence to be extracted and analysed from a larger sequence. -b defaults to 1; -e defaults to the end of the sequence (which can be explicitly signified by appending '0'.
Interactions: In the Linear Map output, the upper label indicates numbering from beginning of subsequence; the lower label indicates numbering from the beginning of the entire sequence. This can be further confused if you use the --numstart option which forces the numbering scheme you choose on the linear map.
Warnings: The SMALLEST SEQUENCE that tacg can handle is 4 bases (10 for the ladder map (-l)). This allows analysis of primers and linkers.
Set Topology to Linear or Circular
-f {0 | 1*}
This flag sets the form or topology of the Nucleic Acid. Linear is assumed unless otherwise specified.
Interactions:If circular topology is specified, patterns will be matched across the border as long as the pattern isn't longer than BASE_OVERLAP set in tacg.h (30 as distributed). Number and size of fragments will be adjusted to account for the topology in both Fragments Table and Gel Map. If either the --ps or --pdf flags are used, -f is set to circular.
Warnings: If topology is set to circular, Translation and ORFs will not be tracked accurately across the origin, so if you suspect that this is the case, change the origin and try again.
Change the output width
-w {i# | 1}
-w sets output width in bp's (must be between 60* and 210, truncated to a # exactly divisible by 15 ('-w 100' will be interpreted as '-w 90') and actual printed output will be about 20 characters wider due to numbering and other labels. Also applies to output of the linear, ladder and gel maps, so if you're trying to get more accuracy and your output device can display small fonts, you may want to use this flag to widen the output.
Interactions
Warnings: If you want as much output on one line as possible for external parsing/analysis, specify -w 1, which will print the output in 1 line, so that it might be easier to search with an external tool such as the grep family.
Identify Sequences Only
-i --idonly {0|1*|2}
reduces output for sequences that have no hits, when scanning multiple sequence files.
  • 0 - ID line and normal output printed regardless of hits
  • 1 - (default) ID line and normal output are printed ONLY IF there are hits.
  • 2 - ONLY ID line is printed if there are hits.
Interactions
Warnings
Force raw file read --raw
--raw
tells tacg to consider ALL input as valid sequence (as with version 2). instead of using SEQIO to parse the input as a standard sequence format. Useful for analyzing file fragments or editor buffers, which may be missing valid format.
Interactions
Warnings: Note that specifying this flag will tell tacg to consider all headers, comments, etc as sequence, if it encounters them and if the characters are valid IUPAC . ALL IUPAC degeneracies will be analyzed
Set Degeneracy Handling
-D {0-4}
  • 0   FORCES exclusion of degens in seq; only 'acgtu' accepted; much like
  • 1 [default]   cut as NONdegen unless degen's found; then cut as '-D3'
  • 2   degen's OK; ignore in KEY, but match outside of KEY
  • 3   degen's OK; expand in KEY, find only EXACT matches
  • 4   degen's OK; expand in KEY, find ALL POSSIBLE matches
where KEY is the central hexamer under consideration.
Interactions
Warnings: Using -D 0 will silently strip all degeneracies, which may not be what you want. -D 4 will result in a very large number of hits as it will match all possible degeneracies with all possible patterns. If there are enough hits in a small region, it may overflow some formatting buffers, but this should be caught by the program.
Extract Sequences around match
-X, --extract {b#,e#,[0|1]}
eXtracts the sequence around the pattern matched, from b# bases preceding, to e# bases following the MIDDLE of a pattern (if an IUPAC pattern), or the START of the pattern (if a regular expression). If the pattern is found in the bottom strand AND the last field = 1, the extracted sequence is reverse-complemented before it's extracted so all patterns are in same orientation; if the last field = 0, it is NOT reverse compl'ed. In any event, the sequences are FASTA-formatted on output, so they are ready to be fed to a multiple alignment program such as ClustalX.
Interactions
Warnings: Don't forget that IUPAC and regex patterns are extracted accordings to different positions, so if you mix them, they won't line up if you then try to recombine them.

Restriction Enzyme selection & Filtering by ...

Magnitude of Site -n {3-10} select enzymes by magnitude of recognition site; the minimum is a magnitude of 3 = all, 4 = 4,5,6... 5 = 5,6,7,8... etc. ACGTU have a magnitde of 1 each, YRWSMK have a magnitude of 1/2 each, BDHV have a magnitude of 1/4, N has a magnitude of 0 (doesn't count) ie: ttca=4, tgyrca=5, tgcnnngca=6, etc. This flag filters on the tannnnnnnnnnta=4 Notes: This flag filters patterns on input (while reading the REBASE or pattern file) so it will decrease the number of patterns to be searched for, resulting in a faster search. Warnings: Overhang -o {0|1*|3|5} select enzymes by overhang generated; 5 = 5' overhang, 3 = 3' overhang, 0 for blunt, 1 (default) selects all. Notes: Like -n, it filters on input so makes the search slightly faster. Warnings: Cost --cost {f#} select REs by their cost if you use the modified REBASE file that has the cost and units values entered appropriately. The one supplied is for the small unitage NEB enzyme products and is somewhat dated. The number that (units/$ - >100 is cheap; <10 is v. expensive) Interactions Warnings Short Description Flag and Options Full Description Interactions Warnings Short Description Flag and Options Full Description Interactions Warnings Short Description Flag and Options Full Description Interactions Warnings