Pairwise alignment parameters

A distance is calculated between every pair of sequences and these are used to construct the dendrogram which guides the final multiple alignment. The scores are calculated from separate pairwise alignments. These can be calculated using 2 methods: dynamic programming (slow but accurate) or by the method of Wilbur and Lipman (extremely fast but approximate).

You can choose between the 2 alignment methods using menu option 8 . The slow/accurate method is fine for short sequences but will be VERY SLOW for many (e.g. >20) long (e.g. >1000 residue) sequences.

SLOW/ACCURATE alignment parameters:

These parameters do not have any affect on the speed of the alignments. They are used to give initial alignments which are then rescored to give percent identity scores. These % scores are the ones which are displayed on the screen. The scores are converted to distances for the trees.

1) Gap Open Penalty: the penalty for opening a gap in the alignment.
2) Gap extension penalty: the penalty for extending a gap by 1 residue.
3) Protein weight matrix: the scoring table which describes the similarity of each amino acid to each other.
4) DNA weight matrix: the scores assigned to matches and mismatches (including IUB ambiguity codes).

FAST/APPROXIMATE alignment parameters:

These similarity scores are calculated from fast, approximate, global align ments, which are controlled by 4 parameters. 2 techniques are used to make these alignments very fast: 1) only exactly matching fragments (k-tuples) are considered; 2) only the 'best' diagonals (the ones with most k-tuple matches) are used.

K-TUPLE SIZE: This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) you may need to increase the default.

GAP PENALTY: This is a penalty for each gap in the fast alignments. It has little affect on the speed or sensitivity except for extreme values.

TOP DIAGONALS: The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only the best ones (with most matches) are used in the alignment. This parameter specifies how many. Decrease for speed; increase for sensitivity.

WINDOW SIZE: This is the number of diagonals around each of the 'best' diagonals that will be used. Decrease for speed; increase for sensitivity.