A distance is calculated between every pair of sequences and these are
used to construct the dendrogram which guides the final multiple alignment.
The scores are calculated from separate pairwise alignments. These can be
calculated using 2 methods: dynamic programming (slow but accurate) or by the
method of Wilbur and Lipman (extremely fast but approximate).
You can choose between the 2 alignment methods using menu option 8
. The
slow/accurate method is fine for short sequences but will be VERY SLOW
for many (e.g. >20) long (e.g. >1000 residue) sequences.
SLOW/ACCURATE alignment parameters:
These parameters do not have any affect on the speed of the alignments. They
are used to give initial alignments which are then rescored to give percent
identity scores. These % scores are the ones which are displayed on the
screen. The scores are converted to distances for the trees.
1) Gap Open Penalty: the penalty for opening a gap in the alignment.
2) Gap extension penalty: the penalty for extending a gap by 1 residue.
3) Protein weight matrix: the scoring table which describes the similarity of
each amino acid to each other.
4) DNA weight matrix: the scores assigned to matches and mismatches (including
IUB ambiguity codes).
FAST/APPROXIMATE alignment parameters:
These similarity scores are calculated from fast, approximate, global align
ments, which are controlled by 4 parameters. 2 techniques are used to make
these alignments very fast: 1) only exactly matching fragments (k-tuples) are
considered; 2) only the 'best' diagonals (the ones with most k-tuple matches)
are used.
K-TUPLE SIZE: This is the size of exactly matching fragment that is used.
INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity.
For longer sequences (e.g. >1000 residues) you may need to increase the default.
GAP PENALTY: This is a penalty for each gap in the fast alignments. It has
little affect on the speed or sensitivity except for extreme values.
TOP DIAGONALS: The number of k-tuple matches on each diagonal (in an imaginary
dot-matrix plot) is calculated. Only the best ones (with most matches) are
used in the alignment. This parameter specifies how many. Decrease for speed;
increase for sensitivity.
WINDOW SIZE: This is the number of diagonals around each of the 'best'
diagonals that will be used. Decrease for speed; increase for sensitivity.