1) Before calculating a tree, you must have an ALIGNMENT in memory. This can be
input in any format or you should have just carried out a full multiple
alignment and the alignment is still in memory. Remember YOU MUST ALIGN THE
SEQUENCES FIRST!!!!
The method used is the NJ (Neighbour Joining) method of Saitou and Nei. First
you calculate distances (percent divergence) between all pairs of sequence from
a multiple alignment; second you apply the NJ method to the distance matrix.
2) EXCLUDE POSITIONS WITH GAPS? With this option, any alignment positions
where ANY of the sequences have a gap will be ignored. This means that 'like'
will be compared to 'like' in all distances. It also, automatically throws
away the most ambiguous parts of the alignment, which are concentrated around
gaps (usually). The disadvantage is that you may throw away much of
the data if there are many gaps.
3) CORRECT FOR MULTIPLE SUBSTITUTIONS? For small divergence (say <10%) this
option makes no difference. For greater divergence, this option corrects
for the fact that observed distances underestimate actual evolutionary dist-
ances. This is because, as sequences diverge, more than one substitution will
happen at many sites. However, you only see one difference when you look at the
present day sequences. Therefore, this option has the effect of stretching
branch lengths in trees (especially long branches). The corrections used here
(for DNA or proteins) are both due to Motoo Kimura. See the documentation for
details.
For VERY divergent sequences, the distances cannot be reliably
corrected. You will be warned if this happens. Even if none of the distances
in a data set exceed the reliable threshold, if you bootstrap the data,
some of the bootstrap distances may randomly exceed the safe limit.
4) To calculate a tree, use option 4 (DRAW TREE NOW). This gives an UNROOTED
tree and all branch lengths. The root of the tree can only be inferred by
using an outgroup (a sequence that you are certain branches at the outside
of the tree .... certain on biological grounds) OR if you assume a degree
of constancy in the 'molecular clock', you can place the root in the 'middle'
of the tree (roughly equidistant from all tips).
5) BOOTSTRAPPING is a method for deriving confidence values for the groupings in
a tree (first adapted for trees by Joe Felsenstein). It involves making N
random samples of sites from the alignment (N should be LARGE, e.g. 500 - 1000);
drawing N trees (1 from each sample) and counting how many times each grouping
from the original tree occurs in the sample trees. You must supply a seed
number for the random number generator. Different runs with the same seed
will give the same answer. See the documentation for details.
6) OUTPUT FORMATS: three different formats are allowed. None of these
displays the tree visually. You must make the tree yourself (on paper)
using the results OR get the PHYLIP package and use the tree drawing facilities
there. (Get the PHYLIP package anyway if you are interested in trees).