The use of secondary structure-based penalties has been shown to improve
the accuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty
masks to be supplied with the input sequences. The masks work by raising gap
penalties in specified regions (typically secondary structure elements) so that
gaps are preferentially opened in the less well conserved regions (typically
surface loops).
Options 1 and 2
control whether the input secondary structure information
or gap penalty masks will be used.
Option 3
controls whether the secondary structure and gap penalty masks should
be included in the output alignment.
Options 4 and 5
provide the value for raising the gap penalty at core Alpha
Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues
denote the A and B core structure notation. Basic gap penalties are multiplied
by the amount specified.
Option 6
provides the value for the gap penalty in Loops. By default this
penalty is not raised. In CLUSTAL format, loops are specified by "." in the
secondary structure notation.
Option 7
provides the value for setting the gap penalty at the ends of
secondary structures. Ends of secondary structures are observed to grow
and/or shrink in related structures. Therefore by default these are given
intermediate values, lower than the core penalties. All secondary structure
read in as lower case in CLUSTAL format gets the reduced terminal penalty.
Options 8 and 9
specify the range of structure termini for the intermediate
penalties. In the alignment output, these are indicated as lower case.
For Alpha Helices, by default, the range spans the end helical turn. For
Beta Strands, the default range spans the end residue and the adjacent loop
residue, since sequence conservation often extends beyond the actual H-bonded
Beta Strand.
CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format
input files. For many 3-D protein structures, secondary structure information
is recorded in the feature tables of SWISS-PROT database entries. You
should always check that the assignments are correct - some are quite
inaccurate. CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g.
FT HELIX 100 115
FT STRAND 118 119
The structure and penalty masks can also be read from CLUSTAL alignment format
as comment lines beginning "!SS_" or "!GM_" e.g.
!SS_HBA_HUMA ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA
!GM_HBA_HUMA 112224444444444222122244444444442222224222111111111222444444
HBA_HUMA VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
Note that the mask itself is a set of numbers between 1 and 9 each of which is
assigned to the residue(s) in the same column below.
In GDE flat file format, the masks are specified as text and the names
must begin with SS_ or GM_.
Either a structure or penalty mask or both may be used. If both are included
in an alignment, the user will be asked which is to be used.
The options in this menu let you choose whether or not to include the masks
in the CLUSTAL W output alignments. Showing both is useful for understanding
how the masks work. The secondary structure information is itself useful in
judging the alignment quality and in seeing how residue conservation patterns
vary with secondary structure.