Secondary structure / gap penalty masks


The use of secondary structure-based penalties has been shown to improve the accuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty masks to be supplied with the input sequences. The masks work by raising gap penalties in specified regions (typically secondary structure elements) so that gaps are preferentially opened in the less well conserved regions (typically surface loops).

Options 1 and 2 control whether the input secondary structure information or gap penalty masks will be used.

Option 3 controls whether the secondary structure and gap penalty masks should be included in the output alignment.

Options 4 and 5 provide the value for raising the gap penalty at core Alpha Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. Basic gap penalties are multiplied by the amount specified.

Option 6 provides the value for the gap penalty in Loops. By default this penalty is not raised. In CLUSTAL format, loops are specified by "." in the secondary structure notation.

Option 7 provides the value for setting the gap penalty at the ends of secondary structures. Ends of secondary structures are observed to grow and/or shrink in related structures. Therefore by default these are given intermediate values, lower than the core penalties. All secondary structure read in as lower case in CLUSTAL format gets the reduced terminal penalty.

Options 8 and 9 specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bonded Beta Strand.

CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input files. For many 3-D protein structures, secondary structure information is recorded in the feature tables of SWISS-PROT database entries. You should always check that the assignments are correct - some are quite inaccurate. CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g.

FT   HELIX       100    115
FT   STRAND      118    119
The structure and penalty masks can also be read from CLUSTAL alignment format as comment lines beginning "!SS_" or "!GM_" e.g.
!SS_HBA_HUMA    ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA
!GM_HBA_HUMA    112224444444444222122244444444442222224222111111111222444444
HBA_HUMA        VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
Note that the mask itself is a set of numbers between 1 and 9 each of which is assigned to the residue(s) in the same column below.

In GDE flat file format, the masks are specified as text and the names must begin with SS_ or GM_.

Either a structure or penalty mask or both may be used. If both are included in an alignment, the user will be asked which is to be used.

Secondary structure / gap penalty mask output options


The options in this menu let you choose whether or not to include the masks in the CLUSTAL W output alignments. Showing both is useful for understanding how the masks work. The secondary structure information is itself useful in judging the alignment quality and in seeing how residue conservation patterns vary with secondary structure.