General help for CLUSTAL W
Main Menu

Clustal W is a general purpose multiple alignment program for DNA or proteins.

SEQUENCE INPUT : all sequences must be in 1 file, one after another. 7 formats are automatically recognised: NBRF/PIR, EMBL/SWISSPROT, Pearson (Fasta), Clustal (*.aln), GCG/MSF (Pileup), GCG9/RSF and GDE flat file. All non-alphabetic characters (spaces, digits, punctuation marks) are ignored except "-" which is used to indicate a GAP ("." in GCG/MSF).

To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from the main menu to INPUT them; go to menu item 2 to do the multiple alignment.

PROFILE ALIGNMENTS (menu item 3) are used to align 2 alignments. Use this to add a new sequence to an old alignment, or to use secondary structure to guide the alignment process. GAPS in the old alignments are indicated using the "-" character. PROFILES can be input in ANY of the allowed formats; just use "-" (or "." for MSF/RSF) for each gap position.

PHYLOGENETIC TREES (menu item 4) can be calculated from old alignments (read in with "-" characters to indicate gaps) OR after a multiple alignment while the alignment is still in memory.

The program tries to automatically recognise the different file formats used and to guess whether the sequences are amino acid or nucleotide. This is not always foolproof.

FASTA and NBRF/PIR formats are recognised by having a ">" as the first character in the file.

EMBL/Swiss Prot formats are recognised by the letters ID at the start of the file (the token for the entry name field).

CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.

GCG/MSF format is recognised by one of the following:

the word PileUp at the start of the file.
the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT at the start of the file.
the word MSF on the first line of the line, and the characters .. at the end of this line.

Note from the htmlizer (sorry): This is not the best way to input sequences from GCG. For more details see this additional note.

If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the sequence will be assumed to be nucleotide. This works in 97.3% of cases but watch out!

General help for CLUSTAL W Main Menu

General help for CLUSTAL W
Main Menu