Clustal
Encyclopedia
Clustal is a widely used multiple sequence alignment
computer program
. The latest version is 2.1. There are two main variations:
This program is available from the Clustal Homepage or [ftp://ftp.ebi.ac.uk/pub/software/ European Bioinformatics Institute ftp server].
, EMBL/Swissprot, Clustal, GCC/MSF, GCG9 RSF, and GDE.
The output format can be one or many of the following: Clustal, NBRF/PIR, GCG/MSF, PHYLIP
, GDE, or NEXUS.
These are done automatically when you select "Do Complete Alignment".
Other options are "Do Alignment from guide tree" and "Produce guide tree only".
The main parameters are the gap opening penalty, and the gap extension penalty.
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...
computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
. The latest version is 2.1. There are two main variations:
- ClustalW: command line interface
- ClustalX: This version has a graphical user interface. It is available for Windows, Mac OS, and Unix/Linux.
This program is available from the Clustal Homepage or [ftp://ftp.ebi.ac.uk/pub/software/ European Bioinformatics Institute ftp server].
Input/Output
This program accepts a wide range on input format. Included NBRF/PIR, FASTAFASTA format
In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences...
, EMBL/Swissprot, Clustal, GCC/MSF, GCG9 RSF, and GDE.
The output format can be one or many of the following: Clustal, NBRF/PIR, GCG/MSF, PHYLIP
PHYLIP
PHYLIP is a free computational phylogenetics package of programs for inferring evolutionary trees . The name is an acronym for PHYLogeny Inference Package. It consists of 35 portable programs, i.e...
, GDE, or NEXUS.
Multiple sequence alignment
There are three main steps:- Do a pairwise alignmentSequence alignmentIn bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
- Create a phylogenetic treePhylogenetic treeA phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...
(or use a user-defined tree) - Use the phylogenetic tree to carry out a multiple alignment
These are done automatically when you select "Do Complete Alignment".
Other options are "Do Alignment from guide tree" and "Produce guide tree only".
Setting
Users can align the sequences using the default setting, but occasionally it may be useful to customize one's own parameters.The main parameters are the gap opening penalty, and the gap extension penalty.
See also
- Sequence alignment softwareSequence alignment softwareThis list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment...
- T-CoffeeT-CoffeeT-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment...
- Align-mAlign-mAlign-m is a multiple sequence alignment program written by Ivo Van Walle.Align-m has the ability to accomplish the following tasks:* Multiple sequence alignment* Include extra information to guide the sequence alignment* Multiple structural alignment...
- DIALIGN-TDIALIGN-TDIALIGN-T is an implementation of an improved algorithm for segment-based multiple sequence alignment, written by Amarendran R. Subramanian and freely available under the GNU Lesser General Public License. DIALIGN-T has been updated to DIALIGN-TX recently in 2008.- References :* Subramanian AR,...
- DIALIGN-TXDIALIGN-TXDIALIGN-TX is a multiple sequence alignment program written by Amarendran R. Subramanian and is substantial improvement of DIALIGN-T by combining greedy and progressive alignment strategies in a new algorithm....
- JAlignerJAlignerJAligner is an open source Java implementation of the Smith-Waterman algorithm with Gotoh's improvement for biological local pairwise sequence alignment using the affine gap penalty model. It was written by Ahmed Moustafa....
- MAFFTMAFFTMAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. MAFFT is freely available for academic use, without any warranty.- External links :* * * at EBI* at GenomeNet* in MyHits, SIB...
- MAVIDMAVIDMAVID is a multiple sequence alignment program suitable for the alignment of large numbers of DNA sequences. The sequences can be small mitochondrial genomes or large genomic regions up to megabases long...
- MUSCLE
- ProbConsProbConsProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.- See also :*...
External links
- Clustal Homepage (free Unix/Linux, Mac, and Windows download)
- ClustalW and ClustalX mirror at the EBI (free Unix/Linux, Mac, and Windows download)
- "Accelerating Intensive Applications at 10x-50x Speedup to Remove Bottlenecks in Computational Workflows" — White Paper by Progeniq Pte Ltd.
- Multiple Sequence Alignment by CLUSTALW