Sequence alignment software
Encyclopedia
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment
and multiple sequence alignment
. See structural alignment software
for structural alignment
of proteins.
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
and multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...
. See structural alignment software
Structural alignment software
This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment.-Structural comparison and alignment:Key map:* Class:...
for structural alignment
Structural alignment
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules...
of proteins.
Database search only
Name | Description | Sequence Type* | Link | Authors | Year |
---|---|---|---|---|---|
BLAST BLAST In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences... |
local search with fast k-tuple heuristic (Basic Local Alignment Search Tool) | Both | NCBI EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only) | Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ | 1990 |
CS-BLAST CS-BLAST CS-BLAST , an improved version of BLAST , is a program for protein sequence searching.... |
sequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLAST | Protein | CS-BLAST server [ftp://toolkit.lmb.uni-muenchen.de/csblast/ download] | Biegert A, Söding J | 2009 |
FASTA FASTA FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.- History :... |
local search with fast k-tuple heuristic, slower but more sensitive than BLAST | Both | EBI DDBJ GenomeNet PIR (protein only) | ||
GGSEARCH / GLSEARCH | Global:Global (GG), Global:Local (GL) alignment with statistics | Protein | FASTA server | ||
HMMER HMMER HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences... |
local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST | Both | download | Durbin R, Eddy SR, Krogh A Anders Krogh Professor Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is well known for his pioneering work on the use of hidden Markov models in bioinformatics , and is co-author of a widely used textbook in bioinformatics... , Mitchison G |
1998 |
HHpred / HHsearch HHpred / HHsearch HHsearch is a program for protein sequence searching that is free for non-commercial use. HHpred is a free protein function and protein structure prediction server based on the HHsearch method... |
pairwise comparison of profile Hidden Markov models; very sensitive, but can only search alignment databases (Pfam, PDB, InterPro...) | Protein | server [ftp://toolkit.lmb.uni-muenchen.de/hhsearch/ download] | Söding J | 2005 |
IDF | Inverse Document Frequency | Both | download | ||
Infernal | profile SCFG search | RNA | download | Eddy S | |
PSI-BLAST | position-specific iterative BLAST, local search with position-specific scoring matrices Position-specific scoring matrix A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences.... , much more sensitive than BLAST |
Protein | NCBI PSI-BLAST | Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ | 1997 |
ScalaBLAST | Highly parallel Scalable BLAST | Both | ScalaBLAST | Oehmen et al. | 2011 |
Sequilab | Linking and profiling sequence alignment data from NCBI-BLAST results with major sequence analysis servers/services | Nucleotide/peptide | server | 2010 | |
SAM | local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST | Both | SAM | Karplus K Kevin Karplus Kevin Karplus is a professor at University of California, Santa Cruz, currently in the Biomolecular Engineering Department.He is probably best known for work he did as a computer science graduate student at Stanford University on the Karplus-Strong string synthesis algorithm.He taught VLSI design... , Krogh A Anders Krogh Professor Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is well known for his pioneering work on the use of hidden Markov models in bioinformatics , and is co-author of a widely used textbook in bioinformatics... |
1999 |
SSEARCH | Smith-Waterman search, slower but more sensitive than FASTA | Both | EBI DDBJ | ||
Pairwise alignment
Name | Description | Sequence Type* | Alignment Type** | Link | Author | Year |
---|---|---|---|---|---|---|
ACANA | fast heuristic anchor based pairwise alignment | Both | Both | download | Huang, Umbach, Li | 2005 |
AlignMe | Alignments for low identity membrane protein sequences based on various similarity criteria | Protein | Both | download,server | K. Khafizov, R. Staritzbichler, M. Stamm, L.R. Forrest | 2010 |
Bioconductor Bioconductor Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.... Biostrings::pairwiseAlignment |
Dynamic programming | Both | Both + Ends-free | site | P. Aboyoun | 2008 |
BioPerl BioPerl BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.... dpAlign |
Dynamic programming | Both | Both + Ends-free | site | Y. M. Chan | 2003 |
BLASTZ,LASTZ | Seeded pattern-matching | Nucleotide | Local | download,download | Schwartz et al. | 2004,2009 |
DNADot | Web-based dot-plot tool | Nucleotide | Global | server | R. Bowen | 1998 |
DOTLET | Java-based dot-plot tool | Both | Global | applet | M. Pagni and T. Junier | 1998 |
FEAST | Posterior based local extension with descriptive evolution model | Nucleotide | Local | site | A. K. Hudek and D. G. Brown | 2010 |
GGSEARCH, GLSEARCH | Global:Global (GG), Global:Local (GL) alignment with statistics | Protein | Global in query | FASTA server | W. Pearson | 2007 |
JAligner JAligner JAligner is an open source Java implementation of the Smith-Waterman algorithm with Gotoh's improvement for biological local pairwise sequence alignment using the affine gap penalty model. It was written by Ahmed Moustafa.... |
Open source Java implementation of Smith-Waterman | Both | Local | JWS | A. Moustafa | 2005 |
LALIGN | Multiple, non-overlapping, local similarity (same algorithm as SIM) | Both | Local non-overlapping | server FASTA server | W. Pearson | 1991 (algorithm) |
mAlign | modelling alignment; models the information content of the sequences | Nucleotide | Both | doc [ftp://ftp.csse.monash.edu.au/software/m-align/ code] | D. Powell, L. Allison and T. I. Dix | 2004 |
matcher | Memory-optimized Needleman-Wunsch dynamic programming (based on LALIGN) | Both | Local | Pasteur | I. Longden (modified from W. Pearson) | 1999 |
MCALIGN2 | explicit models of indel evolution | DNA | Global | server | J. Wang et al. | 2006 |
MUMmer | suffix tree Suffix tree In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix... based |
Nucleotide | Global | download | S. Kurtz et al. | 2004 |
needle | Needleman-Wunsch dynamic programming | Both | SemiGlobal | EBIPasteur | A. Bleasby | 1999 |
Ngila | logarithmic and affine gap costs and explicit models of indel evolution | Both | Global | download | R. Cartwright | 2007 |
Path | Smith-Waterman on protein Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of... back-translation Translation (genetics) In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein... graph Graph Graph may refer to:* A graphic depicting the relationship between two or more variables used, for instance, in visualising scientific data.In mathematics:* Graph , is a set of vertices and edges.... (detects frameshifts at protein level) |
Protein | Local | server download | M. Gîrdea et al. | 2009 |
PatternHunter | Seeded pattern-matching | Nucleotide | Local | download | B. Ma et al. | 2002–2004 |
ProbA (also propA) | Stochastic partition function sampling via dynamic programming Dynamic programming In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure... |
Both | Global | download | U. Mückstein | 2002 |
PyMOL | "align" command aligns sequence & applies it to structure | Protein | Global (by selection) | site | W. L. DeLano | 2007 |
REPuter | suffix tree Suffix tree In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix... based |
Nucleotide | Local | download | S. Kurtz et al. | 2001 |
SABERTOOTH | Alignment using predicted Connectivity Profiles | Protein | Global | download on request | F. Teichert, J. Minning, U. Bastolla, and M. Porto | 2009 |
Satsuma | Parallel whole-genome synteny alignments | DNA | Local | download | M.G. Grabherr et al. | 2010 |
SEQALN | Various dynamic programming | Both | Local or Global | server | M.S. Waterman and P. Hardy | 1996 |
SIM, GAP, NAP, LAP | Local similarity with varying gap treatments | Both | Local or global | server | X. Huang and W. Miller | 1990-6 |
SIM | Local similarity | Both | Local | servers | X. Huang and W. Miller | 1991 |
SPA: Super pairwise alignment | Fast pairwise global alignment | Nucleotide | Global | available upon request | Shen, Yang, Yao, Hwang | 2002 |
SSEARCH | Local (Smith-Waterman) alignment with statistics | Protein | Local | EBI FASTA server | W. Pearson | 1981 (Algorithm) |
Sequences Studio | Java applet demonstrating various algorithms from | Generic sequence | Local and global | code applet | A.Meskauskas | 1997 (reference book) |
SWIFT suit | Fast Local Alignment Searching | DNA | Local | site | K. Rasmussen, W. Gerlach | 2005,2008 |
stretcher | Memory-optimized dynamic programming | Both | Global | Pasteur | I. Longden (modified from G. Myers and W. Miller) | 1999 |
tranalign | Aligns nucleic acid sequences given a protein alignment | Nucleotide | NA | Pasteur | G. Williams (modified from B. Pearson) | 2002 |
UGENE | Opensource Smith-Waterman for SSE/CUDA, Suffix array based repeats finder & dotplot | Both | Both | UGENE site | UniPro | 2010 |
water | Smith-Waterman dynamic programming | Both | Local | EBIPasteur | A. Bleasby | 1999 |
wordmatch | k-tuple pairwise match | Both | NA | Pasteur | I. Longden | 1998 |
YASS Yass (software) YASS is a public domain, pairwise sequence alignment software for nucleotide sequences. This program accepts sequences in FASTA format and the output format includes the BLAST tabular output. YASS uses several transition-constrained spaced seeds that allow to considerably improve the sensitivity... |
Seeded pattern-matching | Nucleotide | Local | server download | L. Noe and G. Kucherov | 2003–2007 |
Multiple sequence alignment
Name | Description | Sequence Type* | Alignment Type** | Link | Author | Year | License |
---|---|---|---|---|---|---|---|
ABA | A-Bruijn alignment | Protein | Global | download | B.Raphaelet al. | 2004 | Proprietary, without charge for educational, research and non profit. |
ALE | manual alignment ; some software assistance | Nucleotides | Local | download | J. Blandy and K. Fogel | 1994 (latest version 2007) | GPL2 |
AMAP AMAP AMAP is a multiple sequence alignment program based on a new approach to multiple alignment called sequence annealing. This approach consists of building up the multiple alignment one match at a time, thereby circumventing many of the problems of progressive alignment... |
Sequence annealing | Both | Global | server | A. Schwartz and L. Pachter | 2006 | |
anon. | fast, optimal alignment of three sequences using linear gap costs | Nucleotides | Global | paper [ftp://ftp.csse.monash.edu.au/software/powell/ software] | D. Powell, L. Allison and T. I. Dix | 2000 | |
BAli-Phy | Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation | Both | Global | WWW+download | BD Redelings and MA Suchard | 2005 (latest version 2010) | |
CHAOS/DIALIGN | Iterative alignment | Both | Local (preferred) | server | M. Brudno and B. Morgenstern | 2003 | |
Clustal Clustal Clustal is a widely used multiple sequence alignment computer program. The latest version is 2.1. There are two main variations:*ClustalW: command line interface*ClustalX: This version has a graphical user interface... W |
Progressive alignment | Both | Local or Global | download EBI DDBJ PBIL EMBNet GenomeNet | Thompson et al. | 1994 | Proprietary, no charge for using non-commercial |
CodonCode Aligner CodonCode Aligner CodonCode Aligner is a commercial application for DNA sequence assembly, sequence alignment, and editing on Mac OS X and Windows.- Features :* Chromatogram editing, end clipping, and vector trimming.* Sequence assembly and contig editing... |
Multi alignment; ClustalW & Phrap support | Nucleotides | Local or Global | download | P. Richterich et al. | 2003 (latest version 2009) | |
DIALIGN-TX and DIALIGN-T | Segment-based method | Both | Local (preferred) or Global | download and server | A.R.Subramanian | 2005 (latest version 2008) | |
DNA Alignment | Segment-based method for intraspecific alignments | Both | Local (preferred) or Global | server | A.Roehl | 2005 (latest version 2008) | |
FSA Fast statistical alignment FSA is a multiple sequence alignment program for aligning many proteins or RNAs or long genomic DNA sequences. Along with MUSCLE and MAFFT, FSA is one of the few sequence alignment programs which can align datasets of hundreds or thousands of sequences... |
Sequence annealing | Both | Global | download and server | R. K. Bradley et al. | 2008 | |
Geneious Geneious Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an... |
Progressive/Iterative alignment; ClustalW plugin | Both | Local or Global | download | A.J. Drummond et al. | 2005 (latest version 2009) | |
Kalign | Progressive alignment | Both | Global | serverEBI MPItoolkit | T. Lassmann | 2005 | |
MAFFT MAFFT MAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. MAFFT is freely available for academic use, without any warranty.- External links :* * * at EBI* at GenomeNet* in MyHits, SIB... |
Progressive/iterative alignment | Both | Local or Global | GenomeNet MAFFT | K. Katoh et al. | 2005 | |
MARNA | Multiple Alignment of RNAs | RNA | Local | server download | S. Siebert et al. | 2005 | |
MAVID MAVID MAVID is a multiple sequence alignment program suitable for the alignment of large numbers of DNA sequences. The sequences can be small mitochondrial genomes or large genomic regions up to megabases long... |
Progressive alignment | Both | Global | server | N. Bray and L. Pachter | 2004 | |
MSA | Dynamic programming | Both | Local or Global | download | D.J. Lipman et al. | 1989 (modified 1995) | |
MSAProbs | Dynamic programming | Protein | Global | download | Y. Liu, B. Schmidt, D. Maskell | 2010 | |
MULTALIN | Dynamic programming/clustering | Both | Local or Global | server | F. Corpet | 1988 | |
Multi-LAGAN | Progressive dynamic programming alignment | Both | Global | server | M. Brudno et al. | 2003 | |
MUSCLE | Progressive/iterative alignment | Both | Local or Global | server | R. Edgar | 2004 | |
Opal | Progressive/iterative alignment | Both | Local or Global | download | T. Wheeler and J. Kececioglu | 2007 | |
Pecan | Probabilistic/consistency | DNA | Global | download | B. Paten et al. | 2008 | |
Phylo Phylo (video game) Phylo is an experimental video game about multiple sequence alignment optimization. Developed by the McGill Centre for Bioinformatics, it was originally released as a free Flash game in November 2010... |
A human computing framework for comparative genomics to solve multiple alignment Multiple sequence alignment A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor... |
Nucleotides | Local or Global | site | McGill Bioinformatics | 2010 | |
Praline | Progressive/iterative/consistency/homology-extended alignment with pre-profiling and secondary structure prediction | Protein | Global | server | J. Heringa | 1999 (latest version 2009) | |
POA | Partial order/hidden Markov model | Protein | Local or Global | download | C. Lee | 2002 | |
Probalign | Probabilistic/consistency with partition function probabilities | Protein | Global | server | Roshan and Livesay | 2006 | |
ProbCons ProbCons ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.- See also :*... |
Probabilistic/consistency | Protein | Local or Global | server | C. Do et al. | 2005 | |
PROMALS3D | Progressive alignment/hidden Markov model/Secondary structure/3D structure | Protein | Global | server | J. Pei et al. | 2008 | |
PRRN/PRRP | Iterative alignment (especially refinement) | Protein | Local or Global | PRRP PRRN | Y. Totoki (based on O. Gotoh) | 1991 and later | |
PSAlign | Alignment preserving non-heuristic | Both | Local or Global | download | S.H. Sze, Y. Lu, Q. Yang. | 2006 | |
RevTrans | Combines DNA and Protein alignment, by back translating the protein alignment to DNA. | DNA/Protein (special) | Local or Global | server | Wernersson and Pedersen | 2003 (newest version 2005) | |
SAGA | Sequence alignment by genetic algorithm | Protein | Local or Global | download | C. Notredame et al. | 1996 (new version 1998) | |
SAM | Hidden Markov model | Protein | Local or Global | server | A. Krogh et al. | 1994 (most recent version 2002) | |
Se-Al | Manual alignment | Both | Local | download | A. Rambaut | 2002 | |
StatAlign | Bayesian co-estimation of alignment and phylogeny (MCMC) | Both | Global | download | A. Novak et al. | 2008 | |
Stemloc Stemloc Stemloc is a program for pairwise RNA structural alignment based on probabilistic models of RNA structure known as Pair stochastic context-free grammars. Stemloc implements constrained versions of the Sankoff algorithms for simultaneous structure prediction and sequence alignment of multiple... |
Multiple alignment and secondary structure prediction | RNA | Local or Global | download | I. Holmes | 2005 | GPLv3 (parte de DART) |
T-Coffee T-Coffee T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment... |
More sensitive progressive alignment | Both | Local or Global | server download | C. Notredame et al. | 2000 (newest version 2008) | GPL2 |
UGENE UGENE UGENE is free open-source cross-platform bioinformatics software.It integrates dozens of well-known biological tools and algorithms, providing both graphical user and command line interfaces... |
Supports multiple alignment Multiple sequence alignment A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor... with MUSCLE, KAlign, Clustal Clustal Clustal is a widely used multiple sequence alignment computer program. The latest version is 2.1. There are two main variations:*ClustalW: command line interface*ClustalX: This version has a graphical user interface... and MAFFT MAFFT MAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. MAFFT is freely available for academic use, without any warranty.- External links :* * * at EBI* at GenomeNet* in MyHits, SIB... plugins |
Both | Local or Global | download | UGENE team | 2010 | GPL2 |
|
Genomics analysis
Name | Description | Sequence Type* | Link |
---|---|---|---|
ACT (Artemis Comparison Tool) | Synteny and comparative genomics | Nucleotide | server |
AVID | Pairwise global alignment with whole genomes | Nucleotide | server |
BLAT | Alignment of cDNA sequences to a genome. | Nucleotide | |
GMAP | Alignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy. | Nucleotide | http://research-pub.gene.com/gmap |
Mauve | Multiple alignment of rearranged genomes (also available inside Geneious Geneious Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an... ) |
Nucleotide | download |
MGA | Multiple Genome Aligner | Nucleotide | download |
Mulan | Local multiple alignments of genome-length sequences | Nucleotide | server |
Multiz | Multiple alignment of genomes | Nucleotide | download |
PLAST-ncRNA | Search for ncRNAs in genomes by partition function local alignment | Nucleotide | server |
Sequerome Sequerome Sequerome is a web-based Sequence profiling tool for integrating the results of a BLAST sequence-alignment report with external research tools and servers that perform advanced sequence manipulations, and allowing the user to record the steps of such an analysis... |
Profiling sequence alignment data with major servers/services | Nucleotide/peptide | server |
Sequilab | Profiling sequence alignment data from NCBI-BLAST results with major servers/services | Nucleotide/peptide | server |
Shuffle-LAGAN | Pairwise glocal alignment of completed genome regions | Nucleotide | server |
SIBsim4 / Sim4 Sim4 Sim4 is a nucleotide sequence alignment program akin to BLAST but specifically tailored to DNA to cDNA/EST alignment . It was written by Florea et al.-External links:**... |
A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns | Nucleotide | download |
SLAM | Gene finding, alignment, annotation (human-mouse homology identification) | Nucleotide | server |
Motif finding
Name | Description | Sequence Type* | Link |
---|---|---|---|
BLOCKS | Ungapped motif identification from BLOCKS database | Both | server |
eMOTIF | Extraction and identification of shorter motifs | Both | servers |
Gibbs motif sampler | Stochastic motif extraction by statistical likelihood | Both | server server |
HMMTOP | Prediction of transmembrane helices and topology of proteins | Protein | homepage & download |
I-sites | Local structure motif library | Protein | server |
JCoils | Prediction of Coiled coil Coiled coil A coiled coil is a structural motif in proteins, in which 2-7 alpha-helices are coiled together like the strands of a rope . Many coiled coil type proteins are involved in important biological functions such as the regulation of gene expression e.g. transcription factors... and Leucine Zipper Leucine zipper A leucine zipper, aka leucine scissors, is a common three-dimensional structural motif in proteins. These motifs are usually found as part of a DNA-binding domain in various transcription factors, and are therefore involved in regulating gene expression... |
Protein | homepage & download |
MEME Multiple EM for Motif Elicitation Multiple EM for Motif Elicitation or MEME is a tool for discovering motifs in a group of related DNA or protein sequences.A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences... /MAST |
Motif discovery and search | Both | server |
MERCI | Discriminative motif discovery and search | Both | homepage & download |
PHI-Blast | Motif search and alignment tool | Both | Pasteur |
Phyloscan Phyloscan Phyloscan is a web service for DNA sequence analysis that is free and open to all users . For locating matches to a user-specified sequence motif for a regulatory binding site, Phyloscan provides a statistically sensitive scan of user-supplied mixed aligned and unaligned DNA sequence data... |
Motif search tool | Nucleotide | server |
PRATT | Pattern generation for use with ScanProsite | Protein | server |
ScanProsite | Motif database search tool | Protein | server |
TEIRESIAS | Motif extraction and database search | Both | server |
Benchmarking
Name | Link | Authors |
---|---|---|
BAliBASE | download | Thompson, Plewniak, Poch |
HOMSTRAD | download | Mizuguchi |
Oxbench | download | Raghava, Searle, Audley, Barber, Barton |
PFAM | [ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release download] | |
PREFAB | download | Edgar |
SABmark | download | Van Walle, Lasters, Wyns |
SMART | download | Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork |
Short-Read Sequence Alignment
Name | Description | paired-end option | Use FASTQ quality | Gapped | Multi-threaded | License | Link |
---|---|---|---|---|---|---|---|
BFAST BFAST BFAST is a universal DNA sequence aligner tool developed at UCLA by Nils Homer.The BFAST Web Server is a user-friendly way to quickly align short reads to reference sequences in both nucleotide space as well as ABI SOLiD color space.... |
Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. | Yes (POSIX Threads POSIX Threads POSIX Threads, usually referred to as Pthreads, is a POSIX standard for threads. The standard, POSIX.1c, Threads extensions , defines an API for creating and manipulating threads.... ) |
GPL | link | |||
BLASTN | BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome. | link | |||||
BLAT BLAT (bioinformatics) Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments.BLAT is a software program developed by Jim Kent at UCSC to identify similarities between DNA sequences and protein sequences. It was developed to assist in the annotation of the human genome sequence... |
Made by Jim Kent Jim Kent William James Kent is an American research scientist and computer programmer. He has been a contributor to genome database projects and the 2003 winner of the Benjamin Franklin Award .- Early life :... . Can handle one mismatch in initial alignment step. |
Yes (client/server). | Free for academic and non-commercial use. | link | |||
Bowtie | Uses a Burrows-Wheeler transform Burrows-Wheeler transform The Burrows–Wheeler transform , is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center in Palo Alto, California... to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies (can be run from inside Geneious Server Geneious Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an... ). |
Yes (POSIX Threads POSIX Threads POSIX Threads, usually referred to as Pthreads, is a POSIX standard for threads. The standard, POSIX.1c, Threads extensions , defines an API for creating and manipulating threads.... ) |
Artistic License | link | |||
BWA | Uses a Burrows-Wheeler transform Burrows-Wheeler transform The Burrows–Wheeler transform , is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center in Palo Alto, California... to create an index of the genome. It's a bit slower than bowtie but allows indels in alignment (can be run from inside Geneious Server Geneious Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an... ). |
Yes | GPL | link | |||
CASHX | Quantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together or as independent modules on their own. This algorithm is very accurate for perfect hits to a reference genome. | No | Free for academic and non-commercial use. | link | |||
CUDA-EC | Short-read alignment error correction using GPUs. | Yes (GPU enabled) | CUDA-EC- | ||||
drFAST | Read mapping alignment software that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and mrsFAST, however designed for the SOLiD sequencing platform (color space reads). It also returns all possible map locations for improved structural variation discovery. | BSD | link | ||||
ELAND | Implemented by Illumina. Includes ungapped alignment with a finite read length. | ||||||
GNUMAP | Accurately performs gapped alignment of sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Includes adaptor trimming, SNP calling and Bisulfite sequence analysis. | Yes (also supports Illumina *_int.txt and *_prb.txt files with all 4 quality scores for each base) | Multithreading and MPI-enabled | link | |||
GEM | High-quality alignment engine (exhaustive mapping, that is 100% of sensitivity, for any number of substitutions; 1 non-exhaustive indel). Several standalone applications (mapper, split mapper, mappability, and other) provided. | Yes | Yes | Yes | GPL; GEM source is currently unavailable | link | |
GMAP and GSNAP | Robust, fast, short-read alignment. GMAP: longer reads, with multiple indels and splices (see entry above under Genomics analysis); GSNAP: shorter reads, with a single indel or up to two splices per read. Useful for digital gene expression, SNP and indel genotyping. Developed by Thomas Wu at Genentech. Used by the National Center for Genome Resources National Center for Genome Resources The is a nonprofit research organization in Santa Fe, New Mexico founded in 1994 focusing on life sciences research, bioinformatics technologies, and leading-edge molecular data production including sequencing, genotyping, and gene expression.... (NCGR) in Alpheus. |
Yes | Free for academic and non-commercial use. | link | |||
Geneious Assembler | Fast, accurate overlap assembler with the ability to handle any combination of sequencing technology, read length, any pairing orientations, with any spacer size for the pairing, with or without a reference genome. | Yes | Commercial | link | |||
LAST | link | ||||||
MAQ | Ungapped alignment that takes into account quality scores for each base (can be run from inside Geneious Server Geneious Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an... ). |
GPL | link | ||||
mrFAST and mrsFAST | Gapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory transfers. They are designed for the Illumina sequencing platform and they can return all possible map locations for improved structural variation discovery. | BSD | mrFAST mrsFAST | ||||
MOM | MOM or maximum oligonucleotide mapping is a query matching tool that captures a maximal length match within the short read. | Yes | link | ||||
MOSAIK Mosaik Mosaik is a German comic book. First published in December 1955, it is the longest-running German monthly comic book and the only one originating in East Germany that still exists. Mosaik also appeared in other countries and other languages... |
Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long. | Yes | link | ||||
MPscan | Fast aligner based on a filtration strategy (no indexing, use q-grams and Backward Nondeterministic DAWG Directed acyclic word graph In computer science, a directed acyclic word graph is a data structure that represents a set of strings, and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length... Matching) |
link | |||||
Novoalign | Gapped alignment of single end and paired end Illumina GA I & II reads and reads from the new Helicos Heliscope Genome Analyzer. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and option to report multiple alignments per read. | Multi-threading and MPI versions available with paid license. | Single threaded version free for academic and non-commercial use. | Novocraft | |||
NextGENe | NextGENe® software has been developed specifically for use by biologists performing analysis of next generation sequencing data from Roche Genome Sequencer FLX, Illumina GA/HiSeq, Life Technologies Applied BioSystems’ SOLiD™ System, PacBio and Ion Torrent platforms. | Yes | Yes | Yes | Yes | Commercial | Softgenetics |
PALMapper | PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm it aligns around 7 million reads per hour on a single CPU. It refines the originally proposed QPALMA approach. | Yes | GPL | link | |||
PerM | Indexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths. | Yes | GPL | link | |||
QPalma | Is able to take advantage of quality scores, intron lengths and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version). | Yes (client/server) | GPLv2 | link | |||
RazerS | No read length limit. Hamming or edit distance mapping with configurable error rates. Configurable and predictable sensitivity (runtime/sensitivity tradeoff). Supports paired-end read mapping. | LGPL | link | ||||
RMAP | Can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated read mapping. There are no limitations on read length or number of mismatches. | Yes | Yes | Yes | GPL v3 | link | |
rNA | A randomized Numerical Aligner for Accurate alignment of NGS reads | Yes | Low quality bases trimming | Yes | Multithreading and MPI-enabled | GPL v3 | link |
RTG Investigator | Extremely fast, tolerant to high indel and substitution counts. Includes full read alignment. Product includes comprehensive pipelines for variant detection and metagenomic analysis with any combination of Illumina, Complete Genomics and Roche 454 data. | Yes | Yes, for variant calling | Yes | Yes | Free for individual investigator use. | link |
Segemehl | Can handle insertions, deletions and mismatches. Uses enhanced suffix arrays. | No | No | Yes | Yes | Free for non-commercial use | link |
SeqMap | Up to 5 mixed substitutions and insertions/deletions. Various tuning options and input/output formats. | Free for academic and non-commercial use. | link | ||||
Shrec | Short read error correction with a Suffix trie data structure. | Yes (Java) | link | ||||
SHRiMP | Indexes the reference genome as of version 2. Uses masks to generate possible keys. Can map ABI SOLiD color space reads. | Yes | Yes | Yes | Yes (OpenMP OpenMP OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms... ) |
BSD derivative | link |
SLIDER | Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. | link | |||||
SOAP, SOAP2 and SOAP3 | Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. SOAP2 using bidirectional BWT to build the index of reference, and it is much faster than the first version. Now an GPU-accelerated version named as SOAP3/GPU is available, that could find all 4-mismatch alignments in tens of seconds per one million reads. | Yes | Yes(multithread), SOAP3/GPU need GPU available. | GPL | link | ||
SOCS | For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm. | Yes | GPL | link | |||
SSAHA and SSAHA2 | Fast for a small number of variants. | Free for academic and non-commercial use. | link | ||||
Stampy | For Illumina reads. High specificity, and sensitive for reads with indels, structural variants, or many SNPs. Slow, but speed increased dramatically by using BWA for first alignment pass). | Yes | Yes | Yes | No | Free for academic and non-commercial use | link |
SToRM | Experimental ; for singles reads only (mainly SOLiD, but with Illumina experimental support now), and with SAM native output. Highly sensitive for reads with many errors, indels (from 1 to 16), and SNPs. Uses spaced seeds. Authors recommend Shrimp2. | No | Yes | Yes | Yes (OpenMP OpenMP OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms... ) |
link | |
Taipan | de-novo Assembler for Illumina reads | Free for academic and non-commercial use. | link | ||||
UGENE | Visual interface both for Bowtie and embedded aligner | Opensource, GPL | link | ||||
XpressAlign | FPGA based sliding window short read aligner which exploits the embarrassingly parallel property of short read alignment. Performance scales linearly with number of transistors on a chip (i.e. performance guaranteed to double with each iteration of Moore's Law without modification to algorithm). Low power consumption is useful for datacentre equipment. Predictable runtime. Better price/performance than software sliding window aligners on current hardware, but not better than software BWT-based aligners currently. Can cope with large numbers (>2) of mismatches. Will find all hit positions for all seeds. Single-FPGA experimental version, needs work to develop it into a multi-FPGA production version. | Free for academic and non-commercial use. | link | ||||
ZOOM | 100% sensitivity for a reads between 15 - 240bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454. | Yes (GUI) No (CLI). | Commercial | link | |||