Mass spectrometry software
Encyclopedia
Mass spectrometry software is software used for data acquisition, analysis, or representation in mass spectrometry
.
, tandem mass spectrometry
(also known as MS/MS or MS2) experiments are used for protein
/peptide
identification.
In these experiments, sample proteins are broken up into short peptides using an enzyme like trypsin
and separated in time using liquid chromatography. They are then sent through one mass spectrometer to separate them by mass. Peptide having a specific mass are then typically fragmented using collision-induced dissociation
and sent through a second mass spectrometer, which will generate a set of fragment peaks from which the amino acid
sequence of the peptide may often be inferred. Peptide identification software is used to try to reliably make these inferences.
A typical experiment involves several hours of mass spectrometer time, and recent instruments may produce hundreds of thousands of MS/MS spectra, which must then be interpreted.
Peptide identification algorithms fall into two broad classes: database search and de novo search. The former search takes place against a database containing all amino acid sequences assumed to be present in the analyzed sample, whereas the latter infers peptide sequences without knowledge of genomic data. At present, database search is more popular and considered to produce higher quality results for most uses. With increasing instrument precision, however, de novo search may become increasingly attractive.
is a proprietary tandem mass spectrometry data analysis program developed by John Yates
and Jimmy Eng in 1994. The algorithm used by this program is covered by several US and European software patents.
SEQUEST identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein
sequences. It was one of the first, if not the first, database search program.
SEQUEST, like many engines, identifies each tandem mass spectrum individually. The software evaluates protein sequences from a database to compute the list of peptides that could result from each. The peptide's intact mass is known from the mass spectrum, and SEQUEST uses this information to determine the set of candidate peptides sequences that could meaningfully be compared to the spectrum by including only those which are near the mass of the observed peptide ion. For each candidate peptide, SEQUEST projects a theoretical tandem mass spectrum, and SEQUEST compares these theoretical spectra to the observed tandem mass spectrum by the use of cross correlation. The candidate sequence with the best matching theoretical tandem mass spectrum is reported as the best identification for this spectrum.
While very successful in terms of sensitivity, it is quite slow to process data and there are concerns against specificity, especially if multiple posttranslational modification
s (PTMs) are present.
is a proprietary identification program available from Matrix Science. It performs mass spectrometry data analysis through a statistical evaluation of matches between observed and projected peptide fragments rather than cross correlation. As of version 2.2, support for peptide quantitation methods is provided in addition to the identification features.
is a proprietary proteomic mass spectrometry database search engine, developed by Bioinformatics Solutions Inc. In addition to providing an independent database search, results can be incorporated as part of the software’s multi-engine (Sequest, Mascot, X!Tandem, OMSSA, PEAKS DB) consensus reporting tool, inChorus. In addition to reporting database sequences, it also provides a list of sequences identified exclusively by de novo sequencing. The approach of considering de novo sequence results with those of database searching increases the efficiency of the search process, maintains speed and ultimately maintains a low false discovery rate (FDR).
software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.
This software has a simple, XML
-based input file format. This format is used for all of the X! series search engines, as well as the GPM and GPMDB.
Unlike some earlier generation search engines, all of the X! Series search engines calculate statistical confidence (expectation values) for all of the individual spectrum-to-sequence assignments. They also reassemble all of the peptide assignments in a data set onto the known protein sequences and assign the statistical confidence that this assembly and alignment is non-random (i.e., did not occur by chance). Therefore, separate assembly and statistical analysis software (e.g. PeptideProphet and ProteinProphet) are not needed.
This approach is good in terms of speed but poor with regard to false negatives and sensitivity.
X!!Tandem is a parallel, high performance version of X!Tandem that has been parallelized via MPI to run on clusters or other non-shared memory multiprocessors running Linux.
In X!!Tandem the search is parallelized by splitting the input spectra into as many subsets as there are processors, and processing each subset independently. Both compute-intensive stages of the processing (initial and refinement) are parallelized, and overall speedups in excess of 20-fold have been observed on real datasets.
With the exception of the details related to MPI launch, it is run exactly as X!Tandem, and produces exactly the same results using the same input and configuration files. It differs from Parallel Tandem in that the parallelism is handled internally, rather than as an external driver/wrapper.
(SIB). Phenyx incorporates OLAV, a family of statistical scoring models, to generate and optimize scoring schemes that can be tailored for all kinds of instruments, instrumental set-ups and general sample treatments. Although, not RAW, unprocessed data.
Phenyx computes a score to evaluate the quality of a match between a theoretical and experimental peak list (i.e. mass spectrum). A match is thus a collection of observations deduced from this comparison. The basic peptide score is ultimately transformed into a normalized z-Score and a p-Value. A basic peptide score is the sum of raw scores for up to twelve physico-chemical properties.
In addition to regular peptide and protein identification features, Phenyx proposes a number of additional functionalities, such as: a result comparison interface to visualise side-by-side multiple results; an import functionality to incorporate results from other search engines; a manual validation feature to manually accept/reject identifications and dynamically recalculates protein scores.
.
(false discovery rate) validation algorithm as well. It is designed to perform large searches on computational clusters
having hundreds of nodes. Notably, it is largely implemented in an interpreted language, Python
, with only the CPU-intensive routines written in a compiled language (C++
).
at the University of California, San Diego
is a free, open source search algorithm developed at Institute of Genomics and Integrative Biology
. It is available as a windows commandline tool and also as a webserver.
Task:
DeNoS performs complete or almost complete sequencing of peptides with reliability (>95%). DeNoS uses all information from CAD and ECD spectra. It is a hierarchal algorithm. In the first step fragments that are confirmed in both CAD and ECD (so called Golden Complementary Pairs) along with fragments that are only found in CAD (so called Complementary Pairs) are used. After that, step-by-step fragments with low reliability are used. In the last step, if the peptide is still not fully sequenced, the software uses a trivial application from the graf theory to sequence the remaining peptide parts with "unreliable" fragments.
Advantage:
DeNoS is the first algorithm ever to be able to sequence peptides with >95% reliability. 13% percent of all MS/MS spectra are almost completely sequenced (in typical experiments you usually only identify about 10% of all MS/MS spectra using a search engine, so 13% in this case is very good).
Input:
DTA files, where each file contains data from a mass spectrum, either ECD or CAD.
Output:
Complete or almost complete peptide sequences.
de novo automatically provides a complete sequence for each peptide, confidence scores on individual amino acid assignments, simple reporting for high-throughput analysis, and greater knowledge for scientifically sensitive, in-depth investigations. A de novo, manually-assisted mode, is available for users who wish to tweak/optimize their results further. According to published reports, PEAKS is currently the fastest, most accurate auto de novo algorithm available. Automated de novo sequencing on an entire LC run processed data faster than 1 spectra per second. The results went unmatched in accuracy; PEAKS determines at least 3 times as many completely correct sequences as the next best de novo software. Accurate mass capabilities mean de novo at 97% accuracy is possible.
) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database
. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. The SPIDER algorithm matches sequence tags with errors to database sequences for the purpose of protein and peptide identification. BLAST (and similar) homology approaches can fail when confronted with common sequence substitutions such as I/L, N/GG, SAT/TAS. SPIDER is designed to avoid these problems. SPIDER can be used in conjunction with PEAKS mass spectrometry data analysis software.
ESIprot 1.0 enables the charge state determination and molecular weight calculation for low resolution electrospray ionization (ESI) mass spectrometry (MS) data of proteins.
Spectrolyzer focuses on finding protein biomarkers and detecting protein deviations. Spectrolyzer is compatible with most mass spectrometers, i.e. TANDEM MS (MS/MS), MALDI-TOF MS and SELDI-TOF MS. The software ensures high quality of analysis, while allowing high flexibility for special requirements and reduces time needed for each analysis. Spectrolyzer is a software package that contains several other software tools where each of them focuses on analyzing data from a certain mass spectrometry technology, i.e. TANDEM MS (MS/MS), MALDI-TOF MS and SELDI-TOF MS.
-related applications.
. Input is not necessarily restricted to output from ProTrawler. Regatta has been used for successfully for biomarker discovery.
.
TOPP - The OpenMS Proteomics Pipeline - is a set of small applications that can be chained to create analysis pipelines tailored for a specific problem. TOPP is developed using the datastructures and algorithms provided by OpenMS. TOPP is free software available under the LGPL. TOPP provides ready-to-use applications for peak picking, the finding of peptides features, their quantitation and interfaces for most of the database search engines.
OpenMS and TOPP are a joint project of the Algorithmic Bioinformatics group at the Free University of Berlin, the Applied Bioinformatics group at Tübingen University and the Junior Research Group for Protein-Protein Interactions and Computational Proteomics at Saarland University.
of small molecules. Computer methods for interpretation of mass spectral data in Mass Frontier centre on three fundamental methodologies: library search techniques, expert system procedures and classification methods. Mass Frontier uses automated generation of possible fragments at an expert level, including complete fragmentation and rearrangement mechanisms, starting from a user-supplied chemical structure. This software contains an expert system
that automatically extracts a decomposition mechanism for each fragmentation reaction in the fragmentation library and determines the compound class range that the mechanism can be applied to. The expert system applies database mechanisms to a user provided structure and automatically predicts the fragmentation reactions for a given compound. The knowleadge base uses around 30,000 fragmentation schemes that contain around 100,000 reactions collected from mass spectrometry literature.
Mass Frontier also incorporates an automated system for detecting chromatographic components in complex GC/MS
, LC/MS
or MSn
runs and extracting mass spectral signals from closely coeluting components (deconvolution).
Classification methods include principal component analysis, neural networks
and fuzzy clustering
.
Research Foundation. Protein and peptide probabilities are generated by independent implementations of the Peptide Prophet and Protein Prophet algorithms. In ProteoIQ, protein relative quantitation is performed via spectral counting, standard deviations are automatically calculated across replicates, and spectral count abundances are normalized between samples. Integrated comparison functions allow user to quickly compare proteomic results across biological samples.
used with mass spectrometry instruments.
used with mass spectrometry instruments like The JMS AccuTOF T100LC.
.
is an open source chromatography and mass spectrometry software. It can be extended using plug-ins and is available for several operating systems (Microsoft Windows, Linux, Unix, Mac OS X) and processor architectures (x86, x86_64, ppc). A free of charge read only converter for Agilents ChemStation (*.D) files is also available.
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...
.
MS/MS peptide identification
Within the field of protein mass spectrometryProtein mass spectrometry
Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. Mass spectrometry is an important emerging method for the characterization of proteins. The two primary methods for ionization of whole proteins are electrospray ionization and matrix-assisted laser...
, tandem mass spectrometry
Tandem mass spectrometry
Tandem mass spectrometry, also known as MS/MS or MS2, involves multiple steps of mass spectrometry selection, with some form of fragmentation occurring in between the stages.-Tandem MS instruments:...
(also known as MS/MS or MS2) experiments are used for protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
/peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...
identification.
In these experiments, sample proteins are broken up into short peptides using an enzyme like trypsin
Trypsin
Trypsin is a serine protease found in the digestive system of many vertebrates, where it hydrolyses proteins. Trypsin is produced in the pancreas as the inactive proenzyme trypsinogen. Trypsin cleaves peptide chains mainly at the carboxyl side of the amino acids lysine or arginine, except when...
and separated in time using liquid chromatography. They are then sent through one mass spectrometer to separate them by mass. Peptide having a specific mass are then typically fragmented using collision-induced dissociation
Collision-induced dissociation
In Mass spectrometry, Collision-induced dissociation , referred to by some as collisionally activated dissociation , is a mechanism by which to fragment molecular ions in the gas phase. The molecular ions are usually accelerated by some electrical potential to high kinetic energy and then allowed...
and sent through a second mass spectrometer, which will generate a set of fragment peaks from which the amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
sequence of the peptide may often be inferred. Peptide identification software is used to try to reliably make these inferences.
A typical experiment involves several hours of mass spectrometer time, and recent instruments may produce hundreds of thousands of MS/MS spectra, which must then be interpreted.
Peptide identification algorithms fall into two broad classes: database search and de novo search. The former search takes place against a database containing all amino acid sequences assumed to be present in the analyzed sample, whereas the latter infers peptide sequences without knowledge of genomic data. At present, database search is more popular and considered to produce higher quality results for most uses. With increasing instrument precision, however, de novo search may become increasingly attractive.
SEQUEST
SEQUESTSEQUEST
SEQUEST is a tandem mass spectrometry data analysis program used for protein identification. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.-Applications:...
is a proprietary tandem mass spectrometry data analysis program developed by John Yates
John R. Yates
John R. Yates III is an American chemist and professor of chemical biology at The Scripps Research Institute in La Jolla, California. His work is focused on developing tools and in proteomics and he specializes in mass spectrometry...
and Jimmy Eng in 1994. The algorithm used by this program is covered by several US and European software patents.
SEQUEST identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
sequences. It was one of the first, if not the first, database search program.
SEQUEST, like many engines, identifies each tandem mass spectrum individually. The software evaluates protein sequences from a database to compute the list of peptides that could result from each. The peptide's intact mass is known from the mass spectrum, and SEQUEST uses this information to determine the set of candidate peptides sequences that could meaningfully be compared to the spectrum by including only those which are near the mass of the observed peptide ion. For each candidate peptide, SEQUEST projects a theoretical tandem mass spectrum, and SEQUEST compares these theoretical spectra to the observed tandem mass spectrum by the use of cross correlation. The candidate sequence with the best matching theoretical tandem mass spectrum is reported as the best identification for this spectrum.
While very successful in terms of sensitivity, it is quite slow to process data and there are concerns against specificity, especially if multiple posttranslational modification
Posttranslational modification
Posttranslational modification is the chemical modification of a protein after its translation. It is one of the later steps in protein biosynthesis, and thus gene expression, for many proteins....
s (PTMs) are present.
Mascot
MascotMascot (software)
Mascot is a software search engine that uses mass spectrometry data to identify proteins from primary sequence databases.Many research facilities utilize Mascot, either by private or public server for database searching....
is a proprietary identification program available from Matrix Science. It performs mass spectrometry data analysis through a statistical evaluation of matches between observed and projected peptide fragments rather than cross correlation. As of version 2.2, support for peptide quantitation methods is provided in addition to the identification features.
PEAKS
PEAKS DBPEAKS (software)
PEAKS is a software program for tandem mass spectrometry designed for peptide sequencing, protein identification and quantification.-Description:...
is a proprietary proteomic mass spectrometry database search engine, developed by Bioinformatics Solutions Inc. In addition to providing an independent database search, results can be incorporated as part of the software’s multi-engine (Sequest, Mascot, X!Tandem, OMSSA, PEAKS DB) consensus reporting tool, inChorus. In addition to reporting database sequences, it also provides a list of sequences identified exclusively by de novo sequencing. The approach of considering de novo sequence results with those of database searching increases the efficiency of the search process, maintains speed and ultimately maintains a low false discovery rate (FDR).
X!Tandem
X!Tandem is open sourceOpen source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.
This software has a simple, XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
-based input file format. This format is used for all of the X! series search engines, as well as the GPM and GPMDB.
Unlike some earlier generation search engines, all of the X! Series search engines calculate statistical confidence (expectation values) for all of the individual spectrum-to-sequence assignments. They also reassemble all of the peptide assignments in a data set onto the known protein sequences and assign the statistical confidence that this assembly and alignment is non-random (i.e., did not occur by chance). Therefore, separate assembly and statistical analysis software (e.g. PeptideProphet and ProteinProphet) are not needed.
This approach is good in terms of speed but poor with regard to false negatives and sensitivity.
X!!Tandem
X!!Tandem is a parallel, high performance version of X!Tandem that has been parallelized via MPI to run on clusters or other non-shared memory multiprocessors running Linux.
In X!!Tandem the search is parallelized by splitting the input spectra into as many subsets as there are processors, and processing each subset independently. Both compute-intensive stages of the processing (initial and refinement) are parallelized, and overall speedups in excess of 20-fold have been observed on real datasets.
With the exception of the details related to MPI launch, it is run exactly as X!Tandem, and produces exactly the same results using the same input and configuration files. It differs from Parallel Tandem in that the parallelism is handled internally, rather than as an external driver/wrapper.
Phenyx
Phenyx is developed by Geneva Bioinformatics (GeneBio) in collaboration with the Swiss Institute of BioinformaticsSwiss Institute of Bioinformatics
The Swiss Institute of Bioinformatics is an academic not-for-profit foundation which federates bioinformatics activities throughout Switzerland...
(SIB). Phenyx incorporates OLAV, a family of statistical scoring models, to generate and optimize scoring schemes that can be tailored for all kinds of instruments, instrumental set-ups and general sample treatments. Although, not RAW, unprocessed data.
Phenyx computes a score to evaluate the quality of a match between a theoretical and experimental peak list (i.e. mass spectrum). A match is thus a collection of observations deduced from this comparison. The basic peptide score is ultimately transformed into a normalized z-Score and a p-Value. A basic peptide score is the sum of raw scores for up to twelve physico-chemical properties.
In addition to regular peptide and protein identification features, Phenyx proposes a number of additional functionalities, such as: a result comparison interface to visualise side-by-side multiple results; an import functionality to incorporate results from other search engines; a manual validation feature to manually accept/reject identifications and dynamically recalculates protein scores.
OMSSA
OMSSA is an open source database search program developed at NCBINational Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
.
MyriMatch
MyriMatch is an open source database search program developed at the Vanderbilt Medical Center.greylag
Greylag is an open source database search program developed at the Stowers Institute for Medical Research. Its scoring algorithm is based on that of MyriMatch, but it includes a novel FDRFalse discovery rate
False discovery rate control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses...
(false discovery rate) validation algorithm as well. It is designed to perform large searches on computational clusters
Cluster (computing)
A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks...
having hundreds of nodes. Notably, it is largely implemented in an interpreted language, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, with only the CPU-intensive routines written in a compiled language (C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
).
ByOnic
ByOnic is a database search program with a public web interface. developed at PARC. ByOnic works together with ComByne, which combines peptide identifications to produce a protein score.InsPecT
A MS-alignment search engine available at the Center for Computational Mass SpectrometryCenter for Computational Mass Spectrometry
The Center for Computational Mass Spectrometry is a proteomics center at the University of California, San Diego.CCMS is the author of mass spectrometry software tools such as InsPecT search engine or PepNovo....
at the University of California, San Diego
SIMS
SIMS (Sequential Interval Motif Search) is a software tool design to perform unrestrictive PTM search over tandem mass spectra. In other words, users do not have to characterize the potential PTMs. Instead, users only need to specify the range of modification mass for each individual amino acid.MassWiz
MassWizMassWiz
MassWiz is an open source software for protein identification from mass spectrometry data. Developed at Institute of Genomics and Integrative Biology, it is currently in its beta release. It has been developed to better the existing methods by finding semi and non-enzymatic peptides to enhance the...
is a free, open source search algorithm developed at Institute of Genomics and Integrative Biology
Institute of Genomics and Integrative Biology
Institute of Genomics and Integrative Biology is a scientific research institute devoted primarily to integrated biological research. It is a part of Council of Scientific and Industrial Research , India....
. It is available as a windows commandline tool and also as a webserver.
De novo sequencing algorithms
De novo peptide sequencing algorithms are based, in general, on the approach proposed in.DeNoS
DeNoS is part of the software tool Proteinmatching Analysis Software (PAS) which in turn is part of the software package Medicwave Bioinformatics Suite (MBS).Task:
DeNoS performs complete or almost complete sequencing of peptides with reliability (>95%). DeNoS uses all information from CAD and ECD spectra. It is a hierarchal algorithm. In the first step fragments that are confirmed in both CAD and ECD (so called Golden Complementary Pairs) along with fragments that are only found in CAD (so called Complementary Pairs) are used. After that, step-by-step fragments with low reliability are used. In the last step, if the peptide is still not fully sequenced, the software uses a trivial application from the graf theory to sequence the remaining peptide parts with "unreliable" fragments.
Advantage:
DeNoS is the first algorithm ever to be able to sequence peptides with >95% reliability. 13% percent of all MS/MS spectra are almost completely sequenced (in typical experiments you usually only identify about 10% of all MS/MS spectra using a search engine, so 13% in this case is very good).
Input:
DTA files, where each file contains data from a mass spectrum, either ECD or CAD.
Output:
Complete or almost complete peptide sequences.
PEAKS
PEAKSPEAKS (software)
PEAKS is a software program for tandem mass spectrometry designed for peptide sequencing, protein identification and quantification.-Description:...
de novo automatically provides a complete sequence for each peptide, confidence scores on individual amino acid assignments, simple reporting for high-throughput analysis, and greater knowledge for scientifically sensitive, in-depth investigations. A de novo, manually-assisted mode, is available for users who wish to tweak/optimize their results further. According to published reports, PEAKS is currently the fastest, most accurate auto de novo algorithm available. Automated de novo sequencing on an entire LC run processed data faster than 1 spectra per second. The results went unmatched in accuracy; PEAKS determines at least 3 times as many completely correct sequences as the next best de novo software. Accurate mass capabilities mean de novo at 97% accuracy is possible.
SPIDER
For the identification of proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tagsPeptide sequence tag
A peptide sequence tag is a piece of information about a peptide obtained by tandem mass spectrometry that can be used to identify this peptide in a protein database.-Mass spectrometry:...
) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database
Sequence database
In the field of bioinformatics, a sequence database is a large collection of computerized nucleic acid sequences, protein sequences, or other sequences stored on a computer...
. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. The SPIDER algorithm matches sequence tags with errors to database sequences for the purpose of protein and peptide identification. BLAST (and similar) homology approaches can fail when confronted with common sequence substitutions such as I/L, N/GG, SAT/TAS. SPIDER is designed to avoid these problems. SPIDER can be used in conjunction with PEAKS mass spectrometry data analysis software.
AnalyzerPro
AnalyzerPro is a proprietary software by SpectralWorks Limited. It is a vendor independent software application for processing mass spectrometry data.Using proprietory algorithms, AnalyzerPro can analyze both GC-MS and LC-MS using both qualitative and quantitative data processing. It is widely used for metabolomics data processing using MatrixAnalyzer for the comparison of multiple data sets.RemoteAnalyzer
RemoteAnalyzer is a proprietary software by SpectralWorks Limited. It is a vendor independent 'Open Access' client/server based solution to provide a walk-up and use LC-MS and GC-MS data system. Instrument control and data processing support for multiple vendors' hardware is provided.ESIprot 1.0
Electrospray ionization (ESI) mass spectrometry (MS) devices with relatively low resolution are widely used for proteomics and metabolomics. Ion trap devices like the Agilent MSD/XCT ultra or the Bruker HCT ultra are typical representatives. However, even if ESI-MS data of most of the naturally occurring proteins can be measured, the availability of data evaluation software for such ESI protein spectra with low resolution is quite limited.ESIprot 1.0 enables the charge state determination and molecular weight calculation for low resolution electrospray ionization (ESI) mass spectrometry (MS) data of proteins.
Spectrolyzer
Spectrolyzer is a flexible Microsoft Windows based software package that provides bioinformatics data analysis tools for different mass spectrometers.Spectrolyzer focuses on finding protein biomarkers and detecting protein deviations. Spectrolyzer is compatible with most mass spectrometers, i.e. TANDEM MS (MS/MS), MALDI-TOF MS and SELDI-TOF MS. The software ensures high quality of analysis, while allowing high flexibility for special requirements and reduces time needed for each analysis. Spectrolyzer is a software package that contains several other software tools where each of them focuses on analyzing data from a certain mass spectrometry technology, i.e. TANDEM MS (MS/MS), MALDI-TOF MS and SELDI-TOF MS.
PROTRAWLER
ProTrawler is an LC/MS data reduction application that reads raw mass spectrometry vendor data (from a variety of well-known instrument companies) and creates lists of {mass, retention time, integrated signal intensity} triplets summarizing the LC/MS chromatogram. The measurements are reported with errors, which are essential for performing dynamic binning for comparisons between data sets. ProTrawler operates in two modes: a highly visual hands-on (expert) mode for the development of parameters used in data reduction and a fully automated mode for moving through many chromatograms in an automated fashion. ProTrawler's data reduction work flow includes background elimination, noise estimation, peak shape estimation, shape deconvolution, and isotopic and charge-state list deconvolution (factoring in errors and signal noise) to give a list features. Typically, ProTrawler reduces 1 GB of raw data to 10 Kb of processed results with a detection sensitivity of three orders of magnitude in 25% of the data acquisition time. No formal Bayesian methods are used, but sophisticated statistical inference is employed throughout. ProTrawler has been used for bacterial protein biomarker discovery efforts as well as for IPExIPEX
IPEX is a rare disease linked to the dysfunction of the transcriptional activator FoxP3.It leads to the dysfunction of regulatory T-cells and the subsequent autoimmunity...
-related applications.
REGATTA
Regatta is an LC/MS list comparison application that works hand-in-hand with ProTrawler (but accepts input in Excel/CSV form) to provide an environment for LC/MS results list filtering and normalization {mass, retention time, integrated intensity} lists. To accomplish this, Regatta solves the famous Transitive Property of Equality problem that arises in the comparison of analytical list data, viz., if Peak A in Sample A overlaps Peak B in Sample B, and Peak B overlaps Peak C in Sample C, but Peak A does not overlap Peak C, then can we say that we've measured the same analyte in all three samples or not? Regatta also implements multivariate analysis, e.g., hierarchical cluster analysis, principal component analysis, as well as statistical tests, e.g., coefficients of variationCoefficient of variation
In probability theory and statistics, the coefficient of variation is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation , which is...
. Input is not necessarily restricted to output from ProTrawler. Regatta has been used for successfully for biomarker discovery.
OmicsHub Proteomics
OmicsHub® Proteomics combines a LIMS for mass spec information management with data analysis functionalities on one platform. The software allows the user to import data files from multiple instruments, and conduct protein peak detection, filtering, protein identification, annotation and exportation of formatted reports. It is a single server platform with a web interface for multiuser access and is proprietary software of Integromics.VIPER and Decon2LS
The "Proteomics Research Resource for Integrative Biology" distributes software tools (VIPER, Decon2LS, and others) that can be used to perform analysis of accurate mass and chromatography retention time analysis of LC-MS features. Sometimes referred to as the Accurate Mass and Time tag approach (AMT tag approach) generally these tools are used for ProteomicsProteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...
.
OpenMS / TOPP
OpenMS is a software C++ library for LC/MS data management and analysis. It offers an infrastructure for the development of mass spectrometry related software. OpenMS is free software available under the LGPL.TOPP - The OpenMS Proteomics Pipeline - is a set of small applications that can be chained to create analysis pipelines tailored for a specific problem. TOPP is developed using the datastructures and algorithms provided by OpenMS. TOPP is free software available under the LGPL. TOPP provides ready-to-use applications for peak picking, the finding of peptides features, their quantitation and interfaces for most of the database search engines.
OpenMS and TOPP are a joint project of the Algorithmic Bioinformatics group at the Free University of Berlin, the Applied Bioinformatics group at Tübingen University and the Junior Research Group for Protein-Protein Interactions and Computational Proteomics at Saarland University.
Mass Frontier
Mass Frontier is a software tool for interpretation and management of mass spectraMass spectrum
A mass spectrum is an intensity vs. m/z plot representing a chemical analysis. Hence, the mass spectrum of a sample is a pattern representing the distribution of ions by mass in a sample. It is a histogram usually acquired using an instrument called a mass spectrometer...
of small molecules. Computer methods for interpretation of mass spectral data in Mass Frontier centre on three fundamental methodologies: library search techniques, expert system procedures and classification methods. Mass Frontier uses automated generation of possible fragments at an expert level, including complete fragmentation and rearrangement mechanisms, starting from a user-supplied chemical structure. This software contains an expert system
Expert system
In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...
that automatically extracts a decomposition mechanism for each fragmentation reaction in the fragmentation library and determines the compound class range that the mechanism can be applied to. The expert system applies database mechanisms to a user provided structure and automatically predicts the fragmentation reactions for a given compound. The knowleadge base uses around 30,000 fragmentation schemes that contain around 100,000 reactions collected from mass spectrometry literature.
Mass Frontier also incorporates an automated system for detecting chromatographic components in complex GC/MS
Gas chromatography-mass spectrometry
Gas chromatography–mass spectrometry is a method that combines the features of gas-liquid chromatography and mass spectrometry to identify different substances within a test sample. Applications of GC-MS include drug detection, fire investigation, environmental analysis, explosives investigation,...
, LC/MS
Liquid chromatography-mass spectrometry
Liquid chromatography–mass spectrometry is an analytical chemistry technique that combines the physical separation capabilities of liquid chromatography with the mass analysis capabilities of mass spectrometry. LC-MS is a powerful technique used for many applications which has very high...
or MSn
Tandem mass spectrometry
Tandem mass spectrometry, also known as MS/MS or MS2, involves multiple steps of mass spectrometry selection, with some form of fragmentation occurring in between the stages.-Tandem MS instruments:...
runs and extracting mass spectral signals from closely coeluting components (deconvolution).
Classification methods include principal component analysis, neural networks
Neural Networks
Neural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
and fuzzy clustering
Fuzzy clustering
Fuzzy clustering is a class of algorithms for cluster analysis in which the allocation of data points to clusters is not "hard" but "fuzzy" in the same sense as fuzzy logic.- Explanation of clustering :...
.
massXpert
The program massXpert is a graphical user interface-based (GUI) software for simulating and analyzing mass spectrometric data obtained on known bio-polymer sequences. The software runs in an identical manner on MS-Windows, Mac OS X and GNU/Linux/Unix platforms. massXpert is not for identifying proteins, but is useful when characterizing biopolymer sequences (post-translational modifications, intra-molecular cross-links...). It comprises four modules, all available in the same program interface: XpertDef will let the user define any aspect of the polymer chemistry at hand (atoms/isotopes, monomers, modifications, cleavage agents, fragmentation patterns, cross-links, default ionization...) ; XpertCalc is a desktop calculator with which anything mass is calculatable (the calculation is polymer chemistry definition-aware and is fully programmable; m/z ratios are computable with automatic replacement of the ionization agent ; isotopic patterns are computable starting from an elemental composition, with the possibility to specify the resolution of the mass spectrometer) ; XpertEdit is the central part of the software suite. In it reside all the simulation/analysis functionalities, like polymer sequence editing, sequence/monomer chemical modifications, cleavages, fragmentations, elemental/monomeric composition determinations, pI/net charge calculations, arbitrary mass searches in the polymer sequence; XpertMiner is a rather recently developed module (still experimental) in which it is possible to import lists of (m/z, z) pairs to submit them to any kind of calculation. Typically this module will be used to apply a formula to all the pairs in a single strike, or to perform matches between two lists, one from a simulation and another from the mass spectrometric data actually gotten from the mass spectrometer. All the simulations' results can be exported in the form of text either to the clipboard or to text files.mMass
mMass presents open source multi-platform package of tools for precise mass spectrometric data analysis and interpretation. It is written in Python language, so it is portable to different computer platforms, and released under GNU General Public License, so it can be modified or extended by modules for specific needs.ProteoIQ
ProteoIQ is commercial software for the post-analysis of Mascot, SEQUEST, or X!Tandem database search results. The software provides the means to combine tandem mass spectrometry database search results derived from different instruments/platforms. Since the primary goal of many proteomics projects is to determine thresholds which identify as many real proteins as possible while encountering a minimal number of false positive protein identifications, ProteoIQ incorporates the two most common methods for statistical validation of large proteome datasets: the false discovery rate and protein probability approaches. For false discovery rate calculations, ProteoIQ incorporates proprietary Protein Validation Technology (ProValT) algorithms licensed from the University of GeorgiaUniversity of Georgia
The University of Georgia is a public research university located in Athens, Georgia, United States. Founded in 1785, it is the oldest and largest of the state's institutions of higher learning and is one of multiple schools to claim the title of the oldest public university in the United States...
Research Foundation. Protein and peptide probabilities are generated by independent implementations of the Peptide Prophet and Protein Prophet algorithms. In ProteoIQ, protein relative quantitation is performed via spectral counting, standard deviations are automatically calculated across replicates, and spectral count abundances are normalized between samples. Integrated comparison functions allow user to quickly compare proteomic results across biological samples.
PatternLab for proteomics
PatternLab is a free software for post-analysis of SEQUEST or ProLuCID database search results filtered by DTASelect or Census. It offers several tools that combine false discovery rates with statistical tests and protein fold changes to pinpoint differentially expressed proteins, find trend of proteins having similar expression profiles in time course experiments, generate area proportional Venn diagrams, and even deconvolute mass spectra to enable analysis of top-down / middle-down proteomic data (YADA module). Results can also be analyzed using its Gene Ontology Explorer module.MolAna
MolAna was developed by Phenomenom Discoveries Inc, (PDI) for use in IONICS Mass Spectrometry Group's 3Q Molecular Analyzer, Triple quadrupole mass spectrometerTriple quadrupole mass spectrometer
A triple quadrupole mass spectrometer is a tandem mass spectrometer consisting of two quadrupole mass spectrometers in series, with a radio frequency only quadrupole between them to act as a collision cell for collision-induced dissociation...
Xcalibur
Xcalibur is a proprietary software by Thermo Fisher ScientificThermo Fisher Scientific
Thermo Fisher Scientific is a large life sciences supply company that was created in 2006 by the merger of Thermo Electron and Fisher Scientific.-Predecessors and merger:...
used with mass spectrometry instruments.
MassCenter
MassCenter is a proprietary software by JEOLJEOL
is a manufacturer of electron microscopes and other scientific instruments. Its headquarters are in Tokyo, Japan, with 25 subsidiaries and two associated companies....
used with mass spectrometry instruments like The JMS AccuTOF T100LC.
MSight
MSight is a free software for mass spectrometry imaging developed by the Swiss Institute of BioinformaticsSwiss Institute of Bioinformatics
The Swiss Institute of Bioinformatics is an academic not-for-profit foundation which federates bioinformatics activities throughout Switzerland...
.
Spectromania
Spectromania is a commercial software for analysis and visualization of mass spectrometric data.Peacock
Peacock is an open source Mac OS X application developed by Johan Kool that can be used to interpret gas-chromatography/mass-spectrometry (GC/MS) data files.OpenChrom
OpenChromOpenChrom
OpenChrom is an open source software for the mass spectrometric analysis of chromatographic data. Its focus is to handle native data files from several mass spectrometry systems...
is an open source chromatography and mass spectrometry software. It can be extended using plug-ins and is available for several operating systems (Microsoft Windows, Linux, Unix, Mac OS X) and processor architectures (x86, x86_64, ppc). A free of charge read only converter for Agilents ChemStation (*.D) files is also available.
File formats
- Mass spectrometry data format: for a list of mass spectrometry data viewers and format converters.