Protein subcellular localization prediction
Encyclopedia
Protein subcellular localization prediction involves the computational prediction of where a protein
resides in a cell
. Prediction of protein subcellular localization is an important component of bioinformatics
-based prediction of protein function and genome annotation
, and it can aid the identification of drug targets.
Most eukaryotic
proteins are encoded in the nuclear genome
and synthesized in the cytosol
, but many need to be further sorted
before they reach their final destination. For prokaryote
s, proteins are synthesized in the cytoplasm and some must be targeted to other locations such as to a cell membrane
or the extracellular
environment. Proteins must be localized at their appropriate subcellular compartment
to perform their desired function.
Experimentally determining the subcellular localization
of a protein
is a laborious and time consuming task. Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by bioinformatics
. Many protein subcellular localization prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization.
Particularly, some predictors developed recently can be used to deal with proteins that may simultaneously exist, or move between, two or more different subcellular locations.
Also, the predictors were specialized for proteins in different organisms. Some was specialized for eukaryotic proteins,
some for human proteins,
and
some for plant proteins. Methods for the prediction of bacterial localization predictors, and their accuracy, have been recently reviewed.
is important for understanding protein function and is a critical step in genome annotation.
Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery
process. For example, secreted proteins and plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.
Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets.
Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as cancer
and Alzheimer’s disease.
Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
resides in a cell
Cell (biology)
The cell is the basic structural and functional unit of all known living organisms. It is the smallest unit of life that is classified as a living thing, and is often called the building block of life. The Alberts text discusses how the "cellular building blocks" move to shape developing embryos....
. Prediction of protein subcellular localization is an important component of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
-based prediction of protein function and genome annotation
Genome project
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features...
, and it can aid the identification of drug targets.
Most eukaryotic
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...
proteins are encoded in the nuclear genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
and synthesized in the cytosol
Cytosol
The cytosol or intracellular fluid is the liquid found inside cells, that is separated into compartments by membranes. For example, the mitochondrial matrix separates the mitochondrion into compartments....
, but many need to be further sorted
Protein targeting
Protein targeting or protein sorting is the mechanism by which a cell transports proteins to the appropriate positions in the cell or outside of it. Sorting targets can be the inner space of an organelle, any of several interior membranes, the cell's outer membrane, or its exterior via secretion...
before they reach their final destination. For prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...
s, proteins are synthesized in the cytoplasm and some must be targeted to other locations such as to a cell membrane
Cell membrane
The cell membrane or plasma membrane is a biological membrane that separates the interior of all cells from the outside environment. The cell membrane is selectively permeable to ions and organic molecules and controls the movement of substances in and out of cells. It basically protects the cell...
or the extracellular
Extracellular
In cell biology, molecular biology and related fields, the word extracellular means "outside the cell". This space is usually taken to be outside the plasma membranes, and occupied by fluid...
environment. Proteins must be localized at their appropriate subcellular compartment
Subcellular localization
The cells of eukaryotic organisms are elaborately subdivided into functionally distinct membrane bound compartments. Some major constituents of eukaryotic cells are: extracellular space, cytoplasm, nucleus, mitochondria, Golgi apparatus, endoplasmic reticulum , peroxisome, vacuoles, cytoskeleton,...
to perform their desired function.
Experimentally determining the subcellular localization
Subcellular localization
The cells of eukaryotic organisms are elaborately subdivided into functionally distinct membrane bound compartments. Some major constituents of eukaryotic cells are: extracellular space, cytoplasm, nucleus, mitochondria, Golgi apparatus, endoplasmic reticulum , peroxisome, vacuoles, cytoskeleton,...
of a protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
is a laborious and time consuming task. Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
. Many protein subcellular localization prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization.
Particularly, some predictors developed recently can be used to deal with proteins that may simultaneously exist, or move between, two or more different subcellular locations.
Methods
Several computational tools for predicting the subcellular localization of a protein are publicly available, a few of which are listed below. The development of protein subcellular location prediction has been summarized in two comprehensive review articles.Also, the predictors were specialized for proteins in different organisms. Some was specialized for eukaryotic proteins,
some for human proteins,
and
some for plant proteins. Methods for the prediction of bacterial localization predictors, and their accuracy, have been recently reviewed.
- Cell-PLoc: A package of web-servers for predicting subcellular localization of proteins in various organisms.
- BaCelLo: Prediction of eukaryotic protein subcellular localization. Unlike other methods, the predictions are balanced among different classes and all the localizations that are predicted are considered as equiprobable, to avoid mispredictions.
- CELLO: CELLO uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.
- Euk-mPLoc 2.0: Predicting the subcellular localization of eukaryotic proteins with both single and multiple sites.
- CoBaltDB: CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations.
- HSLpred: This method allow to predict subcellular localization of human proteins. This method combines power of composition based SVM models and similarity search techniques PSI-BLAST.
- LOCtree: Prediction based on mimicking the cellular sorting mechanism using a hierarchical implementation of support vector machines. LOCtree is a comprehensive predictor incorporating predictions based on PROSITEPROSITEPROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns, signatures, and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot...
/PFAMPfamPfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models.- Features :For each family in Pfam one can:* Look at multiple alignments* View protein domain architectures...
signatures as well as SwissProt keywordsKeywordsKeywords are the words that are used to reveal the internal structure of an author's reasoning. While they are used primarily for rhetoric, they are also used in a strictly grammatical sense for structural composition, reasoning, and comprehension...
. - MultiLoc: An SVM-based prediction engine for a wide range of subcellular locations.
- PSORT: The first widely used method for protein subcellular localization prediction, developed under the leadership of Kenta NakaiPSORTPSORT is a bioinformatics tool used for the prediction of protein localisation sites in cells. It receives the information of an amino acid sequence and its species of origin, e.g. Gram-negative bacteria as inputs. Then it analyses the input sequence by applying the stored rules for various...
. Now researchers are also encouraged to use other PSORT programs such as WoLF PSORT and PSORTb for making predictions for certain types of organisms (see below). PSORTPSORTPSORT is a bioinformatics tool used for the prediction of protein localisation sites in cells. It receives the information of an amino acid sequence and its species of origin, e.g. Gram-negative bacteria as inputs. Then it analyses the input sequence by applying the stored rules for various...
prediction performances are lower than those of recently developed predictors. - PSORTb: Prediction of bacterial protein localization.
- PredictNLS: Prediction of nuclear localization signalNuclear localization signalA nuclear localization signal or sequence is an amino acid sequence which 'tags' a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different...
s. - Proteome Analyst: Prediction of protein localization for both prokaryotes and eukaryotes using a text mining approach.
- SecretomeP: Prediction of eukaryotic proteins that are secreted via a non-traditional secretory mechanism.
- SherLoc: An SVM-based predictor combining MultiLoc with text-based features derived from PubMed abstracts.
- TargetP: Prediction of N-terminal sorting signals.
- WoLF PSORT: An updated version of PSORT/PSORT II for the prediction of eukaryotic sequences.
Application
Determining subcellular localizationSubcellular localization
The cells of eukaryotic organisms are elaborately subdivided into functionally distinct membrane bound compartments. Some major constituents of eukaryotic cells are: extracellular space, cytoplasm, nucleus, mitochondria, Golgi apparatus, endoplasmic reticulum , peroxisome, vacuoles, cytoskeleton,...
is important for understanding protein function and is a critical step in genome annotation.
Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery
Drug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...
process. For example, secreted proteins and plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.
Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets.
Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
and Alzheimer’s disease.
Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.
External links
- Cell Centered Database - Protein subcellular localization data
- Cell-PLoc 2.0 - A recently updated version of Cell-PLoc
- BaCelLo - Balanced subCellular Localization predictor
- CELLO - subCELlular LOcalization predictor for prokaryotes and eukaryotes
- CoBaltDB - Complete bacterial and archaeal orfeomes subcellular localization database and associated resources
- LOCtree - prediction webserver for prokaryotes and eukaryotes]:
- MultiLoc - MultiLoc prediction webserver
- Protein Analysis Subcellular Localization PredictionProtein Analysis Subcellular Localization PredictionProtein Analysis Subcellular Localization Prediction is a process of predicting the location or destination of a protein within the cell using only the protein sequence as its inputs....
- PSORT.org - A portal for protein subcellular localization predictors
- SherLoc - SherLoc prediction webserver