Protein subcellular localization prediction (or just protein localization prediction) involves the prediction of where a
protein resides in a
cell, its
subcellular localization.
Prediction of protein subcellular localization is an important component of
bioinformatics based prediction of
protein function and
genome annotation, and it can aid the identification of drug targets.
Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by
bioinformatics, and
machine learning.
Many prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization.[1][2][3] Particularly, some predictors have been developed[4] that can be used to deal with proteins that may simultaneously exist, or move between, two or more different subcellular locations. Experimental validation is typically required to confirm the predicted localizations.
In 1999
PSORT was the first published program to predict subcellular localization.[5] Subsequent tools and websites have been released using techniques such as
artificial neural networks,
support vector machine and
protein motifs. Predictors can be specialized for proteins in different organisms. Some are specialized for eukaryotic proteins,[6] some for human proteins,[7] and some for plant proteins.[8] Methods for the prediction of bacterial localization predictors, and their accuracy, have been reviewed.[9] In 2021, SCLpred-MEM, a membrane protein prediction tool powered by artificial neural networks was published.[10] SCLpred-EMS is another tool powered by Artificial neural networks that classify proteins into endomembrane system and secretory pathway (EMS) versus all others.[11] Similarly, Light-Attention uses machine learning methods to predict ten different common subcellular locations.[12]
The development of protein subcellular location prediction has been summarized in two comprehensive review articles.[13][14] Recent tools and an experience report can be found in a recent paper by
Meinken and Min (2012).
Application
Knowledge of the subcellular localization of a protein can significantly improve target identification during the
drug discovery process. For example,
secreted proteins and
plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.
Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets. Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as
cancer and
Alzheimer's disease. Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.
By using prediction a high number of proteins can be assessed in order to find candidates that are trafficked to the desired location.
Databases
The results of subcellular localization prediction can be stored in databases. Examples include the multi-species database
Compartments, FunSecKB2, a fungal database;[15] PlantSecKB, a plant database;[16] MetazSecKB, an animal and human database;[17] and ProtSecKB, a protist database.[18]
References
^Kaleel, M; Zheng, Y; Chen, J; Feng, X; Simpson, JC; Pollastri, G; Mooney, C (6 March 2020). "SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks". Bioinformatics. 36 (11): 3343–3349.
doi:
10.1093/bioinformatics/btaa156.
hdl:10197/12182.
PMID32142105.
^Chou KC, Shen HB (2008). "Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms". Nature Protocols. 3 (2): 153–62.
doi:
10.1038/nprot.2007.494.
PMID18274516.
S2CID226104.
^Shen HB, Chou KC (Nov 2009). "A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0". Analytical Biochemistry. 394 (2): 269–74.
doi:
10.1016/j.ab.2009.07.046.
PMID19651102.
Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (Nov 1998). "Predicting function: from genes to genomes and back". Journal of Molecular Biology. 283 (4): 707–25.
doi:
10.1006/jmbi.1998.2144.
PMID9790834.