Proteoform: a single term describing protein complexity

LM Smith, NL Kelleher - Nature methods, 2013 - nature.com
Nature methods, 2013nature.com
A surprise revealed by the success of the human genome project was the lower-than-
anticipated number of genes identified:∼ 20,300, rather than the∼ 100,000 estimated 1.
This finding led to the recognition that much of the complexity afforded by our biological
machinery is at the level of protein variation rather than due to a high number of distinct
genes 2. The divergences among highly related, but chemically different, protein molecules
arise from variation within populations, cell and tissue types and subcellular localization. On …
A surprise revealed by the success of the human genome project was the lower-than-anticipated number of genes identified:∼ 20,300, rather than the∼ 100,000 estimated 1. This finding led to the recognition that much of the complexity afforded by our biological machinery is at the level of protein variation rather than due to a high number of distinct genes 2. The divergences among highly related, but chemically different, protein molecules arise from variation within populations, cell and tissue types and subcellular localization. On the DNA, RNA and protein levels, complexity can arise from allelic variations, from alternative splicing of RNA transcripts and from many post-translational modifications, respectively. These events create distinct protein molecules that modulate a wide variety of biological processes, from cell signaling inside or between cells to gene regulation and activation of protein complexes.
Although the complexity of protein forms was first revealed by two-dimensional gel electrophoresis, newer proteomic technologies can provide the precise compositions of whole protein molecules. Mass spectrometry has emerged as a key platform for proteomic analyses, with two contrasting approaches referred to as' bottom-up'and'top-down'proteomics. In the bottom-up approach, proteins are digested into peptides using trypsin or other proteases and are then identified by liquid chromatography and tandem mass spectrometry. In top-down proteomics, digestion into peptides is avoided, and protein identification is obtained directly from fragmentation of the intact protein. When available, the top-down approach provides the richest data for both precise identification (that is, the specific gene in a higher eukaryote that encodes the protein measured) 3 and full characterization of molecular composition. However, it is considerably more challenging to execute than the bottom-up approach because of the complexity of the data generated and various technical limitations.
nature.com