SignalP 4.0: discriminating signal peptides from transmembrane regions

TN Petersen, S Brunak, G Von Heijne, H Nielsen - Nature methods, 2011 - nature.com
Nature methods, 2011nature.com
To the Editor: The secretory signal peptide is a ubiquitous proteinsorting signal that targets
its passenger protein for translocation across the endoplasmic reticulum membrane in
eukaryotes and the cytoplasmic membrane in prokaryotes1. Many methods have been
published for predicting signal peptides from the amino acid sequence, including SignalP2–
4, PrediSi5, SPEPlip6, Signal-CF7, Signal-3L8 and Signal-BLAST9. A benchmark study
done in 2009 found SignalP 3.0 to be the best method10. All these methods, however, have …
To the Editor: The secretory signal peptide is a ubiquitous proteinsorting signal that targets its passenger protein for translocation across the endoplasmic reticulum membrane in eukaryotes and the cytoplasmic membrane in prokaryotes1. Many methods have been published for predicting signal peptides from the amino acid sequence, including SignalP2–4, PrediSi5, SPEPlip6, Signal-CF7, Signal-3L8 and Signal-BLAST9. A benchmark study done in 2009 found SignalP 3.0 to be the best method10. All these methods, however, have only limited ability to distinguish between signal peptides and N-terminal transmembrane helices. Both peptides are hydrophobic, but transmembrane helices typically have longer hydrophobic regions. Also, transmembrane helices do not have cleavage sites, but the cleavage-site pattern is in itself not sufficient to distinguish the two types of sequence. This is a substantial problem because a scan for signal peptides in any complete genome will yield a lot of false positive predictions from N-terminal transmembrane regions. The hidden Markov model method included in SignalP versions 2.0 (ref. 3) and 3.0 (ref. 4) partially took this issue into account by including three submodels of eukaryotic sequences: signal peptide, signal anchor and other proteins. Other methods such as Phobius11, Philius12, membrane protein structure and topology 3 (MEMSAT3) 13, support vector machine–based MEMSAT (MEMSAT-SVM) 14 and Spoctopus15 try to solve the problem by predicting transmembrane topology as well as signal peptides by joint models.
Here we present SignalP version 4.0, which we designed to discriminate between signal peptides and transmembrane regions. In training SignalP 4.0, we used two kinds of negative data: the first correspond to the negative data used in training earlier versions of SignalP, consisting of cytoplasmic and, for the eukaryotes, nuclear proteins; the second comprise sequences not containing signal peptides but containing transmembrane regions within the first 70 residues of the sequence.
nature.com