[HTML][HTML] ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

T Xu, SK Park, JD Venable, JA Wohlschlegel… - Journal of …, 2015 - Elsevier
T Xu, SK Park, JD Venable, JA Wohlschlegel, JK Diedrich, D Cociorva, B Lu, L Liao, J Hewel…
Journal of proteomics, 2015Elsevier
ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and
protein sequence databases has been developed. This algorithm uses a three tier scoring
scheme. First, a binomial probability is used as a preliminary scoring scheme to select
candidate peptides. The binomial probability scores generated by ProLuCID minimize
molecular weight bias and are independent of database size. A modified cross-correlation
score is calculated for each candidate peptide identified by the binomial probability. This …
Abstract
ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster.
This article is part of a Special Issue entitled: Computational Proteomics.
Elsevier