A model-based background adjustment for oligonucleotide expression arrays

Z Wu, RA Irizarry, R Gentleman… - Journal of the …, 2004 - Taylor & Francis
Z Wu, RA Irizarry, R Gentleman, F Martinez-Murillo, F Spencer
Journal of the American statistical Association, 2004Taylor & Francis
High-density oligonucleotide expression arrays are widely used in many areas of
biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix
system, a fair amount of further preprocessing and data reduction occurs after the image-
processing step. Statistical procedures developed by academic groups have been
successful in improving the default algorithms provided by the Affymetrix system. In this
article we present a solution to one of the preprocessing steps—background adjustment …
High-density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further preprocessing and data reduction occurs after the image-processing step. Statistical procedures developed by academic groups have been successful in improving the default algorithms provided by the Affymetrix system. In this article we present a solution to one of the preprocessing steps—background adjustment—based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications. These arrays use short oligonucleotides to probe for genes in an RNA sample. Typically, each gene is represented by 11–20 pairs of oligonucleotide probes. The first component of these pairs is referred to as a perfect match probe and is designed to hybridize only with transcripts from the intended gene (i. e., specific hybridization). However, hybridization by other sequences (i. e., nonspecific hybridization) is unavoidable. Furthermore, hybridization strengths are measured by a scanner that introduces optical noise. Therefore, the observed intensities need to be adjusted to give accurate measurements of specific hybridization. We have found that the default ad hoc adjustment, provided as part of the Affymetrix system, can be improved through the use of estimators derived from a statistical model that uses probe sequence information. A final step in preprocessing is to summarize the probe-level data for each gene to define a measure of expression that represents the amount of the corresponding mRNA species. In this article we illustrate the practical consequences of not adjusting appropriately for the presence of nonspecific hybridization and provide a solution based on our background adjustment procedure. Software that computes our adjustment is available as part of the Bioconductor Project (http://www.bioconductor.org).
Taylor & Francis Online