Variable selection in clustering via Dirichlet process mixture models

S Kim, MG Tadesse, M Vannucci - Biometrika, 2006 - academic.oup.com
Biometrika, 2006academic.oup.com
The increased collection of high-dimensional data in various fields has raised a strong
interest in clustering algorithms and variable selection procedures. In this paper, we propose
a model-based method that addresses the two problems simultaneously. We introduce a
latent binary vector to identify discriminating variables and use Dirichlet process mixture
models to define the cluster structure. We update the variable selection index using a
Metropolis algorithm and obtain inference on the cluster structure via a split-merge Markov …
Abstract
The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a model-based method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a DNA microarray study.
Oxford University Press