Feature selection with the Boruta package

MB Kursa, WR Rudnicki - Journal of statistical software, 2010 - jstatsoft.org
Journal of statistical software, 2010jstatsoft.org
This article describes a R package Boruta, implementing a novel feature selection algorithm
for finding emph {all relevant variables}. The algorithm is designed as a wrapper around a
Random Forest classification algorithm. It iteratively removes the features which are proved
by a statistical test to be less relevant than random probes. The Boruta package provides a
convenient interface to the algorithm. The short description of the algorithm and examples of
its application are presented.
Abstract
This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph {all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.
jstatsoft.org