ChromHMM: automating chromatin-state discovery and characterization

J Ernst, M Kellis - Nature methods, 2012 - nature.com
Nature methods, 2012nature.com
Chromatin-state annotation using combinations of chromatin modification patterns has
emerged as a powerful approach for discovering regulatory regions and their cell type–
specific activity patterns and for interpreting disease-association studies 1, 2, 3, 4, 5.
However, the computational challenge of learning chromatin-state models from large
numbers of chromatin modification datasets in multiple cell types still requires extensive
bioinformatics expertise. To address this challenge, we developed ChromHMM, an …
Chromatin-state annotation using combinations of chromatin modification patterns has emerged as a powerful approach for discovering regulatory regions and their cell type–specific activity patterns and for interpreting disease-association studies 1, 2, 3, 4, 5. However, the computational challenge of learning chromatin-state models from large numbers of chromatin modification datasets in multiple cell types still requires extensive bioinformatics expertise. To address this challenge, we developed ChromHMM, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets and visualizing the resulting genome-wide maps of chromatin-state annotations.
ChromHMM is based on a multivariate hidden Markov model that models the observed combination of chromatin marks using a product of independent Bernoulli random variables 2, which enables robust learning of complex patterns of many chromatin modifications. As input, it receives a list of aligned reads for each chromatin mark, which are automatically converted into presence or absence calls for each mark across the genome, based on a Poisson background distribution. One can use an optional additional input of aligned reads for a control dataset to either adjust the threshold for present or absent calls, or as an additional input mark. Alternatively, the user can input files that contain calls from an independent peak caller. By default, chromatin states are analyzed at 200-base-pair intervals that roughly approximate nucleosome sizes, but smaller or larger windows can be specified. We also developed an improved parameter-initialization procedure that enables relatively efficient inference of comparable models across different numbers of states (Supplementary Note).
nature.com