Main content area

A Bayesian Hidden Markov Model for Motif Discovery Through Joint Modeling of Genomic Sequence and ChIP‐Chip Data

Gelfond, Jonathan A. L., Gupta, Mayetri, Ibrahim, Joseph G.
Biometrics 2009 v.65 no.4 pp. 1087-1095
DNA, Markov chain, algorithms, analytical methods, binding sites, biometry, chromatin, data collection, genome, microarray technology, models, nucleotide sequences, precipitin tests, transcription factors, uncertainty, yeasts
We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP‐chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP‐chip assays are used to focus the genome‐wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two‐step approach: (i) analyze array data to estimate IP‐enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two‐stage procedures in terms of both sensitivity and specificity.