Jump to Main Content
A Bayesian Hidden Markov Model for Motif Discovery Through Joint Modeling of Genomic Sequence and ChIPâChip Data
- Gelfond, Jonathan A. L., Gupta, Mayetri, Ibrahim, Joseph G.
- Biometrics 2009 v.65 no.4 pp. 1087-1095
- DNA, Markov chain, algorithms, analytical methods, binding sites, biometry, chromatin, data collection, genome, microarray technology, models, nucleotide sequences, precipitin tests, transcription factors, uncertainty, yeasts
- We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIPâchip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIPâchip assays are used to focus the genomeâwide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a twoâstep approach: (i) analyze array data to estimate IPâenrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available twoâstage procedures in terms of both sensitivity and specificity.