Main content area

A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level

Zheng, Sika, Chen, Liang
Nucleic acids research 2009 v.37 no.10 pp. e75
Bayesian theory, Markov chain, alternative splicing, data collection, genes, heteroskedasticity, high-throughput nucleotide sequencing, humans, mice, models, nucleic acids, prediction, quantitative polymerase chain reaction, reverse transcriptase polymerase chain reaction, transcriptome
The complexity of mammalian transcriptomes is compounded by alternative splicing which allows one gene to produce multiple transcript isoforms. However, transcriptome comparison has been limited to differential analysis at the gene level instead of the individual transcript isoform level. High-throughput sequencing technologies and high-resolution tiling arrays provide an unprecedented opportunity to compare transcriptomes at the level of individual splice variants. However, sequence read coverage or probe intensity at each position may represent a family of splice variants instead of one single isoform. Here we propose a hierarchical Bayesian model, BASIS (Bayesian Analysis of Splicing IsoformS), to infer the differential expression level of each transcript isoform in response to two conditions. A latent variable was introduced to perform direct statistical selection of differentially expressed isoforms. Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler. BASIS has the ability to borrow information across different probes (or positions) from the same genes and different genes. BASIS can handle the heteroskedasticity of probe intensity or sequence read coverage. We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set. Some of the predictions were validated by quantitative real-time RT-PCR experiments.