Jump to Main Content
A high-quality annotated transcriptome of swine peripheral blood
- Liu, Haibo, Smith, Timothy P.L., Nonneman, Dan J., Dekkers, Jack C.M., Tuggle, Christopher K.
- BMC Genomics 2017 v.18 no.1 pp. 479-500
- RNA splicing, animal genetics, blood, complementary DNA, computer software, databases, exons, genome, introns, medicine, messenger RNA, physiology, sequence homology, swine, transcriptome, transcriptomics
- Background: High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly. Results: A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique GO terms; and 7,618 PTs annotated with Enzyme Commission codes were assigned to 134 KEGG pathways. Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissue types, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5’ termini of 37,569 PTs was validated by public CAGE data. By comparison to the Ensembl transcripts, we found the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts and 12,262 PTs had both longer 5’ and 3’ UTRs than their maximally overlapping Ensembl transcripts and 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNAs collection. Conclusion: We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome.