Jump to Main Content
Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis
- Guo, Yan, Dai, Yulin, Yu, Hui, Zhao, Shilin, Samuels, David C., Shyr, Yu
- Genomics 2017 v.109 no.2 pp. 83-90
- genome, genome assembly, genomics, high-throughput nucleotide sequencing, humans, single nucleotide polymorphism
- Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re-sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results.