Main content area

Data mining for discovery of endophytic and epiphytic fungal diversity in short-read genomic data from deciduous trees

LaBonte, Nicholas R., Jacobs, James, Ebrahimi, Aziz, Lawson, Shaneka, Woeste, Keith
Fungal ecology 2018 v.35 pp. 1-9
Castanea, DNA, DNA barcoding, Juglans, Ulmus, bioinformatics, community structure, databases, endophytes, epiphytes, fungal communities, fungi, genome, genomics, high-throughput nucleotide sequencing, internal transcribed spacers, proteins, ribosomal RNA, symbionts, trees
High-throughput sequencing of DNA barcodes, such as the internal transcribed spacer (ITS) of the 16s rRNA sequence, has expanded the ability of researchers to investigate the endophytic fungal communities of living plants. With a large and growing database of complete fungal genomes, it may be possible to utilize portions of fungal symbiont genomes outside conventional marker sequences for community analysis of short-read data. We designed a bioinformatics pipeline to identify putative fungal coding sequences from 100 bp Illumina reads of DNA extracted from several angiosperm species (Castanea, Juglans, and Ulmus). Reads remaining after a two-step filtering process made up a small fraction of total reads (2–100 putative fungal reads per 10,000 plant reads) and were assigned to fungal genera and orders based on similarity to proteins from complete fungal genomes. Some of the taxa identified are known to be ubiquitous class 2 endophytes. We detected some differences in endophyte community composition based on ITS sequence data versus results from the short-read pipeline, particularly among Ulmus. ITS results in Juglans and Castanea, however, closely reflected results from the short-read pipeline, and both methods portrayed similar intergeneric differences in endophyte community composition.