Main content area

Cloning and sequencing of cDNAs for hypothetical genes from chromosome 2 of Arabidopsis

Xiao, Y.L., Malik, M., Whitelaw, C.A., Town, C.D.
Plant physiology 2002 v.130 no.4 pp. 2118-2128
Arabidopsis thaliana, sequence analysis, plant pathogenic bacteria, plant diseases and disorders, alternative splicing, complementary DNA, chromosomes, Xanthomonas campestris pv. campestris, seedlings, host plants, stress response, cold stress, heat stress, amino acid sequences, genetic variation, introns, roots, callus
About 25% of the genes in the fully sequenced and annotated Arabidopsis genome have structures that are predicted solely by computer algorithms with no support from either nucleic acid or protein homologs from other species or expressed sequence matches from Arabidopsis. These are referred to as "hypothetical genes." On chromosome 2, sequenced by The Institute for Genomic Research, there are approximately 800 hypothetical genes among a total of approximately 4,100 genes. To test their expression under various growth conditions and in specific tissues, we used six cDNA populations prepared from cold-treated, heat-treated, and pathogen (Xanthomonas campestris pv campestris)-infected plants, callus, roots, and young seedlings. To date, 169 hypothetical genes were tested, and 138 of them are found to be expressed in one or more of the six cDNA populations. By sequencing multiple clones from each 5'- and 3'-rapid amplification of cDNA ends (RACE) product and assembling the sequences, we generated full-length sequences for 16 of these genes. For 14 genes, there was one full-length assembly that precisely supported the intron-exon boundaries of their gene predictions, adding only 5'- and 3'-untranslated region sequences. However, for three of these genes, the other assemblies represent additional exons and alternatively spliced or unspliced introns. For the remaining two genes, the cDNA sequences reveal major differences with predicted gene structures. In addition, a total of six genes displayed more than one polyadenylation site. These data will be used to update gene models in The Institute for Genomic Research annotation database ATH1.