Oral Paper

         Phytochemical

Predicting enzyme functions: A case study using BAHD acyltransferases

Presenting Author
Gaurav Moghe
Description
Over a thousand plant genomes have been deposited in databases today. While assembling genomes and identifying genes is no longer a major bottleneck, predicting gene functions is still a major challenge. This is especially true for genes in families, such as those in metabolism, where gene duplication is highly prevalent and leads to rapid functional divergence and promiscuity. Which signals are important for functional prediction in duplicate genes is still not clear. Here, we used the large BAHD acyltransferase family as a model for addressing this question. Several dozen BAHDs are present in individual diploid angiosperm genomes but typically the functions of only 10-20% of BAHDs are meaningfully annotated. We explored whether using phylogenomic data coupled with extensive literature curation, sequence motifs, co-expression data and structural features can help in functional prediction. Using phylogenomic data, we improved the number of BAHDs associated with a substrate class in cultivated tomato from 15% to 45%. Sequence analysis also revealed enrichment of specific active site motifs in individual clades that may indicate lineage-specific selection. We found evidence of a large 110 amino acid long intrinsically disordered region in BAHDs, however, there were drastic clade-specific differences. Further analysis indicated that the flexibility of this motif plays a role in substrate specialization and/or promiscuity possibly by changing protein conformational diversity. Co-expression analysis further allowed definition of biological roles of certain BAHDs such as in pathogen defense, detoxification and stress response. Our analyses based on first principles reveal multiple features of enzymes contributing to their in planta functions. We are currently expanding this analysis to ten other enzyme families – comprising ~5% of angiosperm plant genomes – making available thousands of previously published and predicted activities for the phytochemical community to access in the future.