Oral Paper

         Population Genetics/Genomics

2 does not equal 4: Variance dissimilarities in mixed-ploidy genomic data cause irregular patterns in PCA and other clustering analyses

Presenting Author
Trevor Faske
Description
The formation of polyploids, individuals with multiple sets of chromosomes, has played a major role in the diversification of plants and can have pronounced evolutionary consequences with extended ecological effects. A history of polyploidization events is evident within roughly 50% of plant species and has been shown to vary across latitude, elevation, and environmental stressors. Moreover, it is estimated that 16% of all plant taxa are mixed-ploidy systems, where a single species has multiple known ploidal levels that can vary either between populations across geographic and environment gradients, or within a single population. While high-throughput sequencing has made it possible to generate a genome-wide perspective easily and affordably for thousands of individuals across the landscape, there are still many statistical and bioinformatic uncertainties arising from ploidal variation. In recent years, a plethora of statistical software has been published to appropriately call genotypes and incorporate uncertainty in mixed-ploidy systems, but proper assessment of how mixed-ploidal variation affects widely used downstream population genetic analyses is lacking. To address this, we evaluated outcomes of principal components analysis (PCA) and other clustering analyses (e.g., Structure, neighbor-joining, UMAP, etc.) across a range of common variant calling and genomic standardization approaches. We simulated multiple mixed-ploidy systems that varied by the extent of genetic differentiation as well as number of demes, individuals, and loci. Our results show that currently accepted practices for variant calling and standardization can have vastly different effects on clustering outcomes and interpretations. Alarmingly, the effect of ploidal variation on clustering is more pronounced that of true genetic differentiation, which is the goal of these analyses. We identify the cause to be dissimilar variance across ploidy levels, which influence clustering results in analyses that rely on forms of variance partitioning or maximization. We also highlight the evolutionary scenarios in which this issue is more pronounced and offer suggestions to better handle and interpret mixed-ploidy variation in clustering analyses. This research builds on recent advances in mixed-ploidy population genetics and provides researchers with an improved framework to handle and interpret results from various population and landscape genomic analyses.