Oral Paper

         Biodiversity Informatics & Herbarium Digitization

Linking phenotype, genotype and environmental data from museum specimens in the Prunus serotina (black cherry) species complex

Presenting Author
Richie Hodel
Description
The phenotypes of organisms have been used for centuries to quantify biodiversity in museum collections, and represent the primary tool early naturalists used to study evolution. Phenotypes are the targets upon which selection acts, and therefore affect the survival and success of organisms in variable environments. In order to understand patterns of biodiversity, an understanding of genomic as well as phenotypic variation is necessary as both contribute to species diversification. However, the relationship between genotype and the environment to shape phenotype is complex, and can be difficult to study. In the genomic and big data age, we can relatively inexpensively obtain genomic and environmental data from field-collected and/or herbarium specimens to investigate genetic patterns across environmental gradients. However, we lack methods to obtain phenotypic data in a high throughput fashion. Recent advances in computer vision-based machine learning approaches hold the promise to extract high-throughput phenotypic data from digitized biological image data, such as herbarium specimens.             The black cherry (Prunus serotina) tree species complex occupies a variety of environmental conditions across its wide native range in North and Central America. Within the species complex, there are five named subspecies that have distinct morphological features, such as leaf shape, size, margin, and texture. Previous genetic investigations indicate some differentiation among subspecies, but genome-scale data are needed to resolve genetic relationships within and among subspecies. Here, using herbarium specimens from the US collections, we aim to explicitly connect genotypic, phenotypic, and environmental big data sets to determine how the phenotypic characters observed on herbarium sheets may have arisen via ineteractions between the genome and environment. We use 610 nuclear loci and plastid genomes generated via Hyb-Seq to quantify genomic variation among herbarium specimens from across the range of P. serotina in the U.S., Mexico, and Central America. For the same specimens, we quantify the environment using bioclimatic variables associated with each specimen’s georfererenced coordinates, and we investigate phenotypic traits present on each herbarium sheet by extracting leaf traits using the versatile machine learning tool LeafMachine. This study has the potential to revolutionize how we collect phenotypic data from herbarium sheets, as well as redefine the scale of data we can expect to include in future studies of phenotypic variation using museum specimens, or specimens mounted as herbarium specimens.