Oral Paper

         Phylogenomics

Phylogenetic and biogeographic patterns in sequencing deficiency across vascular plants

Presenting Author
Daniel Spalink
Description
Producing a useful tree of life is a data-intensive process that relies heavily on the DNA sequences assembled over the past few decades. Critical gaps in taxonomic sampling of sequence data remain despite ongoing efforts, both in terms of the per taxon quantity (i.e., number of individuals, number of genes) and quality (i.e., barcode markers, transcriptomes, genomes). Arguably, not all newly generated sequence data are of equal value in terms of information gained with respect to the identification of new genes, alleles, lineages, or relationships. Establishing priorities to fill data gaps more efficiently may therefore be fruitful. Here we present a new statistic, SeqDef (Sequencing Deficiency), which incorporates cophenetic distances in calculating the data deficiency and thus the priority for additional sequencing efforts in each tip of a phylogeny. SeqDef can be tailored to the needs or priorities of each user. To demonstrate this flexibility, we used SeqDef to answer three questions. 1) How is data deficiency distributed across the vascular plant tree of life? 2) How is data deficiency distributed across the globe? 3) Which rare species should be priorities for additional sequencing before they go extinct?