Symposia

         Supporting inclusive and sustainable research infrastructure for systematics (SISRIS) by connecting scientists and their specimens

Let the records show: attribution of scientific credit in natural history collections

Presenting Author
Rebecca Dikow
Description
Natural history collections are essential resources for taxonomy, systematics, and ecological and climate change research. Mass digitization of these collections provides the opportunity to study broad biological patterns among specimens and their associated metadata at a scale that was previously impossible. The specimen metadata can also be used to study the contributions of the people that collected and identified these specimens. A proper accounting of these contributions impacts our understanding of the history of these collections and who played a role in their growth. Here, we provide an assessment of the scientific contributions of past women in science at the Smithsonian Institution, focusing on their specimen collections and identifications. The challenges related to documenting the work done by women over the Smithsonian’s history directly led to the creation of a ‘living’ list called the Funk List, named in honor of Vicki Funk, which is a list of past and present women in science affiliated with the Smithsonian. Using this list as a starting point, we evaluate natural history specimen collections records available on GBIF and Smithsonian Annual Reports, volumes dating to the founding of the Smithsonian in 1846. First, we collected and determined specimens for deceased Funk List individuals. We then attributed specimens (both to collectors and determiners) in Bionomia to all Funk List individuals for which we could find specimens. In total, we identified 40 women with specimen collections or identifications, with a total of more than 120,000 total specimens attributed to them. We then analyzed specimens by Smithsonian unit, location, and taxonomic family. In cases where specimens are not yet digitized, we were able to learn more about their contributions using Annual Reports, which provide a richer picture of their work at the Smithsonian. We also release a semantic search application, which allows users to search the Smithsonian Annual Reports. To build this, we looked for mentions of Funk List individuals in the historical reports and ran Optical Character Recognition (OCR) across downloaded publicly-available JPEG documents to produce new PDF and txt files. We then developed a custom spaCy pipeline in Python for querying the text OCR output files for each Funk List individual, considering all possible name variants. This work relies on collaboration as well as deep institutional knowledge. Collections records are a rich resource but there are significant barriers to accurate specimen attribution, which disproportionately affect women collectors and determiners, as well as those from other marginalized groups. We propose ways we might document these problems at scale and remedy cases of misattribution in digital repositories of record.