Oral Paper

         Biodiversity Informatics & Herbarium Digitization

From leaves to labels: building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2

Presenting Author
Will Weaver
Description
Quantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time-consuming and have limited scalability. We present LeafMachine2, a suite of machine learning and computer vision tools that can automatically extract a base set of traits from over 100 angiosperm families and calculate pixel-to-metric conversion factors for more than 20 commonly used ruler types.  LeafMachine2 was trained on 494,766 manually prepared and expert-reviewed annotations from 5597 herbarium images obtained from 288 herbaria, representing 2663 species. LeafMachine2 employs object detection and segmentation algorithms to accurately identify and isolate individual leaves and petioles, even in cases of partial occlusion or overlap. Additionally, our landmarking network identifies and measures nine pseudo-landmarks that occur on most broadleaf taxa, including apex and base angles, lamina length and width, midvein length, petiole length, and lobe tips. Archival processing algorithms prepare labels for optical character recognition and interpretation, while reproductive organs are scored.  Our results demonstrate that LeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from field images and non-archival datasets, making it a valuable asset for trait-based research. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus towards the crucial task of data revision and quality control, which is essential for validating autonomously collected measurements.