Metastasis Extraction from NSCLC Clinical Notes: A Retrospective Comparative Evaluation of Large Language Model-Based Classification

This retrospective multi-cohort study evaluated the performance of three large language models for extracting metastasis status from clinical notes in non-small cell lung cancer, addressing a critical gap in cancer registry data where over half of initially identified patients had missing or unknown structured metastasis labels. Across two independent patient cohorts drawn from the Winship Cancer Institute, fine-tuned and zero-shot models achieved strong classification performance, with the best models reaching F1 scores of up to 0.80 for overall metastasis and 0.93 for brain and CNS metastasis at the patient level. Notably, error analysis revealed that most model errors reflected incomplete registry labels or ambiguous clinical documentation rather than true model failures, and an exploratory recovery analysis demonstrated that model predictions agreed with manual annotations at 90% accuracy for patients with missing registry data.

These findings highlight the substantial promise of large language models as scalable tools for augmenting cancer registries, which remain a cornerstone of population-level surveillance, trial eligibility assessment, and treatment planning. Manual abstraction of metastasis status from clinical notes is resource intensive and inconsistent, and this study demonstrates that AI-driven extraction can recover clinically meaningful information at scale. As real-world data becomes increasingly central to oncology research, robust and validated approaches to structured data recovery will be essential for ensuring the accuracy and completeness of the evidence base.