Trust Before Scale: Scientific Methods in the Age of AI
AI can scale safety insights—but only high-quality data, interoperability, and expert judgment make them trustworthy.
AI can scale safety insights—but only high-quality data, interoperability, and expert judgment make them trustworthy.

The journey of data reuse in preclinical safety spans over more than two decades of progressive methodological refinement. Beginning with early QSAR and expert systems for endpoints such as mutagenicity and skin sensitization, which by 2006 already demonstrated >70% predictive accuracy for these mechanistically well-defined effects, the field has steadily expanded its ambitions. In 2018, a big-data concordance analysis of 1,637,449 adverse event reports from over 3,920 approved drugs across FDA and EMA regulatory filings illustrated both the power and the pitfalls of large-scale, automated preclinical-to-clinical translation.
The identified pitfalls illustrate the prerequisites for effective data reuse rooted in the FAIR principles—Findability, Accessibility, Interoperability, and Reusability. The concordance exposed persistent challenges: the lack of controlled vocabularies and harmonized preclinical-clinical ontologies, MedDRA terms designed for human observations poorly mapping to animal findings, and the heterogeneity of manual curation across decades of regulatory documents. Thus model reliability depends on high-quality, standardized training data. These findings underscore that data governance and semantic interoperability are not administrative formalities.
AI and safety assessment amplify both the promise and the responsibility. Automated, agnostic big-data analyses can surface unexpected signals and generate hypotheses at previously impossible scale. Yet the same analyses reveal that absence of an animal finding does not reliably predict human safety, and that high statistical significance does not equate to high predictive value. Mechanistic understanding, expert interpretation, and contextual judgment remain irreplaceable. AI and in silico tools must possess clearly defined applicability domains, and model outputs must be transparently linked to their source data and assumptions.
Organizationally and culturally, the field must move beyond data silos sustained by institutional ownership structures. Integrated cross-functional platforms linking preclinical, ADME, and clinical data are essential, as is embedding data scientists within safety teams supported by mutual training programs. Clearance committees and governance frameworks that enable responsible data sharing—not restrict it—are the critical enablers. Ultimately, translational trust is not built by algorithms alone; it is earned through transparent processes, rigorous validation, and the sustained exercise of human scientific judgment at every stage.