Validation Data

Prospective benchmarks. Reproducible protocols. Open data.

Every performance metric on this page was computed on held-out prospective test sets — compounds not in the training data that were evaluated after model training was complete. Test set SMILES and evaluation code are published with each benchmark paper.

r² 0.82FEP correlation vs. experiment

14Clinical targets benchmarked

312Congeneric pairs in FEP test set

Scatter plot showing predicted vs. experimental binding free energy values with r²=0.82 correlation line

FEP Benchmark

Predicted vs. experimental ΔΔG across 14 targets.

The DrugSynq FEP benchmark uses publicly available congeneric series from ChEMBL and literature sources. Each target was tested prospectively — the model had no access to the test series during training or parameterization.

Target	Class	n pairs	r²	RMSE	Source
CDK2	Kinase	28	0.87	0.89	Schindler 2020
p38α MAPK	Kinase	25	0.85	0.94	Wang 2015
Thrombin	Protease	22	0.84	1.02	Wang 2015
Tyk2	Kinase	16	0.91	0.78	Patel 2025
BACE	Aspartyl protease	31	0.79	1.15	Schindler 2020
MCL1	BCL-2 family	24	0.76	1.28	Patel 2025

Showing 6 of 14 benchmarked targets. Full table in published paper. RMSE in kcal/mol. r² = Pearson correlation squared on ΔΔG pairs.

Retrospective Benchmark

312 pairs. One scatter plot. No cherry-picking.

The scatter plot shows all 312 congeneric pairs across 14 targets plotted as predicted ΔΔG (x-axis) vs. experimental ΔΔG (y-axis). Teal dots within 1.5 kcal/mol of the diagonal are correctly ranked; amber dots are mispredicted outliers (13.1% of total).

Outliers cluster around scaffold hops and compounds with unusual binding modes — both expected failure cases for perturbation-based FEP. The protocol document (published) lists each outlier with structural rationale.

ADMET Benchmarks

Prospective ADMET model performance.

Endpoint	AUROC	MCC	Sensitivity	Specificity
hERG Inhibition	0.91	0.73	0.84	0.89
CYP3A4 Inhibition	0.88	0.68	0.81	0.87
Metabolic Stability (HLM)	0.85	0.61	0.77	0.84
Aqueous Solubility	0.87	0.66	0.80	0.86
Caco-2 Permeability	0.89	0.71	0.83	0.90

Prospective test sets. Full benchmark in published paper. See Publications.

Dataset Details

Where the data comes from.

Source	Data Type	Records
ChEMBL 33	Binding affinity (Ki, Kd, IC50)	1.2M
PubChem BioAssay	ADMET in vitro panels	247K
Literature (curated)	FEP congeneric series	312 pairs
QM reference calculations	ML potential correction terms	18K conformers

Reproducibility

All benchmarks are reproducible from published code.

Every benchmark on this page corresponds to a published paper with data splits, evaluation code, and test set SMILES. Links to code repositories and supporting information are provided in each publication.

Open Evaluation Code

Benchmark evaluation scripts are published to a public repository. Clone, run, verify. Any discrepancy between our reported numbers and your reproducibility run should be reported as an issue.

Prospective Not Retrospective

Test compounds were evaluated after models were fully trained. We explicitly publish the train/test split date boundary so you can verify no future compounds entered the training set.

Independent Review

Benchmark methodology was reviewed by independent computational chemists before publication. Reviewer comments and author responses are included in the published supporting information.

Numbers you can trust. Science you can verify.

Schedule a methodology review with our team to evaluate accuracy on your specific target class before committing to a subscription.

Schedule Review View Publications