Validation Data
Prospective benchmarks. Reproducible protocols. Open data.
Every performance metric on this page was computed on held-out prospective test sets — compounds not in the training data that were evaluated after model training was complete. Test set SMILES and evaluation code are published with each benchmark paper.
FEP Benchmark
Predicted vs. experimental ΔΔG across 14 targets.
The DrugSynq FEP benchmark uses publicly available congeneric series from ChEMBL and literature sources. Each target was tested prospectively — the model had no access to the test series during training or parameterization.
| Target | Class | n pairs | r² | RMSE | Source |
|---|---|---|---|---|---|
| CDK2 | Kinase | 28 | 0.87 | 0.89 | Schindler 2020 |
| p38α MAPK | Kinase | 25 | 0.85 | 0.94 | Wang 2015 |
| Thrombin | Protease | 22 | 0.84 | 1.02 | Wang 2015 |
| Tyk2 | Kinase | 16 | 0.91 | 0.78 | Patel 2025 |
| BACE | Aspartyl protease | 31 | 0.79 | 1.15 | Schindler 2020 |
| MCL1 | BCL-2 family | 24 | 0.76 | 1.28 | Patel 2025 |
Retrospective Benchmark
312 pairs. One scatter plot. No cherry-picking.
The scatter plot shows all 312 congeneric pairs across 14 targets plotted as predicted ΔΔG (x-axis) vs. experimental ΔΔG (y-axis). Teal dots within 1.5 kcal/mol of the diagonal are correctly ranked; amber dots are mispredicted outliers (13.1% of total).
Outliers cluster around scaffold hops and compounds with unusual binding modes — both expected failure cases for perturbation-based FEP. The protocol document (published) lists each outlier with structural rationale.
ADMET Benchmarks
Prospective ADMET model performance.
| Endpoint | AUROC | MCC | Sensitivity | Specificity |
|---|---|---|---|---|
| hERG Inhibition | 0.91 | 0.73 | 0.84 | 0.89 |
| CYP3A4 Inhibition | 0.88 | 0.68 | 0.81 | 0.87 |
| Metabolic Stability (HLM) | 0.85 | 0.61 | 0.77 | 0.84 |
| Aqueous Solubility | 0.87 | 0.66 | 0.80 | 0.86 |
| Caco-2 Permeability | 0.89 | 0.71 | 0.83 | 0.90 |
Dataset Details
Where the data comes from.
| Source | Data Type | Records |
|---|---|---|
| ChEMBL 33 | Binding affinity (Ki, Kd, IC50) | 1.2M |
| PubChem BioAssay | ADMET in vitro panels | 247K |
| Literature (curated) | FEP congeneric series | 312 pairs |
| QM reference calculations | ML potential correction terms | 18K conformers |
Reproducibility
All benchmarks are reproducible from published code.
Every benchmark on this page corresponds to a published paper with data splits, evaluation code, and test set SMILES. Links to code repositories and supporting information are provided in each publication.
Open Evaluation Code
Benchmark evaluation scripts are published to a public repository. Clone, run, verify. Any discrepancy between our reported numbers and your reproducibility run should be reported as an issue.
Prospective Not Retrospective
Test compounds were evaluated after models were fully trained. We explicitly publish the train/test split date boundary so you can verify no future compounds entered the training set.
Independent Review
Benchmark methodology was reviewed by independent computational chemists before publication. Reviewer comments and author responses are included in the published supporting information.
Numbers you can trust. Science you can verify.
Schedule a methodology review with our team to evaluate accuracy on your specific target class before committing to a subscription.