MAPLE: Model-Aware Parameterization from Literature Evidence
Summary
QSP models have many biological parameters, and most can’t be measured directly in the clinical context being modeled. The relevant data is usually scattered across papers, often from different species or indications. MAPLE provides a structured pipeline for turning those measurements into informative priors that account for the gap between the experimental and model contexts.
Extraction
MAPLE provides two structured YAML schemas for extracting calibration data from papers.
Both schemas are filled out interactively using an MCP server with Claude Code. The server exposes the extraction prompt, valid enum values, a multi-step workflow guide, and hard rules that LLMs commonly violate during extraction (e.g., inventing uncertainties, using wrong input types).
A validate_target tool runs three levels of checks:
- Schema validation — Pydantic model with 30+ validators
- Prior derivation — bootstrap + forward model inversion + distribution fitting + translation sigma
- Snippet verification — checks that every extracted value appears in the source paper text (Europe PMC full text or source PDFs), catching hallucinated numbers before they enter the pipeline
Inference
All SubmodelTargets are combined into a joint NumPyro model for MCMC inference. A source relevance rubric scores each target across eight axes — species, indication, TME compatibility, measurement directness, and others — and maps these to a translation sigma that widens the likelihood for that target. Mouse in vitro data naturally contributes less than human clinical data constraining the same parameter.
The joint posterior is parameterized as marginal distributions + a Gaussian copula that preserves posterior correlations.
Two-Stage Calibration
SubmodelTargets
In vitro / preclinical data
Joint MCMC (NumPyro/NUTS)
Output: marginals + Gaussian copula
CalibrationTargets
Clinical data + full QSP simulator
Copula prior from Stage 1 + SBI (SNPE-C)
Output: final posterior