White Paper
NextStat is a high-performance statistical inference toolkit implemented in Rust with Python bindings and a CLI. It spans two major use-cases: binned likelihood inference via HistFactory workspaces (HEP and adjacent sciences) and general statistics workflows (regression, hierarchical models, time series, econometrics, pharmacometrics).
Scope
Binned Likelihood Inference
- HistFactory-style binned likelihoods via pyhf JSON and HS3 v0.2 workspaces
- Deterministic parity testing against pyhf (7-tier tolerance contract)
- MLE fit, profile likelihood scan, hypothesis test, CLs upper limits
- Systematic ranking via autodiff (reverse-mode AD)
- Batch toy fitting: CPU (Rayon) and GPU (CUDA/Metal)
General Statistics
- GLMs: linear, logistic, Poisson, negative binomial with robust SE and cross-validation
- Multilevel / hierarchical models: random intercepts and slopes, partial pooling
- Time series: Kalman filter, RTS smoother, EM estimation, forecasting
- Survival: Cox PH, Weibull, exponential, log-logistic, Schoenfeld residuals
- Econometrics: Panel FE, DiD/TWFE, IV/2SLS, doubly-robust AIPW
- Pharmacometrics: 1-compartment oral PK, NLME with log-normal random effects
Bayesian Sampling
- NUTS sampler with dual averaging (single-chain and multi-chain)
- HMC diagnostics: R-hat, ESS, divergences
- Generic LogDensityModel contract — any model that provides log_prob + gradient
Architecture
┌──────────────────────────────────────────────────────────┐
│ Python API (PyO3) CLI (clap) WASM │
├──────────────────────────────────────────────────────────┤
│ ns-inference: MLE, NUTS, profile scan, CLs, ranking │
├──────────────────────────────────────────────────────────┤
│ ns-compute: NLL, gradient, Hessian (SIMD / Accelerate) │
│ ns-ad: reverse-mode automatic differentiation │
├──────────────────────────────────────────────────────────┤
│ ns-translate: pyhf JSON, HS3 v0.2, TREx config, Arrow │
│ ns-root: native ROOT file reader + zstd decompression │
├──────────────────────────────────────────────────────────┤
│ ns-core: HistFactoryModel, parameters, interpolation │
└──────────────────────────────────────────────────────────┘Compatibility Target: pyhf JSON / HistFactory
The primary compatibility target is the pyhf JSON schema. NextStat aims to:
- Parse pyhf JSON workspaces and build an internal model representation
- Match pyhf's canonical objective (logpdf and twice_nll) in deterministic mode
- Match fit results (best-fit, uncertainties, profile scans) within documented tolerances
- Provide a fast production mode (SIMD, Accelerate, CUDA) without breaking the reference contract
Validation Methodology
NextStat uses a dual-mode validation strategy:
- Parity mode: Kahan summation, single-thread, Accelerate disabled — deterministic CI reference
- Fast mode: naive summation, Rayon parallelism, SIMD/GPU — production inference
- 7-tier tolerance hierarchy from per-bin expected data (1e-12) to toy ensemble statistics (0.05)
- 3-way cross-validation against pyhf + ROOT/RooFit on canonical HistFactory fixtures
- Continuous integration: pyhf parity is CI-gated; ROOT comparison is informational
Performance Highlights
| Operation | Speedup |
|---|---|
| Profile scan (31 points) vs ROOT | 283×–880× |
| Profile scan vs pyhf | 37×–75× |
| Batch toys (1000, CUDA) vs serial | ~200× |
| TTree parse + histogram fill vs uproot | ~8.5× |
| Ranking (16 NPs, AD) vs pyhf FD | ~4× |
| Zstd decompression (ns-zstd fork) | ~783 MiB/s (PGO) |
Non-Goals (Near-Term)
- Exact duplication of every pyhf backend and feature
- Full numerical identity across all parallel backends and hardware (GPU/threads)
- Replacing analysis frameworks or end-user modeling tools
- Full drop-in replacement for the entire statsmodels surface area
License
AGPL-3.0-or-later with a dual commercial license option. See LICENSE and LICENSE-COMMERCIAL in the repository root.
