White Paper

NextStat is a high-performance statistical inference toolkit implemented in Rust with Python bindings and a CLI. It spans two major use-cases: binned likelihood inference via HistFactory workspaces (HEP and adjacent sciences) and general statistics workflows (regression, hierarchical models, time series, econometrics, pharmacometrics).

Scope

Binned Likelihood Inference

HistFactory-style binned likelihoods via pyhf JSON and HS3 v0.2 workspaces
Deterministic parity testing against pyhf (7-tier tolerance contract)
MLE fit, profile likelihood scan, hypothesis test, CLs upper limits
Systematic ranking via autodiff (reverse-mode AD)
Batch toy fitting: CPU (Rayon) and GPU (CUDA/Metal)

General Statistics

GLMs: linear, logistic, Poisson, negative binomial with robust SE and cross-validation
Multilevel / hierarchical models: random intercepts and slopes, partial pooling
Time series: Kalman filter, RTS smoother, EM estimation, forecasting
Survival: Cox PH, Weibull, exponential, log-logistic, Schoenfeld residuals
Econometrics: Panel FE, DiD/TWFE, IV/2SLS, doubly-robust AIPW
Pharmacometrics: 1-compartment oral PK, NLME with log-normal random effects

Bayesian Sampling

NUTS sampler with dual averaging (single-chain and multi-chain)
HMC diagnostics: R-hat, ESS, divergences
Generic LogDensityModel contract — any model that provides log_prob + gradient

Architecture

┌──────────────────────────────────────────────────────────┐
│  Python API (PyO3)        CLI (clap)        WASM        │
├──────────────────────────────────────────────────────────┤
│  ns-inference: MLE, NUTS, profile scan, CLs, ranking     │
├──────────────────────────────────────────────────────────┤
│  ns-compute: NLL, gradient, Hessian (SIMD / Accelerate)  │
│  ns-ad: reverse-mode automatic differentiation            │
├──────────────────────────────────────────────────────────┤
│  ns-translate: pyhf JSON, HS3 v0.2, TREx config, Arrow   │
│  ns-root: native ROOT file reader + zstd decompression    │
├──────────────────────────────────────────────────────────┤
│  ns-core: HistFactoryModel, parameters, interpolation     │
└──────────────────────────────────────────────────────────┘

Compatibility Target: pyhf JSON / HistFactory

The primary compatibility target is the pyhf JSON schema. NextStat aims to:

Parse pyhf JSON workspaces and build an internal model representation
Match pyhf's canonical objective (logpdf and twice_nll) in deterministic mode
Match fit results (best-fit, uncertainties, profile scans) within documented tolerances
Provide a fast production mode (SIMD, Accelerate, CUDA) without breaking the reference contract

Validation Methodology

NextStat uses a dual-mode validation strategy:

Parity mode: Kahan summation, single-thread, Accelerate disabled — deterministic CI reference
Fast mode: naive summation, Rayon parallelism, SIMD/GPU — production inference
7-tier tolerance hierarchy from per-bin expected data (1e-12) to toy ensemble statistics (0.05)
3-way cross-validation against pyhf + ROOT/RooFit on canonical HistFactory fixtures
Continuous integration: pyhf parity is CI-gated; ROOT comparison is informational

Performance Highlights

Operation	Speedup
Profile scan (31 points) vs ROOT	283×–880×
Profile scan vs pyhf	37×–75×
Batch toys (1000, CUDA) vs serial	~200×
TTree parse + histogram fill vs uproot	~8.5×
Ranking (16 NPs, AD) vs pyhf FD	~4×
Zstd decompression (ns-zstd fork)	~783 MiB/s (PGO)

Non-Goals (Near-Term)

Exact duplication of every pyhf backend and feature
Full numerical identity across all parallel backends and hardware (GPU/threads)
Replacing analysis frameworks or end-user modeling tools
Full drop-in replacement for the entire statsmodels surface area

License

AGPL-3.0-or-later with a dual commercial license option. See LICENSE and LICENSE-COMMERCIAL in the repository root.