NextStatNextStat

White Paper

NextStat is a high-performance statistical inference toolkit implemented in Rust with Python bindings and a CLI. It spans two major use-cases: binned likelihood inference via HistFactory workspaces (HEP and adjacent sciences) and general statistics workflows (regression, hierarchical models, time series, econometrics, pharmacometrics).

Scope

Binned Likelihood Inference

  • HistFactory-style binned likelihoods via pyhf JSON and HS3 v0.2 workspaces
  • Deterministic parity testing against pyhf (7-tier tolerance contract)
  • MLE fit, profile likelihood scan, hypothesis test, CLs upper limits
  • Systematic ranking via autodiff (reverse-mode AD)
  • Batch toy fitting: CPU (Rayon) and GPU (CUDA/Metal)

General Statistics

  • GLMs: linear, logistic, Poisson, negative binomial with robust SE and cross-validation
  • Multilevel / hierarchical models: random intercepts and slopes, partial pooling
  • Time series: Kalman filter, RTS smoother, EM estimation, forecasting
  • Survival: Cox PH, Weibull, exponential, log-logistic, Schoenfeld residuals
  • Econometrics: Panel FE, DiD/TWFE, IV/2SLS, doubly-robust AIPW
  • Pharmacometrics: 1-compartment oral PK, NLME with log-normal random effects

Bayesian Sampling

  • NUTS sampler with dual averaging (single-chain and multi-chain)
  • HMC diagnostics: R-hat, ESS, divergences
  • Generic LogDensityModel contract — any model that provides log_prob + gradient

Architecture

┌──────────────────────────────────────────────────────────┐
│  Python API (PyO3)        CLI (clap)        WASM        │
├──────────────────────────────────────────────────────────┤
│  ns-inference: MLE, NUTS, profile scan, CLs, ranking     │
├──────────────────────────────────────────────────────────┤
│  ns-compute: NLL, gradient, Hessian (SIMD / Accelerate)  │
│  ns-ad: reverse-mode automatic differentiation            │
├──────────────────────────────────────────────────────────┤
│  ns-translate: pyhf JSON, HS3 v0.2, TREx config, Arrow   │
│  ns-root: native ROOT file reader + zstd decompression    │
├──────────────────────────────────────────────────────────┤
│  ns-core: HistFactoryModel, parameters, interpolation     │
└──────────────────────────────────────────────────────────┘

Compatibility Target: pyhf JSON / HistFactory

The primary compatibility target is the pyhf JSON schema. NextStat aims to:

  • Parse pyhf JSON workspaces and build an internal model representation
  • Match pyhf's canonical objective (logpdf and twice_nll) in deterministic mode
  • Match fit results (best-fit, uncertainties, profile scans) within documented tolerances
  • Provide a fast production mode (SIMD, Accelerate, CUDA) without breaking the reference contract

Validation Methodology

NextStat uses a dual-mode validation strategy:

  • Parity mode: Kahan summation, single-thread, Accelerate disabled — deterministic CI reference
  • Fast mode: naive summation, Rayon parallelism, SIMD/GPU — production inference
  • 7-tier tolerance hierarchy from per-bin expected data (1e-12) to toy ensemble statistics (0.05)
  • 3-way cross-validation against pyhf + ROOT/RooFit on canonical HistFactory fixtures
  • Continuous integration: pyhf parity is CI-gated; ROOT comparison is informational

Performance Highlights

OperationSpeedup
Profile scan (31 points) vs ROOT283×–880×
Profile scan vs pyhf37×–75×
Batch toys (1000, CUDA) vs serial~200×
TTree parse + histogram fill vs uproot~8.5×
Ranking (16 NPs, AD) vs pyhf FD~4×
Zstd decompression (ns-zstd fork)~783 MiB/s (PGO)

Non-Goals (Near-Term)

  • Exact duplication of every pyhf backend and feature
  • Full numerical identity across all parallel backends and hardware (GPU/threads)
  • Replacing analysis frameworks or end-user modeling tools
  • Full drop-in replacement for the entire statsmodels surface area

License

AGPL-3.0-or-later with a dual commercial license option. See LICENSE and LICENSE-COMMERCIAL in the repository root.