Changelog

Format: Keep a Changelog · Semantic Versioning

[Unreleased]

Added

Unified Python API — merged model-type and device variants into single functions with runtime dispatch: ranking(), hypotest()/hypotest_toys(), profile_scan(), fit_toys(), upper_limit(). All accept device="cpu"|"cuda"|"metal" and dispatch on HistFactoryModel vs UnbinnedModel automatically. Old unbinned_*, *_gpu, *_batch_gpu variants removed.
TypedDict return types — ~25 structured TypedDict definitions (RankingEntry, ProfileScanResult, HypotestToysMetaResult, PanelFeResult, etc.) replacing opaque Dict[str, Any]. IDE autocomplete now works for all inference functions.
profile_scan(return_curve=True) — merges former profile_curve() into profile_scan() with plot-friendly arrays (mu_values, q_mu_values, twice_delta_nll).
upper_limit(method="root") — merges former upper_limits_root() into upper_limit() with method="bisect"|"root".
LAPS Metal backend — GPU-accelerated MAMS sampler on Apple Silicon (M1–M4) via Metal Shading Language. Built-in models (StdNormal, EightSchools, NealFunnel, GlmLogistic) with f32 compute, fused multi-step kernel, and SIMD-group cooperative kernel for data-heavy GLM. Automatic fallback: CUDA (f64) > Metal (f32) > error.
LAPS windowed mass adaptation — warmup Phase 2+3 now uses Stan-style doubling windows for inv_mass estimation. Each window resets Welford statistics and dual averaging, improving convergence on multi-scale models. Configurable via n_mass_windows.
LAPS Riemannian MAMS for Neal Funnel — new neal_funnel_riemannian model with position-dependent Fisher metric. Uses effective potential where log-determinant terms cancel, yielding scale-invariant natural gradients. Available on both Metal (f32) and CUDA (f64).
BCa confidence interval engine — reusable bootstrap CI utilities in ns-inference (percentile + bca) with diagnostics (z0, acceleration, adjusted alphas, sample counts).
HEP toy-summary CI controls — nextstat unbinned-fit-toys now supports opt-in summary CI computation: --summary-ci-method percentile|bca, --summary-ci-level, --summary-ci-bootstrap. Output includes summary.mean_ci with requested/effective method and fallback diagnostics.
Churn bootstrap CI method selection — nextstat churn bootstrap-hr now supports --ci-method percentile|bca (default percentile) and --n-jackknife for BCa acceleration estimation. Output includes method metadata and per-coefficient diagnostics with fallback reason.
Python churn parity for CI methods — nextstat.churn_bootstrap_hr() now accepts ci_method and n_jackknife and returns per-coefficient effective method/diagnostics.
Single-artifact visualization renderer (nextstat viz render) — direct rendering of JSON viz artifacts to image outputs (pulls, corr, ranking) without full report generation. Supports title/DPI options and correlation filters (--corr-include, --corr-exclude, --corr-top-n).
Rootless HEP dataset generator — bench_hep_dataset_bootstrap_ci.py now supports --root-writer auto|uproot|root-cli and defaults to rootless uproot when available.

Fixed

panel_fe() parameter order — changed from (entity_ids, x, y, p) to (y, x, entity_ids, p) to match econometrics module convention (y first).
HS3/pyhf CLI scope clarity — corrected command help/docs and added explicit fail-fast diagnostics for pyhf-only commands when HS3 input is provided.
Arrow/Parquet from_arrow large-offset compatibility — fixed ingestion for Utf8/LargeUtf8 and List<Float64>/LargeList<Float64> in core paths, so Arrow tables from Polars/DuckDB are accepted without pre-normalization.

[0.9.5] — 2026-02-15

Fixed

PyPI wheel coverage — pip install nextstat now works out of the box on all major platforms. Pre-built wheels for Linux x86_64/aarch64, macOS arm64/x86_64, and Windows x86_64 (Python 3.11–3.13).
Linux x86_64 glibc compatibility — wheels now target manylinux_2_17 (glibc 2.17+) instead of manylinux_2_35, fixing installation on CentOS 7/8, RHEL 7+, Ubuntu 18.04+.
macOS Intel support — added x86_64-apple-darwin target to the release matrix. Intel Mac users no longer fall back to source builds.
Multi-interpreter wheel builds — Linux wheels built inside manylinux Docker container; macOS/Windows use explicit setup-python with 3.11/3.12/3.13.

Added

HEP Full Workflow Tutorial — comprehensive 1200-line tutorial covering workspace construction, all modifier types, MLE fitting, CLs hypothesis testing, upper limits (Brazil band), NP ranking, pulls, correlation matrix, profile likelihood scans, workspace combination, mass scans, GPU acceleration, preprocessing, and automated reports.
Detailed Installation & Quickstart guides — rewritten with step-by-step instructions, expected outputs, troubleshooting sections, and GPU acceleration flags.

[0.9.4] — 2026-02-15

Native ROOT I/O

LRU basket cache — decompressed TTree basket payloads are cached per-RootFile with byte-bounded LRU eviction (default 256 MiB). Eliminates redundant decompression on repeated branch reads. RootFile::basket_cache() for stats, RootFile::set_cache_config() to tune.
Lazy branch reader — RootFile::lazy_branch_reader() decompresses only the baskets needed for the requested entries. read_f64_at(entry) touches one basket; read_f64_range(start, end) touches only overlapping ones.
ChainedSlice — zero-copy concatenation of multiple decompressed basket payloads via Arc sharing. O(log n) random access across non-contiguous segments.
ROOT leaflist parsing — ns-root now parses compound leaf-list branches (multiple scalars packed per entry).

ns-zstd Performance

Encoder hot-path optimizations — 20+ targeted improvements to the pure-Rust Zstd encoder: faster FSE state transitions, packed sequence bit writes, hash-chain collision reduction via head tags and u64 reject, common-prefix u128 comparison, fast-path depth-1 search, lazy/history check reduction, no-match skip heuristic.
zstd-shim — transparent backend selection crate: uses native libzstd on desktop for maximum throughput, falls back to pure-Rust ns-zstd on WASM and embedded targets.

GPU Acceleration

Multi-channel GPU batch toys (Metal + CUDA) — batch toy fitter now handles workspaces with multiple channels on both GPU backends.
Unbinned CUDA multi-GPU batch toys (--gpu-devices) — nextstat unbinned-fit-toys and nextstat unbinned-hypotest-toys can shard host-sampled toys across multiple CUDA devices.
Unbinned CUDA device-resident shard orchestration (--gpu-shards) — sharded --gpu-sample-toys execution with round-robin device mapping. Single-GPU emulation via --gpu cuda --gpu-sample-toys --gpu-shards N.
Unbinned CUDA host-toy shard orchestration — CUDA toy workflows support sharded host-toy execution (pipeline: cuda_host_sharded) with shard plan exposed in metrics.
Unbinned CUDA sharded toy-path metrics/tests — integration coverage for --gpu-sample-toys --gpu-shards with metrics contract checks for pipeline: cuda_device_sharded and device_shard_plan.
Metal --gpu-sample-toys — device-resident toy sampling on Apple Silicon (previously CUDA-only).
Parquet observed data for unbinned --gpu — unbinned GPU path can now ingest observed data directly from Parquet files.
TensorRT execution provider for neural PDFs — --features neural-tensorrt enables TensorRT EP with FP16 inference, engine caching (~/.cache/nextstat/tensorrt/), and dynamic batch-size optimization profiles. Automatic fallback chain: TensorRT → CUDA EP → CPU. FlowGpuConfig for custom TRT settings; FlowPdf::from_manifest_with_config() constructor; FlowPdf::gpu_ep_kind() for runtime introspection.
Analytical Jacobian gradients for flow PDFs — when the manifest includes a log_prob_grad ONNX model, FlowPdf::log_prob_grad_batch() computes exact ∂ log p / ∂ context in a single forward pass instead of 2 × n_context finite-difference evaluations. Fused CUDA kernel flow_nll_grad_reduce computes NLL + gradient intermediates in one launch; GpuFlowSession::nll_grad_analytical() assembles the full gradient on CPU.
Multi-GPU batch toy fitting for flow PDFs — fit_flow_toys_batch_cuda() runs lockstep L-BFGS-B across thousands of toys using the flow_batch_nll_reduce kernel (1 block = 1 toy). shard_flow_toys() partitions toys evenly across devices; fit_flow_toys_batch_multi_gpu() drives all GPUs and merges results.
CUDA EP f32 zero-copy NLL — GpuFlowSession::nll_device_ptr_f32() accepts a raw CUDA device pointer to float log-probs from ONNX Runtime CUDA EP, eliminating the host-to-device memcpy. Up to 57× faster than the f64 host-upload path at typical event counts (~1K). Python: GpuFlowSession.nll_device_ptr_f32(ptr, params).
PF3.1-OPT2: Parallel device-resident multi-GPU — cuda_device_sharded pipeline runs each shard on its own thread with its own CUDA context via std::thread::scope, enabling true multi-GPU concurrency. Validated on 4× A40: 2 GPU = 1.97× (near-linear), 3 GPU = 2.9×. Before fix: flat 1.00×.
PF3.1 2× H100 benchmark snapshot — full unbinned toy matrix (10k/50k/100k) for CPU, 1-GPU/2-GPU host-toy, and device-resident sharded paths.
PF3.1 single-GPU runtime snapshot (RTX 4000 Ada) — 10k/20k validation artifacts for worker-per-device orchestration.
CLI --gpu cuda for flow PDFs (G2-R1) — nextstat unbinned-fit-toys --gpu cuda supports flow/conditional_flow/dcr_surrogate PDFs. CPU sampling → CPU logp → CUDA NLL reduction via flow_batch_nll_reduce kernel → lockstep L-BFGS-B.
Normalization grid cache — QuadratureGrid::auto() selects Gauss-Legendre (1–3D), tensor product (4–5D), or Halton quasi-Monte Carlo (6D+). NormalizationCache avoids recomputation when parameters haven't changed.

Hybrid Likelihood (Phase 4)

HybridLikelihood<A, B> — generic combined likelihood that sums NLLs from two LogDensityModel implementations with shared parameters. Implements LogDensityModel, PoiModel, and FixedParamModel.
SharedParameterMap — merges parameter vectors from two models by name. Shared parameters get intersected bounds; model-specific parameters are appended.
CLI nextstat hybrid-fit — --binned (pyhf/HS3 JSON) + --unbinned (YAML/JSON spec). Prints summary of shared/total parameters, runs MLE via HybridLikelihood.
WeightSummary diagnostics — EventStore::weight_summary() returns ESS, sum/min/max/mean weights, n_zero.
HS3 unbinned extension — Hs3UnbinnedDist extends HS3 v0.2 schema with event-level channels. export_unbinned_hs3() and export_hybrid_hs3() for serialization.
Fused single-pass CPU NLL kernel — fused_gauss_exp_nll computes per-event log-likelihood inline for Gaussian+Exponential topology, eliminating intermediate allocations. Adaptive parallelism: sequential for N < 8k, rayon par_chunks(1024) for N ≥ 8k.
SIMD-vectorized fused kernel (wide::f64x4) — processes 4 events per iteration (AVX2 on x86, NEON on ARM). 4–12× speedup over generic path on x86; 2–5× on ARM. ~770 M events/s NLL throughput at 100k events on x86.
Unbinned benchmark suite — Criterion benchmarks: ~770 M events/s NLL throughput on x86 (fused+SIMD); full 5-param fit in 637 µs at 10k events.
Fused CrystalBall+Exponential kernel — fused_cb_exp_nll extends the fused single-pass path to CB+Exp topology. Scalar event loop with rayon adaptive parallelism; analytical gradients for all 5 shape parameters. 2.7–5.8× speedup over generic on M5.
Unbinned fused-vs-generic benchmark mode — UnbinnedModel::nll_generic() / grad_nll_generic() force the generic multi-pass path for apples-to-apples CPU comparison against the default fused path.

Cross-Vertical Statistical Features

Gamma GLM (GammaRegressionModel) — Gamma distribution with log link, shared shape parameter α. Analytical NLL and gradient. For insurance claim amounts, hospital costs, strictly positive continuous responses.
Tweedie GLM (TweedieRegressionModel) — compound Poisson-Gamma with power p ∈ (1, 2), log link. Saddle-point NLL approximation (Dunn & Smyth 2005). Handles exact zeros. For insurance aggregate claims, rainfall, zero-inflated positive data.
GEV distribution (GevModel) — Generalized Extreme Value for block maxima (Fréchet ξ>0, Gumbel ξ≈0, Weibull ξ<0). MLE via L-BFGS-B with analytical gradient. return_level(T) for T-block return levels. For reinsurance, hydrology, climate extremes.
GPD distribution (GpdModel) — Generalized Pareto for peaks-over-threshold. MLE with analytical gradient. quantile(p) for excess quantiles (VaR/ES). For tail risk in finance, reinsurance pricing.
Meta-analysis (meta_fixed, meta_random) — fixed-effects (inverse-variance) and random-effects (DerSimonian–Laird) pooling. Heterogeneity: Cochran's Q, I², H², τ². Forest plot data with per-study weights and CIs. For pharma, epidemiology, social science.

WASM Playground

Slim 454 KB binary — stripped Arrow/Parquet from WASM build, custom [profile.release-wasm] with opt-level = "z", lto = "fat", strip = true, plus wasm-opt -Oz. Down from 5.7 MB to 454 KB.
UI polish — standard site header with logo, compact single-screen layout, loading spinner on Run, ⌘+Enter shortcut, auto-load simple example on first visit, converged/failed badges in MLE Fit, fade-in animations.
Guided examples — 4 HistFactory + 3 GLM examples with contextual descriptions. Dropdown filtered by active operation tab — only compatible examples shown.
Auto-run on tab switch — switching between workspace operations auto-runs on the loaded workspace. GLM ↔ workspace transitions clear the editor to prevent format mismatches.
GLM Regression tab — run_glm() WASM endpoint: linear, logistic, Poisson regression via L-BFGS-B. Model/intercept selectors, parameter table in results.
Mass Scan (Type B Brazil Band) — ATLAS/CMS-style exclusion plot: 95% CL upper limit on μ vs signal peak position with ±1σ/±2σ expected bands.

Inference Server (ns-server)

API key authentication — --api-keys <file> or NS_API_KEYS env var. Bearer token validation on all endpoints except GET /v1/health.
Per-IP rate limiting — --rate-limit <N> (requests/second/IP). Token-bucket with lazy prune.
POST /v1/unbinned/fit — unbinned MLE fit endpoint.
POST /v1/nlme/fit — NLME / PK population fit endpoint. Supports pk_1cpt and nlme_1cpt. LLOQ policies: ignore, replace_half, censored.
Async job system — POST /v1/jobs/submit, GET /v1/jobs/{id}, DELETE /v1/jobs/{id}. In-memory store with TTL pruning, cancellation tokens.
GET /v1/openapi.json — OpenAPI 3.1 specification covering all 16 endpoints.

Survival Analysis (Non-parametric)

Kaplan-Meier estimator — nextstat.kaplan_meier(times, events, conf_level): non-parametric survival curve with Greenwood variance, log-log transformed CIs, median survival, number-at-risk table.
Log-rank test — nextstat.log_rank_test(times, events, groups): Mantel-Cox chi-squared test for 2+ groups.
CLI: nextstat survival km, nextstat survival log-rank-test.

Subscription / Churn Vertical

Synthetic SaaS churn dataset — nextstat.churn_generate_data(): deterministic, seeded cohort data with right-censored churn times, plan/region/usage covariates, and treatment assignment.
Cohort retention — nextstat.churn_retention(): stratified KM per group + log-rank comparison.
Churn risk model — nextstat.churn_risk_model(): Cox PH hazard ratios with CIs.
Causal uplift — nextstat.churn_uplift(): AIPW-based intervention impact with Rosenbaum sensitivity.
CLI: nextstat churn generate-data, nextstat churn retention, nextstat churn risk-model, nextstat churn uplift.
Diagnostics trust gate — nextstat.churn_diagnostics(): censoring rates per segment, covariate balance (SMD), propensity overlap, sample-size adequacy.
Cohort retention matrix — nextstat.churn_cohort_matrix(): life-table per cohort with period boundaries, cumulative retention.
Segment comparison — nextstat.churn_compare(): pairwise log-rank + HR proxy + Benjamini-Hochberg / Bonferroni MCP correction.
Survival-native uplift — nextstat.churn_uplift_survival(): RMST, IPW-weighted KM, ΔS(t) at eval horizons.
Bootstrap hazard ratios — nextstat.churn_bootstrap_hr(): parallel bootstrap Cox PH with percentile CIs.
Data ingestion — nextstat.churn_ingest(): validate & clean raw customer arrays, observation-end censoring cap.

CLI

nextstat mass-scan — batch asymptotic CLs upper limits across multiple workspaces (Type B Brazil Band).
nextstat significance — discovery significance (p₀ and Z-value).
nextstat goodness-of-fit — saturated-model goodness-of-fit test (χ²/ndof and p-value).
nextstat combine — merge multiple pyhf JSON workspaces into a single combined workspace with automatic systematic correlation.
nextstat fit --asimov — blind fit on Asimov (expected) data.
nextstat viz gammas — postfit γ parameter values with prefit/postfit uncertainties.
nextstat viz summary — multi-fit μ summary artifact from multiple fit result JSONs.
nextstat viz pie — sample composition pie chart per channel.
nextstat viz separation — signal vs background shape comparison with separation metric.
nextstat preprocess smooth — native Rust 353QH,twice smoothing for HistoSys templates.
nextstat preprocess prune — native Rust pruning of negligible systematics.

pyhf Feature Parity

Workspace::prune() — remove channels, samples, modifiers, or measurement POIs by name.
Workspace::rename() — rename channels, samples, modifiers, or measurement POIs.
Workspace::sorted() — return workspace with sorted channels, samples, and modifiers.
Workspace::digest() — SHA-256 content digest of canonicalised workspace JSON.
Workspace::combine() — merge two workspaces with configurable channel join semantics.
pyhf::simplemodels — uncorrelated_background() and correlated_background() quick workspace builders.
pyhf::xml_export — export pyhf workspace to HistFactory XML format.
HistoSys interpolation code2 — quadratic interpolation with linear extrapolation. Scalar, SIMD, and tape-based AD paths.
Test statistics t_μ and t̃_μ — Eq. 8 and Eq. 11 of arXiv:1007.1727.
OptimizerStrategy presets — Default, MinuitLike, HighPrecision.
docs/pyhf-parity.md — comprehensive feature matrix.

Econometrics & Causal Inference (Phase 12)

Panel fixed-effects regression — entity-demeaned OLS with Liang–Zeger cluster-robust SE. panel_fe_fit() in Rust, nextstat.panel_fe() in Python.
Difference-in-Differences (DiD) — canonical 2×2 estimator and multi-period event-study with leads/lags. Python: nextstat.did(), nextstat.event_study().
Instrumental Variables / 2SLS — two-stage least squares with first-stage F-statistic, partial R², Stock–Yogo test. Python: nextstat.iv_2sls().
AIPW (Doubly Robust) — Augmented Inverse Probability Weighting for ATE. Influence-function SE, propensity score trimming. Python: nextstat.aipw_ate().
Rosenbaum sensitivity analysis — Wilcoxon signed-rank bounds for matched-pair sensitivity to unobserved confounding. Python: nextstat.rosenbaum_bounds().
docs/references/econometrics.md — reference documentation with code examples, assumptions table, and limitations.

API Stabilization

ns-core re-exports — LogDensityModel, PoiModel, FixedParamModel, PreparedNll, PreparedModelRef now re-exported from crate root.
Deprecated Model trait — superseded by LogDensityModel. Will be removed in 1.0.
ns-inference re-exports — added scan, scan_metal, NegativeBinomialRegressionModel, QualityGates, compute_diagnostics, quality_summary to crate root.
nextstat.unbinned.UnbinnedAnalysis — high-level workflow wrapper over UnbinnedModel: .from_config(), .fit(), .scan(), .hypotest(), .hypotest_toys(), .ranking(), .summary().
Python __all__ completeness — added volatility, UnbinnedModel, HybridModel, and 9 more to nextstat.__all__.

Fixed

L-BFGS steepest-descent fallback — optimizer now correctly falls back to steepest descent when the L-BFGS update produces a non-descent direction.

[0.9.0] — 2026-02-09

Major release

Neural Density Estimation

Flow PDF — ONNX-backed normalizing flow as an unbinned PDF. Loads pre-trained flows from flow_manifest.json + ONNX models. Supports unconditional and conditional flows with nuisance parameters as context. Spec YAML: type: flow / type: conditional_flow. Feature-gated: --features neural.
DCR Surrogate — neural Direct Classifier Ratio surrogate replacing binned template morphing. Drop-in replacement for morphing histograms — smooth, continuous, bin-free systematic morphing trained via FAIR-HUC protocol. Spec YAML: type: dcr_surrogate.
Unbinned spec YAML supports flow, conditional_flow, and dcr_surrogate PDF types with automatic feature gating.
Normalization verification — Gauss-Legendre quadrature (orders 32–128) for normalization verification and correction of neural PDFs.
Training helpers — Python scripts for flow training (zuko NSF + ONNX export), DCR distillation from HistFactory templates, and validation (normalization, PIT/KS, closure checks).
Python bindings for FlowPdf / DcrSurrogate — nextstat.FlowPdf and nextstat.DcrSurrogate classes in ns-py behind --features neural. Standalone ONNX flow evaluation from Python: from_manifest(), log_prob_batch(), update_normalization(), validate_nominal_normalization().

Documentation

Unbinned spec reference — docs/references/unbinned-spec.md: dedicated human-readable reference for nextstat_unbinned_spec_v0 covering all 13 PDF types, yield expressions, rate modifiers (NormSys + WeightSys), per-event systematics, neural PDFs, and GPU acceleration constraints.

GPU Acceleration

CUDA (NVIDIA, f64) — fused NLL+gradient kernel covering all 7 HistFactory modifier types in a single launch. Lockstep batch optimizer fits thousands of toys in parallel. Dynamic loading via cudarc — binary works without CUDA installed.
Metal (Apple Silicon, f32) — same fused kernel in MSL. Zero-copy unified memory. NLL parity vs CPU f64: 1.27e-6 relative diff.
Apple Accelerate — vDSP/vForce vectorized NLL on macOS. <5% overhead vs naive summation.
GPU-resident toy pipeline (CUDA) — --gpu-sample-toys now keeps sampled events on the GPU device, eliminating the D2H+H2D round-trip of the large obs_flat buffer between sampler and batch fitter.
Unbinned GPU WeightSys — weightsys rate modifier now lowered to CUDA/Metal kernels (code0/code4p interpolation). Spec YAML: type: weightsys, param, lo, hi, optional interp_code.
CPU batch toys — Rayon-parallel fitting with per-thread tape reuse, seed-based reproducibility.
Reverse-mode tape — faster gradient computation with reduced memory allocation.
CLI: --gpu cuda, --gpu metal · Python: device="cuda", device="metal"

Differentiable Analysis (PyTorch)

Zero-copy CUDA kernel reads signal histogram from a PyTorch tensor and writes dNLL/dsignal directly into the grad buffer — no device-host roundtrip.
DifferentiableSession: NLL + signal gradient at fixed nuisance parameters.
ProfiledDifferentiableSession: profiled test statistics with envelope-theorem gradients — enables NN → signal histogram → profiled CLs → loss.
nextstat.torch module: NextStatNLLFunction, NextStatProfiledQ0Function (autograd), NextStatLayer(nn.Module).
profiled_zmu_loss() — Zμ loss wrapper (sqrt(qμ) with numerical stability) for signal-strength optimization.
SignificanceLoss(model) — ML-friendly class wrapping profiled −Z₀. Init once, call per-batch: loss_fn(signal_hist).backward().
SoftHistogram — differentiable binning (Gaussian KDE / sigmoid): NN classifier scores → soft histogram → SignificanceLoss.
batch_profiled_q0_loss() — profiled q₀ for a batch of signal histograms (ensemble training).
signal_jacobian(), signal_jacobian_numpy() — direct ∂q₀/∂signal without autograd for SciPy bridge and fast pruning.
as_tensor() — DLPack array-API bridge: JAX, CuPy, Arrow, NumPy → torch.Tensor.
nextstat.mlops — fit metrics extraction for W&B / MLflow / Neptune: metrics_dict(result), significance_metrics(z0), StepTimer.
nextstat.interpret — systematic-impact ranking as Feature Importance: rank_impact(model), rank_impact_df(), plot_rank_impact().
nextstat.tools — LLM tool definitions (OpenAI function calling, LangChain, MCP) for 7 operations: fit, hypotest, upper_limit, ranking, significance, scan, workspace_audit.
nextstat.distill — surrogate training dataset generator. generate_dataset(model, n_samples=100k, method="sobol") produces (params, NLL, gradient) tuples. Export to PyTorch TensorDataset, .npz, or Parquet. Built-in train_mlp_surrogate() with Sobolev loss.
Fit convergence check: returns error if GPU profile fit fails to converge.

Gymnasium RL Environment

nextstat.gym — optional Gymnasium/Gym wrapper treating a HistFactory workspace as an RL/DOE environment.
Propose updates to a sample's nominal yields, receive a NextStat metric as reward (NLL, q₀, Z₀, qμ, Zμ).
make_histfactory_env() factory with configurable reward_metric, action_mode, action_scale.
Compatible with gymnasium (preferred) and legacy gym.

Deterministic Validation

EvalMode — process-wide flag: Parity (Kahan summation, single-threaded, bit-exact) vs Fast (default, SIMD/GPU, multi-threaded).
CLI: --parity · Python: nextstat.set_eval_mode("parity")
7-tier tolerance contract vs pyhf (per-bin ~1e-14 worst case).

Native ROOT I/O

TTree reader — mmap file access, native binary deserialization, basket decompression (zlib/LZ4/ZSTD) with rayon-parallel extraction. 9 leaf types + jagged branches.
Expression engine — bytecode-compiled, vectorized. Full grammar: arithmetic, comparisons, boolean logic, ternary, builtins. Dynamic jagged indexing (jet_pt[idx]) follows ROOT/TTreeFormula convention. Python wrapper: nextstat.analysis.expr_eval.
Histogram filler — single-pass with selection cuts, weights, variable binning.
Unsplit vector branch decoding — best-effort decoding for std::vector<T> branches without offset tables.
~8.5× faster than uproot+numpy on the full pipeline.

Ntuple-to-Workspace Pipeline

NtupleWorkspaceBuilder: ROOT ntuples → HistFactory Workspace via fluent Rust API.
Per-sample modifiers: NormFactor, NormSys, WeightSys, TreeSys, HistoSys, StatError.
Produces the same Workspace struct as the pyhf JSON path — no ROOT C++ dependency.

TRExFitter Interop

nextstat import trex-config — import TRExFitter .config into pyhf JSON workspace.
nextstat build-hists — run NTUP pipeline, write workspace.json.
HIST mode — read pre-built ROOT histograms (ReadFrom: HIST) alongside NTUP.
Analysis Spec v0 (YAML + JSON Schema) — nextstat run <spec.yaml> orchestrates import/fit/scan/report.
Jagged column support and TRExFitter-style expression compatibility.

Systematics Preprocessing

Smoothing: 353QH,twice algorithm (ROOT TH1::Smooth equivalent) + Gaussian kernel.
Pruning: shape, norm, and overall pruning with audit trail.
nextstat preprocess CLI with declarative YAML config and content-hash caching.

HistFactory Enhancements

HS3 v0.2 ingestion — load HS3 JSON workspaces (ROOT 6.37+) natively. Auto-detects format (pyhf vs HS3) at load time.
HS3 roundtrip export — export HistFactoryModel back to HS3 JSON with bestfit parameter points.
Python: HistFactoryModel.from_workspace() (auto-detect), HistFactoryModel.from_hs3(json_str). CLI: auto-detection in nextstat fit, nextstat scan.
HS3 inputs use ROOT HistFactory defaults (NormSys Code1, HistoSys Code0). For pyhf JSON, NextStat defaults to smooth interpolation (NormSys Code4, HistoSys Code4p); use --interp-defaults pyhf (CLI) or from_workspace_with_settings(Code1, Code0) (Rust) for strict pyhf defaults.
HEPData patchset support: nextstat import patchset, Python nextstat.apply_patchset().
Arrow / Polars ingestion — nextstat.from_arrow(table) creates a HistFactoryModel from PyArrow Table, RecordBatch, or any Arrow-compatible source (Polars, DuckDB, Spark). nextstat.from_parquet(path) reads Parquet directly.
Arrow export — nextstat.to_arrow(model, what="yields"|"params") exports expected yields or parameter metadata as a PyArrow Table.
ConstraintTerm semantics — LogNormal alpha-transform, Gamma constraint for ShapeSys, Uniform and NoConstraint handling. Parsed from <ConstraintTerm> metadata in HistFactory XML.

Unbinned Likelihood

Product PDF — joint likelihood over independent observables: log p(x,y) = log p₁(x) + log p₂(y). Enables multi-observable unbinned fits without manual factorization.
Spline PDF — monotonic cubic Hermite (Fritsch–Carlson) interpolation from user-specified knot positions and density values. Analytically normalized, inverse-CDF sampling for toys.
Multi-dimensional KDE — 2-D/3-D Gaussian kernel density estimator with Silverman bandwidth, truncated on bounded observable support.
ARGUS PDF — ARGUS background shape for B-meson spectroscopy. Gauss-Legendre normalization on bounded support.
Voigtian PDF — pseudo-Voigt (Thompson–Cox–Hastings) resonance line shape. Gaussian ⊗ Breit-Wigner convolution for resonance + detector resolution modeling.
Normalization integrals are cached across optimizer iterations, avoiding redundant quadrature when parameters haven't changed.
GPU flow NLL reduction — CUDA kernel for extended unbinned likelihood from externally-computed log-prob values (flow PDFs). Supports multi-process logsumexp reduction, Gaussian constraints, and both host-upload and device-resident (ONNX CUDA EP) input paths.
GPU flow session — orchestrates flow PDF evaluation (CPU or CUDA EP) with GPU NLL reduction. Central finite-difference gradient, yield computation from parameter vector, and Gaussian constraint handling.

Report System

nextstat report — generates distributions, pulls, correlations, yields (.json/.csv/.tex), and uncertainty ranking.
Python rendering: multi-page PDF + per-plot SVGs via matplotlib.
--blind flag masks observed data for unblinded regions.
--deterministic for stable JSON key ordering.
nextstat validation-report — unified validation artifact combining Apex2 results with workspace fingerprints. Outputs validation_report.json (schema validation_report_v1) with dataset SHA-256, model spec, environment, regulated-review notes, and per-suite pass/fail summary. Optional --pdf renders a 7-page audit-ready PDF via matplotlib.
--json-only flag for validation-pack: generate JSON artifacts without PDF rendering (no matplotlib dependency).
Validation pack manifest — validation_pack_manifest.json with SHA-256 hashes and sizes for all pack files. Convenient index for replication and signing.
Optional signing — --sign-openssl-key / --sign-openssl-pub flags produce Ed25519 signatures over the manifest digest.

Survival Analysis

Parametric models: Exponential, Weibull, LogNormal AFT (with right-censoring).
Cox Proportional Hazards: Efron/Breslow ties, robust sandwich SE, Schoenfeld residuals.
Python: nextstat.survival.{exponential,weibull,lognormal_aft,cox_ph}.fit(...)
CLI: nextstat survival fit, nextstat survival predict

Linear Mixed Models

Analytic marginal likelihood (random intercept, random intercept + slope).
Laplace approximation for approximate posteriors.

Ordinal Models

Ordered logit/probit with stable cutpoint parameterization.

Econometrics & Causal Inference

Panel FE with 1-way cluster SE.
DiD TWFE + event-study helpers.
IV / 2SLS with weak-IV diagnostics (first-stage F, partial R²).
AIPW for ATE/ATT + E-value helper. Propensity scores, IPW weights, overlap diagnostics. Python: nextstat.causal.aipw(), nextstat.causal.propensity_scores().
GARCH / Stochastic Volatility — GARCH(1,1) and stochastic volatility models for financial time series. CLI: nextstat volatility fit, nextstat volatility forecast. Python: nextstat.volatility.garch(), nextstat.volatility.sv().

Pharmacometrics

RK4 integrator for linear ODE systems.
One-compartment oral PK model with LLOQ censoring.
NLME extension with per-subject random effects.
Error model enum — ErrorModel::Additive, Proportional, Combined(σ_add, σ_prop) with variance, NLL, and gradient helpers.
2-compartment PK models — TwoCompartmentIvPkModel (IV bolus, 4 params) and TwoCompartmentOralPkModel (oral, 5 params). Analytical bi/tri-exponential solutions.
Dosing regimen — DosingRegimen supporting IV bolus, oral, and IV infusion. Single-dose, repeated-dose, and mixed-route schedules. Closed-form infusion solutions.
NONMEM dataset reader — NonmemDataset::from_csv() parses standard NONMEM-format CSV. Auto-converts dosing records to DosingRegimen per subject.
FOCE/FOCEI estimation — FoceEstimator: per-subject ETA optimization (damped Newton-Raphson) + population parameter updates. Laplace approximation with ridge-regularized Hessian.
Correlated random effects — OmegaMatrix stores full Ω via Cholesky factor (Ω = L·Lᵀ). FoceEstimator::fit_1cpt_oral_correlated() fits with full Ω; FoceResult includes omega_matrix and correlation fields.
Stepwise Covariate Modeling (SCM) — ScmEstimator: forward selection + backward elimination using ΔOFV (χ²(1) LRT). Power, proportional, and exponential relationships on CL/V/Ka. Full audit trace.
VPC and GOF diagnostics — vpc_1cpt_oral(): Visual Predictive Checks with simulated quantile prediction intervals. gof_1cpt_oral(): PRED, IPRED, IWRES, CWRES.
Pharma benchmark suite — Warfarin (32 subjects), Theophylline (12), Phenobarbital (40 neonates). Parameter recovery, GOF, VPC. Includes correlated-Ω variant. cargo test --test pharma_benchmark.
NLME artifact schema (v2.0.0) — NlmeArtifact wraps all estimation results into a single JSON-serializable structure. CSV exports for fixed effects, random effects, GOF, VPC, SCM trace.
Run bundle (provenance) — RunBundle captures NextStat version, git revision, Rust toolchain, OS/CPU, random seeds, dataset provenance.
SAEM algorithm — SaemEstimator: Stochastic Approximation EM for NLME (Monolix-class). Metropolis-Hastings E-step with adaptive proposal variance, closed-form M-step. Returns SaemDiagnostics (acceptance rates, OFV trace). Supports diagonal and correlated Ω.
PD models — EmaxModel, SigmoidEmaxModel (Hill equation), IndirectResponseModel (Types I–IV). ODE-based IDR via adaptive RK45. PkPdLink for PK→PD concentration interpolation.
Adaptive ODE solvers — rk45() (Dormand-Prince 4(5) with PI step-size control) for non-stiff systems and esdirk4() (L-stable SDIRK2) for stiff systems. Generic OdeSystem trait.

Applied Statistics API

Formula parsing + deterministic design matrices (nextstat.formula).
from_formula wrappers for all GLM and hierarchical builders.
Wald summaries + robust covariance (HC0–HC3, 1-way cluster).
scikit-learn adapters: NextStatLinearRegression, NextStatLogisticRegression, NextStatPoissonRegressor.
Missing-data policies: drop_rows, impute_mean.

WASM Playground

Browser-based inference via wasm-bindgen: fit_json(), hypotest_json(), upper_limit_json().
Drag-and-drop workspace.json → asymptotic CLs Brazil bands. No Python, no server.

Visualization

plot_cls_curve(), plot_brazil_limits(), plot_profile_curve().
nextstat viz distributions, viz pulls, viz corr, viz ranking subcommands.
Kalman: plot_kalman_states(), plot_forecast_bands().

Pure-Rust Zstd Codec (ns-zstd)

ns-zstd crate — pure-Rust Zstd decompressor and compressor for ROOT file I/O. Zero C dependency — enables WASM and embedded targets. Supports compression levels 1–19 with FSE (Finite State Entropy) and Huffman entropy coding. Decode output matches libzstd byte-for-byte (verified via fixture tests). Hash-chain match finder with configurable search depth.

R Bindings

nextstat R package — native R interface via extendr (bindings/ns-r/). Provides nextstat_fit(), nextstat_hypotest(), nextstat_upper_limit(), nextstat_scan(), nextstat_ranking() for HistFactory workspaces. Unbinned event Parquet I/O and neural PDF bindings (FlowPdf, DcrSurrogate) also exposed. Install: R CMD INSTALL bindings/ns-r.

CLI & Infrastructure

Structured logging (--log-level), reproducible run bundles (--bundle).
fit() supports init_pars= for warm-start MLE.
CI: pyhf parity gate on push/PR, TREx baseline refresh (nightly), HEPData workspace tests.
Apex2 validation: NLL parity, bias/pulls regression, SBC calibration, NUTS quality gates.
nextstat-server — self-hosted REST API for shared GPU inference. POST /v1/fit, POST /v1/ranking, GET /v1/health. Flags: --gpu cuda|metal, --port, --host, --threads.
nextstat.remote — pure-Python thin client (httpx). client = nextstat.remote.connect("http://gpu-server:3742"), then client.fit(workspace), client.ranking(workspace), client.health().
Batch API — POST /v1/batch/fit fits up to 100 workspaces in one request; POST /v1/batch/toys runs GPU-accelerated toy fitting. client.batch_fit(workspaces), client.batch_toys(workspace, n_toys=1000).
Model cache — POST /v1/models uploads a workspace and returns a model_id (SHA-256); subsequent /v1/fit and /v1/ranking calls accept model_id= to skip re-parsing. LRU eviction (64 models).
Docker & Helm — multi-stage Dockerfile for CPU and CUDA builds, Helm chart with health probes, GPU resource requests, configurable replicas.

Fixed

End-to-end discovery script (e2e_discovery.py): fixed --no-deterministic flag handling. Script now correctly writes summary.json and summary.md.
CUDA batch toys (--gpu cuda) crash when some toys converge before others.
GPU profiled session (ProfiledDifferentiableSession) convergence failure near parameter bounds.
Optimizer early-stop with negative NLL (target_cost(0.0) removed).
kalman_simulate(): init="sample|mean" and x0=... support.
StatError: incorrect sqrt(sumw2) propagation with zero nominal counts.
Metal GPU: scratch buffer reuse (~40% less allocation overhead).
HistFactory XML: strip <!DOCTYPE> declarations before parsing.
CUDA/Metal signal gradient race condition: incorrect accumulation when multiple samples contribute to the same bin.
10 missing Python re-exports in __init__.py: has_metal, read_root_histogram, workspace_audit, cls_curve, profile_curve, kalman_filter/smooth/em/forecast/simulate.
ROOT TTree decompression: cap output buffer to prevent OOM on corrupted/oversized baskets.
HistFactory XML: absolute InputFile path fallback and ROOT C macro special-character escaping.
Metal GPU: ranking now explicitly rejected with a clear error (Metal does not yet support ranking).
StatError: HistoName uncertainties are now treated as relative and converted to absolute (sigma_abs = rel * nominal), matching ROOT/HistFactory semantics.

[0.1.0] — 2026-02-05

Initial public release

Core Engine

HistFactory workspace data model with full pyhf JSON compatibility.
Poisson NLL with all modifier types + Barlow-Beeston.
SIMD-accelerated NLL via wide::f64x4.
Automatic differentiation: forward-mode (dual numbers) and reverse-mode (tape AD).

Frequentist Inference

MLE via L-BFGS-B with Hessian-based uncertainties.
Asymptotic CLs hypothesis testing (q-tilde test statistic).
Profile likelihood scans, CLs upper limits (bisection + linear scan), Brazil bands.
Batch MLE, toy studies, nuisance parameter ranking.

Bayesian Sampling

No-U-Turn Sampler (NUTS) with dual averaging.
HMC diagnostics: divergences, tree depth, step size, E-BFMI.
Rank-normalized folded R-hat + improved ESS.
Python: sample() returning ArviZ-compatible dict.

Regression & GLM

Linear, logistic, Poisson, negative binomial regression.
Ridge regression (MAP/L2), separation detection, exposure/offset support.
Cross-validation and metrics (RMSE, log-loss, Poisson deviance).

Hierarchical Models

Random intercepts/slopes, correlated effects (LKJ + Cholesky), non-centered parameterization.
Posterior Predictive Checks.

Time Series

Linear-Gaussian Kalman filter + RTS smoother.
EM parameter estimation, multi-step-ahead forecasting with prediction intervals.
Local-level, local-trend, AR(1) builders. Missing observation handling.

Probability Distributions

Normal, StudentT, Bernoulli, Binomial, Poisson, NegativeBinomial, Gamma, Exponential, Weibull, Beta.
Bijector/transform layer: Identity, Exp, Softplus, Sigmoid, Affine.

Visualization

Profile likelihood curves and CLs Brazil band plots.
CLI: viz profile, viz cls. Python: viz_profile_curve(), viz_cls_curve().

Python Bindings & CLI

nextstat Python package (PyO3/maturin) with Model, FitResult classes.
nextstat CLI: fit, hypotest, upper-limit, scan, version.
CI workflows + GitHub release pipeline (multi-arch wheels + CLI binary).

Validation (Apex2)

Master report aggregator with NLL parity, GLM benchmarks, bias/pulls regression, SBC calibration, NUTS quality gates.
Nightly slow CI workflow.