Changelog
Format: Keep a Changelog · Semantic Versioning
[Unreleased]
Added
- Unified Python API — merged model-type and device variants into single functions with runtime dispatch:
ranking(),hypotest()/hypotest_toys(),profile_scan(),fit_toys(),upper_limit(). All acceptdevice="cpu"|"cuda"|"metal"and dispatch onHistFactoryModelvsUnbinnedModelautomatically. Oldunbinned_*,*_gpu,*_batch_gpuvariants removed. - TypedDict return types — ~25 structured
TypedDictdefinitions (RankingEntry,ProfileScanResult,HypotestToysMetaResult,PanelFeResult, etc.) replacing opaqueDict[str, Any]. IDE autocomplete now works for all inference functions. profile_scan(return_curve=True)— merges formerprofile_curve()intoprofile_scan()with plot-friendly arrays (mu_values,q_mu_values,twice_delta_nll).upper_limit(method="root")— merges formerupper_limits_root()intoupper_limit()withmethod="bisect"|"root".- LAPS Metal backend — GPU-accelerated MAMS sampler on Apple Silicon (M1–M4) via Metal Shading Language. Built-in models (
StdNormal,EightSchools,NealFunnel,GlmLogistic) with f32 compute, fused multi-step kernel, and SIMD-group cooperative kernel for data-heavy GLM. Automatic fallback: CUDA (f64) > Metal (f32) > error. - LAPS windowed mass adaptation — warmup Phase 2+3 now uses Stan-style doubling windows for
inv_massestimation. Each window resets Welford statistics and dual averaging, improving convergence on multi-scale models. Configurable vian_mass_windows. - LAPS Riemannian MAMS for Neal Funnel — new
neal_funnel_riemannianmodel with position-dependent Fisher metric. Uses effective potential where log-determinant terms cancel, yielding scale-invariant natural gradients. Available on both Metal (f32) and CUDA (f64). - BCa confidence interval engine — reusable bootstrap CI utilities in
ns-inference(percentile+bca) with diagnostics (z0, acceleration, adjusted alphas, sample counts). - HEP toy-summary CI controls —
nextstat unbinned-fit-toysnow supports opt-in summary CI computation:--summary-ci-method percentile|bca,--summary-ci-level,--summary-ci-bootstrap. Output includessummary.mean_ciwith requested/effective method and fallback diagnostics. - Churn bootstrap CI method selection —
nextstat churn bootstrap-hrnow supports--ci-method percentile|bca(defaultpercentile) and--n-jackknifefor BCa acceleration estimation. Output includes method metadata and per-coefficient diagnostics with fallback reason. - Python churn parity for CI methods —
nextstat.churn_bootstrap_hr()now acceptsci_methodandn_jackknifeand returns per-coefficient effective method/diagnostics. - Single-artifact visualization renderer (
nextstat viz render) — direct rendering of JSON viz artifacts to image outputs (pulls,corr,ranking) without full report generation. Supports title/DPI options and correlation filters (--corr-include,--corr-exclude,--corr-top-n). - Rootless HEP dataset generator —
bench_hep_dataset_bootstrap_ci.pynow supports--root-writer auto|uproot|root-cliand defaults to rootlessuprootwhen available.
Fixed
panel_fe()parameter order — changed from(entity_ids, x, y, p)to(y, x, entity_ids, p)to match econometrics module convention (yfirst).- HS3/pyhf CLI scope clarity — corrected command help/docs and added explicit fail-fast diagnostics for pyhf-only commands when HS3 input is provided.
- Arrow/Parquet
from_arrowlarge-offset compatibility — fixed ingestion forUtf8/LargeUtf8andList<Float64>/LargeList<Float64>in core paths, so Arrow tables from Polars/DuckDB are accepted without pre-normalization.
[0.9.5] — 2026-02-15
Fixed
- PyPI wheel coverage —
pip install nextstatnow works out of the box on all major platforms. Pre-built wheels for Linux x86_64/aarch64, macOS arm64/x86_64, and Windows x86_64 (Python 3.11–3.13). - Linux x86_64 glibc compatibility — wheels now target manylinux_2_17 (glibc 2.17+) instead of manylinux_2_35, fixing installation on CentOS 7/8, RHEL 7+, Ubuntu 18.04+.
- macOS Intel support — added
x86_64-apple-darwintarget to the release matrix. Intel Mac users no longer fall back to source builds. - Multi-interpreter wheel builds — Linux wheels built inside manylinux Docker container; macOS/Windows use explicit
setup-pythonwith 3.11/3.12/3.13.
Added
- HEP Full Workflow Tutorial — comprehensive 1200-line tutorial covering workspace construction, all modifier types, MLE fitting, CLs hypothesis testing, upper limits (Brazil band), NP ranking, pulls, correlation matrix, profile likelihood scans, workspace combination, mass scans, GPU acceleration, preprocessing, and automated reports.
- Detailed Installation & Quickstart guides — rewritten with step-by-step instructions, expected outputs, troubleshooting sections, and GPU acceleration flags.
[0.9.4] — 2026-02-15
Native ROOT I/O
- LRU basket cache — decompressed TTree basket payloads are cached per-
RootFilewith byte-bounded LRU eviction (default 256 MiB). Eliminates redundant decompression on repeated branch reads.RootFile::basket_cache()for stats,RootFile::set_cache_config()to tune. - Lazy branch reader —
RootFile::lazy_branch_reader()decompresses only the baskets needed for the requested entries.read_f64_at(entry)touches one basket;read_f64_range(start, end)touches only overlapping ones. - ChainedSlice — zero-copy concatenation of multiple decompressed basket payloads via
Arcsharing. O(log n) random access across non-contiguous segments. - ROOT leaflist parsing —
ns-rootnow parses compound leaf-list branches (multiple scalars packed per entry).
ns-zstd Performance
- Encoder hot-path optimizations — 20+ targeted improvements to the pure-Rust Zstd encoder: faster FSE state transitions, packed sequence bit writes, hash-chain collision reduction via head tags and u64 reject, common-prefix u128 comparison, fast-path depth-1 search, lazy/history check reduction, no-match skip heuristic.
zstd-shim— transparent backend selection crate: uses nativelibzstdon desktop for maximum throughput, falls back to pure-Rustns-zstdon WASM and embedded targets.
GPU Acceleration
- Multi-channel GPU batch toys (Metal + CUDA) — batch toy fitter now handles workspaces with multiple channels on both GPU backends.
- Unbinned CUDA multi-GPU batch toys (
--gpu-devices) —nextstat unbinned-fit-toysandnextstat unbinned-hypotest-toyscan shard host-sampled toys across multiple CUDA devices. - Unbinned CUDA device-resident shard orchestration (
--gpu-shards) — sharded--gpu-sample-toysexecution with round-robin device mapping. Single-GPU emulation via--gpu cuda --gpu-sample-toys --gpu-shards N. - Unbinned CUDA host-toy shard orchestration — CUDA toy workflows support sharded host-toy execution (
pipeline: cuda_host_sharded) with shard plan exposed in metrics. - Unbinned CUDA sharded toy-path metrics/tests — integration coverage for
--gpu-sample-toys --gpu-shardswith metrics contract checks forpipeline: cuda_device_shardedanddevice_shard_plan. - Metal
--gpu-sample-toys— device-resident toy sampling on Apple Silicon (previously CUDA-only). - Parquet observed data for unbinned
--gpu— unbinned GPU path can now ingest observed data directly from Parquet files. - TensorRT execution provider for neural PDFs —
--features neural-tensorrtenables TensorRT EP with FP16 inference, engine caching (~/.cache/nextstat/tensorrt/), and dynamic batch-size optimization profiles. Automatic fallback chain: TensorRT → CUDA EP → CPU.FlowGpuConfigfor custom TRT settings;FlowPdf::from_manifest_with_config()constructor;FlowPdf::gpu_ep_kind()for runtime introspection. - Analytical Jacobian gradients for flow PDFs — when the manifest includes a
log_prob_gradONNX model,FlowPdf::log_prob_grad_batch()computes exact ∂ log p / ∂ context in a single forward pass instead of 2 × n_context finite-difference evaluations. Fused CUDA kernelflow_nll_grad_reducecomputes NLL + gradient intermediates in one launch;GpuFlowSession::nll_grad_analytical()assembles the full gradient on CPU. - Multi-GPU batch toy fitting for flow PDFs —
fit_flow_toys_batch_cuda()runs lockstep L-BFGS-B across thousands of toys using theflow_batch_nll_reducekernel (1 block = 1 toy).shard_flow_toys()partitions toys evenly across devices;fit_flow_toys_batch_multi_gpu()drives all GPUs and merges results. - CUDA EP f32 zero-copy NLL —
GpuFlowSession::nll_device_ptr_f32()accepts a raw CUDA device pointer to float log-probs from ONNX Runtime CUDA EP, eliminating the host-to-device memcpy. Up to 57× faster than the f64 host-upload path at typical event counts (~1K). Python:GpuFlowSession.nll_device_ptr_f32(ptr, params). - PF3.1-OPT2: Parallel device-resident multi-GPU —
cuda_device_shardedpipeline runs each shard on its own thread with its own CUDA context viastd::thread::scope, enabling true multi-GPU concurrency. Validated on 4× A40: 2 GPU = 1.97× (near-linear), 3 GPU = 2.9×. Before fix: flat 1.00×. - PF3.1 2× H100 benchmark snapshot — full unbinned toy matrix (10k/50k/100k) for CPU, 1-GPU/2-GPU host-toy, and device-resident sharded paths.
- PF3.1 single-GPU runtime snapshot (RTX 4000 Ada) — 10k/20k validation artifacts for worker-per-device orchestration.
- CLI
--gpu cudafor flow PDFs (G2-R1) —nextstat unbinned-fit-toys --gpu cudasupports flow/conditional_flow/dcr_surrogate PDFs. CPU sampling → CPU logp → CUDA NLL reduction viaflow_batch_nll_reducekernel → lockstep L-BFGS-B. - Normalization grid cache —
QuadratureGrid::auto()selects Gauss-Legendre (1–3D), tensor product (4–5D), or Halton quasi-Monte Carlo (6D+).NormalizationCacheavoids recomputation when parameters haven't changed.
Hybrid Likelihood (Phase 4)
HybridLikelihood<A, B>— generic combined likelihood that sums NLLs from twoLogDensityModelimplementations with shared parameters. ImplementsLogDensityModel,PoiModel, andFixedParamModel.SharedParameterMap— merges parameter vectors from two models by name. Shared parameters get intersected bounds; model-specific parameters are appended.- CLI
nextstat hybrid-fit—--binned(pyhf/HS3 JSON) +--unbinned(YAML/JSON spec). Prints summary of shared/total parameters, runs MLE viaHybridLikelihood. WeightSummarydiagnostics —EventStore::weight_summary()returns ESS, sum/min/max/mean weights, n_zero.- HS3 unbinned extension —
Hs3UnbinnedDistextends HS3 v0.2 schema with event-level channels.export_unbinned_hs3()andexport_hybrid_hs3()for serialization. - Fused single-pass CPU NLL kernel —
fused_gauss_exp_nllcomputes per-event log-likelihood inline for Gaussian+Exponential topology, eliminating intermediate allocations. Adaptive parallelism: sequential for N < 8k, rayonpar_chunks(1024)for N ≥ 8k. - SIMD-vectorized fused kernel (
wide::f64x4) — processes 4 events per iteration (AVX2 on x86, NEON on ARM). 4–12× speedup over generic path on x86; 2–5× on ARM. ~770 M events/s NLL throughput at 100k events on x86. - Unbinned benchmark suite — Criterion benchmarks: ~770 M events/s NLL throughput on x86 (fused+SIMD); full 5-param fit in 637 µs at 10k events.
- Fused CrystalBall+Exponential kernel —
fused_cb_exp_nllextends the fused single-pass path to CB+Exp topology. Scalar event loop with rayon adaptive parallelism; analytical gradients for all 5 shape parameters. 2.7–5.8× speedup over generic on M5. - Unbinned fused-vs-generic benchmark mode —
UnbinnedModel::nll_generic()/grad_nll_generic()force the generic multi-pass path for apples-to-apples CPU comparison against the default fused path.
Cross-Vertical Statistical Features
- Gamma GLM (
GammaRegressionModel) — Gamma distribution with log link, shared shape parameter α. Analytical NLL and gradient. For insurance claim amounts, hospital costs, strictly positive continuous responses. - Tweedie GLM (
TweedieRegressionModel) — compound Poisson-Gamma with powerp ∈ (1, 2), log link. Saddle-point NLL approximation (Dunn & Smyth 2005). Handles exact zeros. For insurance aggregate claims, rainfall, zero-inflated positive data. - GEV distribution (
GevModel) — Generalized Extreme Value for block maxima (Fréchet ξ>0, Gumbel ξ≈0, Weibull ξ<0). MLE via L-BFGS-B with analytical gradient.return_level(T)for T-block return levels. For reinsurance, hydrology, climate extremes. - GPD distribution (
GpdModel) — Generalized Pareto for peaks-over-threshold. MLE with analytical gradient.quantile(p)for excess quantiles (VaR/ES). For tail risk in finance, reinsurance pricing. - Meta-analysis (
meta_fixed,meta_random) — fixed-effects (inverse-variance) and random-effects (DerSimonian–Laird) pooling. Heterogeneity: Cochran's Q, I², H², τ². Forest plot data with per-study weights and CIs. For pharma, epidemiology, social science.
WASM Playground
- Slim 454 KB binary — stripped Arrow/Parquet from WASM build, custom
[profile.release-wasm]withopt-level = "z",lto = "fat",strip = true, pluswasm-opt -Oz. Down from 5.7 MB to 454 KB. - UI polish — standard site header with logo, compact single-screen layout, loading spinner on Run,
⌘+Entershortcut, auto-load simple example on first visit, converged/failed badges in MLE Fit, fade-in animations. - Guided examples — 4 HistFactory + 3 GLM examples with contextual descriptions. Dropdown filtered by active operation tab — only compatible examples shown.
- Auto-run on tab switch — switching between workspace operations auto-runs on the loaded workspace. GLM ↔ workspace transitions clear the editor to prevent format mismatches.
- GLM Regression tab —
run_glm()WASM endpoint: linear, logistic, Poisson regression via L-BFGS-B. Model/intercept selectors, parameter table in results. - Mass Scan (Type B Brazil Band) — ATLAS/CMS-style exclusion plot: 95% CL upper limit on μ vs signal peak position with ±1σ/±2σ expected bands.
Inference Server (ns-server)
- API key authentication —
--api-keys <file>orNS_API_KEYSenv var. Bearer token validation on all endpoints exceptGET /v1/health. - Per-IP rate limiting —
--rate-limit <N>(requests/second/IP). Token-bucket with lazy prune. POST /v1/unbinned/fit— unbinned MLE fit endpoint.POST /v1/nlme/fit— NLME / PK population fit endpoint. Supportspk_1cptandnlme_1cpt. LLOQ policies: ignore, replace_half, censored.- Async job system —
POST /v1/jobs/submit,GET /v1/jobs/{id},DELETE /v1/jobs/{id}. In-memory store with TTL pruning, cancellation tokens. GET /v1/openapi.json— OpenAPI 3.1 specification covering all 16 endpoints.
Survival Analysis (Non-parametric)
- Kaplan-Meier estimator —
nextstat.kaplan_meier(times, events, conf_level): non-parametric survival curve with Greenwood variance, log-log transformed CIs, median survival, number-at-risk table. - Log-rank test —
nextstat.log_rank_test(times, events, groups): Mantel-Cox chi-squared test for 2+ groups. - CLI:
nextstat survival km,nextstat survival log-rank-test.
Subscription / Churn Vertical
- Synthetic SaaS churn dataset —
nextstat.churn_generate_data(): deterministic, seeded cohort data with right-censored churn times, plan/region/usage covariates, and treatment assignment. - Cohort retention —
nextstat.churn_retention(): stratified KM per group + log-rank comparison. - Churn risk model —
nextstat.churn_risk_model(): Cox PH hazard ratios with CIs. - Causal uplift —
nextstat.churn_uplift(): AIPW-based intervention impact with Rosenbaum sensitivity. - CLI:
nextstat churn generate-data,nextstat churn retention,nextstat churn risk-model,nextstat churn uplift. - Diagnostics trust gate —
nextstat.churn_diagnostics(): censoring rates per segment, covariate balance (SMD), propensity overlap, sample-size adequacy. - Cohort retention matrix —
nextstat.churn_cohort_matrix(): life-table per cohort with period boundaries, cumulative retention. - Segment comparison —
nextstat.churn_compare(): pairwise log-rank + HR proxy + Benjamini-Hochberg / Bonferroni MCP correction. - Survival-native uplift —
nextstat.churn_uplift_survival(): RMST, IPW-weighted KM, ΔS(t) at eval horizons. - Bootstrap hazard ratios —
nextstat.churn_bootstrap_hr(): parallel bootstrap Cox PH with percentile CIs. - Data ingestion —
nextstat.churn_ingest(): validate & clean raw customer arrays, observation-end censoring cap.
CLI
nextstat mass-scan— batch asymptotic CLs upper limits across multiple workspaces (Type B Brazil Band).nextstat significance— discovery significance (p₀ and Z-value).nextstat goodness-of-fit— saturated-model goodness-of-fit test (χ²/ndof and p-value).nextstat combine— merge multiple pyhf JSON workspaces into a single combined workspace with automatic systematic correlation.nextstat fit --asimov— blind fit on Asimov (expected) data.nextstat viz gammas— postfit γ parameter values with prefit/postfit uncertainties.nextstat viz summary— multi-fit μ summary artifact from multiple fit result JSONs.nextstat viz pie— sample composition pie chart per channel.nextstat viz separation— signal vs background shape comparison with separation metric.nextstat preprocess smooth— native Rust 353QH,twice smoothing for HistoSys templates.nextstat preprocess prune— native Rust pruning of negligible systematics.
pyhf Feature Parity
Workspace::prune()— remove channels, samples, modifiers, or measurement POIs by name.Workspace::rename()— rename channels, samples, modifiers, or measurement POIs.Workspace::sorted()— return workspace with sorted channels, samples, and modifiers.Workspace::digest()— SHA-256 content digest of canonicalised workspace JSON.Workspace::combine()— merge two workspaces with configurable channel join semantics.pyhf::simplemodels—uncorrelated_background()andcorrelated_background()quick workspace builders.pyhf::xml_export— export pyhf workspace to HistFactory XML format.- HistoSys interpolation code2 — quadratic interpolation with linear extrapolation. Scalar, SIMD, and tape-based AD paths.
- Test statistics
t_μandt̃_μ— Eq. 8 and Eq. 11 of arXiv:1007.1727. OptimizerStrategypresets — Default, MinuitLike, HighPrecision.docs/pyhf-parity.md— comprehensive feature matrix.
Econometrics & Causal Inference (Phase 12)
- Panel fixed-effects regression — entity-demeaned OLS with Liang–Zeger cluster-robust SE.
panel_fe_fit()in Rust,nextstat.panel_fe()in Python. - Difference-in-Differences (DiD) — canonical 2×2 estimator and multi-period event-study with leads/lags. Python:
nextstat.did(),nextstat.event_study(). - Instrumental Variables / 2SLS — two-stage least squares with first-stage F-statistic, partial R², Stock–Yogo test. Python:
nextstat.iv_2sls(). - AIPW (Doubly Robust) — Augmented Inverse Probability Weighting for ATE. Influence-function SE, propensity score trimming. Python:
nextstat.aipw_ate(). - Rosenbaum sensitivity analysis — Wilcoxon signed-rank bounds for matched-pair sensitivity to unobserved confounding. Python:
nextstat.rosenbaum_bounds(). docs/references/econometrics.md— reference documentation with code examples, assumptions table, and limitations.
API Stabilization
- ns-core re-exports —
LogDensityModel,PoiModel,FixedParamModel,PreparedNll,PreparedModelRefnow re-exported from crate root. - Deprecated
Modeltrait — superseded byLogDensityModel. Will be removed in 1.0. - ns-inference re-exports — added
scan,scan_metal,NegativeBinomialRegressionModel,QualityGates,compute_diagnostics,quality_summaryto crate root. nextstat.unbinned.UnbinnedAnalysis— high-level workflow wrapper overUnbinnedModel:.from_config(),.fit(),.scan(),.hypotest(),.hypotest_toys(),.ranking(),.summary().- Python
__all__completeness — addedvolatility,UnbinnedModel,HybridModel, and 9 more tonextstat.__all__.
Fixed
- L-BFGS steepest-descent fallback — optimizer now correctly falls back to steepest descent when the L-BFGS update produces a non-descent direction.
[0.9.0] — 2026-02-09
Major release
Neural Density Estimation
- Flow PDF — ONNX-backed normalizing flow as an unbinned PDF. Loads pre-trained flows from
flow_manifest.json+ ONNX models. Supports unconditional and conditional flows with nuisance parameters as context. Spec YAML:type: flow/type: conditional_flow. Feature-gated:--features neural. - DCR Surrogate — neural Direct Classifier Ratio surrogate replacing binned template morphing. Drop-in replacement for morphing histograms — smooth, continuous, bin-free systematic morphing trained via FAIR-HUC protocol. Spec YAML:
type: dcr_surrogate. - Unbinned spec YAML supports
flow,conditional_flow, anddcr_surrogatePDF types with automatic feature gating. - Normalization verification — Gauss-Legendre quadrature (orders 32–128) for normalization verification and correction of neural PDFs.
- Training helpers — Python scripts for flow training (zuko NSF + ONNX export), DCR distillation from HistFactory templates, and validation (normalization, PIT/KS, closure checks).
- Python bindings for FlowPdf / DcrSurrogate —
nextstat.FlowPdfandnextstat.DcrSurrogateclasses inns-pybehind--features neural. Standalone ONNX flow evaluation from Python:from_manifest(),log_prob_batch(),update_normalization(),validate_nominal_normalization().
Documentation
- Unbinned spec reference —
docs/references/unbinned-spec.md: dedicated human-readable reference fornextstat_unbinned_spec_v0covering all 13 PDF types, yield expressions, rate modifiers (NormSys + WeightSys), per-event systematics, neural PDFs, and GPU acceleration constraints.
GPU Acceleration
- CUDA (NVIDIA, f64) — fused NLL+gradient kernel covering all 7 HistFactory modifier types in a single launch. Lockstep batch optimizer fits thousands of toys in parallel. Dynamic loading via cudarc — binary works without CUDA installed.
- Metal (Apple Silicon, f32) — same fused kernel in MSL. Zero-copy unified memory. NLL parity vs CPU f64: 1.27e-6 relative diff.
- Apple Accelerate — vDSP/vForce vectorized NLL on macOS. <5% overhead vs naive summation.
- GPU-resident toy pipeline (CUDA) —
--gpu-sample-toysnow keeps sampled events on the GPU device, eliminating the D2H+H2D round-trip of the largeobs_flatbuffer between sampler and batch fitter. - Unbinned GPU WeightSys —
weightsysrate modifier now lowered to CUDA/Metal kernels (code0/code4p interpolation). Spec YAML:type: weightsys,param,lo,hi, optionalinterp_code. - CPU batch toys — Rayon-parallel fitting with per-thread tape reuse, seed-based reproducibility.
- Reverse-mode tape — faster gradient computation with reduced memory allocation.
- CLI:
--gpu cuda,--gpu metal· Python:device="cuda",device="metal"
Differentiable Analysis (PyTorch)
- Zero-copy CUDA kernel reads signal histogram from a PyTorch tensor and writes dNLL/dsignal directly into the grad buffer — no device-host roundtrip.
DifferentiableSession: NLL + signal gradient at fixed nuisance parameters.ProfiledDifferentiableSession: profiled test statistics with envelope-theorem gradients — enables NN → signal histogram → profiled CLs → loss.nextstat.torchmodule:NextStatNLLFunction,NextStatProfiledQ0Function(autograd),NextStatLayer(nn.Module).profiled_zmu_loss()— Zμ loss wrapper (sqrt(qμ) with numerical stability) for signal-strength optimization.SignificanceLoss(model)— ML-friendly class wrapping profiled −Z₀. Init once, call per-batch:loss_fn(signal_hist).backward().SoftHistogram— differentiable binning (Gaussian KDE / sigmoid): NN classifier scores → soft histogram →SignificanceLoss.batch_profiled_q0_loss()— profiled q₀ for a batch of signal histograms (ensemble training).signal_jacobian(),signal_jacobian_numpy()— direct ∂q₀/∂signal without autograd for SciPy bridge and fast pruning.as_tensor()— DLPack array-API bridge: JAX, CuPy, Arrow, NumPy →torch.Tensor.nextstat.mlops— fit metrics extraction for W&B / MLflow / Neptune:metrics_dict(result),significance_metrics(z0),StepTimer.nextstat.interpret— systematic-impact ranking as Feature Importance:rank_impact(model),rank_impact_df(),plot_rank_impact().nextstat.tools— LLM tool definitions (OpenAI function calling, LangChain, MCP) for 7 operations: fit, hypotest, upper_limit, ranking, significance, scan, workspace_audit.nextstat.distill— surrogate training dataset generator.generate_dataset(model, n_samples=100k, method="sobol")produces (params, NLL, gradient) tuples. Export to PyTorchTensorDataset,.npz, or Parquet. Built-intrain_mlp_surrogate()with Sobolev loss.- Fit convergence check: returns error if GPU profile fit fails to converge.
Gymnasium RL Environment
nextstat.gym— optional Gymnasium/Gym wrapper treating a HistFactory workspace as an RL/DOE environment.- Propose updates to a sample's nominal yields, receive a NextStat metric as reward (NLL, q₀, Z₀, qμ, Zμ).
make_histfactory_env()factory with configurablereward_metric,action_mode,action_scale.- Compatible with
gymnasium(preferred) and legacygym.
Deterministic Validation
- EvalMode — process-wide flag: Parity (Kahan summation, single-threaded, bit-exact) vs Fast (default, SIMD/GPU, multi-threaded).
- CLI:
--parity· Python:nextstat.set_eval_mode("parity") - 7-tier tolerance contract vs pyhf (per-bin ~1e-14 worst case).
Native ROOT I/O
- TTree reader — mmap file access, native binary deserialization, basket decompression (zlib/LZ4/ZSTD) with rayon-parallel extraction. 9 leaf types + jagged branches.
- Expression engine — bytecode-compiled, vectorized. Full grammar: arithmetic, comparisons, boolean logic, ternary, builtins. Dynamic jagged indexing (
jet_pt[idx]) follows ROOT/TTreeFormula convention. Python wrapper:nextstat.analysis.expr_eval. - Histogram filler — single-pass with selection cuts, weights, variable binning.
- Unsplit vector branch decoding — best-effort decoding for
std::vector<T>branches without offset tables. - ~8.5× faster than uproot+numpy on the full pipeline.
Ntuple-to-Workspace Pipeline
NtupleWorkspaceBuilder: ROOT ntuples → HistFactoryWorkspacevia fluent Rust API.- Per-sample modifiers: NormFactor, NormSys, WeightSys, TreeSys, HistoSys, StatError.
- Produces the same
Workspacestruct as the pyhf JSON path — no ROOT C++ dependency.
TRExFitter Interop
nextstat import trex-config— import TRExFitter.configinto pyhf JSON workspace.nextstat build-hists— run NTUP pipeline, writeworkspace.json.- HIST mode — read pre-built ROOT histograms (
ReadFrom: HIST) alongside NTUP. - Analysis Spec v0 (YAML + JSON Schema) —
nextstat run <spec.yaml>orchestrates import/fit/scan/report. - Jagged column support and TRExFitter-style expression compatibility.
Systematics Preprocessing
- Smoothing: 353QH,twice algorithm (ROOT
TH1::Smoothequivalent) + Gaussian kernel. - Pruning: shape, norm, and overall pruning with audit trail.
nextstat preprocessCLI with declarative YAML config and content-hash caching.
HistFactory Enhancements
- HS3 v0.2 ingestion — load HS3 JSON workspaces (ROOT 6.37+) natively. Auto-detects format (pyhf vs HS3) at load time.
- HS3 roundtrip export — export
HistFactoryModelback to HS3 JSON with bestfit parameter points. - Python:
HistFactoryModel.from_workspace()(auto-detect),HistFactoryModel.from_hs3(json_str). CLI: auto-detection innextstat fit,nextstat scan. - HS3 inputs use ROOT HistFactory defaults (NormSys Code1, HistoSys Code0). For pyhf JSON, NextStat defaults to smooth interpolation (NormSys Code4, HistoSys Code4p); use
--interp-defaults pyhf(CLI) orfrom_workspace_with_settings(Code1, Code0)(Rust) for strict pyhf defaults. - HEPData patchset support:
nextstat import patchset, Pythonnextstat.apply_patchset(). - Arrow / Polars ingestion —
nextstat.from_arrow(table)creates a HistFactoryModel from PyArrow Table, RecordBatch, or any Arrow-compatible source (Polars, DuckDB, Spark).nextstat.from_parquet(path)reads Parquet directly. - Arrow export —
nextstat.to_arrow(model, what="yields"|"params")exports expected yields or parameter metadata as a PyArrow Table. - ConstraintTerm semantics — LogNormal alpha-transform, Gamma constraint for ShapeSys, Uniform and NoConstraint handling. Parsed from
<ConstraintTerm>metadata in HistFactory XML.
Unbinned Likelihood
- Product PDF — joint likelihood over independent observables:
log p(x,y) = log p₁(x) + log p₂(y). Enables multi-observable unbinned fits without manual factorization. - Spline PDF — monotonic cubic Hermite (Fritsch–Carlson) interpolation from user-specified knot positions and density values. Analytically normalized, inverse-CDF sampling for toys.
- Multi-dimensional KDE — 2-D/3-D Gaussian kernel density estimator with Silverman bandwidth, truncated on bounded observable support.
- ARGUS PDF — ARGUS background shape for B-meson spectroscopy. Gauss-Legendre normalization on bounded support.
- Voigtian PDF — pseudo-Voigt (Thompson–Cox–Hastings) resonance line shape. Gaussian ⊗ Breit-Wigner convolution for resonance + detector resolution modeling.
- Normalization integrals are cached across optimizer iterations, avoiding redundant quadrature when parameters haven't changed.
- GPU flow NLL reduction — CUDA kernel for extended unbinned likelihood from externally-computed log-prob values (flow PDFs). Supports multi-process logsumexp reduction, Gaussian constraints, and both host-upload and device-resident (ONNX CUDA EP) input paths.
- GPU flow session — orchestrates flow PDF evaluation (CPU or CUDA EP) with GPU NLL reduction. Central finite-difference gradient, yield computation from parameter vector, and Gaussian constraint handling.
Report System
nextstat report— generates distributions, pulls, correlations, yields (.json/.csv/.tex), and uncertainty ranking.- Python rendering: multi-page PDF + per-plot SVGs via matplotlib.
--blindflag masks observed data for unblinded regions.--deterministicfor stable JSON key ordering.nextstat validation-report— unified validation artifact combining Apex2 results with workspace fingerprints. Outputsvalidation_report.json(schemavalidation_report_v1) with dataset SHA-256, model spec, environment, regulated-review notes, and per-suite pass/fail summary. Optional--pdfrenders a 7-page audit-ready PDF via matplotlib.--json-onlyflag for validation-pack: generate JSON artifacts without PDF rendering (no matplotlib dependency).- Validation pack manifest —
validation_pack_manifest.jsonwith SHA-256 hashes and sizes for all pack files. Convenient index for replication and signing. - Optional signing —
--sign-openssl-key/--sign-openssl-pubflags produce Ed25519 signatures over the manifest digest.
Survival Analysis
- Parametric models: Exponential, Weibull, LogNormal AFT (with right-censoring).
- Cox Proportional Hazards: Efron/Breslow ties, robust sandwich SE, Schoenfeld residuals.
- Python:
nextstat.survival.{exponential,weibull,lognormal_aft,cox_ph}.fit(...) - CLI:
nextstat survival fit,nextstat survival predict
Linear Mixed Models
- Analytic marginal likelihood (random intercept, random intercept + slope).
- Laplace approximation for approximate posteriors.
Ordinal Models
- Ordered logit/probit with stable cutpoint parameterization.
Econometrics & Causal Inference
- Panel FE with 1-way cluster SE.
- DiD TWFE + event-study helpers.
- IV / 2SLS with weak-IV diagnostics (first-stage F, partial R²).
- AIPW for ATE/ATT + E-value helper. Propensity scores, IPW weights, overlap diagnostics. Python:
nextstat.causal.aipw(),nextstat.causal.propensity_scores(). - GARCH / Stochastic Volatility — GARCH(1,1) and stochastic volatility models for financial time series. CLI:
nextstat volatility fit,nextstat volatility forecast. Python:nextstat.volatility.garch(),nextstat.volatility.sv().
Pharmacometrics
- RK4 integrator for linear ODE systems.
- One-compartment oral PK model with LLOQ censoring.
- NLME extension with per-subject random effects.
- Error model enum —
ErrorModel::Additive,Proportional,Combined(σ_add, σ_prop)with variance, NLL, and gradient helpers. - 2-compartment PK models —
TwoCompartmentIvPkModel(IV bolus, 4 params) andTwoCompartmentOralPkModel(oral, 5 params). Analytical bi/tri-exponential solutions. - Dosing regimen —
DosingRegimensupporting IV bolus, oral, and IV infusion. Single-dose, repeated-dose, and mixed-route schedules. Closed-form infusion solutions. - NONMEM dataset reader —
NonmemDataset::from_csv()parses standard NONMEM-format CSV. Auto-converts dosing records toDosingRegimenper subject. - FOCE/FOCEI estimation —
FoceEstimator: per-subject ETA optimization (damped Newton-Raphson) + population parameter updates. Laplace approximation with ridge-regularized Hessian. - Correlated random effects —
OmegaMatrixstores full Ω via Cholesky factor (Ω = L·Lᵀ).FoceEstimator::fit_1cpt_oral_correlated()fits with full Ω;FoceResultincludesomega_matrixandcorrelationfields. - Stepwise Covariate Modeling (SCM) —
ScmEstimator: forward selection + backward elimination using ΔOFV (χ²(1) LRT). Power, proportional, and exponential relationships on CL/V/Ka. Full audit trace. - VPC and GOF diagnostics —
vpc_1cpt_oral(): Visual Predictive Checks with simulated quantile prediction intervals.gof_1cpt_oral(): PRED, IPRED, IWRES, CWRES. - Pharma benchmark suite — Warfarin (32 subjects), Theophylline (12), Phenobarbital (40 neonates). Parameter recovery, GOF, VPC. Includes correlated-Ω variant.
cargo test --test pharma_benchmark. - NLME artifact schema (v2.0.0) —
NlmeArtifactwraps all estimation results into a single JSON-serializable structure. CSV exports for fixed effects, random effects, GOF, VPC, SCM trace. - Run bundle (provenance) —
RunBundlecaptures NextStat version, git revision, Rust toolchain, OS/CPU, random seeds, dataset provenance. - SAEM algorithm —
SaemEstimator: Stochastic Approximation EM for NLME (Monolix-class). Metropolis-Hastings E-step with adaptive proposal variance, closed-form M-step. ReturnsSaemDiagnostics(acceptance rates, OFV trace). Supports diagonal and correlated Ω. - PD models —
EmaxModel,SigmoidEmaxModel(Hill equation),IndirectResponseModel(Types I–IV). ODE-based IDR via adaptive RK45.PkPdLinkfor PK→PD concentration interpolation. - Adaptive ODE solvers —
rk45()(Dormand-Prince 4(5) with PI step-size control) for non-stiff systems andesdirk4()(L-stable SDIRK2) for stiff systems. GenericOdeSystemtrait.
Applied Statistics API
- Formula parsing + deterministic design matrices (
nextstat.formula). from_formulawrappers for all GLM and hierarchical builders.- Wald summaries + robust covariance (HC0–HC3, 1-way cluster).
- scikit-learn adapters:
NextStatLinearRegression,NextStatLogisticRegression,NextStatPoissonRegressor. - Missing-data policies:
drop_rows,impute_mean.
WASM Playground
- Browser-based inference via
wasm-bindgen:fit_json(),hypotest_json(),upper_limit_json(). - Drag-and-drop
workspace.json→ asymptotic CLs Brazil bands. No Python, no server.
Visualization
plot_cls_curve(),plot_brazil_limits(),plot_profile_curve().nextstat viz distributions,viz pulls,viz corr,viz rankingsubcommands.- Kalman:
plot_kalman_states(),plot_forecast_bands().
Pure-Rust Zstd Codec (ns-zstd)
ns-zstdcrate — pure-Rust Zstd decompressor and compressor for ROOT file I/O. Zero C dependency — enables WASM and embedded targets. Supports compression levels 1–19 with FSE (Finite State Entropy) and Huffman entropy coding. Decode output matcheslibzstdbyte-for-byte (verified via fixture tests). Hash-chain match finder with configurable search depth.
R Bindings
nextstatR package — native R interface viaextendr(bindings/ns-r/). Providesnextstat_fit(),nextstat_hypotest(),nextstat_upper_limit(),nextstat_scan(),nextstat_ranking()for HistFactory workspaces. Unbinned event Parquet I/O and neural PDF bindings (FlowPdf,DcrSurrogate) also exposed. Install:R CMD INSTALL bindings/ns-r.
CLI & Infrastructure
- Structured logging (
--log-level), reproducible run bundles (--bundle). fit()supportsinit_pars=for warm-start MLE.- CI: pyhf parity gate on push/PR, TREx baseline refresh (nightly), HEPData workspace tests.
- Apex2 validation: NLL parity, bias/pulls regression, SBC calibration, NUTS quality gates.
nextstat-server— self-hosted REST API for shared GPU inference.POST /v1/fit,POST /v1/ranking,GET /v1/health. Flags:--gpu cuda|metal,--port,--host,--threads.nextstat.remote— pure-Python thin client (httpx).client = nextstat.remote.connect("http://gpu-server:3742"), thenclient.fit(workspace),client.ranking(workspace),client.health().- Batch API —
POST /v1/batch/fitfits up to 100 workspaces in one request;POST /v1/batch/toysruns GPU-accelerated toy fitting.client.batch_fit(workspaces),client.batch_toys(workspace, n_toys=1000). - Model cache —
POST /v1/modelsuploads a workspace and returns amodel_id(SHA-256); subsequent/v1/fitand/v1/rankingcalls acceptmodel_id=to skip re-parsing. LRU eviction (64 models). - Docker & Helm — multi-stage Dockerfile for CPU and CUDA builds, Helm chart with health probes, GPU resource requests, configurable replicas.
Fixed
- End-to-end discovery script (
e2e_discovery.py): fixed--no-deterministicflag handling. Script now correctly writessummary.jsonandsummary.md. - CUDA batch toys (
--gpu cuda) crash when some toys converge before others. - GPU profiled session (
ProfiledDifferentiableSession) convergence failure near parameter bounds. - Optimizer early-stop with negative NLL (
target_cost(0.0)removed). kalman_simulate():init="sample|mean"andx0=...support.- StatError: incorrect
sqrt(sumw2)propagation with zero nominal counts. - Metal GPU: scratch buffer reuse (~40% less allocation overhead).
- HistFactory XML: strip
<!DOCTYPE>declarations before parsing. - CUDA/Metal signal gradient race condition: incorrect accumulation when multiple samples contribute to the same bin.
- 10 missing Python re-exports in
__init__.py:has_metal,read_root_histogram,workspace_audit,cls_curve,profile_curve,kalman_filter/smooth/em/forecast/simulate. - ROOT TTree decompression: cap output buffer to prevent OOM on corrupted/oversized baskets.
- HistFactory XML: absolute
InputFilepath fallback and ROOT C macro special-character escaping. - Metal GPU: ranking now explicitly rejected with a clear error (Metal does not yet support ranking).
- StatError:
HistoNameuncertainties are now treated as relative and converted to absolute (sigma_abs = rel * nominal), matching ROOT/HistFactory semantics.
[0.1.0] — 2026-02-05
Initial public release
Core Engine
- HistFactory workspace data model with full pyhf JSON compatibility.
- Poisson NLL with all modifier types + Barlow-Beeston.
- SIMD-accelerated NLL via
wide::f64x4. - Automatic differentiation: forward-mode (dual numbers) and reverse-mode (tape AD).
Frequentist Inference
- MLE via L-BFGS-B with Hessian-based uncertainties.
- Asymptotic CLs hypothesis testing (q-tilde test statistic).
- Profile likelihood scans, CLs upper limits (bisection + linear scan), Brazil bands.
- Batch MLE, toy studies, nuisance parameter ranking.
Bayesian Sampling
- No-U-Turn Sampler (NUTS) with dual averaging.
- HMC diagnostics: divergences, tree depth, step size, E-BFMI.
- Rank-normalized folded R-hat + improved ESS.
- Python:
sample()returning ArviZ-compatible dict.
Regression & GLM
- Linear, logistic, Poisson, negative binomial regression.
- Ridge regression (MAP/L2), separation detection, exposure/offset support.
- Cross-validation and metrics (RMSE, log-loss, Poisson deviance).
Hierarchical Models
- Random intercepts/slopes, correlated effects (LKJ + Cholesky), non-centered parameterization.
- Posterior Predictive Checks.
Time Series
- Linear-Gaussian Kalman filter + RTS smoother.
- EM parameter estimation, multi-step-ahead forecasting with prediction intervals.
- Local-level, local-trend, AR(1) builders. Missing observation handling.
Probability Distributions
- Normal, StudentT, Bernoulli, Binomial, Poisson, NegativeBinomial, Gamma, Exponential, Weibull, Beta.
- Bijector/transform layer: Identity, Exp, Softplus, Sigmoid, Affine.
Visualization
- Profile likelihood curves and CLs Brazil band plots.
- CLI:
viz profile,viz cls. Python:viz_profile_curve(),viz_cls_curve().
Python Bindings & CLI
nextstatPython package (PyO3/maturin) withModel,FitResultclasses.nextstatCLI:fit,hypotest,upper-limit,scan,version.- CI workflows + GitHub release pipeline (multi-arch wheels + CLI binary).
Validation (Apex2)
- Master report aggregator with NLL parity, GLM benchmarks, bias/pulls regression, SBC calibration, NUTS quality gates.
- Nightly slow CI workflow.
