NextStatNextStat

NextStat for ML Engineers

Rosetta Stone: Physics ↔ ML

NextStat is a high-performance statistical inference engine built in Rust with first-class Python and PyTorch integration. If you come from ML/Data Science rather than particle physics, this page translates the core concepts into familiar terms and shows you what NextStat can do for your training pipeline.

Why Should ML Engineers Care?

In particle physics, the goal is to determine whether a signal exists in noisy data with rigorous uncertainty quantification. This is directly analogous to training a classifier where the loss function itself accounts for systematic uncertainties. NextStat lets you do exactly that — train a neural network whose loss is a full statistical test, not just cross-entropy.

The Key Idea

Instead of training a classifier with cross-entropy and then running a statistical test, NextStat lets you differentiate through the statistical test itself. Your neural network directly optimises discovery significance (Z₀), with all systematic uncertainties profiled out automatically.

Terminology Bridge

Physics TermML EquivalentNextStat API
Nuisance ParametersLatent Variables / Systematicsmodel.parameters()
Signal Strength (μ)Parameter of Interest / Scale Factorfit_result.mu
Profile LikelihoodLoss Function (marginalised over latents)profiled_q0_loss()
Significance (Z₀)Metric (higher = better separation)profiled_z0_loss()
Asimov DatasetRepresentative Synthetic Datanextstat.asimov_data()
Ranking PlotFeature Importancenextstat.interpret.rank_impact()
Histogram TemplateBinned Distribution / Soft HistogramSoftHistogram()
HistFactory WorkspaceModel Config / Experiment Specificationnextstat.from_pyhf()

The End-to-End Pipeline

# The full differentiable pipeline in 10 lines
import nextstat
from nextstat.torch import SignificanceLoss, SoftHistogram

# 1. Load your statistical model (HistFactory JSON from pyhf)
model = nextstat.from_pyhf(workspace_json)

# 2. Create a differentiable loss (uses GPU profiled significance)
loss_fn = SignificanceLoss(model, "signal")

# 3. Differentiable binning: NN scores → soft histogram
soft_hist = SoftHistogram(bin_edges=torch.linspace(0, 1, 11))

# 4. Training loop
for batch_x, batch_w in dataloader:
    scores = classifier(batch_x)               # NN → [N] scores
    histogram = soft_hist(scores, batch_w)      # → [10] soft bins
    loss = loss_fn(histogram.double().cuda())   # → -Z₀ scalar
    loss.backward()                             # gradients flow to NN
    optimizer.step()

What's Different from pyhf / neos?

Featurepyhf + neosNextStat
LanguagePure Python (JAX/NumPy)Rust core + Python bindings
GPU backendJAX XLA (limited)Metal + CUDA (zero-copy with PyTorch)
Profiled gradientsFixed-point differentiationEnvelope theorem (exact at convergence)
NN integrationRequires special pyhf branchNative PyTorch autograd.Function
Maintainedneos: last commit Sep 2023Active development

Quick Links

  • Training Guide — step-by-step tutorial with SignificanceLoss and SoftHistogram → ML Training
  • MLOps & Interpretability — W&B logging, feature importance, Optuna → MLOps
  • Differentiable Analysis — low-level sessions and autograd internals → Differentiable
  • Gymnasium RL — reinforcement learning for analysis optimisation → Gymnasium