NextStat for ML Engineers

Rosetta Stone: Physics ↔ ML

NextStat is a high-performance statistical inference engine built in Rust with first-class Python and PyTorch integration. If you come from ML/Data Science rather than particle physics, this page translates the core concepts into familiar terms and shows you what NextStat can do for your training pipeline.

Why Should ML Engineers Care?

In particle physics, the goal is to determine whether a signal exists in noisy data with rigorous uncertainty quantification. This is directly analogous to training a classifier where the loss function itself accounts for systematic uncertainties. NextStat lets you do exactly that — train a neural network whose loss is a full statistical test, not just cross-entropy.

The Key Idea

Instead of training a classifier with cross-entropy and then running a statistical test, NextStat lets you differentiate through the statistical test itself. Your neural network directly optimises discovery significance (Z₀), with all systematic uncertainties profiled out automatically.

Terminology Bridge

Physics Term	ML Equivalent	NextStat API
Nuisance Parameters	Latent Variables / Systematics	model.parameters()
Signal Strength (μ)	Parameter of Interest / Scale Factor	fit_result.mu
Profile Likelihood	Loss Function (marginalised over latents)	profiled_q0_loss()
Significance (Z₀)	Metric (higher = better separation)	profiled_z0_loss()
Asimov Dataset	Representative Synthetic Data	nextstat.asimov_data()
Ranking Plot	Feature Importance	nextstat.interpret.rank_impact()
Histogram Template	Binned Distribution / Soft Histogram	SoftHistogram()
HistFactory Workspace	Model Config / Experiment Specification	nextstat.from_pyhf()

The End-to-End Pipeline

# The full differentiable pipeline in 10 lines
import nextstat
from nextstat.torch import SignificanceLoss, SoftHistogram

# 1. Load your statistical model (HistFactory JSON from pyhf)
model = nextstat.from_pyhf(workspace_json)

# 2. Create a differentiable loss (uses GPU profiled significance)
loss_fn = SignificanceLoss(model, "signal")

# 3. Differentiable binning: NN scores → soft histogram
soft_hist = SoftHistogram(bin_edges=torch.linspace(0, 1, 11))

# 4. Training loop
for batch_x, batch_w in dataloader:
    scores = classifier(batch_x)               # NN → [N] scores
    histogram = soft_hist(scores, batch_w)      # → [10] soft bins
    loss = loss_fn(histogram.double().cuda())   # → -Z₀ scalar
    loss.backward()                             # gradients flow to NN
    optimizer.step()

What's Different from pyhf / neos?

Feature	pyhf + neos	NextStat
Language	Pure Python (JAX/NumPy)	Rust core + Python bindings
GPU backend	JAX XLA (limited)	Metal + CUDA (zero-copy with PyTorch)
Profiled gradients	Fixed-point differentiation	Envelope theorem (exact at convergence)
NN integration	Requires special pyhf branch	Native PyTorch autograd.Function
Maintained	neos: last commit Sep 2023	Active development

Quick Links

Training Guide — step-by-step tutorial with SignificanceLoss and SoftHistogram → ML Training
MLOps & Interpretability — W&B logging, feature importance, Optuna → MLOps
Differentiable Analysis — low-level sessions and autograd internals → Differentiable
Gymnasium RL — reinforcement learning for analysis optimisation → Gymnasium