NextStat for ML Engineers
Rosetta Stone: Physics ↔ ML
NextStat is a high-performance statistical inference engine built in Rust with first-class Python and PyTorch integration. If you come from ML/Data Science rather than particle physics, this page translates the core concepts into familiar terms and shows you what NextStat can do for your training pipeline.
Why Should ML Engineers Care?
In particle physics, the goal is to determine whether a signal exists in noisy data with rigorous uncertainty quantification. This is directly analogous to training a classifier where the loss function itself accounts for systematic uncertainties. NextStat lets you do exactly that — train a neural network whose loss is a full statistical test, not just cross-entropy.
The Key Idea
Instead of training a classifier with cross-entropy and then running a statistical test, NextStat lets you differentiate through the statistical test itself. Your neural network directly optimises discovery significance (Z₀), with all systematic uncertainties profiled out automatically.
Terminology Bridge
| Physics Term | ML Equivalent | NextStat API |
|---|---|---|
| Nuisance Parameters | Latent Variables / Systematics | model.parameters() |
| Signal Strength (μ) | Parameter of Interest / Scale Factor | fit_result.mu |
| Profile Likelihood | Loss Function (marginalised over latents) | profiled_q0_loss() |
| Significance (Z₀) | Metric (higher = better separation) | profiled_z0_loss() |
| Asimov Dataset | Representative Synthetic Data | nextstat.asimov_data() |
| Ranking Plot | Feature Importance | nextstat.interpret.rank_impact() |
| Histogram Template | Binned Distribution / Soft Histogram | SoftHistogram() |
| HistFactory Workspace | Model Config / Experiment Specification | nextstat.from_pyhf() |
The End-to-End Pipeline
# The full differentiable pipeline in 10 lines
import nextstat
from nextstat.torch import SignificanceLoss, SoftHistogram
# 1. Load your statistical model (HistFactory JSON from pyhf)
model = nextstat.from_pyhf(workspace_json)
# 2. Create a differentiable loss (uses GPU profiled significance)
loss_fn = SignificanceLoss(model, "signal")
# 3. Differentiable binning: NN scores → soft histogram
soft_hist = SoftHistogram(bin_edges=torch.linspace(0, 1, 11))
# 4. Training loop
for batch_x, batch_w in dataloader:
scores = classifier(batch_x) # NN → [N] scores
histogram = soft_hist(scores, batch_w) # → [10] soft bins
loss = loss_fn(histogram.double().cuda()) # → -Z₀ scalar
loss.backward() # gradients flow to NN
optimizer.step()What's Different from pyhf / neos?
| Feature | pyhf + neos | NextStat |
|---|---|---|
| Language | Pure Python (JAX/NumPy) | Rust core + Python bindings |
| GPU backend | JAX XLA (limited) | Metal + CUDA (zero-copy with PyTorch) |
| Profiled gradients | Fixed-point differentiation | Envelope theorem (exact at convergence) |
| NN integration | Requires special pyhf branch | Native PyTorch autograd.Function |
| Maintained | neos: last commit Sep 2023 | Active development |
Quick Links
- Training Guide — step-by-step tutorial with SignificanceLoss and SoftHistogram → ML Training
- MLOps & Interpretability — W&B logging, feature importance, Optuna → MLOps
- Differentiable Analysis — low-level sessions and autograd internals → Differentiable
- Gymnasium RL — reinforcement learning for analysis optimisation → Gymnasium
