MLOps & Interpretability
Logging, Feature Importance, Hyperparameter Tuning
NextStat provides lightweight hooks for experiment tracking (W&B, MLflow), physics-aware feature importance (systematic impact ranking), and integration with hyperparameter optimisation frameworks (Optuna, Ray Tune).
Experiment Logging
The nextstat.mlops module extracts fit metrics as plain Python dicts — no dependency on any logging framework. You call your own wandb.log() or mlflow.log_metrics().
Weights & Biases
import wandb
import nextstat
from nextstat.mlops import metrics_dict, significance_metrics, StepTimer
wandb.init(project="hep-search", config={"lr": 1e-3})
result = nextstat.fit(model)
wandb.log(metrics_dict(result, prefix="fit/"))
# → {"fit/mu": 1.05, "fit/nll": 42.3, "fit/converged": 1.0, ...}
# In a training loop:
timer = StepTimer()
for step, batch in enumerate(dataloader):
timer.start()
loss = loss_fn(histogram)
loss.backward()
optimizer.step()
elapsed = timer.stop()
z0_val = -loss.item()
wandb.log(significance_metrics(z0_val, prefix="train/", step_time_ms=elapsed))
# → {"train/z0": 2.31, "train/q0": 5.34, "train/step_time_ms": 48.2}MLflow
import mlflow
from nextstat.mlops import metrics_dict
mlflow.set_experiment("hep-search")
with mlflow.start_run():
result = nextstat.fit(model)
mlflow.log_metrics(metrics_dict(result))
mlflow.log_param("model_type", "histfactory")metrics_dict Reference
| Key | Type | Description |
|---|---|---|
| mu | float | Best-fit signal strength (POI) |
| nll | float | Negative log-likelihood at minimum |
| edm | float | Estimated distance to minimum |
| converged | float | 1.0 if converged, 0.0 otherwise |
| time_ms | float | Fit wall-clock time (ms) |
| param/<name> | float | Best-fit value per nuisance parameter |
| error/<name> | float | Hesse error per nuisance parameter |
Feature Importance (Systematic Impact)
In ML, feature importance tells you which inputs matter most. In physics, the equivalent is the ranking plot — which systematic uncertainties have the largest impact on the parameter of interest. NextStat's nextstat.interpret module wraps this as a familiar API.
from nextstat.interpret import rank_impact, rank_impact_df, plot_rank_impact
# Sorted list of dicts (highest impact first)
table = rank_impact(model, top_n=10)
for row in table:
print(f"{row['rank']:2d}. {row['name']:30s} impact={row['total_impact']:.4f}")
# As a pandas DataFrame
df = rank_impact_df(model, top_n=15)
print(df[["rank", "name", "total_impact", "pull"]])
# Matplotlib bar chart
fig = plot_rank_impact(model, top_n=20)
fig.savefig("ranking.png", dpi=150)rank_impact Output
| Field | ML Analogy | Description |
|---|---|---|
| name | Feature name | Systematic / nuisance parameter name |
| total_impact | Importance score | |Δμ_up| + |Δμ_down| — total shift in POI |
| delta_mu_up | — | POI shift when NP shifted +1σ |
| delta_mu_down | — | POI shift when NP shifted −1σ |
| pull | Posterior shift | How far the NP moved from its prior (in σ) |
| constraint | Prior width | Post-fit constraint (like regularisation strength) |
Practical use: if your network isn't learning, check the ranking plot. A dominant systematic (e.g. Jet Energy Scale) might be washing out the signal. You can then focus on reducing that uncertainty or designing your NN to be robust to it.
Fast Pruning via Jacobian
Identify histogram bins that don't contribute to the significance and can be dropped to simplify the model:
from nextstat.torch import signal_jacobian
grad = signal_jacobian(signal_hist, session)
important_bins = grad.abs() > 0.01
print(f"Keep {important_bins.sum()}/{len(important_bins)} bins")
# Bins where |∂q₀/∂s_i| ≈ 0 have no impact on the resultHyperparameter Tuning with Optuna
NextStat's fast inference (~1–50 ms per fit) makes it ideal as an Optuna objective. A 200-trial binning optimisation completes in seconds:
import optuna, nextstat
def objective(trial):
n_bins = trial.suggest_int("n_bins", 3, 40)
lo = trial.suggest_float("lo", 0.0, 0.3)
hi = trial.suggest_float("hi", 0.7, 1.0)
ws = build_workspace(n_bins=n_bins, lo=lo, hi=hi)
model = nextstat.from_pyhf(ws)
hypo = nextstat.hypotest(model, mu=0.0)
return float(hypo.significance) # Z₀ in σ
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=200)Full tutorial with workspace builder, W&B logging, GPU objectives, and Ray Tune parallel search → Optuna Tutorial
API Summary
| Module | Function | Purpose |
|---|---|---|
| nextstat.mlops | metrics_dict(result) | Fit metrics → dict for any logger |
| nextstat.mlops | significance_metrics(z0) | Per-step Z₀/q₀ → dict |
| nextstat.mlops | StepTimer() | Wall-clock timer for training steps |
| nextstat.interpret | rank_impact(model) | Sorted systematic impact (Feature Importance) |
| nextstat.interpret | rank_impact_df(model) | Same as above → pandas DataFrame |
| nextstat.interpret | plot_rank_impact(model) | Matplotlib ranking bar chart |
| nextstat.torch | signal_jacobian(hist, session) | Raw ∂q₀/∂signal for pruning / SciPy |
