MLOps & Interpretability

Logging, Feature Importance, Hyperparameter Tuning

NextStat provides lightweight hooks for experiment tracking (W&B, MLflow), physics-aware feature importance (systematic impact ranking), and integration with hyperparameter optimisation frameworks (Optuna, Ray Tune).

Experiment Logging

The nextstat.mlops module extracts fit metrics as plain Python dicts — no dependency on any logging framework. You call your own wandb.log() or mlflow.log_metrics().

Weights & Biases

import wandb
import nextstat
from nextstat.mlops import metrics_dict, significance_metrics, StepTimer

wandb.init(project="hep-search", config={"lr": 1e-3})

result = nextstat.fit(model)
wandb.log(metrics_dict(result, prefix="fit/"))
# → {"fit/mu": 1.05, "fit/nll": 42.3, "fit/converged": 1.0, ...}

# In a training loop:
timer = StepTimer()
for step, batch in enumerate(dataloader):
    timer.start()
    loss = loss_fn(histogram)
    loss.backward()
    optimizer.step()
    elapsed = timer.stop()

    z0_val = -loss.item()
    wandb.log(significance_metrics(z0_val, prefix="train/", step_time_ms=elapsed))
    # → {"train/z0": 2.31, "train/q0": 5.34, "train/step_time_ms": 48.2}

MLflow

import mlflow
from nextstat.mlops import metrics_dict

mlflow.set_experiment("hep-search")
with mlflow.start_run():
    result = nextstat.fit(model)
    mlflow.log_metrics(metrics_dict(result))
    mlflow.log_param("model_type", "histfactory")

metrics_dict Reference

Key	Type	Description
mu	float	Best-fit signal strength (POI)
nll	float	Negative log-likelihood at minimum
edm	float	Estimated distance to minimum
converged	float	1.0 if converged, 0.0 otherwise
time_ms	float	Fit wall-clock time (ms)
param/<name>	float	Best-fit value per nuisance parameter
error/<name>	float	Hesse error per nuisance parameter

Feature Importance (Systematic Impact)

In ML, feature importance tells you which inputs matter most. In physics, the equivalent is the ranking plot — which systematic uncertainties have the largest impact on the parameter of interest. NextStat's nextstat.interpret module wraps this as a familiar API.

from nextstat.interpret import rank_impact, rank_impact_df, plot_rank_impact

# Sorted list of dicts (highest impact first)
table = rank_impact(model, top_n=10)
for row in table:
    print(f"{row['rank']:2d}. {row['name']:30s}  impact={row['total_impact']:.4f}")

# As a pandas DataFrame
df = rank_impact_df(model, top_n=15)
print(df[["rank", "name", "total_impact", "pull"]])

# Matplotlib bar chart
fig = plot_rank_impact(model, top_n=20)
fig.savefig("ranking.png", dpi=150)

rank_impact Output

Field	ML Analogy	Description
name	Feature name	Systematic / nuisance parameter name
total_impact	Importance score	\|Δμ_up\| + \|Δμ_down\| — total shift in POI
delta_mu_up	—	POI shift when NP shifted +1σ
delta_mu_down	—	POI shift when NP shifted −1σ
pull	Posterior shift	How far the NP moved from its prior (in σ)
constraint	Prior width	Post-fit constraint (like regularisation strength)

Practical use: if your network isn't learning, check the ranking plot. A dominant systematic (e.g. Jet Energy Scale) might be washing out the signal. You can then focus on reducing that uncertainty or designing your NN to be robust to it.

Fast Pruning via Jacobian

Identify histogram bins that don't contribute to the significance and can be dropped to simplify the model:

from nextstat.torch import signal_jacobian

grad = signal_jacobian(signal_hist, session)
important_bins = grad.abs() > 0.01

print(f"Keep {important_bins.sum()}/{len(important_bins)} bins")
# Bins where |∂q₀/∂s_i| ≈ 0 have no impact on the result

Hyperparameter Tuning with Optuna

NextStat's fast inference (~1–50 ms per fit) makes it ideal as an Optuna objective. A 200-trial binning optimisation completes in seconds:

import optuna, nextstat

def objective(trial):
    n_bins = trial.suggest_int("n_bins", 3, 40)
    lo = trial.suggest_float("lo", 0.0, 0.3)
    hi = trial.suggest_float("hi", 0.7, 1.0)

    ws = build_workspace(n_bins=n_bins, lo=lo, hi=hi)
    model = nextstat.from_pyhf(ws)
    hypo = nextstat.hypotest(model, mu=0.0)
    return float(hypo.significance)  # Z₀ in σ

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=200)

Full tutorial with workspace builder, W&B logging, GPU objectives, and Ray Tune parallel search → Optuna Tutorial

API Summary

Module	Function	Purpose
nextstat.mlops	metrics_dict(result)	Fit metrics → dict for any logger
nextstat.mlops	significance_metrics(z0)	Per-step Z₀/q₀ → dict
nextstat.mlops	StepTimer()	Wall-clock timer for training steps
nextstat.interpret	rank_impact(model)	Sorted systematic impact (Feature Importance)
nextstat.interpret	rank_impact_df(model)	Same as above → pandas DataFrame
nextstat.interpret	plot_rank_impact(model)	Matplotlib ranking bar chart
nextstat.torch	signal_jacobian(hist, session)	Raw ∂q₀/∂signal for pruning / SciPy