NextStatNextStat

MLOps & Interpretability

Logging, Feature Importance, Hyperparameter Tuning

NextStat provides lightweight hooks for experiment tracking (W&B, MLflow), physics-aware feature importance (systematic impact ranking), and integration with hyperparameter optimisation frameworks (Optuna, Ray Tune).

Experiment Logging

The nextstat.mlops module extracts fit metrics as plain Python dicts — no dependency on any logging framework. You call your own wandb.log() or mlflow.log_metrics().

Weights & Biases

import wandb
import nextstat
from nextstat.mlops import metrics_dict, significance_metrics, StepTimer

wandb.init(project="hep-search", config={"lr": 1e-3})

result = nextstat.fit(model)
wandb.log(metrics_dict(result, prefix="fit/"))
# → {"fit/mu": 1.05, "fit/nll": 42.3, "fit/converged": 1.0, ...}

# In a training loop:
timer = StepTimer()
for step, batch in enumerate(dataloader):
    timer.start()
    loss = loss_fn(histogram)
    loss.backward()
    optimizer.step()
    elapsed = timer.stop()

    z0_val = -loss.item()
    wandb.log(significance_metrics(z0_val, prefix="train/", step_time_ms=elapsed))
    # → {"train/z0": 2.31, "train/q0": 5.34, "train/step_time_ms": 48.2}

MLflow

import mlflow
from nextstat.mlops import metrics_dict

mlflow.set_experiment("hep-search")
with mlflow.start_run():
    result = nextstat.fit(model)
    mlflow.log_metrics(metrics_dict(result))
    mlflow.log_param("model_type", "histfactory")

metrics_dict Reference

KeyTypeDescription
mufloatBest-fit signal strength (POI)
nllfloatNegative log-likelihood at minimum
edmfloatEstimated distance to minimum
convergedfloat1.0 if converged, 0.0 otherwise
time_msfloatFit wall-clock time (ms)
param/<name>floatBest-fit value per nuisance parameter
error/<name>floatHesse error per nuisance parameter

Feature Importance (Systematic Impact)

In ML, feature importance tells you which inputs matter most. In physics, the equivalent is the ranking plot — which systematic uncertainties have the largest impact on the parameter of interest. NextStat's nextstat.interpret module wraps this as a familiar API.

from nextstat.interpret import rank_impact, rank_impact_df, plot_rank_impact

# Sorted list of dicts (highest impact first)
table = rank_impact(model, top_n=10)
for row in table:
    print(f"{row['rank']:2d}. {row['name']:30s}  impact={row['total_impact']:.4f}")

# As a pandas DataFrame
df = rank_impact_df(model, top_n=15)
print(df[["rank", "name", "total_impact", "pull"]])

# Matplotlib bar chart
fig = plot_rank_impact(model, top_n=20)
fig.savefig("ranking.png", dpi=150)

rank_impact Output

FieldML AnalogyDescription
nameFeature nameSystematic / nuisance parameter name
total_impactImportance score|Δμ_up| + |Δμ_down| — total shift in POI
delta_mu_upPOI shift when NP shifted +1σ
delta_mu_downPOI shift when NP shifted −1σ
pullPosterior shiftHow far the NP moved from its prior (in σ)
constraintPrior widthPost-fit constraint (like regularisation strength)

Practical use: if your network isn't learning, check the ranking plot. A dominant systematic (e.g. Jet Energy Scale) might be washing out the signal. You can then focus on reducing that uncertainty or designing your NN to be robust to it.

Fast Pruning via Jacobian

Identify histogram bins that don't contribute to the significance and can be dropped to simplify the model:

from nextstat.torch import signal_jacobian

grad = signal_jacobian(signal_hist, session)
important_bins = grad.abs() > 0.01

print(f"Keep {important_bins.sum()}/{len(important_bins)} bins")
# Bins where |∂q₀/∂s_i| ≈ 0 have no impact on the result

Hyperparameter Tuning with Optuna

NextStat's fast inference (~1–50 ms per fit) makes it ideal as an Optuna objective. A 200-trial binning optimisation completes in seconds:

import optuna, nextstat

def objective(trial):
    n_bins = trial.suggest_int("n_bins", 3, 40)
    lo = trial.suggest_float("lo", 0.0, 0.3)
    hi = trial.suggest_float("hi", 0.7, 1.0)

    ws = build_workspace(n_bins=n_bins, lo=lo, hi=hi)
    model = nextstat.from_pyhf(ws)
    hypo = nextstat.hypotest(model, mu=0.0)
    return float(hypo.significance)  # Z₀ in σ

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=200)

Full tutorial with workspace builder, W&B logging, GPU objectives, and Ray Tune parallel search → Optuna Tutorial

API Summary

ModuleFunctionPurpose
nextstat.mlopsmetrics_dict(result)Fit metrics → dict for any logger
nextstat.mlopssignificance_metrics(z0)Per-step Z₀/q₀ → dict
nextstat.mlopsStepTimer()Wall-clock timer for training steps
nextstat.interpretrank_impact(model)Sorted systematic impact (Feature Importance)
nextstat.interpretrank_impact_df(model)Same as above → pandas DataFrame
nextstat.interpretplot_rank_impact(model)Matplotlib ranking bar chart
nextstat.torchsignal_jacobian(hist, session)Raw ∂q₀/∂signal for pruning / SciPy