Optimizer Convergence & Best-NLL Philosophy
NextStat uses L-BFGS-B and targets the best NLL minimum by default. Differences vs pyhf in best-fit parameters on large models (>100 params) are expected and documented behavior, not a bug.
Position: Best-NLL by Default
- NextStat does not intentionally constrain the optimizer to match a specific external tool.
- If L-BFGS-B finds a deeper minimum than pyhf's SLSQP, that is a correct result.
- Objective parity is validated: NextStat and pyhf compute the same NLL at the same parameter point (~1e-9 to 1e-13).
- Differences come from the optimizer, not the model.
Typical Mismatch Scale
| Model | Parameters | ΔNLL (NS − pyhf) | Reason |
|---|---|---|---|
| simple_workspace | 2 | 0.0 | Both converge |
| complex_workspace | 9 | 0.0 | Both converge |
| tchannel | 184 | −0.01 to −0.08 | pyhf SLSQP premature stop |
| tHu | ~200 | −0.08 | pyhf SLSQP premature stop |
| tttt | 249 | −0.01 | pyhf SLSQP premature stop |
Negative ΔNLL means NextStat finds a better (lower) minimum.
Parity Levels
Level 1: Objective Parity (P0, required)
NLL(params) matches between NextStat and pyhf at the same params. Tolerance: rtol=1e-6, atol=1e-8. Verified by golden tests on all fixture workspaces.
Level 2: Fit Parity (P1, conditional)
Best-fit parameters match within tolerances: atol=2e-4 on parameters, atol=5e-4 on uncertainties. Full agreement on small models (<50 params); mismatches on large models due to different optimizers. Not a defect if NS NLL ≤ pyhf NLL.
Level 3: Optimizer Compatibility (rejected)
Intentionally degrading the optimizer to match SLSQP is rejected — it is an artificial constraint with no scientific value.
How to Verify
# For users
import nextstat, json
ws = json.load(open("workspace.json"))
model = nextstat.from_pyhf(json.dumps(ws))
result = nextstat.fit(model)
print(f"NLL: {result.nll}") # lower is better# For developers (parity checks)
make pyhf-audit-nll # Objective parity (must always pass)
make pyhf-audit-fit # Fit parity (may differ on large models)
# Cross-eval diagnostic
python tests/diagnose_optimizer.py workspace.jsonWarm-Start for pyhf Reproducibility
If a specific use case requires matching pyhf (e.g. reproducing a published result):
import pyhf, nextstat, json
# 1. Fit in pyhf
ws = json.load(open("workspace.json"))
model = pyhf.Workspace(ws).model()
pyhf_pars, _ = pyhf.infer.mle.fit(
model.config.suggested_init(), model, return_uncertainties=True
)
# 2. Warm-start NextStat from the pyhf point
ns_model = nextstat.from_pyhf(json.dumps(ws))
result = nextstat.fit(ns_model, init_pars=pyhf_pars.tolist())
# result.nll <= pyhf NLL (guaranteed)L-BFGS-B vs SLSQP
| Aspect | L-BFGS-B (NextStat) | SLSQP (pyhf/scipy) |
|---|---|---|
| Hessian | Quasi-Newton (m=10 history) | Rank-1 update |
| Bounds | Native box constraints | Native box constraints |
| Convergence | ||proj_grad|| < ftol | ||grad|| threshold |
| Scaling | O(m·n) per iteration | O(n²) per iteration |
| Large models (>100p) | Robust | Often premature stop |
Profile Scan Evidence
| Fixture | NS vs pyhf |dq(μ)| | NS vs ROOT |dq(μ)| | ROOT fit |
|---|---|---|---|
| xmlimport | 1e-7 | 0.051 | Converged |
| multichannel | 4e-7 | 3.4e-8 | Converged |
| coupled_histosys | 5e-6 | 22.5 | FAILED (status=-1) |
