Optimizer Convergence & Best-NLL Philosophy

NextStat uses L-BFGS-B and targets the best NLL minimum by default. Differences vs pyhf in best-fit parameters on large models (>100 params) are expected and documented behavior, not a bug.

Position: Best-NLL by Default

NextStat does not intentionally constrain the optimizer to match a specific external tool.
If L-BFGS-B finds a deeper minimum than pyhf's SLSQP, that is a correct result.
Objective parity is validated: NextStat and pyhf compute the same NLL at the same parameter point (~1e-9 to 1e-13).
Differences come from the optimizer, not the model.

Typical Mismatch Scale

Model	Parameters	ΔNLL (NS − pyhf)	Reason
simple_workspace	2	0.0	Both converge
complex_workspace	9	0.0	Both converge
tchannel	184	−0.01 to −0.08	pyhf SLSQP premature stop
tHu	~200	−0.08	pyhf SLSQP premature stop
tttt	249	−0.01	pyhf SLSQP premature stop

Negative ΔNLL means NextStat finds a better (lower) minimum.

Parity Levels

Level 1: Objective Parity (P0, required)

NLL(params) matches between NextStat and pyhf at the same params. Tolerance: rtol=1e-6, atol=1e-8. Verified by golden tests on all fixture workspaces.

Level 2: Fit Parity (P1, conditional)

Best-fit parameters match within tolerances: atol=2e-4 on parameters, atol=5e-4 on uncertainties. Full agreement on small models (<50 params); mismatches on large models due to different optimizers. Not a defect if NS NLL ≤ pyhf NLL.

Level 3: Optimizer Compatibility (rejected)

Intentionally degrading the optimizer to match SLSQP is rejected — it is an artificial constraint with no scientific value.

How to Verify

# For users
import nextstat, json

ws = json.load(open("workspace.json"))
model = nextstat.from_pyhf(json.dumps(ws))
result = nextstat.fit(model)
print(f"NLL: {result.nll}")  # lower is better

# For developers (parity checks)
make pyhf-audit-nll   # Objective parity (must always pass)
make pyhf-audit-fit   # Fit parity (may differ on large models)

# Cross-eval diagnostic
python tests/diagnose_optimizer.py workspace.json

Warm-Start for pyhf Reproducibility

If a specific use case requires matching pyhf (e.g. reproducing a published result):

import pyhf, nextstat, json

# 1. Fit in pyhf
ws = json.load(open("workspace.json"))
model = pyhf.Workspace(ws).model()
pyhf_pars, _ = pyhf.infer.mle.fit(
    model.config.suggested_init(), model, return_uncertainties=True
)

# 2. Warm-start NextStat from the pyhf point
ns_model = nextstat.from_pyhf(json.dumps(ws))
result = nextstat.fit(ns_model, init_pars=pyhf_pars.tolist())
# result.nll <= pyhf NLL (guaranteed)

L-BFGS-B vs SLSQP

Aspect	L-BFGS-B (NextStat)	SLSQP (pyhf/scipy)
Hessian	Quasi-Newton (m=10 history)	Rank-1 update
Bounds	Native box constraints	Native box constraints
Convergence	\|\|proj_grad\|\| < ftol	\|\|grad\|\| threshold
Scaling	O(m·n) per iteration	O(n²) per iteration
Large models (>100p)	Robust	Often premature stop

Profile Scan Evidence

Fixture	NS vs pyhf \|dq(μ)\|	NS vs ROOT \|dq(μ)\|	ROOT fit
xmlimport	1e-7	0.051	Converged
multichannel	4e-7	3.4e-8	Converged
coupled_histosys	5e-6	22.5	FAILED (status=-1)