NextStatNextStat

ROOT/HistFactory 3-Way Comparison

Comprehensive validation of NextStat against both pyhf (specification reference) and ROOT/RooFit (legacy implementation) on canonical HistFactory fixtures. Agreement with pyhf is sub-1e-5 on q(μ) across all fixtures.

Executive Summary

FixtureModifiersNS vs pyhf |dq(μ)|NS vs ROOT |dq(μ)|ROOT Status
xmlimportOverallSys + StatError1e-70.0510 (converged)
multichannelShapeSys4e-73.4e-80 (converged)
coupled_histosysHistoSys (coupled NP)5e-622.5-1 (FAILED)

Real-World TREx Exports

CaseNS vs ROOT |dq(μ)|ROOT Issue
simple_fixture1.6e-10None (perfect)
histfactory_fixture1.89Optimizer divergence
hepdata EWK0.0Free fit blowup (μ̂ = 4.9e23)
tttt-prod (249 params)0.04Tail optimizer convergence

Methodology

Each fixture is processed through three independent pipelines reading the same XML + ROOT histograms:

HistFactory XML + ROOT histograms
        │
        ├──► hist2workspace → RooFit → ROOT profile scan (C++ via PyROOT)
        ├──► pyhf.readxml   → pyhf   → pyhf profile scan  (Python)
        └──► NextStat import → PreparedModel → NextStat profile scan (Rust)

The profile scan computes q̃(μ) = 2·[NLL(μ) − NLL(μ̂)] at 31 evenly spaced points in μ = [0, 3]. Test statistic: standard q_mu_tilde (Cowan et al., arXiv:1007.1727).

Detailed q(μ) Comparison

xmlimport — ROOT vs NextStat vs pyhf

μROOT q(μ)pyhf q(μ)NS q(μ)NS − pyhfROOT − NS
1.20.019570.019560.01956+1e-8+1e-5
2.02.072722.066692.06669−4e-7+6e-3
3.09.057889.006769.00676+1e-7+5.1e-2

NextStat and pyhf are numerically identical (Δ < 1e-6). ROOT systematically overshoots at high μ — consistent with Minuit2's conditional minimizer converging to slightly higher NLL at extreme values.

coupled_histosys — ROOT divergence

μROOT q(μ)pyhf q(μ)NS q(μ)NS − pyhfROOT − NS
1.00.9910.4450.445+4e-6+0.545
2.015.5266.5436.543+5e-6+8.98
3.041.56619.04219.042+4e-6+22.52

NextStat and pyhf agree to < 1e-5. ROOT gives completely different results starting from μ = 1.0, with divergence growing with μ. ROOT reports status_free = -1 (Minuit2 could not determine a positive-definite covariance matrix).

Root Cause: Why ROOT Diverges

The NLL offset between ROOT and NextStat should be constant across all μ values (it represents the parameter-independent constraint constant). For coupled_histosys:

PointROOT NLLNS NLLOffset
Free fit434.75414.017420.737
μ = 0.0434.84114.103420.738
μ = 2.0442.51717.288425.229
μ = 3.0455.53723.537432.000

The offset grows from 420.74 to 432.0 — this rules out a pure optimizer difference and indicates ROOT evaluates the coupled HistoSys likelihood differently at large alpha values.

Timing Comparison

FixtureROOTpyhfNextStatNS/ROOTNS/pyhf
xmlimport0.91 s0.23 s0.003 s303×73×
multichannel1.98 s0.26 s0.007 s283×37×
coupled_histosys1.76 s0.15 s0.002 s880×75×

Validation Hierarchy

SPECIFICATION (mathematical definition, arXiv:1007.1727)
    │
    ├── pyhf (ATLAS reference implementation)
    │       │
    │       ├── NextStat  ✓  < 1e-5 on q(μ), CI-gated
    │       │
    │       └── ROOT/RooFit
    │           - ShapeSys: < 1e-6 (excellent)
    │           - OverallSys: < 0.05 (optimizer)
    │           - Coupled HistoSys: DIVERGES (status=-1)
    │           NOT CI-gated (informational only)

Reproducing These Results

python tests/validate_root_profile_scan.py \
  --histfactory-xml tests/fixtures/pyhf_xmlimport/config/example.xml \
  --rootdir tests/fixtures/pyhf_xmlimport \
  --include-pyhf --keep