Python API Reference
The nextstat Python package exposes the Rust engine through PyO3 bindings. All heavy computation runs in Rust; Python provides the API surface.
Core Functions
nextstat.from_pyhf(json_str: str) → Model
Parse a pyhf JSON workspace string into a NextStat Model object.
nextstat.from_histfactory(xml_path: str) → Model
Create model from HistFactory XML + ROOT histogram files.
HistFactory XML Ingest Semantics
- ›ShapeSys histograms are treated as relative per-bin uncertainties → converted to absolute σ = rel × nominal.
- ›StatErrorConfig
Poissonmaps StatError to Barlow-Beestonshapesys;Gaussianmaps tostaterror. - ›NormalizeByTheory="True" → sample receives a
lumimodifier;LumiRelErris surfaced via measurement config. - ›NormFactor Val/Low/High → surfaced via measurement parameter config (inits and bounds).
Unified API (0.9.6+)
All inference functions accept any applicable model type (HistFactoryModel, UnbinnedModel) and dispatch automatically. GPU acceleration uses a single device="cpu"|"cuda"|"metal" parameter. Old prefixed variants (unbinned_*, *_gpu) have been removed. All structured returns use TypedDict definitions for IDE autocomplete.
nextstat.fit(model, *, data=None, init_pars=None, device="cpu") → FitResult
Maximum Likelihood Estimation using L-BFGS-B. Returns best-fit parameters, uncertainties, and NLL at minimum.
nextstat.hypotest(poi_test, model, *, data=None, return_tail_probs=False) → float
Asymptotic CLs hypothesis test. Dispatches on model type: HistFactory (binned) or Unbinned (qμ).
nextstat.hypotest_toys(poi_test, model, *, n_toys=1000, seed=42, expected_set=False, device="cpu") → float | dict
Toy-based CLs hypothesis test. Dispatches on model type.
nextstat.profile_scan(model, mu_values, *, data=None, device="cpu", return_curve=False) → ProfileScanResult
Profile likelihood scan. return_curve=True adds plot-friendly arrays (mu_values, q_mu_values, twice_delta_nll).
nextstat.fit_toys(model, params, *, n_toys=1000, seed=42, device="cpu") → list[FitResult]
Unified toy fitting. CPU uses Rayon parallelism; GPU uses lockstep L-BFGS-B kernels.
nextstat.ranking(model, *, device="cpu") → list[RankingEntry]
Nuisance parameter ranking (impact on POI). Dispatches on model type.
nextstat.upper_limit(model, *, method="bisect", alpha=0.05) → float | tuple
Upper limit. method="bisect" (observed only) or method="root" (observed + 5 expected bands).
HS3 Support (ROOT 6.37+)
NextStat natively loads HS3 (HEP Statistics Serialization Standard) v0.2 JSON files as produced by ROOT 6.37+. Format is auto-detected — no flag needed.
HistFactoryModel.from_workspace(json_str) → Model
Auto-detects pyhf vs HS3 format. Works transparently with both.
HistFactoryModel.from_hs3(json_str, analysis=None, param_points=None) → Model
Explicit HS3 loading with optional analysis selection and parameter point set.
import nextstat
# Auto-detect: works with both pyhf and HS3
json_str = open("workspace-postFit_PTV.json").read()
model = nextstat.HistFactoryModel.from_workspace(json_str)
# Explicit HS3 with analysis selection
model = nextstat.HistFactoryModel.from_hs3(
json_str,
analysis="combPdf_obsData",
param_points="default_values",
)
result = nextstat.fit(model)Supported modifiers: normfactor, normsys, histosys, staterror, shapesys, shapefactor, lumi. Unknown types are silently skipped (forward-compatible with future HS3 versions).
Model Object
model.poi_index() → int
Index of the parameter of interest in the parameter vector.
FitResult Object
| Attribute | Type | Description |
|---|---|---|
| bestfit | list[float] | Best-fit parameter values |
| uncertainties | list[float] | Parameter uncertainties (Hessian-based) |
| nll | float | NLL at the minimum |
| warnings | list[str] | Identifiability warnings: near-singular Hessian (cond > 1e8), non-finite uncertainties, near-zero Hessian diagonal. Empty when model is well-identified. |
Profile Likelihood CI
Generic bisection-based profile confidence intervals for any LogDensityModel. Properly asymmetric — no normality assumption. Computes {θ : 2*(NLL(θ) - NLL_min) ≤ χ²(1, α)}.
# CI for a single parameter
ci = nextstat.profile_ci(model, fit_result, param_idx=0)
print(f"[{ci['ci_lower']:.3f}, {ci['ci_upper']:.3f}]")
# CI for all parameters
all_ci = nextstat.profile_ci(model, fit_result)
for c in all_ci:
print(f"param {c['param_idx']}: [{c['ci_lower']:.3f}, {c['ci_upper']:.3f}]")Fault Tree MC (CE-IS)
Cross-Entropy Importance Sampling with multi-level adaptive biasing for rare-event fault tree probability estimation. Handles probabilities down to ~10−16 with ~104–105 scenarios instead of brute-force vanilla MC. Supports all failure modes: Bernoulli, WeibullMission, and BernoulliUncertain. Also available on GPU via fault_tree_mc(spec, n, device='metal').
result = nextstat.fault_tree_mc_ce_is(
spec, # FaultTreeSpec dict
n_per_level=10000,
max_levels=20, # multi-level adaptive biasing
q_max=0.99, # soft importance function threshold
seed=42,
)
print(f"P(failure) = {result['p_failure']:.2e} ± {result['se']:.2e}")
print(f"Levels: {result['n_levels']}, scenarios: {result['n_total_scenarios']}")
# GPU-accelerated vanilla MC (Metal / CUDA)
mc = nextstat.fault_tree_mc(spec, n_scenarios=1_000_000, device='metal')Bayesian Sampling (Unified Interface)
nextstat.sample() is the unified entry point for all MCMC methods: NUTS, MAMS, and LAPS (GPU). Set return_idata=True for direct ArviZ InferenceData.
import nextstat
# Unified dispatcher — NUTS (default)
idata = nextstat.sample(model, method="nuts", n_samples=2000, return_idata=True)
# MAMS — typically better ESS/grad on hierarchical models
idata = nextstat.sample(model, method="mams", n_samples=2000, return_idata=True)
# LAPS — GPU-accelerated MAMS on CUDA (4096+ chains)
result = nextstat.sample("eight_schools", method="laps",
model_data={"y": [28,8,-3,7,-1,1,18,12],
"sigma": [15,10,16,11,9,11,10,18]},
n_chains=4096, n_samples=2000)
# Save to disk
nextstat.sample(model, n_samples=2000, return_idata=True, out="trace.json")
# ArviZ convenience wrapper (return_idata=True by default)
idata = nextstat.bayes.sample(model, method="mams", n_samples=2000)Visualization Submodule
import numpy as np import nextstat scan = np.linspace(0.0, 5.0, 101) cls_art = nextstat.viz.cls_curve(model, scan, alpha=0.05) nextstat.viz.plot_cls_curve(cls_art, title="CLs Brazil band") mu = [0.0, 0.5, 1.0, 2.0] prof_art = nextstat.viz.profile_curve(model, mu) nextstat.viz.plot_profile_curve(prof_art, title="Profile likelihood")
Gymnasium / RL Environment
Gymnasium-compatible environment for optimizing signal yields via reinforcement learning. Requires gymnasium + numpy.
from nextstat.gym import HistFactoryEnv
env = HistFactoryEnv(
workspace_json=json_str,
channel="SR",
sample="signal",
reward_metric="z0", # "nll", "q0", "z0", "qmu", "zmu"
mu_test=5.0,
max_steps=128,
action_scale=0.05,
action_mode="logmul", # "add" or "logmul"
)
obs, info = env.reset(seed=42)
obs, reward, terminated, truncated, info = env.step(action)| Parameter | Description |
|---|---|
| reward_metric | Reward signal: "nll", "q0", "z0", "qmu", or "zmu" |
| action_mode | "add" (additive) or "logmul" (log-multiplicative) |
| max_steps | Episode length before truncation |
Factory function: make_histfactory_env(workspace_json, channel, sample, reward_metric)
Remote Server Client
Pure-Python HTTP client for a remote nextstat-server instance. Requires httpx.
Core
nextstat.remote.connect(url, timeout=300) → NextStatClient
Create a client. Supports context manager.
client.fit(workspace, *, model_id=None, gpu=True) → FitResult
Remote MLE fit. Pass workspace (dict/JSON) or model_id from cache.
client.ranking(workspace, *, model_id=None, gpu=True) → RankingResult
Remote nuisance-parameter ranking. Supports model_id.
client.health() → HealthResult
Server health (status, version, uptime, device, counters, cached_models).
Batch API
client.batch_fit(workspaces, gpu=True) → BatchFitResult
Fit up to 100 workspaces in one request. Returns .results (list of FitResult | None).
client.batch_toys(workspace, n_toys=1000, seed=42, gpu=True) → BatchToysResult
GPU-accelerated toy fitting. Returns .results, .n_converged.
Model Cache
client.upload_model(workspace, name=None) → str
Upload to LRU cache. Returns model_id (SHA-256).
client.list_models() → list[ModelInfo]
List cached models (id, name, params, channels, age, hits).
client.delete_model(model_id) → bool
Evict a model from the cache.
import nextstat.remote as remote
client = remote.connect("http://gpu-server:3742")
# Single fit
result = client.fit(workspace_json)
print(result.bestfit, result.nll, result.converged)
# Model cache — upload once, fit many times without re-parsing
model_id = client.upload_model(workspace_json, name="my-analysis")
result = client.fit(model_id=model_id) # ~4x faster
# Batch fit
batch = client.batch_fit([ws1, ws2, ws3])
for r in batch.results:
print(r.nll if r else "failed")
# Batch toys (GPU-accelerated)
toys = client.batch_toys(workspace_json, n_toys=10_000, seed=42)
print(f"{toys.n_converged}/{toys.n_toys} converged")
# Ranking
ranking = client.ranking(workspace_json)
for e in ranking.entries:
print(f"{e.name}: Δμ = {e.delta_mu_up:+.3f} / {e.delta_mu_down:+.3f}")Raises NextStatServerError(status_code, detail) on non-2xx HTTP responses. Result types: FitResult, RankingResult, BatchFitResult, BatchToysResult, ModelInfo.
GARCH / Stochastic Volatility
Volatility models for financial time series. Available via the nextstat.volatility convenience module or directly from nextstat._core.
nextstat.volatility.garch(ys, *, max_iter=1000, tol=1e-6, alpha_beta_max=0.999, min_var=1e-18) → dict
Fit Gaussian GARCH(1,1) by MLE. Returns: params ({mu, omega, alpha, beta}), conditional_variance, conditional_sigma, log_likelihood, converged.
nextstat.volatility.sv(ys, *, max_iter=1000, tol=1e-6, log_eps=1e-12) → dict
Approximate stochastic volatility via log(χ²₁) Gaussian approximation + Kalman MLE. Returns: params ({mu, phi, sigma}), smoothed_h, smoothed_sigma, log_likelihood, converged.
Low-level (no wrapper): nextstat._core.garch11_fit(), nextstat._core.sv_logchi2_fit()
from nextstat.volatility import garch, sv
returns = [0.01, -0.02, 0.005, 0.03, -0.015, 0.002, -0.04, 0.035]
# GARCH(1,1)
g = garch(returns)
print(f"omega={g['params']['omega']:.4f}, alpha={g['params']['alpha']:.4f}, beta={g['params']['beta']:.4f}")
print(f"Conditional sigma: {g['conditional_sigma'][:5]}")
# Stochastic Volatility
s = sv(returns)
print(f"phi={s['params']['phi']:.4f}, sigma_eta={s['params']['sigma']:.4f}")
print(f"Smoothed sigma: {s['smoothed_sigma'][:5]}")Neural PDFs (--features neural)
The nextstat.FlowPdf and nextstat.DcrSurrogate classes are available when built with --features neural. They enable standalone ONNX-backed flow evaluation from Python.
FlowPdf
import nextstat
import numpy as np
# Load a pre-trained flow from manifest
flow = nextstat.FlowPdf.from_manifest("models/signal_flow/flow_manifest.json")
print(f"Context params: {flow.n_context()}")
print(f"Observables: {flow.observable_names()}")
print(f"Norm correction: {flow.log_norm_correction():.6f}")
# Compute log p(x) for an array of events
events = np.array([5100.0, 5200.0, 5300.0, 5400.0])
params = np.array([]) # unconditional flow — no context params
log_probs = flow.log_prob_batch(events, params)
print(f"log p(x): {log_probs}")
# Update normalization (Gauss-Legendre quadrature)
flow.update_normalization(bounds=(5000.0, 6000.0), order=64)DcrSurrogate
import nextstat
import numpy as np
# Load DCR surrogate
dcr = nextstat.DcrSurrogate.from_manifest("models/bkg_dcr/flow_manifest.json")
print(f"Process: {dcr.process_name()}")
print(f"Systematics: {dcr.systematic_names()}")
print(f"Norm tolerance: {dcr.norm_tolerance():.4f}")
# Compute log p(x | α) for given systematic values
events = np.array([5100.0, 5200.0, 5300.0])
alpha = np.array([0.5, -0.3]) # jes_alpha, jer_alpha values
log_probs = dcr.log_prob_batch(events, alpha)
# Validate normalization against template
is_ok = dcr.validate_nominal_normalization(rtol=0.01)| Method | Class | Description |
|---|---|---|
| from_manifest(path) | Both | Load from flow_manifest.json + ONNX |
| log_prob_batch(events, params) | Both | log p(x|θ) for an array of events |
| update_normalization() | Both | Recompute normalization (GL quadrature) |
| n_context() | FlowPdf | Number of context params (0 = unconditional) |
| validate_nominal_normalization() | DcrSurrogate | Verify normalization against template |
GLM — Gamma & Tweedie
nextstat.gamma_glm(y, X, *, include_intercept=True)
Gamma GLM with log link. Returns dict with params, alpha (shape), se, nll.
nextstat.tweedie_glm(y, X, *, p=1.5, include_intercept=True)
Tweedie compound Poisson-Gamma GLM with log link. Power p ∈ (1, 2). Handles exact zeros. Returns dict with params, phi, p.
Extreme Value Theory (EVT)
nextstat.gev_fit(data) → dict
GEV for block maxima. Parameters: mu, sigma, xi. Static: GevModel.return_level(params, T).
nextstat.gpd_fit(exceedances) → dict
GPD for peaks-over-threshold. Parameters: sigma, xi. Static: GpdModel.quantile(params, p) for VaR/ES.
Meta-Analysis
nextstat.meta_fixed(estimates, se, *, labels=None, conf_level=0.95) → dict
Fixed-effects meta-analysis (inverse-variance). Returns pooled estimate, CI, heterogeneity (Q, I², H², τ²), forest plot data.
nextstat.meta_random(estimates, se, *, labels=None, conf_level=0.95) → dict
Random-effects meta-analysis (DerSimonian–Laird). Same return structure as meta_fixed.
Survival Analysis
nextstat.kaplan_meier(times, events, *, conf_level=0.95) → dict
Non-parametric KM estimate with Greenwood variance and log-log CI.
nextstat.log_rank_test(times, events, groups) → dict
Mantel-Cox log-rank test for 2+ groups. Returns chi², df, p-value, per-group observed/expected.
nextstat.survival.cox_ph.fit(times, events, x, *, ties="efron", robust=True) → CoxPhFit
Cox PH with Efron/Breslow ties, robust sandwich SE. .hazard_ratio_confint().
Churn / Subscription
nextstat.churn_generate_data(n_customers, *, n_cohorts=6, max_time=24.0, seed=42) → dict
Deterministic synthetic SaaS churn dataset with covariates and treatment assignment.
nextstat.churn_retention(times, events, groups, *, conf_level=0.95) → dict
Stratified KM per group + log-rank comparison.
nextstat.churn_risk_model(times, events, covariates, names) → dict
Cox PH hazard ratios with CI for churn risk factors.
nextstat.churn_uplift(times, events, treated, covariates, *, horizon=12.0) → dict
AIPW causal uplift with Rosenbaum sensitivity (ATE, SE, CI, gamma_critical).
nextstat.churn_diagnostics(times, events, groups, *, treated=[], covariates=[], covariate_names=[], trim=0.01) → dict
Data quality trust gate: censoring rates per segment, covariate balance (SMD), propensity overlap, sample-size adequacy. Returns trust_gate_passed + warnings.
nextstat.churn_cohort_matrix(times, events, groups, period_boundaries) → dict
Life-table cohort retention matrix. Per-cohort period retention, cumulative retention, at-risk/event/censored counts.
nextstat.churn_compare(times, events, groups, *, correction="benjamini_hochberg", alpha=0.05) → dict
Pairwise segment comparison: log-rank + HR proxy + Benjamini-Hochberg / Bonferroni MCP correction.
nextstat.churn_uplift_survival(times, events, treated, *, covariates=[], horizon=12.0, eval_horizons=[3,6,12,24], trim=0.01) → dict
Survival-native uplift: RMST, IPW-weighted KM, ΔS(t) at eval horizons. Propensity trimming + ESS reporting.
nextstat.churn_bootstrap_hr(times, events, covariates, names, *, n_bootstrap=1000, seed=42, conf_level=0.95, ci_method="percentile", n_jackknife=None) → dict
Parallel bootstrap Cox PH hazard ratios with percentile or BCa CIs. ci_method: "percentile" (default) or "bca". n_jackknife controls BCa acceleration estimation sample size. Returns hr_point, hr_ci_lower/upper, per-coefficient ci_diagnostics with effective method and fallback reason.
nextstat.churn_ingest(times, events, *, groups=None, treated=None, covariates=[], covariate_names=[], observation_end=None) → dict
Validate & clean raw customer arrays: drop invalid rows, apply observation-end censoring cap. Returns clean dataset + n_dropped + warnings.
Pharmacometrics
OneCompartmentOralPkModel(times, y, *, dose, bioavailability=1.0, error_model="additive", sigma=1.0, lloq=None, lloq_policy="ignore")
1-compartment oral PK with first-order absorption. .predict(params), .nll(params).
TwoCompartmentIvPkModel(times, y, *, dose, error_model="additive", sigma=1.0, lloq=None)
2-compartment IV bolus (CL, V1, V2, Q). Analytical bi-exponential solution.
TwoCompartmentOralPkModel(times, y, *, dose, error_model="additive", sigma=1.0, lloq=None)
2-compartment oral (CL, V1, V2, Q, Ka). Analytical tri-exponential solution.
DosingRegimen.iv_bolus(amt, time) | .oral(amt, time) | .infusion(amt, time, duration)
Build dosing schedules (single, repeated, mixed-route). Used by all PK models for concentration-time profiles.
NonmemDataset.from_csv(path) → NonmemDataset
Parse NONMEM-format CSV (ID, TIME, DV, AMT, EVID, MDV, CMT, RATE). Auto-converts dosing records to DosingRegimen per subject.
FoceEstimator.focei(model, data, omega) | .foce(model, data, omega)
First-Order Conditional Estimation (with/without interaction). Per-subject ETA optimization + population parameter updates. Laplace approximation.
OmegaMatrix.from_diagonal(variances) | .from_correlation(sds, corr) | .from_covariance(matrix)
Full Ω variance-covariance via Cholesky factor (Ω = L·Lᵀ). Efficient inv_quadratic() and log_det().
SaemEstimator(model, data, omega, *, n_burn=300, n_iter=200) → FoceResult
Stochastic Approximation EM (Monolix-class). MH E-step with adaptive proposal, closed-form M-step. SaemDiagnostics includes acceptance rates and OFV trace.
ScmEstimator(estimator, data, covariates, *, alpha_forward=0.05, alpha_backward=0.01)
Stepwise Covariate Modeling: forward selection + backward elimination using ΔOFV. Returns full audit trace with per-step p-values and coefficients.
vpc_1cpt_oral(result, data, *, n_sim=200, n_bins=10) → VpcResult
Visual Predictive Checks: simulates replicates, bins by time, computes observed vs simulated quantile prediction intervals.
gof_1cpt_oral(result, data) → GofResult
Goodness-of-Fit diagnostics: PRED, IPRED, IWRES, CWRES per observation.
EmaxModel(ec50, emax) | SigmoidEmaxModel(ec50, emax, gamma) | IndirectResponseModel(kin, kout, imax, ic50, type=1..4)
PD models: direct Emax, Hill equation, IDR Types I–IV (ODE-based via adaptive RK45). PkPdLink for PK→PD concentration interpolation.
rk45(system, y0, t_span, *, rtol=1e-6, atol=1e-9) | esdirk4(system, y0, t_span)
Adaptive ODE solvers: Dormand-Prince 4(5) for non-stiff, L-stable SDIRK2 for stiff (transit compartments ktr > 100).
NlmeArtifact(result, *, scm=None, vpc=None, gof=None, bundle=None) → JSON/CSV
Wraps all estimation results into a single serializable structure. CSV exports for fixed/random effects, GOF, VPC bins, SCM trace. RunBundle captures provenance (version, git, toolchain, seeds).
Hybrid Models (Binned + Unbinned)
HybridModel.from_models(binned, unbinned, poi_from="binned") → HybridModel
Combine a HistFactoryModel and an UnbinnedModel into a single likelihood with shared parameters matched by name. n_shared(), poi_index(), with_fixed_param().
Sampling / NUTS
nextstat.sample(model, *, n_chains=4, n_warmup=500, n_samples=1000, seed=42, max_treedepth=10, target_accept=0.8) → dict
NUTS (No-U-Turn Sampler). Accepts Posterior as well. Returns chains, diagnostics, and ArviZ-compatible output.
nextstat.map_fit(posterior) → FitResult
MAP estimation for Bayesian posteriors.
Toy Data
nextstat.asimov_data(model, params) → list[float]
Asimov dataset (expected counts).
nextstat.poisson_toys(model, params, *, n_toys=1000, seed=42) → list[list[float]]
Poisson-fluctuated toy datasets.
nextstat.fit_toys(model, params, *, n_toys=1000, seed=42) → list[FitResult]
CPU parallel toy fitting (Rayon; tape reuse).
nextstat.unbinned_fit_toys(model, params, *, n_toys=1000, seed=42, init_params=None, max_retries=3, max_iter=5000, compute_hessian=False) → list[FitResult]
Generate and fit Poisson-fluctuated toys for unbinned models (CPU parallel). Warm-start from MLE θ̂, retry with jitter, smooth bounds escalation. Hessian skipped by default for throughput; set compute_hessian=True when parameter pulls are needed.
GPU Acceleration
nextstat.has_cuda() → bool | nextstat.has_metal() → bool
Check CUDA (NVIDIA) or Metal (Apple Silicon) backend availability at runtime.
nextstat.fit_toys_batch_gpu(model, params, *, n_toys=1000, seed=42, device="cpu") → list[FitResult]
GPU-accelerated batch toy fitting. device="cuda" (f64, NVIDIA), "metal" (f32, Apple Silicon), or "cpu" (Rayon fallback).
Evaluation Modes (Parity / Fast)
nextstat.set_eval_mode("fast" | "parity") | nextstat.get_eval_mode() → str
fast (default): naive summation, SIMD/Accelerate/CUDA. parity: Kahan compensated summation for deterministic pyhf parity validation. Overhead <5%.
nextstat.set_threads(n: int) → bool
Configure global Rayon thread pool size (best-effort, returns True if applied). Set to 1 for full determinism in parity mode.
Ordinal Regression
OrderedLogitModel(x, y, *, n_levels) | OrderedProbitModel(x, y, *, n_levels)
Proportional odds (logit link) and ordered probit models for ordinal outcomes. Convenience: nextstat.ordinal.logit(), .probit().
Hierarchical / Mixed Models
LmmMarginalModel(x, y, *, include_intercept, group_idx, n_groups, random_slope_feature_idx)
Gaussian mixed model (marginal likelihood).
ComposedGlmModel.linear_regression(...) | .logistic_regression(...) | .poisson_regression(...)
Hierarchical GLMs via static constructors with random intercepts, random slopes, correlated effects (LKJ prior), and non-centered parameterization.
GARCH / Stochastic Volatility
nextstat.volatility.garch(ys, *, max_iter=1000, tol=1e-6) → dict
Fit Gaussian GARCH(1,1) by MLE. Returns params (mu, omega, alpha, beta), conditional_variance, conditional_sigma, log_likelihood.
nextstat.volatility.sv(ys, *, max_iter=1000, tol=1e-6) → dict
Approximate stochastic volatility via log(χ²₁) Gaussian approximation + Kalman MLE. Returns params (mu, phi, sigma), smoothed_h, smoothed_sigma.
Utilities
nextstat.ols_fit(x, y, *, include_intercept=True) → list[float]
Closed-form OLS regression coefficients.
nextstat.rk4_linear(a, y0, t0, t1, dt) → dict
RK4 ODE solver for linear systems.
nextstat.has_accelerate() → bool
Check Apple Accelerate backend availability.
Arrow / Parquet Authoring (nextstat.arrow_io)
Schema validation and manifest writing helpers. Requires pyarrow (pip install "nextstat[io]").
nextstat.arrow_io.validate_histogram_table(table) → HistogramTableStats
Validate the histogram table contract.
nextstat.arrow_io.validate_modifiers_table(table) → ModifiersTableStats
Validate the modifiers table contract (binned Parquet v2).
nextstat.arrow_io.write_histograms_parquet(table, path, *, compression="zstd", write_manifest=True, ...) → dict
Write Parquet + optional manifest JSON; returns manifest dict.
nextstat.arrow_io.validate_histograms_parquet_manifest(manifest, *, check_sha256=True) → None
Validate a manifest against the referenced Parquet file.
nextstat.arrow_io.load_parquet_as_histfactory_model(path, *, poi="mu", observations=None) → HistFactoryModel
Validate Parquet schema with PyArrow, then call nextstat.from_parquet().
nextstat.arrow_io.load_parquet_v2_as_histfactory_model(yields_path, modifiers_path, ...) → HistFactoryModel
Validate Parquet v2 schemas, then call nextstat.from_parquet_with_modifiers().
nextstat.arrow_io.validate_event_table(table) → EventTableStats
Validate an unbinned event table contract.
nextstat.arrow_io.write_events_parquet(table, path, *, observables=None, compression="zstd") → dict
Write an unbinned event table to Parquet with NextStat metadata.
TREx Replacement (nextstat.analysis)
nextstat.analysis.read_root_histogram(root_path, hist_path, *, flow_policy="drop") → dict
Read one ROOT TH1 histogram. Optionally fold under/overflow into edge bins. Guarantees sumw2.
nextstat.analysis.read_root_histograms(root_path, hist_paths, *, flow_policy="drop") → dict[str, dict]
Read many histograms from the same ROOT file.
nextstat.analysis.eval_expr(expr, env, *, n=None) → np.ndarray
Vectorized TREx-style expression evaluation. env maps variable names to 1D arrays/scalars.
TRExFitter Config Migration (nextstat.trex_config)
nextstat.trex_config.parse_trex_config(text, *, path=None) → TrexConfigDoc
Parse a TRExFitter .config file from a string.
nextstat.trex_config.parse_trex_config_file(path) → TrexConfigDoc
Parse a TRExFitter .config file from disk.
nextstat.trex_config.trex_doc_to_analysis_spec_v0(doc, ...) → (dict, dict)
Convert TREx config doc to analysis spec v0. Returns (spec, report); unsupported keys are recorded in the report.
nextstat.trex_config.trex_config_file_to_analysis_spec_v0(config_path, ...) → (dict, dict)
Convenience wrapper: parse file → convert → (spec, report).
nextstat.trex_config.dump_yaml(obj) → str
Deterministic minimal YAML emitter (no external deps).
Audit / Run Bundles (nextstat.audit)
nextstat.audit.environment_fingerprint() → dict[str, Any]
Small, privacy-preserving environment fingerprint for reproducibility.
nextstat.audit.write_bundle(bundle_dir, *, command, args, input_path, output_value, tool_version=None) → None
Write a reproducible run bundle to a directory (Python mirror of CLI --bundle).
Report Rendering (nextstat.report)
nextstat.report.render_report(input_dir, *, pdf, svg_dir, corr_include=None, corr_exclude=None, corr_top_n=None) → None
Render a report PDF (+ optional per-plot SVG) from an artifacts directory. Requires matplotlib.
CLI: python -m nextstat.report render --input-dir ... --pdf ... [--svg-dir ...]
Validation Report (nextstat.validation_report)
CLI: python -m nextstat.validation_report render --input validation_report.json --pdf out.pdf. Requires matplotlib.
Differentiable Layer (nextstat.torch)
PyTorch autograd.Function wrappers for end-to-end differentiable HEP inference. Two GPU backends (CUDA f64, Metal f32) plus CPU fallback.
CUDA path (zero-copy)
nextstat.torch.create_session(model, signal_sample_name="signal") → DifferentiableSession
Create CUDA session for differentiable NLL (requires --features cuda).
nextstat.torch.nll_loss(signal_histogram, session, params=None) → torch.Tensor
Differentiable NLL (zero-copy). signal_histogram must be contiguous CUDA float64.
nextstat.torch.create_profiled_session(model, signal_sample_name="signal", device="auto")
Create GPU session for profiled test statistics. Returns ProfiledDifferentiableSession (CUDA) or MetalProfiledDifferentiableSession.
nextstat.torch.profiled_q0_loss / profiled_z0_loss / profiled_qmu_loss / profiled_zmu_loss
Differentiable profiled q₀, Z₀ = √q₀, qμ, Zμ = √qμ.
nextstat.torch.batch_profiled_q0_loss / batch_profiled_qmu_loss
Batch q₀ over multiple histograms / qμ for multiple μ values.
nextstat.torch.signal_jacobian(signal_histogram, session) → torch.Tensor
Extract ∂q₀/∂signal without going through autograd. NumPy variant: signal_jacobian_numpy().
Metal path (f32)
MetalProfiledDifferentiableSession(model, signal_sample_name)
Metal GPU session for profiled test statistics (f32 compute). Signal uploaded via CPU.
High-level helpers
nextstat.torch.SignificanceLoss(model, signal_sample_name="signal", *, device="auto", negate=True, eps=1e-12)
ML-friendly callable loss wrapper around profiled Z₀. Returns −Z₀ by default (for SGD minimization).
nextstat.torch.SoftHistogram(bin_edges, bandwidth="auto", mode="kde")
Differentiable binning (KDE/sigmoid) to produce a signal histogram for SignificanceLoss.
MLOps Integration (nextstat.mlops)
nextstat.mlops.metrics_dict(fit_result, *, prefix="", include_time=True, extra=None) → dict[str, float]
Flat dict from FitResult for experiment loggers (W&B, MLflow, Neptune).
nextstat.mlops.significance_metrics(z0, q0=0.0, *, prefix="", step_time_ms=0.0) → dict[str, float]
Per-step metrics for training loop logging.
nextstat.mlops.StepTimer
Lightweight wall-clock timer: .start(), .stop() → float (ms).
Interpretability (nextstat.interpret)
Systematic-impact ranking translated into ML-style Feature Importance.
nextstat.interpret.rank_impact(model, *, gpu=False, sort_by="total", top_n=None, ascending=False) → list[dict]
Sorted impact table. Each dict: name, delta_mu_up/down, total_impact, pull, rank.
nextstat.interpret.rank_impact_df(model, **kwargs) → pd.DataFrame
Same as above, pandas output (requires pandas).
nextstat.interpret.plot_rank_impact(model, *, top_n=20, gpu=False, ...) → matplotlib.Figure
Horizontal bar chart of systematic impact (requires matplotlib).
Agentic Analysis (nextstat.tools)
LLM tool definitions for AI-driven statistical analysis. Compatible with OpenAI function calling, LangChain, and MCP.
nextstat.tools.get_toolkit(*, transport="local", server_url=None, timeout_s=10.0) → list[dict]
OpenAI-compatible tool definitions for 9 operations.
nextstat.tools.execute_tool(name, arguments, ...) → dict
Execute a tool call locally or via server. Returns {ok, result, error, meta}.
nextstat.tools.get_langchain_tools() → list[StructuredTool]
LangChain adapter (requires langchain-core).
nextstat.tools.get_mcp_tools() → list[dict] | handle_mcp_call(name, arguments) → dict
MCP tool definitions and dispatcher.
Surrogate Distillation (nextstat.distill)
Generate training datasets for neural likelihood surrogates. NextStat serves as the ground-truth oracle; the surrogate provides nanosecond inference.
nextstat.distill.generate_dataset(model, n_samples=100_000, *, method="sobol", ...) → SurrogateDataset
Sample parameter space and evaluate NLL + gradient. Methods: "sobol", "lhs", "uniform", "gaussian".
nextstat.distill.train_mlp_surrogate(ds, *, hidden_layers=(256,256,128), epochs=100, ...) → nn.Module
Convenience MLP trainer with Sobolev loss.
nextstat.distill.to_torch_dataset / to_numpy / to_npz / from_npz / to_parquet
Export/import helpers for PyTorch DataLoader, NumPy, compressed .npz, and Parquet.
nextstat.distill.predict_nll(surrogate, params_np) → ndarray
Evaluate trained surrogate on raw parameters.
