Bayesian Sampling (NUTS & MAMS)

NextStat includes two gradient-based MCMC samplers implemented in Rust: the classic No-U-Turn Sampler (NUTS) and the newer Metropolis-Adjusted Microcanonical Sampler (MAMS, Robnik et al., 2025). Both produce the same Chain output and are compatible with ArviZ diagnostics. The Python [bayes] extra installs emcee (ensemble MCMC) and ArviZ (diagnostics/visualization) as optional Python-side samplers and analysis tools.

Installation

pip install "nextstat[bayes]"

This installs emcee (≥3.1.6), ArviZ (≥0.23.4), and numpy (≥2.0) alongside the core package.

Python Usage

import json
from pathlib import Path
import nextstat

workspace = json.loads(Path("workspace.json").read_text())
model = nextstat.from_pyhf(json.dumps(workspace))

idata = nextstat.bayes.sample(
    model,
    n_chains=2,
    n_warmup=500,
    n_samples=1000,
    seed=42,
    target_accept=0.8,
)

print(idata)
# Returns arviz.InferenceData with posterior samples

Parameters

Parameter	Default	Description
n_chains	4	Number of independent chains
n_warmup	500	Warmup (adaptation) iterations per chain
n_samples	1000	Post-warmup samples per chain
seed	None	Random seed for reproducibility
target_accept	0.8	Target acceptance rate for step size adaptation

Diagnostics

The returned InferenceData object is fully compatible with ArviZ for diagnostics:

import arviz as az

# Trace plot
az.plot_trace(idata)

# Summary table (R-hat, ESS)
print(az.summary(idata))

# Pair plot
az.plot_pair(idata, var_names=["mu"])

MAMS: Microcanonical Sampler

MAMS (arXiv:2503.01707) is a microcanonical MCMC method that uses isokinetic dynamics on the unit sphere instead of tree-based trajectory selection. The paper reports 2–7× ESS per gradient evaluation improvement over NUTS on their benchmarks. Key differences:

Fixed trajectory length — no tree building, predictable cost per transition
Velocity lives on S^d-1 with projected gradient updates
Partial velocity refresh for decorrelation between transitions
Overflow-safe MH criterion via log-sum-exp formulation
3-phase auto-tuning warmup: step size → preconditioner → decoherence length

MAMS Usage

import nextstat

model = nextstat.EightSchoolsModel(y, sigma, mu_prior=0.0, tau_prior=5.0)

# MAMS sampling — drop-in replacement for NUTS
result = nextstat.sample_mams(
    model,
    n_chains=4,
    n_warmup=1000,
    n_samples=2000,
    seed=42,
)

# Same diagnostics as NUTS
print(result.summary())      # R-hat, ESS, divergences
idata = result.to_arviz()    # ArviZ InferenceData

NUTS vs MAMS

Property	NUTS	MAMS
Trajectory	Adaptive binary tree	Fixed length L/ε steps
Dynamics	Hamiltonian (p ∈ ℝ^d)	Microcanonical (u ∈ S^d-1)
Velocity refresh	Full resample each step	Partial (momentum persistence)
Cost predictability	Varies (tree depth)	Fixed per transition
Warmup tuning	Dual averaging (ε) + mass matrix	Binary search (ε) + Welford + L-tuning
Multi-chain	Rayon-parallel	Rayon-parallel

Both samplers produce identical output formats and support the same diagnostics (R-hat, bulk/tail ESS, divergences). Choose NUTS for well-studied reliability; try MAMS when you want potentially higher ESS per gradient on challenging geometries (funnels, multiscale posteriors).

Unified `sample()` Interface

nextstat.sample() is a single entry point for all three MCMC methods. The method parameter selects the sampler; all method-specific kwargs are forwarded.

import nextstat as ns

model = ns.EightSchoolsModel([28,8,-3,7,-1,1,18,12], [15,10,16,11,9,11,10,18])

# NUTS (default)
idata = ns.sample(model, method="nuts", n_samples=2000, return_idata=True)

# MAMS
idata = ns.sample(model, method="mams", n_samples=2000, return_idata=True)

# LAPS (GPU — requires CUDA build)
result = ns.sample("eight_schools", method="laps",
                   model_data={"y": [28,8,-3,7,-1,1,18,12],
                               "sigma": [15,10,16,11,9,11,10,18]},
                   n_chains=4096, n_samples=2000)

# Save trace to disk
ns.sample(model, n_samples=2000, return_idata=True, out="trace.json")

# ArviZ convenience wrapper (return_idata=True by default)
idata = ns.bayes.sample(model, method="mams", n_samples=2000)

Pathfinder Init Strategy

Both NUTS and MAMS support init_strategy="pathfinder", which runs a full L-BFGS fit with Hessian to find the posterior mode and extracts the diagonal inverse Hessian as an initial mass matrix. This gives warmup a head-start on the preconditioner, allowing shorter warmup (e.g. 200 vs 1000 iterations). Falls back to random init on fit failure.

import nextstat as ns

# Pathfinder init for NUTS
idata = ns.sample(model, method="nuts", init_strategy="pathfinder", n_warmup=200)

# Pathfinder init for MAMS
idata = ns.sample(model, method="mams", init_strategy="pathfinder", n_warmup=200)

LAPS: GPU-Accelerated Sampling

LAPS (Late-Adjusted Parallel Sampler) is a GPU-accelerated MAMS variant that runs 4096+ chains simultaneously. Two-phase warmup: Phase 1 unadjusted MCLMC for fast posterior exploration, Phase 2 MH-corrected for exact sampling. Supports bothCUDA (f64, NVIDIA GPUs) and Metal (f32, Apple Silicon M1–M4). Automatic fallback: CUDA > Metal > error.

import nextstat as ns

# Built-in models: std_normal, eight_schools, neal_funnel, glm_logistic
result = ns.sample_laps(
    "eight_schools",
    model_data={"y": [28,8,-3,7,-1,1,18,12],
                "sigma": [15,10,16,11,9,11,10,18]},
    n_chains=4096,
    n_warmup=500,
    n_samples=2000,
    seed=42,
)
print(f"GPU chains: {result['n_gpu_chains']}")
print(f"Wall time: {result['wall_time_s']:.2f}s")

# Multi-GPU (CUDA only)
result = ns.sample_laps("glm_logistic", model_data=data,
                        n_chains=65536, device_ids=[0,1,2,3])

Requires an NVIDIA GPU (CUDA build) or Apple Silicon Mac (Metal build) at runtime. On CUDA: f64 precision. On Metal: f32 precision with fused multi-step kernel and SIMD-group cooperative kernel for data-heavy GLM.

Windowed Mass Adaptation

LAPS warmup Phase 2+3 now uses Stan-style doubling windows (default 3) for inv_mass estimation. Each window resets Welford statistics and dual averaging, improving convergence on multi-scale models. Configurable via n_mass_windows in LapsConfig (set to 1 for single-pass legacy behavior).

Riemannian MAMS for Neal Funnel

New neal_funnel_riemannian model with position-dependent Fisher metric G = diag(1/9, exp(−v), …). Uses effective potential where log-determinant terms cancel, yielding scale-invariant natural gradients. Available on both Metal (f32) and CUDA (f64).

import nextstat as ns

# Riemannian MAMS on GPU — handles funnel geometry naturally
result = ns.sample_laps("neal_funnel_riemannian",
                        model_data={"dim": 10},
                        n_chains=4096, n_samples=2000)