Engineering Blog

Deep dives into differentiable inference, GPU kernel design, and the engineering behind NextStat.

2026-02-13/15 min

NUTS v10: Progressive Sampling, ESS/Leapfrog, and Reproducible Benchmarks

A NUTS implementation with artifact-driven benchmarks. ESS/leapfrog diagnostic isolates algorithmic efficiency from wall-time. Progressive sampling at the top-level tree join doubles ESS/leapfrog on GLM. 3.2× ESS/sec on hierarchical posteriors vs CmdStan 2.38.

NUTSBayesianESS/secProgressive SamplingStanReproducibility

2026-02-12/20 min

Compiler-Symbolic vs Hybrid-Neural: GPU-Accelerated Unbinned Fits in HEP

Mathematical comparison of MoreFit (symbolic differentiation + JIT OpenCL) and NextStat (analytical CUDA kernels + ONNX flows + reverse-mode AD). Three-tier gradient system, fused NLL kernel, systematics, conditional flows, and R&D.

GPUUnbinnedONNXCUDAAnalytical GradientsSystematics

2026-02-09/15 min

Unbinned Event-Level Analysis: Why Work With Every Event

Extended unbinned likelihood mathematics, catalog of 10 parametric PDFs (Crystal Ball, KDE, Chebyshev, etc.), columnar EventStore (SoA), and a full resonance search workflow — from JSON spec to toy studies.

UnbinnedEvent-LevelCrystal BallKDEExtended LikelihoodHistFactory

2026-02-09/6 min

JAX Compile vs Execution: The Benchmark You Actually Need

Why compile latency matters in scientific ML pipelines and how NextStat benchmarks cold-start vs warm throughput with reproducible harnesses.

JAXCompile LatencyMLBenchmarks

2026-02-08/8 min

Pharma Benchmarks: PK and NLME Without Benchmark Theater

Rigorous benchmarking of PK/NLME workflows: objective definitions, stopping rules, scaling protocols, correctness gates, and publishable artifacts.

PharmacometricsPKNLMEBenchmarksRegulated

2026-02-06/7 min

Bayesian Benchmarks That Mean Something: ESS/sec vs Wall-Time

How NextStat benchmarks Bayesian inference rigorously: ESS/sec methodology, stable protocols, diagnostics settings, and publishable artifacts for comparisons vs Stan and PyMC.

BayesianNUTSESS/secStanPyMC

2026-02-05/10 min

Building a Trustworthy HEP Benchmark Harness

How NextStat benchmarks HistFactory inference engines without benchmark theater: correctness gates, optimizer convergence checks, warm-start policies, and auditable artifacts.

HistFactorypyhfROOTBenchmarksCorrectness

2026-02-03/6 min

Third-Party Replication: External Reruns + Signed Reports

The strongest trust signal is an independent rerun. Same harness, published manifests, GPG/Sigstore signed reports, and verifiable validation_report.json with SHA-256 hashes.

ReplicationTrustSigned ReportsGPGSigstore

2026-02-01/5 min

Benchmark Snapshots as Products: CI Artifacts, Manifests, and Baselines

How NextStat publishes benchmark snapshots as rerunnable artifact sets: CI automation, baseline manifests, correctness gates, and validation-report-backed evidence.

BenchmarksCIReproducibilityTrust

2026-01-29/7 min

The End of the Scripting Era

Scripts gave us speed. Reproducible benchmark harnesses give us trust. How performance claims become scientific experiments — with protocols, correctness gates, and auditable artifacts.

ReproducibilityScientific ComputingTrust

2026-01-25/8 min

Trust Offensive: Why We Publish Reproducible Benchmarks

Public benchmark snapshots designed like experiments — with protocols, pinned environments, correctness gates, and artifacts that others can rerun. 6 suites across HEP, pharma, Bayesian, ML, time series, and econometrics.

BenchmarksReproducibilityTrustHistFactorypyhf

2026-01-20/18 min

How NextStat Makes HistFactory Differentiable in PyTorch

Four-layer architecture: SoftHistogram → fused CUDA kernel → Rust GPU sessions → envelope theorem. Zero-copy device pointers, analytical gradients, 2.07e⁻⁹ accuracy.

PyTorchCUDADifferentiableGPUHistFactory

2026-01-15/22 min

Where ROOT Gets It Wrong: Numerical Comparison of HistFactory Implementations

3-way profile scan comparison: ROOT/RooFit vs pyhf vs NextStat. ROOT overestimates q(μ) on OverallSys, diverges on coupled HistoSys, fails catastrophically on EWK. 37×–880× faster.

ROOTpyhfHistFactoryValidationMinuit2L-BFGS-B