NextStatNextStat

Arrow / Polars Integration

Zero-Copy Columnar Data Interchange

NextStat speaks Apache Arrow natively. Ingest histogram data from PyArrow, Polars, DuckDB, or Spark — and export model results back — with zero Python-side deserialization overhead. The bridge uses Arrow IPC under the hood, backed by Rust's arrow 57.3 and parquet 57.3 crates.

Data Flow

PyArrow Table / Polars DataFrame / DuckDB Result
        │
        ▼  .to_arrow() or native
  Arrow RecordBatch
        │
        ▼  IPC serialize (~1 memcpy)
  nextstat.from_arrow(table) ──► Rust arrow crate
        │                            │
        ▼                            ▼
  HistFactoryModel            Arrow RecordBatch
        │                            │
        ▼                            ▼  IPC deserialize
  nextstat.to_arrow(model)    PyArrow Table

Quick Start

import pyarrow as pa
import nextstat

# Define histogram data as an Arrow table
table = pa.table({
    "channel": ["SR", "SR", "CR"],
    "sample":  ["signal", "background", "background"],
    "yields":  [[5., 10., 15.], [100., 200., 150.], [500., 600.]],
    "stat_error": [[1., 2., 3.], [10., 14., 12.], None],
})

# Create model and fit
model = nextstat.from_arrow(table, poi="mu")
result = nextstat.fit(model)
print(result)

Table Schema

The input Arrow table must follow this schema. Each row represents one (channel, sample) pair.

ColumnArrow TypeRequiredDescription
channelUtf8yesChannel (region) name
sampleUtf8yesSample (process) name
yieldsList<Float64>yesExpected event counts per bin
stat_errorList<Float64>noPer-bin statistical uncertainties

Polars

import polars as pl
import nextstat

# Read histogram data from Parquet via Polars
df = pl.read_parquet("histograms.parquet")
model = nextstat.from_arrow(df.to_arrow(), poi="mu")

# Or read Parquet directly (Rust-native, no Python overhead)
model = nextstat.from_parquet("histograms.parquet", poi="mu")

DuckDB

import duckdb
import nextstat

con = duckdb.connect()
table = con.sql("""
    SELECT channel, sample, yields
    FROM 'histograms.parquet'
""").arrow()

model = nextstat.from_arrow(table)

Export

Export model data back to Arrow for downstream analysis, dashboards, or ML pipelines.

model = nextstat.from_pyhf(workspace_json)

# Expected yields per channel
yields = nextstat.to_arrow(model, what="yields")
print(yields.to_pandas())
#   channel sample              yields
# 0      CR  total      [500.0, 600.0]
# 1      SR  total  [105.0, 210.0, 165.0]

# Parameter metadata
params = nextstat.to_arrow(model, what="params")
print(params.to_pandas())
#              name  index  value  bound_lo  bound_hi  init
# 0              mu      0    1.0       0.0      10.0   1.0
# 1  staterror_SR[0]    1    1.0       1e-10    10.0   1.0

Custom Observations

# By default, Asimov data (sum of yields) is used.
# Pass observed data explicitly:
model = nextstat.from_arrow(
    table,
    poi="mu",
    observations={
        "SR": [110., 215., 170.],
        "CR": [510., 590.],
    },
)

Low-Level IPC API

For maximum control, use the raw IPC bytes interface directly. This is what from_arrow() and to_arrow() use internally.

import pyarrow as pa

# Serialize table to IPC bytes
sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream(sink, table.schema)
for batch in table.to_batches():
    writer.write_batch(batch)
writer.close()
ipc_bytes = sink.getvalue().to_pybytes()

# Ingest IPC bytes directly
model = nextstat.from_arrow_ipc(ipc_bytes, poi="mu")

# Export as IPC bytes
yields_bytes = nextstat.to_arrow_yields_ipc(model)
params_bytes = nextstat.to_arrow_params_ipc(model)

# Deserialize in Python
yields_table = pa.ipc.open_stream(yields_bytes).read_all()

API Reference

  • nextstat.from_arrow(table, *, poi, observations) — PyArrow Table/RecordBatch → HistFactoryModel.
  • nextstat.to_arrow(model, *, params, what) — HistFactoryModel → PyArrow Table. what="yields" or "params".
  • nextstat.from_parquet(path, *, poi, observations) — Parquet file → HistFactoryModel (Rust-native reader).
  • nextstat.from_arrow_ipc(bytes, poi, observations) — raw IPC stream bytes → HistFactoryModel.
  • nextstat.to_arrow_yields_ipc(model, params) — HistFactoryModel → IPC bytes (yields).
  • nextstat.to_arrow_params_ipc(model, params) — HistFactoryModel → IPC bytes (parameters).