Gymnasium RL Environment

Name: NextStat
Author: Andrei Toistev

experimental

nextstat.gym provides an optional Gymnasium/Gym wrapper that treats a HistFactory workspace as an RL/DOE environment. You propose updates to a sample's nominal yields (e.g. a signal histogram) and receive a NextStat metric as reward.

Installation

pip install nextstat gymnasium numpy

Quick Start

from pathlib import Path
from nextstat.gym import make_histfactory_env

ws_json = Path("workspace.json").read_text()

env = make_histfactory_env(
    ws_json,
    channel="singlechannel",
    sample="signal",
    reward_metric="q0",       # maximize discovery significance
    max_steps=64,
    action_scale=0.02,
    action_mode="logmul",     # multiplicative updates in log-space
    init_noise=0.0,
)

obs, info = env.reset(seed=123)
total = 0.0
for _ in range(64):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    total += float(reward)
    if terminated or truncated:
        break

print("episode reward:", total)

Reward Metrics

Metric	Profiled?	Description
nll	No	-NLL at fixed parameters (fast, many steps/sec)
q0 / z0	Yes	Discovery test statistic / significance
qmu / zmu	Yes	-qμ / -sqrt(qμ) for upper-limit optimization

Configuration

action_mode — "additive" (direct delta) or "logmul" (multiplicative in log-space)
action_scale — scale factor for actions (default 0.02)
max_steps — episode length before truncation
init_noise — Gaussian noise added to initial yields on reset

Notes

Profiled rewards (q0, qmu) run optimization internally — heavier per step than NLL mode.
Compatible with both gymnasium (preferred) and legacy gym.
The environment modifies the model in-place by overriding one sample's nominal yields.