Appearance
Remote Simulation
The ColliderML SaaS backend runs the same pipeline as Local Simulation — but on NERSC Perlmutter, dispatched via a small FastAPI service. You submit a job, get a request ID, and collect the results from a per-request HuggingFace dataset when the run completes. No container runtime needed on your side.
This page covers the operator-facing story. For the API reference (every function signature, error type, env var), see colliderml.remote.
When to use remote simulation
| Use remote when | Use local instead when |
|---|---|
| You need more than a few hundred events | You want tight iteration cycles |
| You want realistic pileup (mu ≥ 10) | You have Docker/Podman already |
| You are on a laptop / CI runner without Docker | You need offline reproducibility |
| You want the job's output hosted on HuggingFace automatically | You are debugging the pipeline scripts themselves |
| You want to earn credits by submitting benchmark models |
Remote runs are idempotent and deduplicated: if a request with the same parameters has already succeeded, the backend reuses the existing dataset and charges zero credits.
Prerequisites
- Python 3.10 or newer.
- The remote extra:bashThis pulls in
pip install "colliderml[remote]"requests. Everything else (huggingface_hub,pyyaml) is already in the base install. - A HuggingFace account and token with read permissions:bashThe token is used both to authenticate with the backend and to pull the resulting dataset from HuggingFace.
huggingface-cli login # interactive # or export HF_TOKEN=hf_xxxxxxxx
Credits
The backend uses a simple credit model:
- 1 credit ≈ 1 node-hour ≈ 100 events at
pu0(roughly; the backend prices each request based on the pipeline stages it will run) - New users get a 10-credit seed grant on first login
- Benchmark wins and PRs to the model zoo earn additional credits — see Benchmark Tasks
Check your balance any time:
python
import colliderml
print(colliderml.balance(), "credits")Or on the CLI:
bash
colliderml balanceSubmitting a job
There are two entry points:
colliderml.simulate(remote=True) — integrated
The drop-in counterpart to local simulation. Returns a SimulationResult whose remote_request_id is set so you can poll it later.
python
import colliderml
result = colliderml.simulate(
preset="higgs-portal-quick",
remote=True,
)
print(result.remote_request_id)simulate() returns immediately after the backend acknowledges the request. Use the request ID with colliderml.remote.wait_for() or the colliderml status CLI to track progress.
colliderml.remote.submit(...) — direct
Lower-level than simulate(). You get the full :class:RemoteSubmission object back, with credits charged, estimated node-hours, and the initial state. Use this when you want tight control over the submit → poll → download loop.
python
from colliderml.remote import submit, wait_for
submission = submit(channel="ttbar", events=1000, pileup=200)
print(submission.request_id, submission.credits_charged, "credits charged")
final = wait_for(
submission.request_id,
poll_interval=30,
on_poll=lambda s: print(f" state={s.state}"),
)
print("done:", final.output_hf_repo)Polling, status, cancellation
python
from colliderml.remote import status, wait_for
# One-shot status check
snap = status("req-abc123")
print(snap.state, snap.output_hf_repo)
# Block until done, with progress callback
final = wait_for("req-abc123", poll_interval=60)Terminal states are completed, failed, and cancelled. wait_for raises RuntimeError on failed / cancelled and TimeoutError when a supplied timeout elapses.
Cancellation — contact the backend admin (see the internal admin runbook) for now; a user-facing cancellation endpoint is planned.
Collecting outputs
Completed requests expose their output dataset at:
hf.co/datasets/ColliderML/req-<request_id>You can load it with the same convenience you use for the canonical release:
python
import colliderml
from colliderml.core.hf_download import DownloadSpec, download_config
from colliderml.core.loader import load_tables
from colliderml.core.data.loader_config import LoaderConfig
# Pull outputs for the completed request into the local cache
download_config(DownloadSpec(
dataset_id="ColliderML/req-abc123",
config="ttbar_pu200_tracker_hits",
))
# Then load locally
tables = load_tables(LoaderConfig(
dataset_id="ColliderML/req-abc123",
channels="ttbar",
pileup="pu200",
objects=["tracker_hits"],
split="train",
lazy=False,
))(After commit B4 lands, the one-liner colliderml.load("ttbar_pu200", dataset_id="ColliderML/req-abc123") will subsume all of the above.)
Configuration
| Env var | Meaning | Default |
|---|---|---|
COLLIDERML_BACKEND | Backend base URL | https://api.colliderml.com |
HF_TOKEN | Explicit HuggingFace token (overrides hub login) | — |
HUGGING_FACE_HUB_TOKEN | Fallback env var | — |
Pass backend_url= to any client function to override per-call.
Troubleshooting
RuntimeError: Authentication failed — the backend rejected your token. Re-run huggingface-cli login and try again. Make sure your token has at least read permissions.
RuntimeError: Insufficient credits — the backend priced your request higher than your current balance. Reduce events, lower pileup, or earn more credits by submitting a benchmark entry. Run colliderml balance to see what you have.
RuntimeError: Duplicate request — someone (possibly you) already ran exactly this request. The backend typically returns the existing dataset URL in the error detail — use that instead of re-submitting.
Could not reach ColliderML backend — network, DNS, or the backend is down. Check the status page linked from the docs home.
ImportError: The colliderml.remote client requires the 'remote' extra — run pip install 'colliderml[remote]'.
Next steps
- Remote API reference
- Local simulation guide — when to stay on your own machine
- Benchmark tasks — earn more credits