Appearance
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.4.0] - 2026-04-09
Added
- [library] Top-level
colliderml.load(dataset, *, tables, max_events, event_range, …)— a one-liner that downloads missing parquets on demand, then loads them with the existing Polars backend. Uses the same<channel>_<pileup>shorthand as the benchmark task runner. - [library]
colliderml.simulatesubpackage — run the full Pythia / MadGraph / Geant4 / ACTS pipeline locally inside a Docker or Podman container, or submit to the SaaS backend withremote=True. Local simulate auto-clones the companioncolliderml-productionrepo at a pinned git ref for pipeline scripts, mirroring the existing ODD and MG5aMC_PY8_interface cache pattern. Ships a bundled preset catalogue (ttbar-quick,higgs-portal-quick,ttbar-dev,ttbar-benchmark, …) accessible viacolliderml.simulate.load_presets(). - [library]
colliderml.remotesubpackage — HTTP client for the ColliderML SaaS backend service. Exposessubmit,status,wait_for,get_me,balance, and aRemoteSubmissiondataclass. Authentication is via a HuggingFace token resolved from the environment or the hub's saved credentials. - [library]
colliderml.taskssubpackage — registry and runner for six benchmark tasks:tracking,jets,anomaly,tracking_latency,tracking_small,data_loading. Each task defines a dataset, eval event range, input tables, metrics, and ahigher_is_betterdirection. Reference baselines (CKF for tracking, scikit-learn GBDT for jets, IsolationForest for anomaly detection) run aspython -m colliderml.tasks.<task>.baselines.<name>. - [library] CLI subcommands:
simulate,list-presets,balance,status. Each handler lazily imports its subsystem so existingdownload/list-configscommands stay fast. - [library] Optional extras:
[sim],[remote],[tasks], and an[all]meta-extra that bundles everything plus[dev]. Base install stays lean. - [library]
pyarrow>=14.0.0added toinstall_requires(used by the task scoring code at the polars/arrow boundary). - [docs] New guide pages:
guide/simulation,guide/remote-simulation,guide/tasks. New library reference pages:library/simulate,library/remote,library/tasks. Navigation and sidebar extended to surface the new content. - [docs] Installation page documents the new optional extras with prerequisites.
- [docs] Quickstart and library overview pages extended with simulate + tasks examples.
- [infra]
docs/.vitepress/config.tsreadsVITEPRESS_BASEfrom the environment so the same config builds both the canonical/ColliderML/site and a/ColliderML/staging/preview (infrastructure landing in a sibling commit; see B7).
Changed
- [library] Top-level
collidermlpackage now uses a__getattr__lazy loader for the optionalsimulate,remote, andtaskssubsystems soimport collidermlstays cheap when extras aren't installed.
[0.3.0] - 2025-12-19
Added
- [library] CLI commands:
colliderml downloadfor downloading HuggingFace datasets with local caching,colliderml list-configsfor discovering available configurations. - [library] Polars-based data loader with lazy/eager loading support and structured configuration (YAML/dict).
- [library] Physics utilities: pileup subsampling, decay-chain traversal with primary ancestor assignment, calorimeter calibration with region-specific scaling factors.
- [library] Flattening helpers to explode nested Parquet structures into pandas DataFrames for analysis.
- [library] Visualization utilities: coordinate transformations (eta, r), binned energy profile plotting.
- [library] Download timing benchmarks integrated into CI/CD.
- [docs] Reference exploration notebook demonstrating loader, physics utilities, and visualization.
Changed
- [library] Data loading now uses local Parquet cache populated by CLI, not direct HuggingFace streaming.
- [library] Exploded/flattened data structures now return pandas DataFrames for downstream analysis.
- [library] Physics constants (calorimeter calibration factors) centralized in
colliderml.physics.constants.
Fixed
- [library] Decay traversal now correctly identifies root particles by broken parent links, ignoring the
primaryflag. - [library] Pileup subsampling now correctly filters tracker hits and calorimeter contributions by particle ID.
- [library] Calorimeter calibration now handles both region names and raw detector IDs.
[0.2.0] - 2025-11-07
Added
- [dataset] Datasets now hosted on HuggingFace Hub for easier access and distribution.
- [dataset] Support for standard HuggingFace
datasetslibrary for data loading. - [docs] Interactive dataset configurator with dynamic channel discovery from HuggingFace API.
- [docs] Updated documentation site with HuggingFace integration examples.
Changed
- [dataset] Migrated from NERSC manifest-based distribution to HuggingFace datasets.
- [dataset] Data now stored in Parquet format with improved compression and accessibility.
- [docs] Simplified data access workflow using
load_dataset()instead of custom CLI.
Removed
- [docs] Removed NERSC manifest.json dependency from documentation build process.
[0.1.0] - 2025-09-08
Added
- [dataset] Initial release of ColliderML dataset with ttbar and ggf physics processes.
- [dataset] Four detector hierarchy levels: particles, tracker_hits, calo_hits, and tracks.
- [dataset] Approximately 100,000 simulated events per process with no pileup (pu0).
- [library] Initial
collidermlPython library with data access utilities. - [docs] Documentation website with VitePress framework.
- [docs] Dataset configuration modal for exploring available data.
Unreleased changes should be added under a ## [Unreleased] header above with entries marked as [dataset], [library], or [docs].