Installation

Requirements

Python 3.10 or newer
pip or conda

Install from PyPI

The easiest way to install ColliderML is via pip:

bash

pip install colliderml

This will install ColliderML and its core dependencies:

datasets - HuggingFace datasets library for data loading
numpy - Numerical computing
h5py - HDF5 file support (for local data inspection)

Install from Source

For development or to get the latest features:

bash

# Clone the repository
git clone https://github.com/OpenDataDetector/ColliderML.git
cd colliderml

# Install in development mode
pip install -e .

Optional feature sets (`extras`)

ColliderML ships a handful of optional extras so that the base install stays lean. Pick whichever you need:

bash

# Run simulation pipelines locally (requires Docker or Podman separately)
pip install "colliderml[sim]"

# Submit simulation jobs to the SaaS backend (no Docker needed)
pip install "colliderml[remote]"

# Reference baselines for the benchmark task runner (brings scikit-learn)
pip install "colliderml[tasks]"

# Development tools (testing, formatting, linting, type-checking)
pip install "colliderml[dev]"

The sim extra pulls in only the Python dependencies required to drive the container runtime — the actual container image and supporting data are fetched on first use of colliderml.simulate(). See Local Simulation for the full story and disk-space expectations.

The remote extra installs the requests library and gives you colliderml.remote.submit, status, balance, and the integrated colliderml.simulate(remote=True) path — see Remote Simulation. You will also need a HuggingFace account and token, which is how the backend authenticates you.

The tasks extra pulls in scikit-learn for the reference baselines shipped with each benchmark task (GBDT for jets, IsolationForest for anomaly detection). You only need it if you want to run the shipped baselines verbatim — the task registry and local scoring work with the base install. See Benchmark Tasks for details.

The dev extra includes:

pytest and pytest-cov for testing
black for code formatting
ruff for linting
mypy for type checking

Using Conda

If you prefer conda:

bash

# Create a new environment
conda create -n colliderml python=3.11
conda activate colliderml

# Install ColliderML
pip install colliderml

Verify Installation

Test your installation:

python

import colliderml
print(f"ColliderML version: {colliderml.__version__}")

# Test loading a dataset
from datasets import load_dataset
dataset = load_dataset(
    "CERN/Colliderml-release-1",
    "ttbar_pu0_particles",
    split="train"
)
print(f"Successfully loaded {len(dataset)} events")

Troubleshooting

HuggingFace Hub Access

ColliderML datasets are public and don't require authentication. However, if you experience connection issues:

Check your internet connection
Try setting a HuggingFace cache directory:
bash
```
export HF_HOME=/path/to/cache
```
Consult the HuggingFace datasets documentation

Python Version Issues

ColliderML requires Python 3.10 or newer. Check your version:

bash

python --version

If you have an older version, consider using conda or pyenv to manage multiple Python versions.

Next Steps

Continue to the Quickstart Guide to start using ColliderML
Learn about Data Structure and available datasets

Installation ​

Requirements ​

Install from PyPI ​

Install from Source ​

Optional feature sets (extras) ​

Using Conda ​

Verify Installation ​

Troubleshooting ​

HuggingFace Hub Access ​

Python Version Issues ​

Next Steps ​

Installation

Requirements

Install from PyPI

Install from Source

Optional feature sets (`extras`)

Using Conda

Verify Installation

Troubleshooting

HuggingFace Hub Access

Python Version Issues

Next Steps