Leaderboard Overview

The ORBIT Leaderboard provides a comprehensive evaluation of recommendation models across multiple benchmark datasets and evaluation metrics. Our goal is to offer a transparent, reproducible, and fair comparison of models across domains and settings.

ORBIT enforces fixed train/validation/test splits, standardized evaluation metrics, and uniform candidate pools for all models. This ensures reproducibility and fair comparison, addressing the inconsistency challenges found in flexible toolkits like RecBole and Elliot. Every result shown on the leaderboard is computed under the same evaluation protocol and candidate set, making model rankings trustworthy and comparable.

Available Rankings

Public Benchmark Leaderboard – Performance of models within individual datasets.
ClueWeb-Reco Leaderboard – Evaluation under the ClueWeb-Reco benchmark.

Metrics Used

Recall@K (K = 1, 10, 50, 100)
NDCG@K (K = 10, 50, 100)

All values are reported as raw decimals (up to 4 decimal places).

Contributing

Interested in submitting a model to the leaderboard? Stay tuned — submission instructions and evaluation scripts will be released soon.