Leaderboard Overview
The ORBIT Leaderboard provides a comprehensive evaluation of recommendation models across multiple benchmark datasets and evaluation metrics. Our goal is to offer a transparent, reproducible, and fair comparison of models across domains and settings.
ORBIT enforces fixed train/validation/test splits, standardized evaluation metrics, and uniform candidate pools for all models. This ensures reproducibility and fair comparison, addressing the inconsistency challenges found in flexible toolkits like RecBole and Elliot. Every result shown on the leaderboard is computed under the same evaluation protocol and candidate set, making model rankings trustworthy and comparable.
Available Rankings
- Public Benchmark Leaderboard – Performance of models within individual datasets.
- ClueWeb-Reco Leaderboard – Evaluation under the ClueWeb-Reco benchmark.
Metrics Used
- Recall@K (K = 1, 10, 50, 100)
- NDCG@K (K = 10, 50, 100)
All values are reported as raw decimals (up to 4 decimal places).
Contributing
Interested in submitting a model to the leaderboard? Stay tuned — submission instructions and evaluation scripts will be released soon.