🏆 ClueWeb-Reco Leaderboard
Candidate Ranking Results
Model | Recall@10 ▼ | NDCG@10 | Recall@50 | NDCG@50 | Recall@100 | NDCG@100 |
---|---|---|---|---|---|---|
GPT-3.5-Turbo-QueryGen | 0.0068 | 0.0027 | 0.0176 | 0.0050 | 0.0312 | 0.0072 |
GPT-4o-QueryGen | 0.0068 | 0.0042 | 0.0146 | 0.0058 | 0.0264 | 0.0077 |
Gemini-QueryGen | 0.0068 | 0.0042 | 0.0146 | 0.0058 | 0.0264 | 0.0077 |
TASTE | 0.0020 | 0.0015 | 0.0039 | 0.0019 | 0.0039 | 0.0019 |
Prompt Construction for Query Generation
To assess the generalization power of LLM-based recommenders, ClueWeb-Reco includes a query generation task. Browsing history titles are formatted into a prompt, and LLMs are asked to infer the next likely interest without rephrasing. The generated query is then embedded and matched to the candidate pool via dense retrieval.
