ORBIT

🏆 ClueWeb-Reco Leaderboard

Candidate Ranking Results

ModelRecall@10NDCG@10Recall@50NDCG@50Recall@100NDCG@100
GPT-4.1-QueryGen0.01070.00500.01950.00680.02540.0077
HLLM0.00880.00410.01370.00520.01760.0059
GPT-3.5-Turbo-QueryGen0.00880.00270.01760.00500.03120.0072
GPT-4o-QueryGen0.00680.00420.01460.00580.02640.0077
Gemini-2.5-Flash-QueryGen0.00680.00420.01460.00580.02640.0077
Claude-Sonnet-4-QueryGen0.00680.00320.01660.00520.02150.0060
TASTE0.00200.00150.00390.00190.00390.0019

Prompt Construction for Query Generation

To assess the generalization power of LLM-based recommenders, ClueWeb-Reco includes a query generation task. Browsing history titles are formatted into a prompt, and LLMs are asked to infer the next likely interest without rephrasing. The generated query is then embedded and matched to the candidate pool via dense retrieval.

Prompt construction visual