RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question Answering
Researchers introduce RECOM, a new evaluation dataset for open-ended Reddit question answering that prioritizes validity discrimination over discriminative power. RECOM contains 15,000 r/AskReddit questions from September 2025, focusing on content alignment over system ranking. This dataset aims to help builders develop and evaluate LLMs that generate high-quality, genuinely aligned responses.
Key takeaways
- RECOM dataset prioritizes validity over discriminative power in evaluating LLM responses.
- Contains 15,000 r/AskReddit questions from September 2025.
- Focuses on content alignment over system ranking.
RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question Answering
Researchers introduce RECOM, a new evaluation dataset for open-ended Reddit question answering that prioritizes validity discrimination over discriminative power. RECOM contains 15,000 r/AskReddit questions from September 2025, focusing on content alignment over system ranking. This dataset aims to help builders develop and evaluate LLMs that generate high-quality, genuinely aligned responses.
Key takeaways
- RECOM dataset prioritizes validity over discriminative power in evaluating LLM responses.
- Contains 15,000 r/AskReddit questions from September 2025.
- Focuses on content alignment over system ranking.