🔬ResearchTopic
Benchmarks
Every story we’ve tagged Benchmarks.

🔬Research
Long Context vs. Short Context Model: When Does a Long Context Model Win?
Anthropic's Claude 3.5 Sonnet outperforms GPT-4o in long-context tasks, with strengths in summarization and code analysis.
🚀
🚀Launches
Meta Watermelon 🍉, Anthropic Samsung chips 🤝, autoresearch in practice 📈
Meta's new model, Watermelon, matches GPT-5.5 benchmarks. It's still in training and uses significant compute. This development is notable in the AI landscape.
You’re all caught up.
