Model Release
Every story we’ve tagged Model Release.

Long Context vs. Short Context Model: When Does a Long Context Model Win?
Anthropic's Claude 3.5 Sonnet outperforms GPT-4o in long-context tasks, with strengths in summarization and code analysis.

GPT and Claude failed Bridgewater's finance tests because the right answers were never public
Bridgewater and Thinking Machines Lab have developed a fine-tuned open-source AI model that outperforms commercial models in financial document analysis. The model, Qwen3-235B, achieved 84.7 percent accuracy at a significantly lower cost. This demonstrates the potential for companies to develop powerful AI solutions using their own proprietary data.

Meta's AI agent push is moving slower than Zuckerberg planned
Meta's AI agent development is progressing slower than expected, according to CEO Mark Zuckerberg. Despite this, the company's AI chief, Alexandr Wang, expressed optimism about upcoming models. Meta is investing heavily in AI infrastructure, with plans to spend up to $145 billion this year.

A behind-the-scenes look at Midjourney’s medical scanner leaves many questions unanswered
Midjourney showcased its medical ultrasound scanner in a behind-the-scenes video, but experts question its ability to overcome ultrasound limitations and generate detailed images at scale. The scanner is planned for launch as a wellness product focused on body composition.

Presentation: Fine Tuning the Enterprise: Reinforcement Learning in Practice
OpenAI introduced Agent RFT, a platform for fine-tuning models through reinforcement learning and real-time tool interactions. It aims to improve agent performance in complex tasks.

Anthropic says it cut 80 percent of Claude Code's system prompt because Fable 5 models "want a smaller system prompt"
Anthropic reduced Claude Code's system prompt by 80 percent due to the new Fable 5 models' preference for smaller prompts. This change marks a shift in how AI models are steered, with a focus on context over hard rules. The update allows for more imaginative responses.
Meta Watermelon 🍉, Anthropic Samsung chips 🤝, autoresearch in practice 📈
Meta's new model, Watermelon, matches GPT-5.5 benchmarks. It's still in training and uses significant compute. This development is notable in the AI landscape.

Leanstral 1.5: Proof Abundance for All
Mistral AI released Leanstral 1.5, a free and open-source model for formal verification in Lean 4, achieving state-of-the-art results on several benchmarks. The model excels in agentic proof engineering and real-world code verification, uncovering previously unknown bugs in open-source repositories. Leanstral 1.5 is available via Hugging Face and a free API.