Responsible AI
Every story we’ve tagged Responsible AI.

Claude Code's complicated China problem involves bans on both sides of the Pacific
Anthropic is restricting access to its AI model Claude Code for Chinese companies, while Alibaba is banning its employees from using the model due to concerns over hidden code that could flag Chinese users.

Security vulnerability reports have exploded since AI models started hunting for bugs
Anthropic's Claude Mythos Preview model has led to a surge in reported security vulnerabilities, with over 1,500 high-severity and critical vulnerabilities reported in June 2026. This follows Anthropic's announcement in April 2026 that its model can find software vulnerabilities on its own.

We can debate the ethics of AI but can’t seem to change course | Letters
The development of AI is being driven by commercial and geopolitical interests, potentially setting its course before society has had a chance to debate its direction. This raises concerns about the ethics and values guiding AI development.

UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do
The UK's AI Security Institute found that standard benchmarks underestimate AI agents' capabilities when computing budgets are limited. The study showed that increasing the token budget can improve success rates by up to 25%.

Google DeepMind and A24 announce first-of-its-kind research partnership
Google DeepMind and A24 have partnered to research and develop new creative tools and techniques for artists. The collaboration aims to shape future technology and expand storytelling possibilities in entertainment.
Pragmatic FDT, and predictors as game theory
Stuart Armstrong proposed a pragmatic version of Functional Decision Theory (FDT) that focuses on exploitable isomorphisms between an agent's decision process and parts of the world. This approach sidesteps theoretical pitfalls and views predictors through the lens of game theory.

Meta's AI agent push is moving slower than Zuckerberg planned
Meta's AI agent development is progressing slower than expected, according to CEO Mark Zuckerberg. Despite this, the company's AI chief, Alexandr Wang, expressed optimism about upcoming models. Meta is investing heavily in AI infrastructure, with plans to spend up to $145 billion this year.

UK parents warned over posting images of children amid AI sexual abuse fears
The UK National Crime Agency and Internet Watch Foundation have warned parents about posting images of their children online due to the increasing risk of AI-generated child sexual abuse material. The guidance advises parents to make their social media accounts private and be cautious about sharing images of their children online.

AI prey: why watchdogs are telling parents to protect children from nudification apps
The UK's National Crime Agency and the Internet Watch Foundation have warned parents to protect their children's images online due to the rise of AI-generated child sexual abuse material. The amount of such material found online surged by 14% in 2025. Guidance has been issued to help parents mitigate the risk.
Jul 2, 2026 Announcements More details on Fable 5’s cyber safeguards and our jailbreak framework
Anthropic has provided more information on Fable 5's cybersecurity safeguards and proposed a framework for evaluating the severity of AI jailbreaks. The company aims to balance preventing misuse with allowing defensive uses of the technology. This move is part of a broader effort to establish industry standards for AI safety.

Australia news live: minister says it’s safe to ‘keep buying eggs, keep buying chicken’ as east coast records first suspected bird flu case
The shadow minister for the arts, Angie Bell, emphasizes the need for AI companies to obtain permission and pay for using Australian creative works, highlighting the importance of copyright protections.

‘Don’t kill music’: Anthony Albanese’s favourite bands beg PM to stop AI companies from stealing their work
The Australian government is considering a proposal to weaken copyright laws, allowing tech companies to use Australian musicians' work to improve AI models. Musicians, including those favored by Prime Minister Anthony Albanese, are speaking out against the proposal, citing concerns about fair compensation and the impact on the music industry.

Anthropic says it cut 80 percent of Claude Code's system prompt because Fable 5 models "want a smaller system prompt"
Anthropic reduced Claude Code's system prompt by 80 percent due to the new Fable 5 models' preference for smaller prompts. This change marks a shift in how AI models are steered, with a focus on context over hard rules. The update allows for more imaginative responses.

Mark Zuckerberg tells staff that AI agents haven’t progressed as quickly as he’d hoped
Mark Zuckerberg told Meta staff that AI agent development has not progressed as quickly as hoped, despite significant investment in AI infrastructure.

“AI contributions are demoralizing”: Godot bans coding agents to save its mentoring model
Godot has banned coding agents in order to preserve the effectiveness of its mentoring model, citing concerns that AI contributions were demoralizing.

How Amazon Bedrock catches AI-generated phishing
Amazon introduced Amazon Bedrock, a service that uses large-scale general-purpose AI models to analyze behavioral patterns in email content and detect phishing attempts. This service aims to address the growing threat of AI-generated phishing emails that can bypass traditional security filters.

Best practices for multi-turn reinforcement learning in Amazon SageMaker AI
Amazon SageMaker AI shares best practices for multi-turn reinforcement learning, including environment design and reward alignment. The goal is to improve the reliability of agentic RL training. These practices draw from the SOP-Bench dataset and focus on trustworthy environments and evaluation.
.png)
More compute, more capability: Why AI agent evaluations need to account for test-time compute
AISI's research highlights the importance of accounting for test-time compute in AI agent evaluations, as fixed budgets can underestimate capabilities, especially for newer models. Increasing compute can improve performance, and the benefits are more significant for more advanced models.

You Can Now Sound the Alarm on AI Behaving Badly
A new platform called FLARE-AI has been launched to allow users to report harmful AI behavior and model flaws, aiming to improve transparency and accountability in AI development. The platform was discussed in an article published by WIRED, with input from CSET's Jessica Ji. This launch is seen as a positive step towards making AI more transparent.
