Tag

#content-moderation

Every item tagged content-moderation, newest first.

5 items

ChatGPT's image generator can be manipulated to produce violent, sexual content

Researchers found ChatGPT's DALL-E image generator can produce violent and sexual content when manipulated with specific prompts. The model's safety filters can be bypassed, raising concerns about misuse. Builders integrating image generation should assess content moderation risks. This vulnerability highlights the need for robust safeguards in AI systems.

Key takeaways

ChatGPT's DALL-E can produce violent and sexual content.
Safety filters can be bypassed with specific prompts.
Builders must assess content moderation risks in image generation.

HHacker News94 pts#ai-safety #image-generation #content-moderation

other19h

A study found 59% of the videos TikTok serves new accounts are AI "slop"

A study by Kapwing found 59% of videos served to new TikTok accounts were AI-generated or low-effort content. This rate is about three times higher than on YouTube Shorts. The prevalence was highest in kids' content, with 97% of videos under the #CartoonKids tag being AI-generated. TikTok offers an option to see less AI content.

Key takeaways

59% of new TikTok videos are AI-generated or low-effort content.
Kids' content has 97% AI-generated videos under #CartoonKids.
TikTok offers a 'see less AI content' option.

rr/artificial#ai-content #social-media #content-moderation

modelsJun 9

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

Anthropic's Fable 5 model now refuses to engage on certain topics deemed too risky, including cybersecurity, biology, and chemistry. This safety feature aims to prevent misuse and potential harm. You can expect similar content moderation in other frontier models. The specific topics blocked may evolve over time as Anthropic refines its approach.

Key takeaways

Fable 5 blocks cybersecurity, biology, and chemistry queries.
Model proactively refuses to engage on risky topics.
Anthropic may expand or change blocked topics over time.

AArs Technica#ai-safety #content-moderation #frontier-models

modelsApr 29

Welcoming Llama Guard 4 on Hugging Face Hub

Meta has released Llama Guard 4, a new model for content moderation available on Hugging Face Hub. Llama Guard 4 helps detect and filter harmful content. You can deploy it locally or use it as an API.

Key takeaways

Llama Guard 4 is available on Hugging Face Hub.
The model detects and filters harmful content.
Local deployment and API access are supported.

HHugging Face Blog#content-moderation #open-source #hugging-face

otherMar 1

How Hugging Face Accelerated Development of Witty Works Writing Assistant

Hugging Face accelerated development of Witty Works' writing assistant by integrating text classification models. This enabled faster iteration and more accurate content moderation. Builders can apply similar strategies to enhance their own applications with pre-trained models.

Key takeaways

Hugging Face integrated text classification models for Witty Works.
Enabled faster iteration and more accurate content moderation.
Builders can apply similar strategies to their applications.

HHugging Face Blog#hugging-face #text-classification #content-moderation