gpt-oss-safeguard technical report
OpenAI released a technical report on GPT-OSS-Safeguard, two open-weight models (120B, 20B) for content labeling under a given policy. The models are post-trained from GPT-OSS and evaluated on safety benchmarks. You can use these models to build applications that require policy-based content moderation. The report provides baseline safety evaluations and comparisons to the underlying GPT-OSS models.
Key takeaways
- Two open-weight models: 120B and 20B parameters.
- Post-trained from GPT-OSS models for policy-based labeling.
- Baseline safety evaluations provided in the report.
OpenAI released a technical report on GPT-OSS-Safeguard, two open-weight models (120B, 20B) for content labeling under a given policy. The models are post-trained from GPT-OSS and evaluated on safety benchmarks. You can use these models to build applications that require policy-based content moderation. The report provides baseline safety evaluations and comparisons to the underlying GPT-OSS models.
Key takeaways
- Two open-weight models: 120B and 20B parameters.
- Post-trained from GPT-OSS models for policy-based labeling.
- Baseline safety evaluations provided in the report.