1sec.ai

Tag

#multi-speaker

Every item tagged multi-speaker, newest first.

1 item

Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

Researchers propose ScenA, a method for multi-speaker audio scene generation that conditions a text-to-audio model on reference voices. Unlike prior methods, ScenA uses in-the-wild data and skips structured supervision, producing more realistic ambient conversations. This approach could enable more natural-sounding dialogue systems. Builders may find value in exploring ScenA's potential for applications like voice assistants or audio content creation.

Key takeaways
  • ScenA uses in-the-wild data for multi-speaker audio generation.
  • No structured supervision like per-turn tags or learnable embeddings.
  • Conditions model directly on reference voices for realism.