1sec.ai
Back to feed
research1d ago

Variable-Width Transformers

aarXivscore 0.24

Transformers with variable width outperform constant-width models on a range of tasks. The proposed ×-Transformer consistently outperforms parameter-matched baselines, suggesting nonuniform capacity allocation improves performance. This work empirically investigates nonuniform capacity allocation across network depth.

Key takeaways

  • Most transformer architectures maintain constant width across all layers.
  • Proposed ×-Transformer consistently outperforms parameter-matched baselines.
  • Nonuniform capacity allocation improves performance on a range of tasks.
research1d ago

Variable-Width Transformers

Transformers with variable width outperform constant-width models on a range of tasks. The proposed ×-Transformer consistently outperforms parameter-matched baselines, suggesting nonuniform capacity allocation improves performance. This work empirically investigates nonuniform capacity allocation across network depth.

Key takeaways

  • Most transformer architectures maintain constant width across all layers.
  • Proposed ×-Transformer consistently outperforms parameter-matched baselines.
  • Nonuniform capacity allocation improves performance on a range of tasks.